My Two Cents

ID 671701
Updated 7/2/2020
Version Latest
Public

author-image

By

By Steve Hughes, Intel Software Application Engineer

I’ve been threading code in games for years. First as a game programmer back in the nineties and noughties, and then in a more focused way since joining Intel about 12 years ago. In my time with Intel, I’ve had the honor of working with some of the big hitters in the games industry, helping to thread up great titles while hopefully bringing some extra gaming joy to the public.  

In the early days when we were making chips with two to eight cores, threading up code was an easy performance win. You could take a bunch of AI, split it into tasks, throw it into either your own threading engine or something like Intel® Threading Building Blocks (other threading tools are available), and bingo, it would run faster.  Of course, you never got the maximum possible scaling, because you needed synchronization between threads. After all, those pesky AI critters need to know what each other are doing.  

The Synchronization Problem

I think this issue will become more of a problem as CPUs get ever larger core counts and games get ever more complex in order to take advantage of them. Here’s why: As the number of tasks running in parallel rises linearly, the amount of synchronization required between them rises exponentially. Put very simply, if you have N entities in a game that interact, and you thread them, then in the worst case you need N*N synchronization events (who does that?), but in the best case you will never get to needing N or fewer events. To clarify, I’m not talking about those embarrassingly parallelizable tasks that need no synchronization that benefit from high core counts. In games, almost everything that you thread needs some degree of synchronization.  

Saying all that, you do get some embarrassingly parallel tasks in games. They are normally found in chunks of code designed to increase visual differentiation of things that become more realistic with higher core counts: weather effects, grass, water, particles. But in all these cases, as many of you are aware, to get a linear increase in visual quality you need an exponential increase in required processing.

A Problem of Complexity

Picture something that processes a 3D grid of air cells for wind/pressure simulation in a sphere around the camera. You can get nice visual differentiation from that. Now, as you increase the radius (r) of the sphere while maintaining the density of the cells, the number of cells rises by r cubed. So, you get a linear increase in realism for an exponential increase in processing. If you are threading this code (it would be rude not to) then the number of extra cores you need to get a linear increase in quality rises exponentially.  

And I didn’t even mention Amdahl!

My Two Cents

So where am I going with all this? You lose the possible benefit for core count once you pass a certain number of cores. Not only do you eventually lose the benefit of parallelization when the sync becomes sufficiently complicated, once you raise the radius of your effect beyond a certain size the required increase in core count starts to get very, very large for just a tiny increase in quality.

I may upset some by saying this, but now that we have 10 - 12 cores we’re reaching the point where clock speed and core efficiency are more important than the number of cores. We need to get individual tasks through the CPU faster to speed up both the single-threaded and the multithreaded code in games. That in turn will improve the experience for developers and gamers alike.

The Plug 

Intel has already taken this step with the amazing gaming chip on the Intel® Core™ i9-10900K Processor (formerly code named Comet Lake, for those in the know). With a 5.3GHz clock and some excellent cache efficiency, it’s a great choice for a high-end gaming rig for today’s games.

Useful Links to help you get more out of each core:

Intel® Implicit SPMD Program Compiler is a relatively new compiler which will help you develop more efficient SIMD code and pack more math into each cycle.  And don’t just take my word for it - check out what developers are doing with ISPC on Polystream

Intel® Graphics Performance Analyzers helps you make sure the GPU doesn't get in your way. 

Intel® VTune™ Profiler keeps an eye on efficiency in what you are developing, and lets you keep an eye on potential synchronization issues.

Inside the Intel and Creative Assembly* Collaboration  describes how we helped Creative Assembly* release award-winning titles in their Total War* series that take advantage of all the power on any given PC platform.