This post also known as: “Why I should listen to advice future me gave past me in an alternate timeline”

So this week, I finished a lot of little bits I’ve been working on lately, mostly to do with best practice and optimisation, so I figured I’d run through them. Hopefully there’s some OpenGL and mobile developers out there this might help :)

Speeding up OpenGL

I just implemented a draw stack for OpenGL and holy crap I should have done this months ago. Before I had a model class, which had functions to load and draw, so I could call something like:

shader->useShader();

//Setup shader uniforms, about 5 lines

model->draw(position, texture);

Spot any issues? The two major problems are that I have no control over when the model gets drawn, and that shaders and their uniforms have to be set by each draw (you changed a uniform name? Good luck changing every reference to it!). Also, if the shader needs anything extra (such as another texture for bump mapping), I’ve got to overload the draw function, which is just another pain to maintain. With my new fancy draw stack, I basically do this:

cDrawElement *myDE = new cDrawElement( SHADER_TYPE_LIT, &mCar, &tCar, &mRotationMatrix);

drawStack[activeDrawStack].push_back(myDE);

I’m now adding an instance of cDrawElement with shader type, model, texture and transform matrix to a list. If the shader will need any extra info, I can just give cDrawElement an extra variable and set it like myDE->specTex = x . The draw code itself is a little complex as it sorts everything before drawing and does some checks to avoid changing state. When you select a new shader to use, most devices have to wait for all operations using the old shader to finish, so it can end up being a pretty lengthy operation, sorting all draw calls by shader was a great speedup. I did also sort by texture, since binding textures can take a little while, but the sorting code actually took longer than the gain. That’s pretty common in optimisation – some things just have the impact you’re looking for, so try not to get attached to anything you write!

Bake and Divide

The second huge speedup was a load more complicated. Most of the map stays the same throughout playing the game, so the level editor now combines all the static parts of the map into one big model. Then, it precomputes and bakes lighting for each vert (so we can have as many static lights as we want!) and, finally, it splits all the triangles between 64 (8×8 square) sections of the track. Having a large map split into square sections is great, because when the player’s not looking at a section they don’t need to be drawn. The downside here is the additional draw calls, since we need one call per texture per section. Initially I had 16×16 sections, which is 256 sections with 3 or 4 textures each; even when I was only drawing a third, that’s still too many draw calls, which really slow things down. 8×8 sections seemed like a sweet spot in this case, but it does depend on the game. Binary space partitioning, which is similar but splits the map cleverly based on the density and geometry of triangles, instead of just location, would have been better, but I don’t think the improvement would justify the time taken to implement. Optimisations are important, but it’s easy to get carried away!

The actual game doesn’t need to compute lighting for each vertex any more, so that shader is now really fast – it basically reads position, vertex and colour from a VBO and blends the colour with the texture. There’s also a little global illumination code but that’s super fast too. This has a real impact when playing the game, just have a look at the crappy neon sign before and after pics, the glowing green effect really helps it fit in. Best of all, it’s free to have more lights, so I can use it everywhere to add some atmosphere :)

The second thing I baked was collisions. Being a racing game, it has a fairly linear collision mesh (i.e. just the track) so I set out to divide into distinct sections. I quickly thought of loads of methods to do this, but the best trade-off will use a number of nodes that are user-defined in the level editor. I’ll need to have nodes anyway, so the game knows where the track is, particularly for keeping score and guiding AI. Adding the nodes is a pain, but we end up with something like this:

The large red balls are the nodes and their size is their width. When baking the track, the level editor links together the nodes into lines, and it checks each collision vertex to see if it’s in any line, using the size of the nodes as width at each end. This lets us build up the entire track out of small sections, so instead of having 10k triangles to collide with, we now have less than a hundred in each of 130 section, and we can check collisions with a single section at a time. Actually we need to check the next and previous sections as well, just in case we’re on a boundary, but that’s still ~400 triangle checks instead of ~10k, a saving of 96%! Another awesome speed boost comes from the convex-ish nature of the sections; whereas before I had to find the closest collision since the track can loop over itself, I can now stop at the first collision. Unlike OpenGL, which does its own stuff parallel to our code, timing the increase here is easy and accurate. Collision checking for the player’s ship (4 collision checks per step), used to take ~0.0075s, but now takes ~0.0002s, or about 2% of the original time!

The speed up from these changes has been drastic, each frame takes way less time to perform physics and render. My test device is the beastly Nexus 7 so it was already running at full speed, but for a mobile developer it’s really important to squeeze every last drop of performance, both so it’ll run on weaker devices and to save battery, hopefully any future users will appreciate this :)