Mipmapping with ETC1 textures on Android (with a C++ and NDK example)

This post also known as “trade-offs make for long sentences”

When I ported my game engine to Android (and thus GLES 2.0), I switched from using PNG images to using the Ericsson Texture Compression format, ETC1. The advantages were pretty clear: significant speed up, slightly reduced file size, acceptable loss of quality, faster loading speeds and simpler code. Of course, when do we ever get all that without a trade-off? ETC1 doesn’t support alpha channels and, more crucially, OpenGL can’t generate mipmaps for compressed textures. My solution to this was to write a python script which resizes a PNG multiple times and packs the resultant ETC1 data into a single file. You can find the script and code at the end of the article, but I’ll just explain quickly what it’s doing; it’s always best to have a deeper understanding than just “oh, this tool did it for me”.

Here’s how my PNG loading was, and chances are loading from another format or library is pretty similar:

And here’s my mipmapped ETC1 system:

Clearly because the diagram is more green it is much better! Maybe “runtime” is the wrong word, I mean “when the player runs the game on their device” but there’s not enough space for it! What we’re doing is performing the slow scaling before runtime (i.e. before we even create the APK), then the fast bit is done at runtime. Probably my favourite thing about using ETC1 on Android is that OpenGL just accepts raw compressed ETC1 data; we can literally just extract the data into a temporary buffer and upload it to OpenGL for super-fast loading!

Here’s the quick python script I wrote, just install Python and the Python Image Library and run it like so:

python makemipmaps.py image.png

Note that you’ll want it to have access to etc1tool (normally found in android-sdk/tools/). I run it by putting it in the same directory as etc1tool, adding that directory to my path environment variable, then running a batch script wrapper like below.

@ECHO OFF

python “C:\Program Files (x86)\Android\android-sdk\tools\makemipmaps.py” %1

Obviously you’ll need to change the file name as appropriate, but it should let you run the python file from any directory, provided you put it in a directory that is in the path environment variable.

 

So now we have a file with the compressed and mipmapped ETC1 textures in, how do we load them? That’s actually pretty easy but you might need to make a few adjustments to my code. Also of note is that I’m using C++ with the Android NDK, but it should be pretty simple to port it to Java. The code is pretty well commented so it’s probably best if you just check it out! loadETC1.h

Let me know if any part of this didn’t work for you, or if I haven’t explained something very well!

As a follow up, here’s my game with and without mipmapping (click for a larger image)

Even with terrible JPG compression to get the image to ~100KB, the difference is massive! Since it’s doing less scaling at runtime, it’s also slightly faster. Whilst we do have a larger file size (we are storing more textures on disk), and we’re using more memory than not having mipmaps altogether, we should be using less memory than automatically generated mipmaps since our ETC1 mipmaps are compressed!

Optimisations

This post also known as: “Why I should listen to advice future me gave past me in an alternate timeline”

So this week, I finished a lot of little bits I’ve been working on lately, mostly to do with best practice and optimisation, so I figured I’d run through them. Hopefully there’s some OpenGL and mobile developers out there this might help :)

Speeding up OpenGL

I just implemented a draw stack for OpenGL and holy crap I should have done this months ago. Before I had a model class, which had functions to load and draw, so I could call something like:

shader->useShader();

//Setup shader uniforms, about 5 lines

model->draw(position, texture);

Spot any issues? The two major problems are that I have no control over when the model gets drawn, and that shaders and their uniforms have to be set by each draw (you changed a uniform name? Good luck changing every reference to it!). Also, if the shader needs anything extra (such as another texture for bump mapping), I’ve got to overload the draw function, which is just another pain to maintain. With my new fancy draw stack, I basically do this:

cDrawElement *myDE = new cDrawElement( SHADER_TYPE_LIT, &mCar, &tCar, &mRotationMatrix);

drawStack[activeDrawStack].push_back(myDE);

I’m now adding an instance of cDrawElement with shader type, model, texture and transform matrix to a list. If the shader will need any extra info, I can just give cDrawElement an extra variable and set it like myDE->specTex = x . The draw code itself is a little complex as it sorts everything before drawing and does some checks to avoid changing state. When you select a new shader to use, most devices have to wait for all operations using the old shader to finish, so it can end up being a pretty lengthy operation, sorting all draw calls by shader was a great speedup. I did also sort by texture, since binding textures can take a little while, but the sorting code actually took longer than the gain. That’s pretty common in optimisation – some things just have the impact you’re looking for, so try not to get attached to anything you write!

Bake and Divide

The second huge speedup was a load more complicated. Most of the map stays the same throughout playing the game, so the level editor now combines all the static parts of the map into one big model. Then, it precomputes and bakes lighting for each vert (so we can have as many static lights as we want!) and, finally, it splits all the triangles between 64 (8×8 square) sections of the track. Having a large map split into square sections is great, because when the player’s not looking at a section they don’t need to be drawn. The downside here is the additional draw calls, since we need one call per texture per section. Initially I had 16×16 sections, which is 256 sections with 3 or 4 textures each; even when I was only drawing a third, that’s still too many draw calls, which really slow things down. 8×8 sections seemed like a sweet spot in this case, but it does depend on the game. Binary space partitioning, which is similar but splits the map cleverly based on the density and geometry of triangles, instead of just location, would have been better, but I don’t think the improvement would justify the time taken to implement. Optimisations are important, but it’s easy to get carried away!

The actual game doesn’t need to compute lighting for each vertex any more, so that shader is now really fast – it basically reads position, vertex and colour from a VBO and blends the colour with the texture. There’s also a little global illumination code but that’s super fast too. This has a real impact when playing the game, just have a look at the crappy neon sign before and after pics, the glowing green effect really helps it fit in. Best of all, it’s free to have more lights, so I can use it everywhere to add some atmosphere :)

The second thing I baked was collisions. Being a racing game, it has a fairly linear collision mesh (i.e. just the track) so I set out to divide into distinct sections. I quickly thought of loads of methods to do this, but the best trade-off will use a number of nodes that are user-defined in the level editor. I’ll need to have nodes anyway, so the game knows where the track is, particularly for keeping score and guiding AI. Adding the nodes is a pain, but we end up with something like this:

The large red balls are the nodes and their size is their width. When baking the track, the level editor links together the nodes into lines, and it checks each collision vertex to see if it’s in any line, using the size of the nodes as width at each end. This lets us build up the entire track out of small sections, so instead of having 10k triangles to collide with, we now have less than a hundred in each of 130 section, and we can check collisions with a single section at a time. Actually we need to check the next and previous sections as well, just in case we’re on a boundary, but that’s still ~400 triangle checks instead of ~10k, a saving of 96%! Another awesome speed boost comes from the convex-ish nature of the sections; whereas before I had to find the closest collision since the track can loop over itself, I can now stop at the first collision. Unlike OpenGL, which does its own stuff parallel to our code, timing the increase here is easy and accurate. Collision checking for the player’s ship (4 collision checks per step), used to take ~0.0075s, but now takes ~0.0002s, or about 2% of the original time!

The speed up from these changes has been drastic, each frame takes way less time to perform physics and render. My test device is the beastly Nexus 7 so it was already running at full speed, but for a mobile developer it’s really important to squeeze every last drop of performance, both so it’ll run on weaker devices and to save battery, hopefully any future users will appreciate this :)