GSoC 2012 is now over, and the graphics refactor code is ready for merging. During the last two weeks I’ve been in overdrive to ensure my work is complete and ready for inclusion in Stellarium.
I wrote the GL1 renderer backend, a new texture API, and modular shaders. I also integrated the solar system shadow code written by Jörg Müller in his GSoC project, merged new changes from Stellarium trunk and ported them to the new graphics API. Finally, I removed remaining direct GL calls, isolating it completely behind the new API.
The refactor involved large changes to Stellarium core and broke compatibility with Stellarium plugins using old graphics code or direct GL (which is pretty much all plugins). Breaking compatibility is needed from time to time to avoid the Java syndrome, but requires anyone depending on the broken API to update their code. It should be done as rarely as possible, or with long - but not infinite (Java) - deprecation periods. Therefore it was important that the refactor would be complete, merged once, without requiring more breaking changes to be made in foreseeable future.
During the project I wrote a new implementation for textures, but it was still used through the old API (not exposing the new features). Another problem was shader projection, which would have to be done manually by drawing code without some API changes. I also still had some code that was not yet ported, in particular StelPainter initialization. Keeping parts of the old graphics API would be a bad idea as new code might depend on it, causing more API breakage down the line.
At first I focused on removing remaining direct GL calls and old graphics code (StelPainter, StelVertexArray). That was easier than expected as StelPainter state was no longer used and the new graphics code worked differently, so removing StelPainter didn’t break it. I removed all direct GL calls, but OctahedronPolygon depends on the GLUES library which in turn depends on GL includes. However, it seems GLUES only uses GL defines and enums, and does not actually call GL functions.
Then I completed the API changes. The shader projection implementation only exists for the stereographic projector, but at least the breaking changes are over and hopefully no more will be needed (at least until another refactor is needed in 10 years :) ).
Once I was done with that, I started to merge new changes from trunk so my branch could be merged back. I planned to do this every week, but after a few weeks I forgot about it and my branch diverged significantly. Some new changes still used the old graphics API and direct GL, so I had to port that (Exoplanets and Observability plugins, night mode changes).
Then I merged the solar system shadow code. This went without problems; the only API change needed were float textures needed by the shadow shader to store planet positions.
At 18.8, the code became synchronized with trunk and ready to merge. Currently, trunk is frozen before release (so the GSoC changes can be moved to the next major release), which will happen at 26.8. . I will continue to maintain the branch until it can be merged, and fix any bugs uncovered by testing by other Stellarium project members. I also have some optimizations I want to try out. On longer term I plan to maintain the Renderer subsystem, fixing bugs and making sure API additions are as general as possible, to avoid bloat; but that won’t be full time and I won’t be able to do much work during the university year.
Stellarium needs a GL1 renderer backend for old machines and buggy drivers. GL1 fixed function pipeline is easier to use than GL2 shaders and all it took was a straightforward rewrite of the GL2 backend into GL1 code. Also, a lot of GL2 backend code could be shared with GL1 (especially state setup/reset), so that was moved to StelQGLRenderer.
The GL1 backend turned out to be much faster than the GL2 backend; almost as fast as current Stellarium trunk, sometimes faster (e.g. planet drawing). This is unexpected as the code is identical, only shader-less, and I expected it to be slower than GL2. That said, this is good news for exactly those systems that need the GL1 renderer.
Right now, the GL2 backend is default, and GL1 is a fallback used when GL2 fails to initialize. It might happen that on older GPUs/drivers that support GL2 the GL2 backend might initialize correctly but have issues (e.g. shaders too complex for old GPUs, or just low performance). If this happens, we can blacklist some older drivers/GPUs to automatically fail the GL2 initialization.
An existing example of such blacklisting is the newly added float texture support. Floating point textures are needed for solar system shadows, but not supported by open source graphics drivers because of patents (yes, 2D arrays of RGBA values that are floats instead of bytes are patented, don’t ask me why). A StelRenderer function was added to detect float texture support so we can disable solar system shadows. It requires GL 3.0 support (which includes float textures), but always returns false for open source drivers.
The most significant performance improvement was a rewrite of the internal StelQGLRenderer::makeGLContextCurrent() member function, which previously switched context even if Stellarium’s GL context was already being used. This caused great slowdown that was not measurable by the profiler. Now the context is only switched if a different context is being used. This helped most in situations with many separated textured draws, which sometimes had visible lag before.
Pulsars and Quasars plugins made many redundant draw calls drawing empty text. I’m not sure why this was done but it seemed like a hack to prevent GL state set by previous draw calls from affecting Qt paint engine. As the GL state is now contained within draw calls, these calls were removed.
Rectangle drawing always created and destroyed a vertex buffer. I moved it to StelQGLRenderer so it can reuse the same buffer. More optimization is possible (custom draw code specific to rectangles, a small array optimization on vertex buffer backend, batching of rectangle draws).
Vertex buffer backend originally created for testing (StelTestQGL2VertexBufferBackend) was renamed to StelQGLArrayVertexBufferBackend, with GL1 and GL2 specific code moved to StelQGL1 and StelQGL2 derived classes. This vertex buffer backend still uses vertex arrays. The GL2 version should eventually be replaced by VBOs, or, if Qt3D gets into Qt, Qt3D classes. However, this is unlikely to improve performance and might actually cause slowdown without large changes to some other Stellarium code.
Right now, most graphics data is still generated or modified each frame and explicitly uploading it to the GPU each frame would hardly bring any speedup. So before a considering a VBO backend, most per-frame work like projection should be moved to the GPU or removed. The main problem is StelSkyDrawer which regenerates all star sprites each frame. If the main StelSkyDrawer users, e.g. ZoneArray, are rewritten so that they store the stars in vertex buffers that are initialized once and never modified, performance should increase despite drawing more vertices per frame and the VBO backend (which should bring massive speedup) should become viable. This is not a simple project, however, so it should probably be a long-term goal.
As I wrote previously, I reimplemented textures so that they could have renderer-specific backends, but due to the scale of the changes required, I wrapped the new implementation in the old (StelTexture, StelTextureMgr) API which worked differently and didn’t expose all functionality of the new implementation.
I now replaced StelTexture with a new interface class, StelTextureNew. I didn’t keep the name StelTexture to avoid silently breaking older code as some member functions have identical names. StelTextureNew is constructed by StelRenderer and wraps StelTextureBackend provided by the renderer backend. Having separate backend and interface class allows transparent texture caching on the backend; if the user deletes a StelTextureNew with a backend stored in a cache, the cache can decide whether to delete the backend or just decrease its reference count.
As I explained before, the new texture code defines states the texture can be in - Uninitialized, Loading, Loaded and Error. Texture creation never fails but the created texture can be in Error state (e.g. if file reading failed) I wanted to simplify this so it would not be necessary to check for these states. I changed the renderer so that if a texture is in a state different than Loaded, it can still be used(bound), but a placeholder texture will be used instead. Now, even if loading fails, code using the texture can still run, and the error is easy to spot as the a generic checkers texture will be seen instead.
The placeholder texture is also used during loading. This can be seen with planets, which load textures asynchronously. When first zooming to see a planet, the placeholder texture can be seen for a brief moment. Previously, the planet would simply not be drawn during this moment. This can be reverted if undesirable, but I think it’s better to see something than nothing, and seeing a sphere with a checkers texture (instead of nothing) makes it obvious that texture loading is stuck, making it easier to fix.
Another addition is texture creation from QImage and raw data. Both are supported to allow in-memory generation of texture data. QImage is supported for convenience, but as QImage doesn’t support every possible texture format (in particular, floating point textures), a raw data loading function was added. This is also necessary for solar system shadows.
After these changes StelTextureMgr was removed as it is no longer used (StelRenderer now creates textures).
Projection is one of the main blockers for VBO viability. Most vertices are projected on CPU every frame. This can’t be done with the fixed function pipeline, vertex shaders can do it on the GPU, removing the need to modify and reupload vertex data each frame (some code still needs to do this, e.g. StelSkyDrawer, hence this is not the only VBO blocker).
Previously, this would only be possible with custom shaders separate for each case (e.g. planets, point light sources, etc). This would require a great amount of duplicate work so I wanted to do this internally.
I knew that modular shaders can be done with GLSL, but I had no idea about how to do it. Googling didn’t help much - there were various threads about something not working, but I found no detailed information about how it works and no in-depth tutorials. Some people seem to preprocess shaders, generating one large shader for each combination of modules. I tried to do this but it turned out to be too hacky. I ended up trying to use multiple shader sources with separate function declarations/definitions, like C. Fortunately it ended up working just like C.
GLSL has a C-like compilation model. Shader sources can be compiled separately as long as they declare functions they use, even if they don’t contain their definitions. These definitions are only required once the shaders are linked to form a complete program that can run on the GPU. Each shader source just needs to declare the uniforms and functions it uses.
It would be inconvenient to specify shaders to link every time we want to change a function, and I would also need to rewrite the shader API I already wrote, so I turned that into an implementation detail. The user can add a shader with a name, and, using that name, enable or disable it. User-specified shaders will work with shader projection without work on the user’s part - the projector just disables the default (built in dummy) projector shader and enables its own version. Any linking happens behind the scenes. As re-linking each shader on each draw call would be too slow (I tried it; ~0.1 FPS), shader programs are cached so no combination of shaders needs to be linked more than once. All the user specified shaders need to do is declare and call a project() function.
Re-linking/exchanging programs caused a new problem: as StelGLSLShader bind()/release() functions no longer directly bind/release the underlying program, uniform passing was broken. Uniforms can only be passed once the final program is bound, but this only happens in the middle of a draw call. To work around this I needed to delay passing of uniform variables. Due to speed requirements I ended up using a fixed-size array inside the shader backend; uniforms set by StelQGLGLSLShader::setUniformValue() are stored as raw data in an array of bytes and this data is interpreted and passed just before drawing. The only drawback of this is that the total size of uniforms is limited; currently 512 bytes, but it can be expanded if needed (an assert will trigger in case of overlow, with a message saying that the limit can be expanded). Currently, no shader (including solar system shadows) uses more than 256 bytes of uniform data.