I’m now porting remaining direct GL and StelPainter code to the Renderer API. This again required extending the API, mostly to work with the stencil and depth buffer and to generate 3D shapes such as spheres and cylinders.
Some code in src/core/modules needs to draw various 3D shapes. Various types of Landscapes need to draw the surrounding landscape and fog on the interior faces of a sphere or a cylinder, planets need to be drawn as spheres and so on.
Previously, these were usually generated from scratch every frame using static vertex arrays that were reused (but completely rewritten) every frame. StelVertexBuffer, however, is designed to work with any kind of storage (e.g. VBOs), so this kind of optimized generation on every frame is not viable without adding special functions such as drawSphere, drawCylinder and so on to StelRenderer.
I didn’t want to make StelRenderer bloated and force every backend to implement such a huge amount of code so I rewrote shape generation on top of StelRenderer and StelVertexBuffer instead. Building such vertex buffers would not efficient as directly working on arrays, so it would lead to slowdown if this geometry was regenerated every frame. So I concentrated on generating the data only once, or only when needed.
To organize the generation code, I added a new class, StelGeometryBuilder. Right now, this class only acts as a namespace and has no data members, but the functions are non-static so they might use StelGeometryBuilder for variables shared between subfunctions.
StelGeometryBuilder has member functions such as buildCylinder or buildRing. The general idea is that the user manages vertex (and sometimes index) buffers and provides them to the generator functions. This allows the user to reuse the buffers.
With spheres (generated in multiple ways - lit, unlit, and fisheye-mapped), this was to be too inconvenient due to a huge number of generation parameters and a need to provide a particular number of index buffers (for stacks forming the sphere, each of which is a separate triangle strip). To simplify work with spheres, another class was added: StelGeometrySphere , which encapsulates a generated sphere. This is also constructed by StelGeometryBuilder, by one of three functions: buildSphereUnlit, buildSphereLit and buildSphereFisheye. However, geometry itself is only generated on demand when the sphere is first drawn (using the draw member function). Vertex and index buffers of the sphere are inaccessible to the user, but to reuse the buffers, sphere parameters (such as geometry detail, i.e. how many faces the sphere has) can be changed, which will trigger the sphere to regenerate (using the same buffers) at next draw call.
As the sphere class turned out to be very convenient, it might be a good idea to wrap other possible generated shapes in such classes, where generation is used heavily.
A few examples on how geometry is generated with StelGeometryBuilder follow.
Generating a cylinder (without the top and bottom caps)
// Note that this would normally be divided into some initialization function, // drawing and deinitialization function. // VertexP3T2 is a generic vertex type that can be included through a header StelVertexBuffer<VertexP3T2>* cylinder = renderer->createVertexBuffer<VertexP3T2>(PrimitiveType_TriangleStrip); // Build a cylinder, writing vertex data to specified vertex buffer. // The cylinder has a radius of 5, height of 15, and is composed of 64 slices. StelGeometryBuilder().buildCylinder(cylinder, 5.0f, 15.0f, 64); renderer->drawVertexBuffer(fogCylinderBuffer, NULL, projector); delete cylinder;
Generating a sphere (unlit sphere in this case)
// Note that this would normally be divided into some initialization function, // drawing and deinitialization function. // Radius 20, 40 rows and 40 columns forming the sphere, // and a bit oblate. const SphereParams params = SphereParams(20.0f).resolution(40, 40) .oneMinusOblateness(0.9); StelGeometrySphere* sphere = StelGeometryBuilder().buildSphereUnlit(params); sphere->draw(renderer, projector); // Decrease the resolution sphere->setResolution(20, 20); // The sphere is regenerated as the parameters have changed, and drawn with lower resolution. sphere->draw(renderer, projector); delete sphere;
Previously, many generated draws didn’t use index buffers, increasing vertex count by a factor of 2 or more. I rewrote them to use index buffers where possible. Due to the unfortunate fact that we’re projecting on the CPU, this initially slowed down the code as most of the vertex buffer was projected for each index buffer used in drawing. This was particularly bad with spheres, which now use many index buffers with a single vertex buffer.
I optimized vertex projection for cases when index buffers are relatively small to only project the vertices used. However, a much better solution would be to track when a StelProjector is changed (so a single vertex buffer can be projected once per frame, regardless of how many draws it was involved in), and even better, GPU based projection with GLSL.
The worst case of geometry generation was Planet, which regenerated spheres used to draw the planets on each frame. It was easy to avoid this when lighting was disabled (e.g. Sun), as the sphere generation always had the same parameters except for subdivision - and generating the sphere only once would outweigh generating it every frame with potentially lower detail, as the generation was more expensive than the on-CPU projection (which is the other major piece of code run on every vertex every frame).
A bigger problem was generating a lit planet, which baked the lighting into vertex colors. The lighting generation depends on StelProjector, which might change between frames. However, the lighting code was very simple and only depended on a few variables. This made it easy to implement it in shaders, passing these parameters as uniforms.
Shader API and vertex buffer backend were modified so a vertex attribute storing vertex position before projection can be added if requested - this is needed for lighting and might be useful for other shaders. Currently, the shader lighting code is an exact copy of the lighting done on CPU when generating a lit sphere, but with shaders it might also be viable to implement more advanced lighting without significant performance penalty.
I continued to port the code based on direct OpenGL and StelPainter to the new Renderer API. I’m now done with the code in the src/core/modules directory and am now working on the plugins included in the Stellarium repository (plugins directory). This was the code that depended on geometry generation detailed above.
I also noticed a few bugs in my code while porting, in particular, the Oculars plugin was broken for a while due to a culling bug. These have been fixed.
Code in src/core/modules/Planet.cpp uses some depth and stencil buffer tricks to draw planet rings and lunar eclipses. As StelRenderer should support any backend, it would not be a good idea to expose the full power of depth, and stencil buffer, as some backends might have problems implementing that. Instead I chose to enumerate the various ways we need to use the depth and stencil buffer. For example, the DepthTest enum allows to disable depth test completely, do read-only depth test (draws don’t write their own depth values) or read-write depth test.
Stencil test can be disabled, set so that draws write the value 1 to the stencil buffer for pixels drawn, and it can be set so that only pixels with value 1 in the stencil buffer can be drawn to.
Enumerating the ways we want to use stencil/depth buffer might seem unelegant and greatly reduces the power of OpenGL. New depth/stencil effects might require expanding the enumerations with more “modes”. However, this is an advantage for backend implementors as it is easier to support a finite number of ‘tricks’ than a whole range of functions and their combinations (for example glStencilFunc, glStencilOp and the possibiities they cover).
OpenGL behaves as a state machine, and any state set (say, stencil test) that is not reset might unintentionally affect following draw calls. This is somewhat OK if it is only Stellarium code that does the drawing, as we can find fix the bug. However, Qt on some machines uses an OpenGL backend for drawing which might act in unpredictable ways. From time to time stray state can mess up Qt drawing. In order to avoid this, such state is now set at the start of a draw call (drawVertexBuffer), and reset afterwards. This might be changed if it measurably decreases performance, but it is quite advantageous to control what GL state does a draw call leave.
I’m now close to moving all user code to draw with the Renderer subsystem. Once that is done, I can remove StelPainter, its initialization and global variables. Once that is done the GL code will be completely isolated.
The graphics refactor is not the only graphics related project in this GSoC: Jörg Müller is working on solar system shadows which should be built on top of the refactor. The Renderer API might be modified for this purpose if needed.
It’s been a while since the last time I pulled back new changes from the trunk, so I’m also going to work on that. Finally, the OpenGL 1 backend work has not even started. This is not as large a problem as it might seem, though, since most of the code is shared (in StelQGLRenderer, which is the parent of StelQGL2Renderer and the not-yet-existing StelQGL1Renderer).
As a whole, I’m behind schedule I set at the beginning. I’m still on track to get the refactor done and ready for merging, but any serious optimization work might wait till after the end of GSoC - I intend to continue working on it (although once the semester starts I will have much less time). The backend is still very similar to previous code, e.g. vertex arrays are still being used. However, now that the code is isolated, changes like replacing vertex arrays with VBOs or Qt3D classes can be done in one location (in this case, vertex buffer backend) instead of all over the code.
Currently, performance of the refactored code is significantly worse than Stellarium trunk. Release build on my machine is running with only half the FPS compared to trunk. This is expected - I did very little optimization work so far. It is likely that a bit of work with perf/oprofile/callgrind will considerably improve performance.
However, I expect that even after optimization the new code is going to be somewhat slower until the main graphics problem is solved - which is that most graphics data is generated or processed from scratch on every frame. (The most significant problems are vertex projection, which could be moved to the GPU using shaders, and point source drawing, which might be partially cachable) The main reason for this slowdown is that the new API doesn’t allow direct manipulation of vertex data in memory - as it might be stored elsewhere, e.g. VRAM.