|
Architecture of a 3D Software Stack for Peak PentiumŪ III Processor Performance (continued)
3D Application Layer Optimizations
Optimizing a 3D application stack starts at the very top, with the application code and the content itself. The structure of the content (type of primitive, number of vertices per scene, amount of textures, etc.) has a huge impact on the performance of an application. The manner with which this content is presented to the 3D library layer is also very important. The scene manager used in our study implements a few key optimizations that generate significant benefits. Render State Sorting In order to convert the scene manager's 3D models into pictures on the screen, they must maintain a render state. The render state is a collection of information that tells the geometry engine and the graphics controller how to process incoming information (and display it on the screen). A render state contains information ranging from various transform matrix values to texture map addresses, which should be selected by the graphics card for rasterization7. Switching any portion of the render state, within a standard 3D library, is very expensive for the application. The first cost associated with render state changes is in the 3D library itself. For the most part, a call into the 3D library to alter the current state involves a large number of processor cycles. Some of this time is spent in the library routine validating the new state and manipulating state variables. Another chunk of time goes to the device driver, where it typically goes through its own process of validation and setup. The second cost of frequent render state changes is exhibited by the 3D graphics card. As graphics card frequencies and performance increase, so do the depths of their rasterization pipelines. It is typically necessary to flush the raster pipe of existing primitives and only restart after this has completed. This flushing leads to excessive bubbles in the rasterization pipeline and a less than effective utilization of precious pixel fill and triangle setup rate bandwidth. By identifying the types of render states utilized and grouping primitives by distinct state setting at the time of the creation of the model/scene graph, our application was able to eliminate much of this overhead. Object-Level Clipping There are various ways a 3D application can avoid processing non-visible geometry. Our application uses bounding boxes around portions of the scene being processed to trivially accept or reject the primitives for further processing. The scene manager compares the points, which define the corners of the bounding box, with the dimensions of the viewport. There are three possible results of this comparison:
Figure 10: Application-level performance difference with bounding boxes turned on Using a bounding box with eight corner points for this comparison can save much processing time. The comparison involves transforming all the vertices for the primitive from their own object space into world or screen space. With a bounding box, only the eight corner points need to be checked. To check an individual point against an arbitrary bounding plane requires a minimum of four multiplication and three addition operations. These operations need to happen for the top, bottom, right, left, front, and back planes on the viewing frustum (a total of six planes). This becomes a total of 24 multiplication and 18 addition operations per point to account for each plane [3]. Implementing bounding boxes around 3D models minimizes the amount of computation necessary to trivially accept or reject vertices for further processing. Examples of the application-level performance impact of this optimization can be seen in Figure 10. This chart demonstrates a significant, application-level performance impact with bounding boxes enabled for object-level clipping. This effect was measured by turning the feature on or off within the application on three different scenes of varying geometric complexity. (Scene 1 contains the greatest number of vertices and Scene 3 the least.) During our study on bounding box usage, we discovered that you can actually use too many bounding boxes around elements of a scene. The minimum number of vertices to include in a bounding box depends on many different factors and should be experimented with to determine what will work most effectively in any particular 3D software stack. Implementing render state sorting and object-level clipping in our application layer has the potential to significantly boost the performance of the optimized ArchGE engine. The object-level clipping shows a 16%-35% application-level performance increase, and it brings ArchGE closer to peak PentiumŪ III processor performance by at least that much. 7 Rasterization is the process by which a graphics controller converts geometry information into pixel position and color information on a computer monitor. 8 For more information regarding clipping primitives, please see [5] and [6]. |