DirectX 12, what does it change?

The release of Windows 10 marks the first time where the broad PC gaming public will have access to a low-level, cross-vendor graphics API. Ever since AMD first presented Mantle in 2013, there’s been a lot of back and forth discussion on how significant the gains to be made by low-level APIs really are for games. Opinions range from considering it nothing less than a revolution in graphics processing, to little more than an overblown marketing campaign. This article aims to provide a level-headed outlook on what exactly DirectX12 will offer for gamers, in which situations, and when we will see these gains.

To explain not just the what, but also the why of it, I’ll detail the tradeoffs involved in various API design decisions, and the historical growth that led to the current state of the art. This will get technical. Very technical. If you are primarily interested in knowing how these changes will affect you as a gamer, and what you can expect from an upgrade to Windows 10 now and in the near future, then skip forward to the final section, which touches on the important points without the deep dive.

What this article is not about are the handful of new graphics hardware pipeline features exposed in DirectX 12. Every new release of the API adds support for a smattering of hardware features, and the fact that you can, for instance, implement order-independent transparency more effectively on DX12 feature-level graphics hardware is orthogonal to the high/low-level API tradeoff discussion. This separation is further supported by these features actually also being added to DX11.3 for developers who do not want to switch to DX12. The expectation is for such hardware features to become important in years to come, but have minimal impact on the first wave of Direct3D 12 games.

A brief history of DirectX and APIs

Before we get into the details of what changes in DirectX12, some basics need to be established. First of all, what is a 3D API, really? When the 3D accelerator hardware first appeared on the market, there was a need to provide an interface for programmers to leverage its capabilities—an Application Programming Interface, so to speak. While the vendor-specific Glide API developed by 3dfx Interactive managed to hold some ground for a while, it also became obvious that continued growth of the market would necessitate a hardware-agnostic, abstract interface. This interface was to be provided by OpenGL and DirectX.

GPUs back then were rather simple devices compared to now. In the end, the API only really needed to allow the developer to render textured (and perhaps lit) triangles to the screen, because that is what the hardware did. Hardware capabilities have evolved rapidly ever since, and APIs have been gradually adjusted to keep up, adding a slew of features here and entire pipeline stages there. That is not to say that there haven’t been significant API-side changes before—the switch completely away from immediate mode geometry generation for both OpenGL and DirectX was very significant, for example.

Nonetheless, high-level APIs on PC still operate on the basic principle of protecting programmers from worrying about hardware-level details. This can be a convenient feature, but sometimes fully understanding what is going on in hardware—which is the central principle of a low-level API—can be crucial for performance.

Why switch to low-level APIs?

Low-level APIs are a large change to 3D programming, and their introduction requires major design and engineering efforts on part of platform providers and hardware vendors, as well as game and middleware developers. For roughly two decades now, high-level APIs were ‘good enough’ on PC to push forward 3D rendering like no other platform has done. So why is this effort undertaken now? I believe the reason is a combination of several distinct developments, which, together, now outweigh the effort required to implement this change.

The Hardware Side

A primary driver are certainly hardware developments, both on the GPU, but perhaps even more so on the CPU side of things. It might be counter-intuitive to think of CPU hardware changes causing an upheaval in graphics APIs, but as the API serves as a bridge between a program running on the CPU and the rendering power of the GPU, it’s not really surprising.

As noted earlier, high level APIs were seemingly good enough for about two decades of graphics development, starting in the early 90s. The chart above illustrates the floating point performance of high-end desktop CPUs over that same time period, and should give you an idea why the CPU performance implications and parallelization of graphics APIs were not at the forefront of their designer’s thoughts in the 90s and early 00s. Blue depicts parallel performance, while purely sequential performance is shown in orange. Until 2004, the two are in lockstep, but more recently increases in sequential performance have slowed down considerably. Therefore, one important goal of changing up the API landscape is improving parallelization.

However, while the severe slowdown in CPU sequential performance growth is likely the most important driver for these API changes, modern GPUs are also fundamentally different from the devices that we used in the 90s. They perform arbitrary calculations on a wide variety of data, creating geometry on their own rather than just handling an input stream, and leverage parallelism on a staggering number of levels. All of these changes can (and have been) bolted on to existing APIs, but not without some increasingly severe semantic mismatch.

The Software Side

Although the hardware side of the equation is arguably the most important, and certainly the most commonly publicized, there is another set of reasons to make a clean break with existing graphics APIs, and it’s entirely related to software. Every PC gamer should be familiar with this sequence of events: a new highly-anticipated AAA game is released, and at roughly the same time both relevant GPU vendors release new drivers “optimized” for the title. These might be significantly faster, or perhaps even necessary for the game to work at all.

This is not because game developers or driver engineers are incompetent. It’s due to the sheer size that modern high-level graphics APIs have arrived at after years of incremental changes, and the even morestaggering complexity of graphics drivers which need to somehow wrestle these abstract APIs into a stream of instructions the GPU hardware can understand and effectively execute. Slimming down the driver responsibilities seems to be effective at improving their reliability—developers familiar with both claim that DX12 drivers are already in far better shape than DX11 ones were at a similar point in their development timeline, despite the greater changes imposed by the API.

Having such a monstrous driver code base is not just difficult for GPU vendors: it also makes it more challenging—sometimes even virtually impossible—for developers to diagnose bugs or performance issues on their own. Everything in the actual game code base can be investigated and reasoned about using conventional development tools, but only the hardware vendor can say for certain what goes on behind the API wall.

If the driver decides to spend a lot of time every few seconds rebuilding some data structure, introducing stutter, it might be almost impossible to reverse engineer which part of the game code—which is the only thing developers have control over—is responsible. As illustrated above, making the API and driver smaller and more lightweight by moving more responsibilities to the game code will allow developers to get a more complete picture of what is going on in all cases.

The Modern Game Development Ecosystem

Of course, moving all these responsibilities from the API and driver level to the game itself will increase the code size and development effort for the latter. In fact, a common argument in favor of higher-level interfaces in all areas of computing is shielding the programmer from dealing with all the complexities of each underlying system. However, the way most modern games are built works out in favor of low-level APIs in this regard.

Unlike during the 90s or early 00s, it’s very rare these days for any game to be based on a one-shot engine built directly on top of a given graphics APIs. Many large publishers have their own in-house engine teams dedicated purely to keeping their technology up to date, and even independent developers have plenty of high-quality, professionally maintained engines to choose from. As such, the increased programming complexity of low-level APIs will in many cases be absorbed by middleware developers—who have the resources and expertise to deal with these challenges—rather than hitting game developers directly.

Additionally, just because an API is not required to check the correctness of everything it is asked to do—which can be expensive in terms of performance—does not mean that it is not allowed to do so. All low-level APIs offer optional validation layers which should help mitigate developer’s increased responsibilities. Nonetheless, tool support always lags behind the introduction of any new technology, and it will take some time for tools to catch up to where they are for established APIs.

How do low-level APIs work?

Clearly, there are significant incentives from both the hardware and software perspective for a clean slate, low-level approach to graphics API design, and the current engine ecosystem seems to allow for it. But what exactly does “low-level” entail? DirectX12 brings a plethora of changes to the API landscape, but the primary efficiency gains which can be expected stem from three central topics:

Work creation and submission
Pipeline state management
Asynchronicity in memory and resource management

All these changes are connected to some extent, and work well together to minimize overhead, but for clarity we’ll try to look at each of them in turn isolated from the rest.

Work Creation and Submission

The ultimate goal of a graphics API is for the program using it to assemble a set of instructions that the GPU can follow, usually called a command list. In higher-level APIs, this is a rather abstract process, while it is performed much more directly in DirectX 12. The following figure compares the approaches of a ‘classical’ high-level API (DirectX 9), with the first attempt at making parallelization possible in DirectX 11 and finally the more direct process in DirectX12.

In the good old days of annually increasing processor frequencies, what Direct3D 9 offered was just fine: the application uses a single context to submit draw calls, which the API checks and forwards to the driver, which in turn generates a command list for the GPU to consume. All of this is a sequential process, so in the age of consumer multicore chips a first attempt was made to allow for parallelization.

It manifested in the Direct3D 11 concept of deferred contexts, which allow games to submit draw commands independently in multiple threads. However, all of them still ultimately need to pass through a single immediate context, and the driver can only complete the command list once this final pass has happened. Thus, the D3D11 approach allows for some degree of parallelism, but still causes a heavy load imbalance on the thread executing the tail end of the process.

Direct3D 12 cuts out the metaphorical middle man. Game code can directly generate an arbitrary number of command lists, in parallel. It also controls when these are submitted to the GPU, which can happen with far less overhead than in previous concepts as they are already much closer to a format the hardware can consume directly. This does come with additional responsibilities for the game code, such as ensuring the availability of all resources used in a given command list during the entirety of its execution, but should in turn allow for much better parallel load balancing.

Pipeline State Management

The graphics or rendering pipeline consists of several interconnected steps, such as pixel shading or rasterization, which work together to render a given 3D scene. Managing the state of various components in this pipeline is an important task in any game, as this state will directly influence how each drawing operation is performed and thus the final output image.

Most high-level APIs provide an abstract, logical view of this state, divided into several categories which can be independently set. This provides a convenient mental image of the rendering process, but does not necessarily lend itself to high draw call throughput with low overhead.

The figure above, while greatly simplified, illustrates the basic issue. Many independent high-level state descriptions can exist, and at the start of any draw call any subset of them may be active. As such, the driver has to build the actual hardware representation of this combination of states—and maybe even check its validity—before it can start actually performing the draw call. While caching may alleviate this issue, it means moving complexity to the driver and running the risk of unpredictable and hard to debug behavior from the application perspective, as discussed earlier.

In DX12, state information is instead gathered in pipeline state objects. These are immutable once constructed, and thus allow the driver to check them and build their hardware representation only once when they are created. Using them is then ideally a simple matter of just copying the relevant description directly to the right hardware memory location before starting to draw.

Asynchronicity in Memory and Resource Management

Direct3D 12 will allow—and force—game developers to manually manage all resources used by their games more directly than before. While higher level APIs offer convenient views to resources such as textures, and may hide others like the GPU-native representation of command lists or storage for some types of state from developers completely, in D3D12 all of these can and must be managed by the developers.

This not only means direct control over which resources reside in which memory at all times, but also makes game programmers responsible for ensuring that all data is where the GPU needs it to be when it is accessed. The GPU and CPU have always acted independently from each other in an asynchronous manner, but the potential problems (e.g. so-called pipeline hazards) arising from this asynchronicity were handled by the driver in higher-level APIs.

To give you a better idea why this can be a significant performance and resource drain, the figure above illustrates a simple example of such a hazard occurring. From the CPU side, the situation is clear-cut: a draw call X utilizing two resources A and B is performed, and later on resource B is modified. However, due to the asynchronous nature of CPU and GPU processing, the GPU may only start actually executing the draw call in question significantly after the game code on the CPU initially performed it, and after B has already changed. In these situations, drivers for high-level APIs could in the worst case be forced to create a full shadow copy of the required resources during the initial call. In D3D12, game developers are tasked with ensuring that such situations do not occur.

In Summary

Tying it all together, the changes discussed above should result in a lean API which has the potential to aid in implementing games that perform both better and, perhaps even more importantly, with improved consistency.

Application-level control and assembly of command lists allows for well-balanced parallel work distribution and thus better utilization of multicore CPUs. Managing pipeline state in a way which more closely mirrors actual hardware requirements will reduce overall CPU load and in particular result in higher draw count throughput. And manual resource management, while complex, will allow game developers to fully understand all the data movement in their game. This insight should make it easier for them to craft a consistent experience and perhaps ultimately eliminate loading/streaming-related stutter completely.

What does this mean for gamers?

Given all the clear technical advantages outlined on the previous page, you might get the impression that current high-level APIs are a massive waste of resources, and that DirectX 12 and similar low-level APIs will immediately boost PC game performance to entirely new heights. While it is certainly possible to construct scenarios where that is exactly what will happen, in many common cases the benefits could be minor or even unnoticeable.

The most significant improvements to be expected from these changes are primarily reducing CPU overhead and improving parallelization. In turn, this means that performance in situations which were not CPU limited with previous APIs will not change much. In many games and genres, particularly on PCs with fast CPUs, the most noticeable effect could well be a decrease in CPU usage and power consumption.

That is not to say there is nothing to get excited about from a gamer’s perspective. While the average cinematic third person shooter on a decent desktop CPU will remain mostly unaffected, there are plenty of use cases which should benefit.

Detailed, highly interactive open world games or large-scale strategy titles with lots of moving parts are commonly CPU-limited in at least some scenarios even on higher-end systems.

High-framerate gaming is becoming more and more popular, and even if you have the requisite GPU power and monitor technology to push 144 FPS, in many games you will run into CPU overhead limitations long before that.
An interesting and often overlooked case is emulation—particularly projects like Xenia which seek to emulate more recent systems should benefit greatly from lower-level access allowing them to more directly model the behavior of the emulated hardware.
Somewhat similarly, it might benefit console porting efforts to have a target available on PC which is closer in terms of API abstraction to their original platform.
Finally, rendering for virtual reality has extremely stringent frame latency and consistency requirements which will be better served by a low-level API.

In addition to these situations in which a clear overall performance benefit can be expected, performance consistency should improve in almost all games. As developer Dan Baker of Oxide Games put it when we talked about DX12, “We also see huge benefits in frame consistency, to the point where we believe it is possible to ‘never hitch’ in D3D12 if we are appropriately mastering the API.”

I consider that a very exciting prospect, especially given the stuttering issues which plague some recent high-profile games.

Conclusion

With DirectX 12 on Windows 10, low-level hardware-agnostic graphics access is now a reality on PC—and it won’t be long before it is joined by Vulkan, an alternative which isn’t bound to Microsoft’s platforms. With a set of significant changes to how programs interact with 3D hardware, these APIs offer an opportunity to create games which scale better to higher degrees of hardware parallelism, and offer more consistent performance. Reducing the size of the API and the responsibilities of drivers should also eliminate AAA titles’ reliance on day-one driver updates for compatibility and performance, and make the PC an even more stable and versatile gaming platform overall. And to top it off, the vast majority of these advantages are independent of new hardware features and can be leveraged with most existing DX11 GPUs.

Does this mean that you should immediately drop everything and upgrade to Windows 10, expecting large improvements for gaming? Not really. The first games offering DX12 support are expected to arrive during the holiday season, and those will very likely still be made with also supporting high-level APIs in mind. Furthermore, how the performance impact of low-level APIs truly plays out will always depend on many factors, including CPU performance, the type of workload a game features, and the target framerate.

However, what DX12 and other low-level graphics APIs provide most of all is a long-term perspective of increased performance, better support for current and future hardware trends, more predictable and consistent frame delivery, and more solid software engineering in drivers and games alike.

This article benefited greatly from information provided by Dan Baker of Oxide Games, who has worked with low level APIs on PC since Mantle was created, and is an early adopter of DX12. If you spot any technical errors, they are most certainly mine and not his.

Source: PC Gamer

0 items - $0.00 in cart

DirectX 12, what does it change?

A brief history of DirectX and APIs