Amid all the fanfare and hiss at NVIDIA’s virtual GPU Technology Conference (GTC) this week, the California AI and Gaming platform powerhouse finally announced its next-gen PC Gaming graphics cards based on its Ada Lovelace GPU architecture. Named after an English mathematician and computer pioneer, NVIDIA’s Lovelace is indeed a beast of silicon with a more-of-everything design approach, built on a state-of-the-art TSMC 4N chip fabrication process. However, the basic chip architecture has also been designed with new innovations in the various silicon engines, in an effort to scale performance beyond the constraints of Moore’s law, in which the transistor density reaches a point of ever-decreasing yields with each new fab node the seems.
GeForce RTX 4090 and 4080 Brute-Force Silicon Enhancements
Indeed, there is little doubt that NVIDIA’s Lovelace GPU is much more powerful than the previous-generation Ampere architecture, and in fact the new GeForce RTX 4090 has 16,384 CUDA cores and 24 GB of GDDR6X memory, versus 10,752 CUDA cores (the same memory). ) in an RTX 3090. Although, the new GeForce RTX 4080 has 12GB 7,680 CUDA cores, versus an RTX 3080 by 8960, while an RTX 4080 16GB card has 9,728 cores, which is less than an RTX 3080 Ti at 10,240 CUDA. cores. These RTX 4080 series specs and model branding might be a sticking point for some core counting only, but performance scaling just isn’t linear here, especially when you consider that these new GeForce RTX 40 series cards have boost clocks north of 2.5. GHz, while the previous generation came in at 1.75 GHz.
In addition to these core counts, speeds and feeds, there are several new improvements and innovations that NVIDIA is pointing to for performance gains from Ada Lovelace, and ultimately which will lead to new levels of image fidelity and immersion for gamers, the most important of which is the new Ray Tracing core. innovations, as well as 4th Gen Tensor cores now claimed to deliver more than 2x the TFLOP throughput. In addition, Lovelace will also support AV1 video encoding/decoding in hardware, much like Intel’s Arc series, which should be a boon to game streaming performance with much lower overhead at some point in the future.
Shader Execution Reordering Innovation Improves Ray Tracing Performance and Delivers Better RT Effects
Ray Tracing (RT) is a graphical rendering technique for lighting and reflection effects with much higher and more accurate visual fidelity than traditional screening, although it also has a much higher computational overhead. Before the advent of ray tracing, traditional screening was a very orderly, deterministic process. RT does not allow for this natural coherence and therefore parts of a 3D rendered scene cannot be rendered simultaneously, resulting in pipeline freezes.
This problem severely limits the ray-trace effects in modern game engines. However, NVIDIA’s Ada Lovelace GPU arc supports a new technique called Shader Execution Reordering (SER), which adds a stage to the RT pipeline that makes batches and reorders work so beams executing the same program can work together more efficiently (see above image). ).
NVIDIA claims that SER can provide up to 2x percent improvement in RT rendering performance, specifically highlighting a new version of the Cyberpunk 2077 game with new higher levels of RT effects, including an Overdrive mode that allows 635 RT operations per pixel ( above) for great images. Finally, it should be noted that game developers should consult with NVIDIA about the best RT workload optimization and sorting methods, so NVIDIA has an API available for developers to optimize their game engines and rendering techniques with this feature.
DLSS 3 image upscaling and the AI supercomputer behind your gaming experience
NVIDIA’s DLSS or Deep Learning Super-Sampling technology is a performance recovery technique that has delivered nice performance gains for gamers who want to either dial in visual fidelity with ray tracing, or increase FPS (Frames Per Second) for higher resolution gameplay on GeForce cards. . The technology uses machine learning to render higher-resolution frames derived from pre-trained models in NVIDIA’s data centers, while allowing the rest of the graphics pipeline to run at a lower resolution for higher performance and lower latency. but with a similar image quality to the native higher resolution image. While AMD and Intel also have competing scaling techniques (FSR and XeSS), DLSS is now in its third iteration and has been well received and implemented by game developers, with 200 game titles and apps currently using the technology.
Where NVIDIA’s new DLSS 3 (supported only on RTX 40 series cards) differs from the previous generation DLSS 2 is that the architecture has become so fast with NVIDIA’s new architecture that the GPU can now generate full frames in real time for much higher performance, while maintaining excellent image quality. With DLSS 3, NVIDIA gave examples of AI rendering half of the frames in a sequence, and 7 of the 8 pixels, with both the upscaling and frame generation combined.
Without going too deep into the weeds, GeForce RTX 40 cards achieve this in part thanks to a much faster optical flow accelerator that calculates the movement of pixels in a scene. This accelerator understands how to properly render lighting and shadows as an object moves, then feeds all that information into the Tensor AI engines on the chip (above diagram), to make a decision about how best to frame the frame. are generated. The technology can also help improve performance in game engines that tend to be more CPU-bound as well, through this multi-frame generation technique.
NVIDIA showed Cyberpunk and Microsoft Flight Simulator demos with the technology, for impressive 2X performance improvements with great visual fidelity. NVIDIA will also have a streamlined DLSS 3 AI plugin for easier integration of game developers, and both the Unity and Unreal game engines will natively support the technology as well. Furthermore, the company noted that, in addition to Cyberpunk and MS Flight Sim, there will be 35 game titles at launch that will support DLSS 3, with more to come, in what NVIDIA claims is the fastest-ever uptake of its technology.
GeForce RTX 4090 and RTX 4080 Performance Expectations and Takeaways
Forgive me the chart below, which is a bit of an eye chart here in the Forbes engine. Anyway, NVIDIA was pretty straightforward about performance expectations for the new GeForce RTX 40 series, with a $899 GeForce RTX 4080 meeting or sometimes conveniently the previous-generation GeForce RTX 3090 Ti, offering a suggested retail price of $899 at launch. $1999. The company also showed achievements in next-gen games such as Cyberpunk 2077, which support DLSS 3, with much higher performance levels.
As you can vaguely see in the chart above, GeForce RTX 4090 and 4080 series cards can perform up to 2x – 4x faster (RTX 4090) than the mighty GeForce RTX 3090 Ti. However, the chart above shows all DLSS (left, up to 2X) and DLSS 3 (right, up to 4X) performance comparisons. It’ll be very interesting to see how the performance shudders when DLSS is disabled in traditional gameplay, although you could argue why you should bother disabling it as long as a game supports the technology.
NVIDIA’s new family of initial GeForce RTX 40 cards are listed above with their respective price points. Aside from speeds, feeds and configurations, the company has rolled out an extremely powerful product offering here, which it claims will deliver a big performance-per-dollar lift averaging 3X for its RTX 4080 cards and 4X for its RTX 4090 cards. versus its previous generation. It’s important to note that these performance claims are made with the new DLSS 3 technology in play, so again, it’ll be interesting to see the performance shake out across the board, with DLSS on and off, as well as ray tracing -enabled games and traditional rasterization gaming workloads.
Thoughts on Ada Lovelace and the Future of Gaming from NVIDIA CEO
Last but certainly not least, I had the chance to meet NVIDIA CEO Jensen Huang at a conference this week, and I asked him how beneficial the switch from Samsung’s 8N chip manufacturing process to TSMC 4N was for this generation. Jensen noted that his design team has a “about 15%” process improvement alone, while the rest of the RTX 40’s performance improvements come from silicon innovations such as SER (Shader Execution Re-Ordering) and DLSS. Huang noted that while TSMC’s 4N process is much more advanced, “Unfortunately, costs are increasing by more than 15%,” and that scaling up the transistor density alone isn’t enough and no longer gets the job done, because “Moore’s law is dead.” Furthermore, Jensen noted, “and it’s not because TSMC is trying to make more profit. That’s just not true. Their costs have gone up. You can see that their cycle time has increased as the number of steps of the process has increased.”
Huang further explained that: “The way we solved it, Dave, with Ada is architecture. The compound advantage of different architectures and the great lever, the giant lever, was artificial intelligence and tensor cores. That’s the giant lever… And so I think we have to overcome the weakness that we’re at the end of Moore’s law, not by giving up, but by coming up with much smarter techniques, and thank goodness artificial intelligence just came on time.”
You have to admire Jensen’s passion for the company, its products and the burgeoning field of AI. There is no doubt that artificial intelligence is a “great lever”, as Huang points out. AI is now becoming ubiquitous in so many tech areas, and driving higher fidelity graphics for PC gaming is a natural evolution to be sure.