Home » today » Technology » New in RDNA 3 architecture and the introduction of Radeon RX 7900 XTX and RX 7900 XT

New in RDNA 3 architecture and the introduction of Radeon RX 7900 XTX and RX 7900 XT

Last night AMD unveiled a new generation of Radeon RX 7000 series with the new RDNA 3 architecture. We have more information about the changes in the architecture, the first performance shots without ray tracing and with it, detailed parameters, prices. What is missing is a performance comparison with the competition, but even with that you can do at least a little.

On December 13, the first two models of the next generation Radeon with RDNA 3 architecture will be launched, which AMD describes as “the most advanced gaming graphics in the world”. AMD presented them yesterday after nine in the form of an evening presentation.

The more powerful RX 7900 XTX model is expected to increase up to 70% performance at 4K resolution over the previous generation flagship model RX 6950 XT.

Probably the most complete performance so far with a complete presentation Posted by Anandtech, you can also find it at the end of the article in the gallery. There are also some details in the Anadtech article that weren’t heard in the presentation on Youtube, so we will use that as a primary source. There was a lot of information, but not enough time, so I’ll probably add a little more on a regular basis. You can find more information directly on the AMD website, where shortly after the publication of the presentation i RX 7900 XTX product pages, RX7900XT and even some small talk to the RDNA3 architecture.

Navi 31 is the first gaming graphics chip that uses chiplet construction. The chip is divided into two types of blocks: Graphics Compute Die (GCD) produced by TSMC’s 5nm process and several Memory Cache Die (MCD) produced by TSMC’s 6nm process.

The entire graphics chip, including the chiplets, has 58 billion transistors (the Navi 21 had 26.8 billion). AMD does not indicate the number of transistors in the chiplet. The GCD graphics chip itself has an area of ​​300 mm².

It contains computing units, auxiliary units, display engine and multimedia engine, practically everything that was in the last generation. Out moved the Infinity Cache and memory controllers.

RDNA3 architecture chips can be equipped with a different number of MCD chiplets. Each of the six MCD chiplets contains a second generation Infinity Cache buffer (L3 cache) and a GDDR6 64b (or 2x 32b) memory controller and has an area of ​​37 mm², the number of transistors in the chiplet is not specified by AMD.

AMD marketing is trying to sell the chiplets as if they are the same revolution with RDNA3 as with Ryzen a few years ago, but I can’t help but feel it’s not the same in this form. In the case of Ryzen, the core part of the processor – the compute cores – has been moved to the chiplets, while in the Radeon chiplets there is an infinity cache, i.e. a buffer memory, and memory controller.

But chiplets with memory next to the GPU aren’t such a revolutionary thing for Radeons. AMD has already tested them with Radeon Fury and Vega, which had “chiplets” with HBM and HBM2 memories placed next to the GPU. In principle, the difference is rather that, while in the R9 Fury and RX Vega the chiplets served directly as the card’s video memory, in the Radeon RX 7900 they have significantly less capacity, are connected by a much faster interface and work as an infinite cache buffer. But there has been a significant change in terms of communication speed with the chiplets and overall throughput. And there are also controllers in the chiplets, the role of video memory is played by the classic GDDR6, with which the chip communicates through the controllers in the chiplets.

RDNA3 chiplets are more like the “Zen” moment in processors as by moving the infinity cache out of the GPU, the monolith can be cut into smaller pieces, thus improving yield and making production cheaper. It also allows you to produce different parts of the chip using different manufacturing processes. But even here I can’t help but think that the savings won’t be as dramatic in this generation. The GPU itself should have an area of ​​only 300mm² and contains six cheaper chiplets with an area of ​​37mm². Compared to the RTX 4090, whose AD102 chip has an area of ​​608.5mm², this is a huge difference, but everyone is building the new Radeons more against the GeForce RTX 4080, whose AD103 chip is supposed to have an area of 379mm², and although it is more expensive to manufacture, it will have simpler and cheaper encapsulation for a modification.

In the full configuration, the RX 7900 XTX has all six active MCDs, the RX 7900 XT has 5 active MCDs, one is non-functional and acts as a spacer. The benefits of manufacturing MCDs using a better 5nm process would have been minimal, so AMD decided to use a cheaper 6nm process for them.

The RX 7900 XTX with six active MCDs offers a 384b GDDR6 memory bus along with 96MB of L3 cache.

With five active MCDs in the RX 7900 XT, we achieve a 320b bus for GDDR6 and 80MB of L3 cache. Even with a more powerful model, the Infinity cache is slightly smaller than in the RX 6800 XT and higher models with Navi 21, which had 128MB cache capacity. The reduction in capacity is said to be made possible by improvements that allow for the reuse of cached data, AMD has not provided further details.

To achieve the necessary throughput, AMD used the Elevated Fanout Bridge (EFB) encapsulation technology first used in the MI200 series accelerators (CDNA2). In them it connected the GPU and HBM2e memory, in RDNA 3 it is used to connect the MCD to the GCD. Total throughput between MCD and GCD is 5.3 TB / s.

RDNA3 inherits a number of elements from RDNA2 and RDNA, but there have also been partial but substantial changes.

The most significant change in the chip is the redesign of the ALU. AMD doubled the number of ALUs (stream processors) in a Compute Unit (CU), from 64 ALUs in a Dual Compute Unit to 128. AMD did not achieve this by doubling them, but by allowing them to dual-issue by processing instructions. . Each SIMD lane can then execute up to two instructions during a cycle. However, this cannot always be done, only for instructions that can be executed in parallel. If it can’t run in parallel, the ALU works like a classic.

In practice, this means that it is not always possible to make full use of the ALU and the theoretical value of FLOPS declared for RDNA3 cannot be directly compared with FLOPS for RDNA2. Ideally, the processing power of the 7900 XTX in FP32 is 2.6x, but it’s actually lower, which is why AMD claims a 1.7x performance boost.

Due to this change, there will likely be some chaos in the reported number of stream processors before the nomenclature is somehow resolved. In the specs, AMD lists 6144 stream processors for the RX 6700 XTX and 5376 stream processors for the RX 7900 XT, but in some materials you will find double the values.

Another significant change is the separate compute acceleration units used for AI. AMD hasn’t revealed much about them yet. Each CU of RDNA 3 is said to contain two AI accelerators with support for new instructions, thanks to which the RX 7900 XTX should perform bfloat16 calculations 2.7 times faster than the RX 6950 XT.

AMD may use them in the future, for example, in a similar way to Nvidia for DLSS, but it is not yet known that the developers have plans to do so (and it would also mean that new FSR iterations are limited to the Radeon 7000). For now, they will be more important in professional use.

Another change made by AMD is the separation between shader and front-end clock. AMD says this is to ensure greater efficiency. The shaders run at 2.3 GHz (which is the claimed game clock), while the chip’s front end runs at a higher 2.5 GHz. On the chip, the shaders occupy a significantly larger area than the frontend, so a slight reduction in clock rates had a positive effect on the consumption of the entire chip. At the same time, it should help ensure that the performance of both sides is more balanced and that neither side is missing.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.