Gaming

In Theory: Can a four teraflop GPU cut it for a next-gen console?


A next generation console with just 4 teraflops of GPU power? Well, that’s the rumour. While Microsoft teases and tempts us with the 12TF behemoth that is Xbox Series X, rumours persist that a second box is in development, designed to hit the market at a much lower price-point, undercutting PlayStation 5 while still being able to play each and every next-gen Xbox game. Lockhart is its codename and I find the basic concept behind its creation absolutely fascinating.

While I suspect that there are important nuances in the design that have yet to be revealed, it’s safe to assume that the basic premise is based on the theory that graphics are far more scalable than any other component of a particular game with the idea being that Series X targets 4K while Lockhart aims for 1440p instead. This is borne out by the various spec leaks we’ve seen, which paint a picture of a console that has far more commonalities with Series X than it has differences. Leaks suggest that Lockhart has the same eight-core/16-thread CPU cluster as the Series X (CPU clocks may be very slightly different) while it still uses an NVMe-based solid state storage solution. As it’s designed to run at lower native resolutions than Series X, we should also expect a lower provision of GDDR6 memory too: 12GB vs the more capable machine’s 16GB seems likely.

However, it’s the pared back GPU that presents the biggest marketing challenge for Microsoft. In a world where Xbox One X hit the market with six teraflops back in 2017, how can a 4TF machine possibly cut it for next-gen? I suspect that this all about a combination of AMD’s Navi architectural improvements which see a lot more ‘performance for your teraflop’, likely combined with more modern GPU features that the current console GCN architectures simply don’t have. The architectural side of the equation is a matter of record already. Back in October last year, we put together an AMD build with 9.2 teraflops of Navi GPU power and found that we got over 80 per cent more performance from just 53 per cent more compute.

A Digital Foundry video where you can see all of this theory-crafting and benchmarking play out before your very eyes.

I thought it would be interesting to see how the results would adjust if this time we stacked up a circa four teraflop Navi graphics card up against both Xbox One X and PC-based GCN products tuned to deliver 6TF of compute performance. I used a Radeon RX 5500 XT, downclocked from its ‘real life’ game clock-based 4.8TF to around 4.3TF. The RX 5500 XT features 22 active compute units from 24 total – and my gut feeling (and nothing more!) is that an actual Lockhart machine would likely feature increased clock speeds and fewer CUs, necessitating a slight rebalancing on the part of my surrogate PC.

Even then, I think that our Navi-based GPU may be falling short of a prospective Lockhart spec. The RX 5500 XT only has a 128-bit interface with 224GB/s of memory bandwidth connected to four or eight gigabytes of GDDR6. If Lockhart ships with 12GB of memory, the most cost-effective solution is likely a 192-bit bus, with something in the region of 288GB/s of bandwidth.

There’s plenty of guesswork here, but the first question I wanted to address was whether a circa 4TF Navi solution could compete with 6TF GCN parts. Xbox One X was an obvious comparison point, but I also threw in an underclocked RX 590 and an overclocked R9 390, which really didn’t like being pushed to 1172MHz, but was just about stable enough for our results. There are plenty of variables in our tests here, so this table should sum everything up – and yes, while the RX 590 was underclocked, I did overclock the memory as 288GB/s seems like a good fit for Lockhart and the 590 could easily accommodate this. I’ve also included a tuned Navi-based Radeon RX 5700 XT delivering 9.2 teraflops of compute, a ballpark equivalent to leaked PS5 specs.

kit
Radeon RX 5500 XT Tuned Radeon RX 5700 XT Tuned Radeon RX 590 Tuned Radeon R9 390 Tuned Xbox One X
Compute Units 22 40 36 40 40
GPU Compute Power 4.3TF 9.2TF 6.0TF 6.0TF 6.0TF
Core Clock 1550MHz 1800MHz 1310MHz 1172MHz 1172MHz
Memory Interface 128-bit 256-bit 256-bit 512-bit 384-bit
Memory Bandwidth 224GB/s 448GB/s 288GB/s 384GB/s 326GB/s
Die Size 158mm2 251mm2 232mm2 438mm2 360mm2
Process 7nm TSMC 7nm TSMC 12/14nm Global Foundries 28nm TSMC 16nmFF TSMC
Silicon Release Date 2019 2019 2016 2013 2017

With all of our prospective hardware lined up, let’s start off with a comparison point that features as much of our kit as possible. But first of all, let’s stress that these are not supposed to be anything like next-gen gameplay tests – we should expect GPU utilisation to change significantly in the next wave of games. What we’re trying to do here is stress-test the GPUs with similar, sensible workloads. To that end, Io Interactive shared with us its PC equivalent quality settings for the Xbox One X version of Hitman 2, which runs at 2160p resolution. I benched all of our 6.0TF GPUs and the lower-end Navi across 1080p, 1440p and 2160p resolutions – and I also included the UHD result of our tuned RX 5700 XT as well. As you can see from the benchmarks below, there are some interesting results here.

The biggest takeaway is that the 4.3TF Navi is capable of outperforming all of the 6.0TF GCN cards I stacked it up against, despite its apparent lack of raw compute power and its huge memory bandwidth disadvantage against every other contender on the field. It’s not a comprehensive win, obviously, but I do feel that despite my best attempts to balance the RX 5500 XT, bandwidth is likely too low – and unfortunately, for reasons I don’t quite understand, AMD’s Navi GPUs really don’t like having their GDDR6 modules over-clocked, so there’s not much I could do to redress the balance.

Interestingly, Xbox One X at 2160p has a small but significant advantage over our tuned low-end Navi, but what I find especially fascinating is how scalability looks up against the 9.2TF-tuned Radeon RX 5700 XT. With a compute advantage of 2.2x, the higher-end Navi card delivers an extra 92 per cent of performance. My understanding is that Lockhart is supposed to deliver at 1440p what Xbox Series X offers at 4K, but based on the scalability seen here, resolution would need to be lower to achieve this – perhaps at some mid-point between 1080p and 1440p. This could conceivably work – the 4.3TF Navi at 1080p still commands a nigh-on 30 per cent lead over the 9.2TF Navi at 4K.

Hitman 2: Console Settings/TAA

  • RX 5500 XT 4.3TF
  • RX 590 6.0TF
  • R9 390 6.0TF
  • RX 5500 XT 4.3TF
  • RX 590 6.0TF
  • R9 390 6.0TF
  • RX 5500 XT 4.3TF
  • RX 590 6.0TF
  • R9 390 6.0TF
  • Xbox One X [V-Sync]
  • RX 5700 XT 9.2TF

Moving on to our next test case, we’re pretty confident that the id Tech 6 games on consoles are equivalent to the PC versions running at medium settings (and we should stress that they still look great, even if we’re not running at high or ultra). Again, we’re able to factor in both Xbox One X running the same content and indeed our 9.2TF-tuned Radeon RX 5700 XT at 2160p.

There are similar results overall here to what we experienced in Hitman 2, though the RX 5500 XT loses its 1080p performance lead against its GCN-based 6.0TF rivals as we scale up the resolution ladder. Once again, Xbox One X has a similar performance advantage at 2160p, just like Hitman 2. And again, I do have to wonder whether constrained memory bandwidth may be the issue here.

Similar results to Hitman 2 are also observed when we take a look at scalability between our two tuned Navi parts. In this scenario, the 2.2x compute advantage that the 9.2TF RX 5700 XT possesses translates into a 93 per cent performance increase. However, this time, at 1440p resolution, our 4.3TF RX 5500 XT runs the same content faster than the 5700 XT running at 4K while performance delta at 1080p vs 2160p is frankly massive. It seems that different game engines scale in different ways across the resolution ladder, but looking at just how fast the 5500 XT is here at lower resolutions, you start to wonder whether a prospective lower resolution console could actually pay dividends.

Wolfenstein – The New Colossus: Medium, TSSAA

  • RX 5500 XT 4.3TF
  • RX 590 6.0TF
  • R9 390 6.0TF
  • RX 5500 XT 4.3TF
  • RX 590 6.0TF
  • R9 390 6.0TF
  • RX 5500 XT 4.3TF
  • RX 590 6.0TF
  • R9 390 6.0TF
  • Xbox One X [V-Sync]
  • RX 5700 XT 9.2TF

Our final benchmark in the main analysis piece here focuses on Remedy’s Control, demonstrating that there are some workloads where the kind of scalability we’ve seen in prior tests simply doesn’t translate. Up against the 6.0TF GCN cards, our 4.3TF-tuned Radeon RX 5500XT falls dramatically short in performance terms. It’s especially surprising that the relatively ancient Hawaii-based R9 390 manages to do as well as it does in any of these tests considering that in all cases we’re using modern graphics APIs – DX12 and Vulkan. We’ve previously seen that the first-gen GCN cards based on the Tahiti design seem to exhibit really poor performance on some DX12 titles, but the R9 390 holds up beautifully in almost all of our tests. And to see it comprehensively best the tweaked RX 5500 XT when it has performed so well elsewhere remains a bit of a mystery.

With that said though, scalability across the Navi cards remains relatively consistent. In fact, the RX 5500 XT at 1440p is point-for-point identical in performance terms to the 9.2TF RX 5700 XT operating at 2160p, with a predictably huge frame-rate advantage at full HD resolution. On paper, the concept of any platform holder launching a four teraflop console for next-gen gaming seems almost absurd, but the numbers here continue to suggest that the idea of scaling down resolution – and resolution only – could pay off.

In the case of Control, I was able to push even more frequency through the RX 5700 XT, taking the compute performance up to 10.2TF. I was curious about this because some have suggested that Sony may aim for this number to get the absolute maximum level of performance possible out of a chip with a potential 40 compute units fully enabled. Let’s just say that the RX 5500 XT wasn’t particularly happy about doing this and the performance uplift simply wasn’t worth the effort – an extra 11 per cent of compute gleaned just 6.5 per cent of actual performance and the frame-rate advantage over the 4.3TF-adjusted RX 5500 XT running at 1440p is a mere 5.3 per cent.

Control: Medium, TAA

  • RX 5500 XT 4.3TF
  • RX 590 6.0TF
  • R9 390 6.0TF
  • RX 5500 XT 4.3TF
  • RX 590 6.0TF
  • R9 390 6.0TF
  • RX 5500 XT 4.3TF
  • RX 590 6.0TF
  • R9 390 6.0TF
  • RX 5700 XT 9.2TF
  • RX 5700 XT 10.2TF

As you’ll see in the video embedded at the top of the page, and indeed in the range of additional benchmarks you’ll find tucked away on page two of this article, our exercise in scalability across the Navi architecture produces broadly consistent results. While the delta in performance between 1440p on a lower-end Navi up against 2160p on a higher-end equivalent varies on a title by title basis, the numbers do suggest that a four teraflop console could conceivably work. However, what is clear is that a prospective Lockhart console at 1440p would deliver much closer results to a notional 9.2TF PlayStation 5, with the 12TF Xbox Series X perhaps too powerful by comparison. But maybe that’s the point – if we look at Microsoft’s broader strategy, a spoiler product against PlayStation 5 makes a lot more sense than trying to steal sales from Xbox Series X.

However, we also need to factor in that there’s still a lot we don’t know about the next generation of consoles – the ‘important nuances’ in the design I referred to earlier on. First of all, we now have explicit confirmation of RDNA 2 as the basis of the architecture. Aside from hardware-accelerated ray tracing features and VRS (variable rate shading), how else has the architecture evolved compared to the PC parts we’ve tested here? The truth is that we simply don’t know. On top of that, there’s also the notion of the classic console ‘secret sauce’ – or more specifically, the hardware-level customisations the platform holders bake into their designs.

On top of that, there is the basic idea that we’re swiftly moving into what you might call the post-resolution era, as my colleague John Linneman discussed in his analysis of Metro Redux on Switch, where he demonstrated that a 720p image from Switch delivers far higher image quality than a 720p image from the same game running on the last-gen consoles. Switch is a fascinating example of just how scalable graphics are, of course, with Metro one of the most compelling examples of how a game can retain its visual identity despite running on a far less capable GPU.

directml
Microsoft has experimented with AI upscaling via DirectML using Forza Horizon 3 for testing.

Finally, image reconstruction techniques are improving in quality to the point where Nvidia’s DLSS AI upscaling is now capable of producing hugely impressive results from just one quarter native resolution – and we actually have an example of some Microsoft research where the firm is using its own machine learning-based API, DirectML, to produce some remarkably good AI upscaling on Forza Horizon 3. So far, we’ve not had any kind of hints on hardware-accelerated deep learning features baked into either the next-gen consoles or indeed RDNA 2, but DirectML has been architected in parallel with the DXR ray tracing API and I find it hard to believe that Microsoft would develop this technology when only Nvidia has the hardware to fully leverage it.

Factoring in the advances in GPU technology, the scaling back of native resolution as a primary indication of overall image quality, and the ultra-extreme examples of graphics scalability witnessed on Switch, there’s an argument that the stars are aligning for Lockhart in delivering an entry-level console that could actually work. Pricing will be critical, because there’s only so much money to be saved with a smaller processor and less RAM and if, say, PS5 comes to market with around 2x the performance and costs just $100-$150 more, Lockhart may struggle to win the argument. Indeed, if the economics don’t work out, maybe the machine will never be released at all.

It’s also fair to say that marketing such a machine also be a considerable challenge. I genuinely thought that the platform holders would avoid teraflops as a performance metric for the new wave of machines simply because the concept of a Navi teraflop bears little relation to a last-gen GCN teraflop – every test shows that Navi is just so much more capable. Despite this, however, Microsoft has gone straight in with 12TF as a key marketing message for Xbox Series X, which surely can’t reflect favourably on the cheaper machine.

But perhaps the biggest concern with the idea of a machine like this is that any kind of lower technological baseline may limit developers in architecting next-gen experiences – and as we’ve seen with the current generation, reduced memory and graphics power has made an impact (sometimes quite profound) on certain titles running on Xbox One S and PlayStation 4 Pro. However, while the tests here can only really be considered a very rough ballpark approximation of what the real thing may deliver, the basic concept of scalability seems to check out – and with next-gen pricing apparently causing concern, maybe a more value-orientated console is what the market is going to need.





READ SOURCE

Leave a Reply

This website uses cookies. By continuing to use this site, you accept our use of cookies.