We’re winding-down coverage of Vega, at this point, but we’ve got a couple more curiosities to explore. This content piece looks at a mix of clock scalability for Vega across a few key clocks (for core and HBM2), and hopes to constrain for a CU difference, to some extent. We obviously can’t fully control down to the shader level (as CUs carry more than just shaders), but we can get close to it. Note that the video content does generally refer to the V56 & V64 difference as one of shaders, but more than shaders are contained in the CUs.
In our initial AMD shader comparison between Vega 56 and Vega 64, we saw nearly identical performance between the cards when clock-matched to roughly 1580~1590MHz core and 945MHz HBM2. We’re now exploring performance across a range of frequency settings, from ~1400MHz core to ~1660MHz core, and from 800MHz HBM2 to ~1050MHz HBM2.
This content piece was originally written and filmed about ten days ago, ready to go live, but we then decided to put a hold on the content and update it. Against initial plans, we ended up flashing V64 VBIOS onto the V56 to give us more voltage headroom for HBM2 clocks, allowing us to get up to 1020MHz easily on V56. There might be room in there for a bit more of an OC, but 1020MHz proved stable on both our V64 and V56 cards, making it easy to test the two comparatively.
Reminder on Test Challenges
As a reminder from the first piece, these tests are somewhat difficult to run with the level of accuracy we’d like. The trouble with clock-constrained testing on the latest architectures from both nVidia (Boost 3.0) and AMD (Vega) is that clocks largely do what they want. This makes sense, to an extent, as the card can min-max its performance without user intervention. For the vast majority of users who don’t explore overclocking or volt-frequency tuning, this behavior is the way of the future. For people who do manually tune clocks, it’s a challenge, as the clocks will still bounce around based on load. Vega, for instance, might have a user-defined clock of 1422MHz, but jump between 1380 and 1398MHz in one test, then might do 1422-1430MHz in the next game, and so on. Testing in FireStrike may yield a clock close to the specified frequency, where games will immediately tank that clock by ~20MHz (or more, in some cases). It just depends on the load level.
To solve for this, the only real approach is to log frequencies during each test, then create frequency vs. time plots for each device. We also average the frequency during the test run, just to get an idea for rough ranges. This approach introduces some additional error, as it’s hard to get the clocks exactly the same for each card and each game, but the method does control relatively well.
|GN Test Bench 2017
|This is what we’re testing
|Intel i7-7700K 4.2GHz locked
|Corsair Vengeance LPX 3200MHz
|Gigabyte Aorus Gaming 7 Z270X
|NZXT 1200W HALE90 V2
|Top Deck Tech Station
BIOS settings include C-states completely disabled with the CPU locked to 4.2GHz, so these results are not directly comparable to our tests at 4.5GHz. Memory is at XMP1. Tested using 17.8.2.
3DMark Clock Scalability
We’re opening with FireStrike. 3DMark FireStrike shows some of the most visible scaling with this new round of tests, so keep that in mind – we’ll lose resolution on the shader (CU) impact as we explore games.
3DMark FireStrike scoring shows slightly more noticeable shader impact toward the low-end, when underclocking to 1395MHz core and 800MHz HBM2. As we move to 1590MHz and 800MHz HBM2, we’re maintaining a somewhat equal distance as at 1395MHz. Towards the higher-end of the frequency scale, the lines merge closer to one another, and start dithering around within margin of error territory, demarcated by the vertical error bars. We haven’t yet pushed Vega 56 to 1050MHz HBM2, but we included the Vega 64 data point for reference: Our performance flatlined for Vega 64 once we hit 1657MHz core and 945MHz HBM2. We still gained about 5.5% performance between the 945MHz and 980MHz numbers on Vega 64 (also aided by clock increases), but nothing close to the gains seen earlier in the line plot. Diminishing returns are encountered once we hit 1050MHz. We may need to push the core higher to benefit from those gains.
Relating these scores into a more distilled FPS value, we can break things into FPS 1 and FPS 2: GT1 heavily loads the GPU with polys, primitives, and tessellation, but doesn’t apply much of a compute workload. GT2 increases compute task workloads and stresses memory harder. Knowing these two facts about 3DMark’s testing, we can better understand why each number behaves the way it does.
Starting with GT1 FPS, the gap is largest at 1395MHz core and 800MHz HBM2: We’re at 97FPS for Vega 56 and 102FPS for Vega 64, plus or minus some error in our clocks. That gives us about a 5% advantage for Vega 64, which is just outside of our tolerance for clock error and 3DMark variance. It appears that at these lower clocks, we are seeing a more noticeable impact from the CU count increase on Vega 64. That 5% gain largely persists to the 1589MHz core and 800MHz HBM2 class, where we still see about a 5-6% gain from the shader count. This difference begins to fall within error margins toward the higher-end of the clock speeds, though overall maintains a slight 1-4% lead over Vega 56 as we approach the higher end. There are times when the scores were effectively identical, as seen in our initial round of tests and in a couple of these other data points, and that falls within test variance and clock error margins. Once we get past the 1589/800MHz territory, we see some more clock scaling toward the very far-end of the scale, but would need a good means to push further to validate fully.
GT2, the second FPS score from 3DMark, is more compute-intensive than GT1. These scores are nearly identical across the board. Vega 64 does not hold a significant lead in any of these tests except for the first two, where we were clocked at 1395MHz and 1589MHz, both with 800MHz HBM2.
For Honor Clock Scaling
Applying this to games is where we start losing some of that resolution: For Honor at 4K plots us at about 47-48FPS AVG with our original test battery, where we were around 1580MHz and 945MHz HBM2. At 1390MHz core and 800MHz HBM2, performance hovered around 42FPS AVG for the Vega 56 GPU, and around 43FPS AVG for the Vega 64 GPU – but note that Vega 64 was running about 8MHz faster. We are within error, here. There is effectively no difference in For Honor at 4K with these clocks, looking at just CUs. But then there are a lot of other elements of the card engaged when gaming, so it’s tough to tell what other bottlenecks we might be encountering.
1080p doesn’t change much of this: Our original numbers were at 137FPS AVG, nearly dead for each device, and had 1% and 0.1% lows close to one another. At 1390-1398MHz core and 800MHz HBM2, both Vega 56 and Vega 64 are within 1FPS of each other. We’re within error margins, here. Again. For Honor is exceptionally GPU bound, so we’re not in a scenario of CPU bottlenecking.
Ashes of the Singularity Clock Scaling
Ashes of the Singularity at 4K seemed to be bottlenecking – potentially on the CPU, given how this title behaves – or was just showing exceptionally limited differences. At 1390MHz and 800MHz HBM2, the difference between the two GPUs is within margin of error. That said, they are also not too distant from the 58FPS AVG of the 1590MHz core and 945MHz HBM2 tests. These differences are also within margins, just barely, and indicate another bottleneck of some kind. We can’t use this test for much, so let’s move on to Hellblade.
Hellblade Clock Scaling
Hellblade at 4K shows no scaling between the Vega 56 and Vega 64 cards when at 1580~1590MHz core and 945MHz HBM2, as discussed previously, and also shows no differences at 1390MHz core and 800MHz HBM2. There is no real difference, here.
Ghost Recon Clock Scaling
Ghost Recon: Wildlands at 4K again shows effectively no scaling at 1400MHz core and 800MHz HBM2, with our scores sitting within fractions of a frame of each other. At roughly 1590MHz core and 945MHz HBM2, we also see no scaling. This trend continues up to 1660MHz core and 980MHz HBM2 overclocks.
After 10 Days: The Addendum
Up to this point, all of the testing was conducted a few weeks ago, before we left for the LMG shoot and before the 7980XE launch. We held publication of this content, though, and that was to allow time to test a few more synthetic options. The other part of our addendum went back on an initial plan not to flash Vega 56 with Vega 64 BIOS, and we ended up flashing anyway, then overclocking memory to 1020MHz on each device.
Heaven and Superposition were those new options, added because we thought they’d be more likely to draw-out differences. We performed Heaven testing using the “Extreme” preset at 1600×900, then ran custom testing at 1080p/Ultra, with AA at 8x and with Dx11, while manually adjusting tessellation across all options. We thought this might give some visibility as to a potential bottleneck in the geometry pipeline.
Heaven – Extreme
Starting with the Extreme preset at 1600×900, we’ll use FPS to first show differences: The Vega 56 card at 1390 and 800MHz averages 96.4FPS after five passes, with the Vega 64 card at similar speeds averaging 96.1FPS. These are functionally the same, particularly considering we were about 5MHz lower on average with the Vega 64 card. With the BIOS flash and Vega 56 set to 1660 and 1020MHz, we score 112.05FPS AVG, compared to 114.75FPS AVG on the Vega 64. That’s a 2.4% difference, which is close to our earlier defined error margins – but close enough to the limit that we can say there is a pattern emerging. We’ll keep this 2.4% advantage in mind for now, as it may come into play as we analyze more data.
With 1080p testing and tessellation scaling, we’re seeing these results. At 1660 and 1020MHz clocks, the Vega 64 card operates a score of 3050.7 versus 3017 on Vega 56, or an increase of 1.1%. The gap widens as we increase tessellation to Moderate, resulting in a score difference of 2801.5 on Vega 64 versus 2730.5 on Vega 56, or a 2.6% improvement on Vega 64. Normal tessellation also posts a 2.6% difference, at 2628.5 versus 2561.5. Extreme has us at 2.9% improved with Vega 64, showing one of the bigger gains we’ve seen. Given the consistency of these results, we can safely say that the Vega 64 cards extra CUs do help it in this instance, somewhere between 1% and 3% for gains, depending on test settings.
As for the 1390MHz and 800MHz clocks, the scoring is roughly the same across the board. We’re within tolerance for error and the 5MHz clock difference, here. Unlike FireStrike, the results appear to be mostly the same at the low-end clocks when testing with Heaven.
Superposition Scaling – Vega 56 vs. 64 at Same Clocks
We’re seeing about +/-3% in the best cases here, if looking for differences. Superposition is made by the same company as Heaven, so it makes sense that performance would be similar. We can’t get this to replicate in most of our game tests, but suspect that perhaps games with greater async reliance could show something similar to the ~3% swings. We’d have to run another piece on that, but will call it for now.
And one more final note: Although we like to do this testing to try and determine shader differences, what we’re really testing is CU differences – each CU contains more than just shaders, like texture units, so other elements of the CU can come into play before shaders do. We have no good way for isolating beyond CUs.
Conclusion: Just Overclock Vega 56
There’s some scaling in 3DMark Firestrike GT1, which is poly and tessellation intensive, and scaling in Unigine synthetic benchmarks. Even when there is scaling at the more realistic upper-end of performance, though, it’s not much – we’re talking 1-3% for an extra $100-$150. Not at all worth it, and often not replicable in gaming scenarios. Again, there are likely compute applications and some very specifically-made games that could benefit from the CU increase, but 97% of the performance comes down to clocks – if not more.
Given how easy it is to flash V56 and overclock, we’d recommend just going that route. Save the money, OC Vega 56, and walk away with more money and functionally equivalent performance.