Dimitrios Kechagias

Posted on Dec 8, 2024 • Edited on Feb 1

Google Axion: A New Leader in ARM Server Performance

#googlecloud #arm #aws #axion

Although our current cloud deployment at SpareRoom is x86, I’ve had the opportunity to test Google’s first in-house ARM server CPU "Axion" for a few months before its recent public release. Without giving too much away upfront - let's just say I was not left unimpressed. I’ll let the numbers, presented as charts, show how Axion compares in both performance and value with the best offerings in Google and Amazon clouds.

Table of Contents:

The Contenders
Test setup
Performance Results
Performance / Price
- On Demand & Reserved
- Spot Instances
Conclusion

The Contenders

Here are the contenders for this comparison, the best/most relevant drawn from my recent Cloud VM Comparison test, with prices updated for the 3rd week of November 2024:

Instance Type	CPU type	HT / SMT	Price* $/Month	1Y Res.* $/Month	3Y Res.* $/Month	Spot* $/Month
Amazon C7a	AMD EPYC Genoa	-	77.36	51.97	35.39	26.20
Google c4a	Google Axion	-	57.69	38.89	27.29	24.50
Amazon C8g	AWS Graviton4	-	66.45	44.60	28.75	10.80
Google c4	Intel Emerald Rapids	Y	64.49	41.52	30.34	27.23
Google t2d	AMD EPYC Milan	-	64.68	41.86	30.76	11.77
Google c3d	AMD EPYC Genoa	Y	57.12	36.87	27.02	24.28

^{* Monthly price for 2x vCPU / 4GB RAM / 30GB disk instance, except t2d with 8GB RAM. For c3d, 4x vCPU is the minimum so price extrapolated to 2x vCPU.}

Amazon had the fastest ARM server VMs as we saw, featuring the Graviton4, and I am including the new compute-optimized type C8g. On the x86 front, I selected their C7a, featuring non-SMT AMD Genoa, which remains the fastest x86 VM in terms of per-vCPU performance.
From Google, I compared against three x86 types: the SMT/HT-enabled AMD Genoa (c3d) and Intel Emerald Rapids (c4), and the older AMD Milan (t2d). The t2d, while older, remains competitive in per-vCPU metrics due to its lack of SMT.

Test setup

I used the same methodology as my recent cloud comparison test, as it was detailed here with one addition: a real-world FFmpeg video compression benchmark.

Here’s how I set it up:

# For ARM instances - replace 'arm64' with 'amd64' for x86:

wget https://johnvansickle.com/ffmpeg/releases/ffmpeg-release-arm64-static.tar.xz
tar -xJf ffmpeg-release-arm64-static.tar.xz --wildcards --no-anchored 'ffmpeg' -O > /usr/bin/ffmpeg
chmod +x /usr/bin/ffmpeg
wget https://download.blender.org/peach/bigbuckbunny_movies/big_buck_bunny_720p_h264.mov
time ffmpeg -i big_buck_bunny_720p_h264.mov -c:v libx264 -threads 1 out264a.mp4
time ffmpeg -i big_buck_bunny_720p_h264.mov -c:v libx264 -threads 2 out264b.mp4

Performance Results

Benchmark suite results

DKbench is probably the most telling benchmark for the general workloads we use our servers with, and Geekbench 5 is added because of the wide availability of comparison results:

Immediately we see that Axion not only edges out Graviton4, it’s surprisingly close to the C7a, the fastest x86 instance overall.

DKbench is a good indicator of general performance, but we'll get on with some more specialised testing.

Compilation

For a developer-specific workload, I compiled Perl on two threads:

Axion came second only to Amazon's Genoa, and the difference was marginal.

7zip performance

This is the most impressive showing for Axion, leading in both compression and decompression.

Video compression

As mentioned above, I am transcoding using FFmpeg/lib264. lib264 is a very mature library and that should be well-optimized for Intel/AMD, so I was interested to see how well the new Google CPu could do:

And the answer is, not bad at all. It falls behind of Emerald Rapids & Genoa per single thread, but not by a significant margin. And given that for video compression we don't really care about single-threaded runs, only the C7a is actually faster per vCPU.

OpenSSL (AVX-512)

Moving on to an even more heavily-optimized for Intel/AMD CPUs benchmark, OpenSSL can use the latest AVX-512 instructions for increased performance. This is basically the worst case usage scenario for ARM as the architecture has much more limited SIMD extensions (NEON):

Here, Axion improves over Graviton4 as in all tests, but cannot keep up even with the older x86.

Summary: performance delta vs Genoa & Graviton4

Let's have a look at the performance delta of the Axion vs Genoa (in purple) and Graviton4 (in yellow) for all the benchmarks we ran (skipping the special case of OpenSSL):

There are consistent gains over the Graviton4 (from 3 to 15%). On the other hand, Genoa maintains the lead for most tests, with the maximum difference at 15%, but Axion keeps much closer than that in general and even bests Genoa in some cases. I would say that, for most uses, Axion will be closer to Genoa than it is to Graviton4.

Performance / Price

The main reason Amazon & Google developed their own ARM solutions is to provide the best possible value (for themselves and their customers). Hence, a look at performance / price is possibly even more useful than raw performance. I will be looking at multicore performance with DKbench, as with it's varied benchmarks gave reasonably balanced performance results.

On Demand & Reserved

Looks like it's mission accomplished for Google. Axion is by far the best value amongst the tested VMs, both for On demand and 1y/3y reserved pricing.

Spot Instances

Spot prices vary wildly, both with time and location. Based on Eastern-US pricing on the 3rd week of November when I was compiling the results, this is what we see:

I don't know if Amazon is doing this on purpose, but they are giving Graviton4 at an unbeatable spot price for US-east, where Axion has availability, when it is priced almost 2x in US-west, where Graviton4 at an unbeatable spot price for US-east is not yet available! In any case, for good value on Axion spot instances you'll have to wait for wider availability, right now on Google you'd have to go with the Milan Tau instances if you wanted the best value.

This chart is mostly to make you research spot prices, as there are always great deals to be found, especially if you are not limited to a specific region. The deals change often, so try to keep track of pricing.

Conclusion

Google’s Axion CPU proves to be an exceptional contender in the ARM server space, offering stellar performance and value. Expect an almost 10% performance improvement over Graviton4. In addition, while it trails behind x86 CPUs in some specialized workloads (e.g. AVX-512), it is not far behind the best x86 CPUs in the majority of tasks, posing as a viable alternative for those seeking to switch to ARM but keep top-tier performance levels.

DEV Community