The end of my AArch64 desktop experiment

(juszkiewicz.com.pl)

78 points | by signa11 17 hours ago

15 comments

HerbManic 15 hours ago
I'm not surprised at the outcome. These Ampere system single core/thread performance is pretty low and that is where you feel it. The OS/software simply cannot allocate the threads across enough cores effectively to make up for this difference.
This is why things like the Apple M Series feels so fast, because while they don't win the multi core performance especially when going up against a 80 core beast like this, they have single thread performance exactly were it is needed.
Maybe we will need 80 cores in future, that is cool but for daily home use it is still just way too much for what we need.
[-]
- akoboldfrying 11 hours ago
  Apple M series is also aarch64 architecture, isn't it? Could you explain more why you expect Ampere to be slow but M series to be fast?
  [-]
  - rwmj 11 hours ago
    Apple design their own Arm-compatible cores from scratch. Ampere use a modified Arm Neoverse N1 core. In addition, the Ampere server that Marcin is using is about 6 years old, and would have been tuned for core count over single thread performance (good for web serving). Basically Arm's own cores aren't nearly as good as Apple's at the best of times, and having a 6 year old server makes things even worse.
  - dzaima 11 hours ago
    Ampere Altra is for cloud/datacenters/servers where multithreaded throughput is approximately all that matters. Apple M series is for consumers.
  - jdub 11 hours ago
    Because they're designed for different things.
    Ampere's primary focus is running lots of simple tasks concurrently, at relatively low power, with lots of I/O. So, many tens to hundreds of cores, not too fast, at lower power draw than amd64, with lots of PCIe lanes for storage and network.
    Apple's primary focus is user experience and power efficiency. That's why you'll find a handful of fast performance cores and low power efficiency cores, along with graphics acceleration to drive high resolution displays.
kjellsbells 43 minutes ago
I feel that OP should be congratulated on trying!
I dont see this experiment as any kind of "failure". Something was learned, and OP is better off for it. Computing and science literature would be a lot better off if people, like OP, honestly documented where things went wrong.
dn3500 12 hours ago
I don't understand the kernel problem. Why did he feel he had to rebuild the kernel weekly? When the amdgpu stopped working why couldn't he just go back to the last working kernel?
[-]
- M95D 11 hours ago
  He said he needed patches to make the GPUs work. Kernel package auto-updated does not have those patches and overwrites the custom kernel he built every time there was an update available.
  [-]
  - trentor 11 hours ago
    You can disable automatic kernel updates on almost every distribution. Most people that use secure boot and Nvidia do it.
    [-]
    - M95D 10 hours ago
      I wouldn't know. I run Gentoo. :)
- nevi-me 11 hours ago
  I presumed that as a kernel developer, he would run the kernel he runs, which would require rebuilding periodically. Daily doesn't make sense, monthly is too infrequent given the rate of change in the kernel.
  My speculation though. When I was building an app I was using, I used to run a recent stable build on my device instead of just the one released in the Play Store. Simplifies having to keep multiple devices.
edg5000 12 hours ago
> there was no org.freedesktop.Platform.GL.nvidia in Flatpak repositories for AArch64
All he had to do was build some packages from source, right? It's really worth learning how to do this, since it removes a lot of constraints.
And the kernel patch should land in the kernel pretty soon, I hope? He won't have to run a patched kernel forever. Should be possible to get that in a release in a year or so?
[-]
- jsnell 11 hours ago
  I don't know if it's your intent, but that reads really condescending. It's obvious the author knows how to build packages from source. They're working professionally for a Linux distro on arch support!
  But that was several layers deep into yak shaving broken graphics, and at some point you need to actually get your real work done.
  [-]
  - edg5000 9 hours ago
    That he ran into blockers with the exotic setup is understandable. After overcoming a bunch of challenges, I was surprised to read that the missing Flatpak package stopped him in his tracks, that felt surprising when I read it. It feels like a lesser challenge than the issues he has already addressed.
  - yjftsjthsd-h 7 hours ago
    > at some point you need to actually get your real work done
    I guess my impression was that in this case the yak shaving was the real work, or at least a part of it? If you're trying to make ARM support fully work in your distro, then daily-driving it and dealing with these things is how you get there. Granted, if that's not the goal and they were just having fun by using an ARM box, that's fair.
yjftsjthsd-h 7 hours ago
I would like to take a moment for this line:
> The “wooster” system stays powered on, churning through RISC-V package builds. It may be weak in single-thread, but it flies when it comes to multi-core load.
Feels vaguely hilarious that the ARM box didn't work out as a desktop, so instead it gets repurposed to cross-compiling RISC-V packages:)
10000truths 11 hours ago
I'm not sure why the author didn't attempt to dive deeper into the error message he saw. amd_vcn_dec sounds like it's an issue with the GPU's video decoding logic. If there's a timeout when trying to process a decode request, it may be that power management for the GPU is buggy somehow. Given that this is a server build and idle power consumption is likely not a big deal, I'd suggest pinning the GPU power state to see if it resolves the issue (see amdgpu.ppfeaturemask and amdgpu.runpm kernel parameters).
[-]
- bayindirh 10 hours ago
  I believe something I call "the window phenomenon" has occurred. Sometimes, life allows you have the time to do these big experiments on your life and then it gets busy again and you can't dive into it with the same capacity, so you have to do what you have to do while surviving what you have at hand.
  I have gone through many patches like this, and I believe he had to handle life while is experimental workstation had to limp through.
  Then when he had the time, he had just pulled the plug.
  [-]
  - vk5tu 9 hours ago
    I designed and built my own DSL router: component selection, PCB design, and so on. When NBN upgraded the link to my home I simply went and bought a 10Gbps ethernet router. Despite any compact PC with SFP+ cages doing the same job more cheaply.
    Exactly because the window of time I had for fooling with home networking had closed.
    [-]
    - bayindirh 8 hours ago
      Same here, I was digging a bug in TrueNAS. I traced the bug, dug the code up, isolated it and let people know. Before doing the detailed bug report, life happened.
      At least the code is there, info is there and other people are picking up the flag where I left. This is how I comfort myself. At least I was able to push the process a little further.
anthonj 13 hours ago
I see the problem, but I don't see a clear analysis on the actual source of the problem. I assumed the issue was mainly single core performance, but he is also suggesting context switches could be the cause?
So could you fix that with a new scheduler? Or you just need another SoC with better single core performance? I could imagine that the latter already exists, just not in soc with >16 cores. My naive view is that such high core count system comes with tradoff on core size and interconnect/memeory bus complexity.
And I mean.. my phone is a middle lower end device and for sure I can play youtube videos (maybe in a popup as well) and run the browser without noticing that much difference from my laptop.
[-]
- NavinF 12 hours ago
  I don't think youtube playback is a relevant comparison since it uses ~0 CPU. Pretty much all phones have hardware accelerated decoding. Lots of TVs and streaming devices use an ancient Android phone SoC yet they too can play YT and run a browser. The entire UI is often a local web app.
  [-]
  - anthonj 10 hours ago
    I imagine, be he mentions video playback on youtube making things worse, and he does have a dedicated amd gpu.
    But iirc for both Firefox and chromium on Linux desktop hw acceleration is tricky so maybe it's that.
    [-]
    - NavinF 2 hours ago
      Yes anything GPU related other than CUDA is a shit show on Linux desktop. Another issue is that YT loves to use AV1 if they know you're on desktop. Almost all desktop users have a CPU powerful enough to software decode it in realtime, but if you're on a prebuilt PC you'll definitely notice the fans kick in
- KaiserPro 13 hours ago
  I think the single core performance would be bearable if it wasn't combined with maintaining a custom built kernel.
fragmede 12 hours ago
Fascinating! I've been running the laptop version-ish of this experiment with the 14M9610, and my major complaint is Device Tree sucks. It's been explained to me why all of ARM can't just enumerate devices like PCs do, but it still sucks. This means every ARM device starts off in custom kernel territory, which makes all sorts of hacks okay to begin with, since you need a custom kernel anyway.
[-]
- bpye 12 hours ago
  ACPI does exist for Aarch64, but is only really used for Windows client devices, and server hardware - though I think the Ampere hardware in the article would use ACPI not DT.
  If you want to run Linux on one of the modern Qualcomm Windows laptops, you still generally end up needing to use device tree.
- bestouff 11 hours ago
  This is not completely true. You can use a generic kernel with a custom device tree.
  The only problem is that distributions currently tend to package them together, but that shouldn't be obligatory.
  [-]
  - dezgeg 10 hours ago
    You can't if the firmware provided DTB doesn't follow any upstream Linux approved bindings and instead uses some vendor kernel specific bindings.
    [-]
    - M95D 9 hours ago
      Why would you combine mainline kernel with manufacturer device tree? Kernel includes its own device trees.
      [-]
      - bestouff 8 hours ago
        (S)he has a point. Sometimes the vendor dtb is all you have.
- M95D 11 hours ago
  > my major complaint is Device Tree sucks
  Why? Device tree is great. You can patch it yourself if something doesn't work, add overlays, etc.
  [-]
  - rwmj 11 hours ago
    It's a bad solution compared to having the hardware just enumerate itself like PCI does. (No one uses the firmware supplied DTs because they're usually broken.)
    [-]
    - M95D 10 hours ago
      All IBM PC clones had (or emulated) the same 8253 timer, 8259 PIC, 8237 DMA controller, 8042 keyboard controller, CMOS RTC, 8250/16550 serial port, standard IDE/PATA, standard framebuffer addresses, standard PCI and ISA register addresses, FPU was always at IRQ13, mouse at IRQ12, RTC at IRQ8, LPT at 0x378, PC speaker at 0x61, etc.
      All this doesn't require any enumeration and was still standard until BIOS/CSM was removed. PCs could use the same IDE driver for 30 years of hardware! All chipsets were compatible, from 386 to today's SATA in compatibility mode.
      ARM made the mistake of not standardizing anything beside CPU instructions (and even those aren't always the same - see the mess armv7 created with thumb, thumb-ee, simd, neon, crypto acceleration, etc.). Of course it needs enumeration. But x86 is now catching up with the mess. Just wait...
      Enumeration instead of standardized hw is bad, but I prefer the least worse device tree.
    - jdub 11 hours ago
      Take a look at how modern PCs enumerate all of their non-PCI hardware. I'll put a bucket over here.
      [-]
    - M95D 8 hours ago
      > No one uses the firmware supplied DTs because they're usually broken.
      Oh, and an even more complex UEFI+ACPI solution won't be broken?
      [-]
      - yjftsjthsd-h 7 hours ago
        Many years of x86_64 PCs would seem to imply that it empirically has better outcomes.
    - M95D 9 hours ago
      But ARM has PCI, including it's enumeration. For the many other devices (timer, uart, I2c, PCI controller itself) no enumeration is possible - you can't enumerate searching for a timer without having a working timer - only a hardware description stored somewhere is possible. The device tree is the most logical, easy to understand, fixable, updateable and extendable way to describe hardware. It doesn't have executable code like ACPI does, and that's also one of the good things.
      Let's take an example. Raspberry Pi doesn't have a RTC, but it has GPIO header. You add a RTC module on that header, one of several models of RTCs.
      With the device tree, you load an overlay with some parameters and a kernel driver module. And it works.
      How do you do that with ACPI? Ask the manufacturer for a UEFI update that scans for dozens of RTC types on each I2c bus? Good luck with that! What happens 5 years later when the board is long abandoned (not Raspberry's case, but think of an ordinary chinese manufacturer)?
__patchbit__ 16 hours ago
Can the ThinkPad T14 ARM Snapdragon variant function without pain as a daily Linux/BSD driver?
[-]
- jordand 1 hour ago
  Qualcomm are slowly but steadily improving Linux support for the X1/X2 Snapdragon CPUs (such as the qcom-hamoa-ec driver in 7.2), so it's still a wait. I think there's some challenges with Secure Boot and the firmware with these Lenovo devices though.
- sedatk 14 hours ago
  Snapdragon has excellent single-thread performance (unlike Ampere) if that’s what you’re asking.
- neobrain 10 hours ago
  Unlikely. I've been daily-driving the predecessor (X13s). While it's usable and technically all drivers are there, it's far from "without pain" due to endless number of small but annoying quirks. Just to give you an idea: boot fails 4 out of 5 times, external displays aren't recognized unless plugged in/out several times, sporadic resets during overnight sleep, etc. On top of that speakers will sound prohibitively tinny due unimplemented software-side speaker protection. I haven't tried T14s, but at least the audio issues will still apply there.
  Apple devices supported by Asahi are a far more polished experience.
  [-]
  - M95D 9 hours ago
    > software-side speaker protection
    What's that?
    [-]
    - neobrain 6 hours ago
      See https://asahilinux.org/2024/01/fedora-asahi-new/#speakers
      The effect is understated there, perhaps because Apple speakers are actually somewhat usable without this feature. For the X13s, the speakers might as well not exist in the current state on Linux.
- bpye 11 hours ago
  When I looked at this before I found https://github.com/kuruczgy/x1e-nixos-config - reasonable though not 100% support.
  I believe Ubuntu also has semi official X1 elite support, no idea if they're working on the latest generation.
- izacus 14 hours ago
  No. The driver support is very poor and won't run at all well.
  Even. Setting it up is a pain: https://github.com/Jeremiah-Hawley/Linux-on-Snapdragon
  It can run Windows well though.
- shevy-java 15 hours ago
  Without pain? I mean, there is pain when using Linux. It just works better than, say, Windows.
  [-]
  - hparadiz 12 hours ago
    I just setup Gentoo on a Lenovo laptop last week. It was the least painful process for a Linux laptop of my entire career. Everything just works. Even sleep and the fingerprint sensor for sudo. LLM tuis replaced Google entirely.
    I can't even say there was any pain whatsoever. The experience is now more akin to MacOS circa 10.6.x years.
    [-]
    - izacus 12 hours ago
      Was it a Snapdragon laptop? Because if it wasn't, then it has nothing to do with the OPs question.
      [-]
      - hparadiz 12 hours ago
        Driver support for that particular Lenovo is 100%. You just recompile. The issue is more to do with the CPU not being as good as say an AMD AI Max or an M4.
ginko 12 hours ago
I wonder if a source-based distro like Gentoo would have made OP's life slightly easier. Portage for instance should allow you to maintain a set of patches to automatically apply when you update your kernel. Those flatpak problems also shouldn't exist there.
[-]
- lonelyasacloud 11 hours ago
  At very least it would have given all those cores something to do :-)
cmrdporcupine 13 hours ago
I use a DGX Spark every day as my daily driver and it's great. I barely use the "AI" facilities of it, but as an Aarch64 desktop Linux, I have no complaints.
[-]
- aj_hackman 3 hours ago
  I'm glad I'm not the only weirdo like this. I dropped an unfathomable $800 on a Jetson AGX Xavier in early 2020 simply because I was obsessed with SBCs at the time and couldn't stop thinking about it. This was before the Raspberry/Orange Pi 5 and Apple Silicon. I still use it as a graphics development workstation.
  [-]
  - cmrdporcupine 3 hours ago
    I dunno, I kept working on projects and at shops that had an Aarch64 deployment scenario that ended up involving cross compilation anyways. Either in the cloud in docker images on Aarch64 VPS, or on SBC for embedded systems. To me it was partially about eating my own dog food to have an ARM workstation without giving up Linux. And to have a giant amount of RAM and high speed networking at the same time.
    Also a chance to learn some of the serving stack for inference.
    In the end, it's worked out. It is power efficient, it shipped with a vendor supported Ubuntu. I can run Qwen 3.6 27b reasonably well on it. And it basically does everything I need applications wise.
    It's also small and convenient enough I can toss it in a backpack and take it with me on trips when I'm staying at my elderly parents, just needing monitor/keyboard/mouse.
    A laptop with same chipset would be nice but has its own downsides.
- anthonj 8 hours ago
  Well it's also more than double the price
  [-]
  - cmrdporcupine 8 hours ago
    It wasn't super badly priced when I bought it back in December. It was high, but not insane. It's memory and storage prices that have spiked it. Remember the thing has 128GB of RAM. If you spec a Mac out with the same quantity, it will be in the same price range.
    Certainly way cheaper than a Ampere system like the author here is talking about. I actually looked into building such a system and ... it feels weird to gripe about DGX Spark prices when building out a system like that. The Altra requires ECC RAM (though DDR4 at least). Have fun kitting that out.
    Those systems were built for highly highly concurrent multicore server (or some workstation) loads. Meant to be carved up into multiple virtual machines, really. I have plenty of applications that would do well on a machine like that, but playing YouTube videos etc is not one of them.
    [-]
    - anthonj 6 hours ago
      Ahh that's true, I forgot how badly prices are raising
dlahoda 11 hours ago
he does not mention AI usage.
how it helped to solve problems and search over git sources.
intresting what he would achieve mixing nixos and ai for patches.
[-]
- bayindirh 10 hours ago
  AI will be more harmful than helpful than a very big and unexplored (for them) codebase for them.
  Moreover, playing with code which fiddles with hardware directly is neither simple, nor easy, nor fun.
  [-]
  - dlahoda 4 hours ago
    Big is what AI good at.
    What exactly codebase is unexplored in article? Patches? Just load them in context. Linux is already in model as well as nix and hardware specs.
    AI is good in search and playing(constrained synthesis) over hardware, Linux and configurations specs. Not fun thing.
john_alan 11 hours ago
I’ve been using ARM Debian desktop stably for a long time. I don’t see what the issue is, am I missing something or is this just his hardware choice?
[-]
- yjftsjthsd-h 7 hours ago
  Largely hardware-specific problems, yes, which is typical for ARM in my experience. Although, I would argue that
  > It turned out that there was no org.freedesktop.Platform.GL.nvidia in Flatpak repositories for AArch64. And I used both of those tools quite often.
  is more on the side of being a software problem... with this particular hardware.
rvz 13 hours ago
The AArch64 desktop experiment started in 2020 with the Macbook M1 and it ended in 2026 with great success with Apple phasing out support for Intel.
It is called Apple Silicon.
[-]
- jeroenhd 12 hours ago
  If you think running a Linux desktop on an Ampere is bad, try running it on an M5 Mac!
- preisschild 13 hours ago
  Which is somewhat useless because it doesn't properly support ACPI/UEFI so that you can boot other operating systems
  [-]
  - laurencerowe 13 hours ago
    Wasn’t booting other operating systems supported from early on (two months after release of M1)? It was reverse engineering the graphics hardware that took time and effort.
  - boxed 13 hours ago
    Linux on apple silicon is a thing though: https://asahilinux.org/
    [-]
    - preisschild 13 hours ago
      True, but they had to implement their own bootloader chain and because of such overhead they need a lot of effort to port to each new apple SoC generation
      [-]
      - dezgeg 10 hours ago
        That is the reality for huge amount of ARM powered hardware, unless you fancy running vendor forks of kernel, u-boot, etc.
        [-]
        preisschild 6 hours ago
        True, but not for all arm powered hardware. Especially the more expansive ones. The ampere altra based boards for example do support booting an uefi iso just like on amd64 PCs.
        Look out for Systemready
      - boxed 11 hours ago
        Ok.. and? That's job someone has already done, so what does it matter?
        From what I've understood there's significant backwards compatibility for the new SoCs, so the significant work they need to do is to support new features, not getting things running.
shevy-java 15 hours ago
The Desktop Linux will take over from here guys. Next year it will be ready, together with GNU Hurd for everyone and their Grandma.
[-]
- mort96 13 hours ago
  Has anyone ever pretended that (non-Apple) ARM hardware running Linux makes for a remotely suitable desktop experience for the general public or are you shadow boxing here?