Saturday, August 1, 2020

Hacking up a fix for the broken AppleTalk kernel module in Linux 5.1 and newer

I was looking at one of my classic Macs a few weeks ago, and noticed that my Ubuntu 18.04 netatalk server wasn’t showing up in the Chooser anymore. If you’re not familiar with netatalk, it’s an implementation of Apple Filing Protocol (AFP) that runs on Unix-like operating systems such as Linux and NetBSD. It allows other operating systems to act as Mac file servers. Version 2.x, which I use, supports the ancient AppleTalk protocol. This allows it to work with really old classic Macs that don’t even have a TCP/IP stack installed. Support for AppleTalk was removed in version 3.x, so that’s why I’m still using 2.x.

I checked out the server, and noticed that atalkd wasn’t running.

doug@miniserver:~$ ps ax | grep atalkd
3351 pts/0 R+ 0:00 grep --color=auto atalkd

Hmmm….why wouldn’t atalkd be running? I went ahead and tried to restart netatalk:

doug@miniserver:~$ sudo service netatalk restart
Job for netatalk.service failed because the control process exited with error code.
See "systemctl status netatalk.service" and "journalctl -xe" for details.

Uh oh! Why won’t netatalk start?

doug@miniserver:~$ systemctl status netatalk.service
● netatalk.service
Loaded: loaded (/etc/init.d/netatalk; generated)
Active: failed (Result: exit-code) since Sat 2020-08-01 10:29:36 PDT; 54s ago
Docs: man:systemd-sysv-generator(8)
Process: 1320 ExecStart=/etc/init.d/netatalk start (code=exited, status=1/FAILURE)

Aug 01 10:29:36 miniserver systemd[1]: Starting netatalk.service…
Aug 01 10:29:36 miniserver netatalk[1320]: Starting Netatalk services (this will take a while): socket: Address family not supported by protocol
Aug 01 10:29:36 miniserver netatalk[1320]: socket: Address family not supported by protocol
Aug 01 10:29:36 miniserver netatalk[1320]: atalkd: can't get interfaces, exiting.
Aug 01 10:29:36 miniserver systemd[1]: netatalk.service: Control process exited, code=exited status=1
Aug 01 10:29:36 miniserver systemd[1]: netatalk.service: Failed with result 'exit-code'.
Aug 01 10:29:36 miniserver systemd[1]: Failed to start netatalk.service.

This setup had been working forever. What could have possibly changed? I was keeping up to date with all of my Ubuntu updates. I had already needed to manually patch the netatalk binary due to another bug. Maybe I needed to reapply the patch? But no, the netatalk binary hadn’t been updated. That wasn’t it.

I tried some Googling and noticed that recently AppleTalk had been patched so that you can’t create raw sockets without the CAP_NET_RAW capability, so I fiddled with setcap to set that capability on the atalkd binary, but that didn’t seem to fix anything, so I undid all the capability changes I tested.

After further experimentation, I realized that the appletalk kernel module wasn’t being loaded:

doug@miniserver:~$ lsmod | grep appletalk
doug@miniserver:~$

Naturally, I tried to load it myself:

doug@miniserver:~$ sudo modprobe appletalk
modprobe: ERROR: could not insert 'appletalk': Cannot allocate memory

Aha! There’s the real problem. Why can’t it allocate memory? I wondered if it was something specific to this particular machine. To test my theory, I headed over to my desktop Linux machine and ran the same modprobe command. It failed with the exact same error.

At this point after trying to do more research, I gave up for a while because I had more important stuff to worry about. It’s kind of difficult to search for info about this type of problem, because hardly anybody is using the AppleTalk networking layer in Linux anymore. “There are dozens of us!”

I finally came back and did some more troubleshooting. Since I knew it had worked before, I tried installing various kernel versions in a VM. Sure enough, Ubuntu’s 5.0.0 kernel worked fine. So this was definitely a kernel issue if I wasn’t already convinced.

Next, I tried a bunch of upstream kernel versions. I narrowed the problem down to sometime between kernels 5.0 and 5.1-rc1. Then I ran a git bisect between those versions, following the instructions on the Ubuntu wiki for bisecting upstream kernels. I also used “make localmodconfig” (followed by enabling appletalk in “make menuconfig”) to speed up the compile process after I noticed that most of the compile time was being spent building kernel modules that I wouldn’t be loading anyway.

The bisect process took quite a while. I probably should have figured out a way to automate it with qemu using a strategy similar to the one used in this excellent blog post. But nevertheless, it finally settled on this commit from March 2019 being the start of the problem:

[6377f787aeb945cae7abbb6474798de129e1f3ac] appletalk: Fix use-after-free in atalk_proc_exit

This commit simply does a better job of checking return values of the functions called by atalk_init, and cleaning up properly if they fail. In particular, the return values of these functions, which were previously ignored, are now checked to ensure they succeed:

  • sock_register
  • register_netdevice_notifier
  • atalk_proc_init
  • atalk_register_sysctl

Further inspection revealed atalk_proc_init as the real culprit. A refactor of atalk_proc_init, which happens to be the previous commit to the one linked above, accidentally left the code in a state where it would return -ENOMEM instead of 0 on success. So it always returns -ENOMEM, regardless of success or failure. This explains the “Cannot allocate memory” error being reported when I attempted to insert the appletalk module.

Armed with this info, I did something really disgusting. I made a copy of the kernel module in /tmp and hacked it by tweaking bytes in a hex editor. A disassembly of atalk_proc_init reveals a line of code that loads a value of 0xFFFFFFF4 (-12) into the EAX register just before it exits. ENOMEM is defined as 12. So this is the line that’s causing it to return -ENOMEM. I simply hacked this line to load 0 into EAX instead. This basically leaves the logic working the same way it used to work before the two aforementioned patches were applied, because the return value was previously being ignored anyway.

Before:

294:   b8 f4 ff ff ff          mov $0xfffffff4,%eax

After:

294:   b8 00 00 00 00          mov $0x0,%eax

This hack by itself didn’t solve the problem:

doug@miniserver:~$ sudo insmod /tmp/appletalk.ko
insmod: ERROR: could not insert module /tmp/appletalk.ko: Key was rejected by service

The problem here is that Ubuntu’s kernel modules are signed. After hacking the binary, it no longer matched its signature. This is an indicator of just how ugly my hack is. So I did something even uglier: I stripped the signature out of the module completely:

doug@miniserver:~$ strip --strip-debug /tmp/appletalk.ko

Then, I tried loading it again:

doug@miniserver:~$ sudo insmod /tmp/appletalk.ko
doug@miniserver:~$

Success! My dmesg log reports an error about a failed signature, thus tainting the kernel:

[ 4479.495054] appletalk: module verification failed: signature and/or required key missing - tainting kernel

But this doesn’t matter for my purposes. netatalk works now:

doug@miniserver:~$ sudo service netatalk restart
doug@miniserver:~$ ps ax | grep atalkd
1698 ? S 0:00 /usr/sbin/atalkd
1716 pts/0 S+ 0:00 grep --color=auto atalkd

Now I can store the hacked version of the module in the correct subdirectory in /lib/modules, and everything works automatically when I reboot. Yay!

This is a really ugly fix though. Whenever I upgrade my kernel, I’m going to have to manually patch it. That is, until the patch hits the mainline kernel, and then I can hope that Ubuntu pulls the fix into their kernel. The reason I’m documenting the binary patch here is because realistically, it’s going to take forever for the kernel fix to get released. I have no idea if it will be possible to convince Ubuntu to pull this fix into their kernels once it’s released. It’s not a high-priority bug fix because nobody uses AppleTalk anymore. And I don’t want to limit myself to an old kernel just for this silly reason.

So the first step here is to submit the patch to the linux-netdev mailing list, and get the fix merged into the mainline kernel. I searched the mailing list, and discovered that I wasn’t the first one to run into this bug. Actually, two separate people have both tried to fix it, but had their patches rejected for minor reasons:

Obviously not too many people are using the appletalk kernel module these days, or else people would be up in arms about how it has been broken since kernel 5.1. I would offer up my own third attempt at getting this patch merged, but since two attempts were made just last month, I suspect one of them will succeed shortly. I hope so, anyway. In the meantime, I guess I’ll just continue binary patching my kernel module.

I think there is an even bigger long-term solution than this kernel fix. Classic Mac users who use netatalk with AppleTalk need to join forces to address a few things. netatalk 2.x is dead, but the last release of it is broken on Linux, at least the AppleTalk portion of it — which is the only reason I would still use it. Maybe we should fork netatalk 2.x? Also, I’m pretty sure the AppleTalk subsystem in the Linux kernel is full of little issues. It works well enough for most people who need netatalk support, but I know it is broken in other ways. Maybe this project idea for a portable userspace AppleTalk stack will gain some traction going forward. If it ever does, perhaps we could add support for it in a netatalk fork.

Anyway, I thought it might be interesting to share this troubleshooting journey and the eventual resolution. I just tested connecting to my Ubuntu 18.04 server from my old Mac IIci running System 7.1, and everything works perfectly again…until the next kernel upgrade, that is!



from Hacker News https://ift.tt/39NTwOd

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.