Monday, November 28, 2022

Building NAS with ZFS, AFP/Samba for Time Machine

Building NAS with ZFS, AFP/Samba for Time Machine

It’s pretty common to have some PCs whose CPUs are no longer fast enough for our heavy tasks like rendering 8k-video or running numerical simulations. However, these PCs may still be suitable for serving as a NAS server. Today, I’m going to take a note about how I build my NAS, which also works as a Time Machine backup place for my Macbook.

Planing Disk usage

In a NAS, we usually combine many hard disks into a single large storage volume for convenience. Also, we would require some redundancy such that the data in the large pool can survive under some hard disk failures. I’d recommend using RAID or ZFS to set up such a large storage volume for home-usage. A software-RAID can be built by the mdadm command, while the ZFS-setup can be managed by zpool and zfs commands.

When it comes to redundancy, the general concept behind RAID and ZFS are quite similar. They either simply mirror the data to a second disk or store some parity data in several extra disks. There is a good introduction about RAID.

In the case of mirroring (or RAID-1), you can only use half amount of your disks for data storage. The other half amount of the disks will be used to store a full copy of your data. This is the simplest way that requires minimal computation power for redundancy. If the budget is okay, you should go for it. However, do I really need to sacrifice so much space for redundancy? If you what to have larger storage space with the same amount of disks, you may take a look at the solution realized by using parity data.

When you use N disks to build a RAID-5 array (or a raidz1 VDEV), you allow any one-disk failure to happen at one time. You can replace the failed disk and rebuild the data in it by using the data in other N-1 disks. This is because, in a RAID-5 array, the parity data of the amount equal to the size of a disk will be calculated when you write data into the array. The parity data is like the check digit of your credit card number. It is computed according to your data. With the parity data, if some parts of your original data are missing, it is possible to resolve them. How much missing data is resolvable depends on the length of parity data and the algorithm. In short, you can have one disk failure at one time for a RAID-5 array, which has a usable storage size equals to the size of N-1 disks. A more secure setup is a RAID-6 array (or a raidz2 VDEV), where you can have two disks fail at the same time. The usable storage size of a RAID-6 array equals the size of N-2 disks.

Since I’ve decided to use ZFS in my NAS, I’d like to emphasize more about the constrain in ZFS.

In ZFS architecture, we construct a VDEV by using several hard disks. The redundancy is assigned during the construction of VDEV. As illustrated in the figure above, the blue VDEV is made by two blue hard disks operating in a mirror mode, while the red VDEV is built by five red hard disks in a raidz2 (RAID-6) mode. Then, a ZFS pool is built on top of these two VDEVs. An important note is that you CANNOT add extra disks to an existing VDEV. The only way to expand your ZFS pool is by adding a new VDEV into it. So you can have your own good disk plan based on this knowledge. For some further discussion, you may read this article.

Purchase Hard Disks

Once you have the knowledge about RAID or ZFS pool, you may want to estimate how much storage space you need and how much redundancy is required according to the importance of your data and the budget you have. In my case, I’d like to have a 12 TB space with ZFS mirrored setup (equivalent to RAID 1), which requires two 12 TB hard disks. So I purchased two Seagate IronWolf 12TB hard disks.

Besides, if you are making software RAID (by mdadm) or ZFS pool, you don’t need to worry too much about the specification of hard drives. Hower. If you are going to build hardware-RAID, remember to purchase Enterprise Hard Drives. Sometimes, the hardware-RAID controller will report disk failure because of the slightly slow response of a cheaper hard drive.

Furthermore, if you are caring about the performance, you may purchase two extra SSDs and a larger system RAM for caching. In the case of ZFS, a dedicated disk (usually SSD) called a Separate Intent Log (SLOG) can be added to your system to improve the writing performance. About the reading performance, you may increase your system RAM, which will be used as the Adaptive Replacement Cache (ARC), and set up another SSD to be the Second Level Adaptive Replacement Cache (L2ARC).

In short, although we haven’t talked about how to create a ZFS pool. In the end, you may use the following two commands to add SLOG and L2ARC to your pool mypool respectively, where nvme0 and nvme1 are the names of your SSDs in /dev/. You may also use the names in /dev/disk/by-id/ to make it more robust if you may rearrange the hardware connections of SSDs:

# SLOG (for better writing performance)
$zpool add mypool log nvme0
# L2ARC (for better reading performance)
$zpool add mypool cache nvme1

For more information regarding SLOG and L2ARC, please refer to:

Create ZFS pool and datasets

First of all, install the zfsutils.

$ sudo apt install zfsutils-linux

ZFS pool

A ZFS pool mypool with a single VDEV made by two mirrored disks can be created by

$ sudo zpool create -o ashift=12 mypool mirror \       
ata-ST3000DM001-9YN166_S1F0KDGY \
ata-ST3000DM001-9YN166_S1F0JKRR

The -o ashift=12 parameter is for better performance on a 4K hard drive. The ata-… are identity numbers of hard disks, which can be found in /dev/disk/by-id/*. With the mirror keyword, zpool command will create a VDEV with mirror mode. To make the VDEV in different ways like raidz (RAID5), raidz2 (RAID6), …, please refer to this tutorial.

After creating the pool, we can check it with the zpool status command:

$ zpool status  pool: mypool
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
mypool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-ST3000DM001-9YN166_S1F0KDGY ONLINE 0 0 0
ata-ST3000DM001-9YN166_S1F0JKRR ONLINE 0 0 0

ZFS datasets

To create a dataset mypool/feynman_tm in the pool mypool, we can use the following command:

$ sudo zfs create -o compression=lz4 mypool/feynman_tm

, where the purpose of an optional argument -o compression=lz4 is to turn on the compression for this dataset. The dataset will be mounted on /mypool/feynman_tm/ automatically. Since we will use this dataset for our Time Machine, it’s good to set a quota on it such that the Time Machine won’t eat up all the space in mypool.

$ sudo zfs set quota=2T mypool/feynman_tm

After all, you could check the list of your datasets:

$ zfs listNAME                 USED  AVAIL     REFER  MOUNTPOINT
mypool 1.46M 12T 96K /mypool
mypool/feynman_tm 720K 2T 104K /mypool/feynman_tm

Remember to change the owner:group of the folder if you want:

$ sudo chown feynman:caltech /mypool/feynman_tm

Set up a Time Machine Server

To set up a Time Machine server, we need to install two packages on our NAS server: Netatalk and Avahi. Netatalk is an open-source implementation of the Apple Filing Protocol, which will be used as our file-sharing protocol. On the other hand, Avahi is an open-source implementation of Apple’s Bonjour/Zeroconf. It will be used for advertising our service in the LAN.

Installation

$ sudo apt install netatalk 
$ sudo apt install avahi-deamon

Netatalk Configuration

# /etc/netatalk/afp.conf  

[myTimeMachine]
path = /mypool/feynman_tm
valid users = feynman
time machine = yes
[myShared]
path = /mypool/shared
valid users = feynman schwinger

After the modification, remember to restart the Netatalk service

$ sudo systemctl restart netatalk.service

Avahi Configuration

Create the file /etc/avahi/services/afpd.service with the following content:

<?xml version="1.0" standalone='no'?><!--*-nxml-*--> 
<!DOCTYPE service-group SYSTEM "avahi-service.dtd">
<service-group>
<name replace-wildcards="yes">%h</name>
<service>
<type>_afpovertcp._tcp</type>
<port>548</port>
</service>
<service>
<type>_device-info._tcp</type>
<port>0</port>
<txt-record>model=Xserve</txt-record>
</service>
</service-group>

Then, restart the Avahi daemon:

$ sudo systemctl restart avahi-daemon.service

Connect to the Time Machine Server

First, press [Cmd]+[K] in Finder. The following window will pop out. Then enter afp://IP_address_of_your_server

You will be asked your username and password on your server, which belongs to the user set by valid users = xxx in the server’s /etc/netatalk/afp.conf. After that, you can connect to the server. You may find the shared folder in Finder under Network/hostname_of_your_server.

Also, in the Time machine setting, you should be able to see the myTimeMachine, which was exported in the server’s /etc/netatalk/afp.conf.

Now, you can start your Time Machine backup to the NAS.

Sparsebundle and Brand Size

Right few minutes after you started your Time Machine backup, you may cancel it and see what’s actually happening here. If you looked into the myTimeMachine folder from Mac, which is equivalent to the data in /mypool/feynman_tm in the server, you might find a XXX.sparsebundle file. That is a kind of disk image created by Time Machine. By double-clicking it in Finder, you can mount it on your Mac. It is an APFS or HFS+ formatted partition. Inside the mounted volume, A folder named Backups.backupdb contains all the backups of your Mac. You may browse the content if you want.

On the other hand, instead of mounting the XXX.sparsebundle file, you may right-click on the XXX.sparsebundle file in Finder and choose Show Package Contents. (Remember to eject/unmount the image for safety) You will find several files in the bands folder. The size of each of these files is about 8MB, which is the default band size of sparsebundle. These are the bands of the sparsebundle. As you put more backup data into XXX.sparsebundle, the number of these bands will increase.

However, the default 8MB band size might be too small in some cases. If your backup size is large, you may have too many band files in the bands folder. Although it is unlikely that you may reach the limit of ZFS, it might result in some performance issues. I apologize that I haven't figure out what kind of problem you will find exactly. But there are some articles like this suggesting that you may increase the band size to avoid some issues. If you wish to do so, there is a convenient GUI tool — Spundle, a utility for creating and adjusting sparse bundles, which can be downloaded from here. After you changed the band size of the sparsebundle file or created a new sparsebundle file, remember to copy those com.apple.TimeMachine.* files from the original sparsebundle file into the new one such that the Time Machine can recognize it.

Connecting from Anywhere (with Tailscale)

One of the common problems is that we do not have a public static IP for our NAS. This can be solved by setting up a VPN (Virtual Private Network). An extremely convenient solution is the Tailscale, which is a zero-config VPN based on WireGuard. You can simply follow the instruction on their websites to install the Tailscale on both the NAS and your Mac client. By signing up with a Google account, you can use its free solo plan to connect up to 100 devices in a VPN. It is also possible to share a specific node with your friends in its free plan, which is useful for sharing your NAS with your friend. The cumbersome authentication setting of WireGuard is fully taken care of by the Tailscale. It’s almost a zero-config solution with a nice web-based console that offers you some settings about Sharing/DNS/Routing/ACLs.

After installing the Tailscale on your NAS server and Mac client according to the instruction, The NAS and clients will be assigned IP addresses in the 100.64.0.0/10 subnet (from 100.64.0.0 to 100.127.255.255). This is called the “Carrier Grade NAT” (CGNAT) address space, reserved by RFC6598. For more information about these 100.x.y.z IP address, you may refer to this article. All you need to do now is reconnect your NAS by using this new IP address. i.e., using afp://100.x.y.z. The actual IP of NAS can be seen on Mac’s Tailscale User Interface, on the web console, or checked by ip addr command in your NAS.

Now you can connect to your NAS from anywhere with the Internet! Also, you can SSH into your NAS using this Tailscale VPN IP!

Using Samba instead of AFP (recommended in 2021)

As pointed out by many people on Reddit that Apple has effectively deprecated AFP, I would like to add this section telling you how to use Samba instead of AFP for Time Machine and file sharing.

Installation

$ sudo apt install samba

Adding User to Samba

$ sudo smbpasswd -a feynman
New SMB password:
Retype new SMB password:

Remember that you will need to use this password instead of the Unix password to connect to the Samba. To change the password or remove a user from Samba, do:

# Change feynman's password
sudo smbpasswd feynamn
# Delete user feynman
smbpasswd -x feynamn
# List Samba users
sudo pdbedit -L -v

Configuration

Add these sections into /etc/samba/smb.conf. The @ sign in the valid users is for the group name like @mit here. The better support for a Mac client is through vfs_fruit. For more explanation, please read the article: Configure Samba to Work Better with Mac OS X. For a Time Machine entry, remember to add fruit:time machine = yes to it.

[mySharedforMac]
comment = Shared with feynman, schwinger, and the group mit
path = /zfs_pool/shared
browseable = yes
guest ok = no
writable = yes
valid users = feynman schwinger @mit
vfs objects = fruit streams_xattr
fruit:metadata = stream
fruit:model = MacSamba
fruit:posix_rename = yes
fruit:zero_file_id = yes
fruit:veto_appledouble = no
fruit:wipe_intentionally_left_blank_rfork = yes
fruit:delete_empty_adfiles = yes

[myTimeMachine]
comment = Feynman's Time Machine
path = /zfs_pool/feynman_tm
browseable = yes
guest ok = no
writable = yes
valid users = feynman schwinger @mit
vfs objects = fruit streams_xattr
fruit:metadata = stream
fruit:model = MacSamba
fruit:posix_rename = yes
fruit:zero_file_id = yes
fruit:veto_appledouble = no
fruit:wipe_intentionally_left_blank_rfork = yes
fruit:delete_empty_adfiles = yes

fruit:time machine = yes

After all, remember to restart smbd:

sudo systemctl restart smbd.service

Then, you can start to connect to your NAS via Samba! Just replace the afp://…… in the above section by smb://…… when you connect from Finder.

Further Readings



from Hacker News https://ift.tt/wTkagcz

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.