Thursday, July 29, 2021

Let’s Talk OpenZFS Snapshots

In Basics of ZFS Snapshot Management, we demonstrated how snapshots and clones can be used to access data from a given point in time. In this article, we’ll learn how ZFS boot environments (BEs) are used to boot the operating system itself into a previous point in time.


If you have been following this series, you may have already discovered how easy it is to create and manage OpenZFS snapshots. If you haven’t used snapshots yet, give them a try! We’re confident you’ll quickly wonder how you ever got along without them.

If you’re already using snapshots and aren’t an aggressive snapshot pruner, you’ve probably wondered: How many snapshots is too many? Since there’s no such thing as infinite storage capacity, your available disk space is an obvious limiting factor. But at what point will snapshots result in a performance hit? Unlike other filesystems, the existence of one or one thousand snapshots has no impact on the performance of the filesystem, reading and writing files performs the same either way. However, the performance of administrative operations, like listing and deleting snapshots, are impacted by the number of snapshots that exist in each dataset. Is it OK to have hundreds of snapshots? Assuming sufficient storage capacity, what about having thousands or tens of thousands of snapshots? In our experience, over 1000 snapshots per dataset starts to cause significant performance issues when listing, creating, replicating, and destroying snapshots. The performance impact is not related to the total number of snapshots on the system, but the snapshots on each dataset. A hundred datasets each with one hundred snapshots will see no performance impact on listing, while a single dataset with 2000 snapshots may take many seconds to return the list of snapshots. While you may never need to store that many snapshots, you still want to get the most value for the space snapshots consume over time.

An internet search won’t give a definitive answer to how many snapshots is too many, with answers ranging from “don’t worry about it” to “it depends”. While not satisfying, the crux of the matter is there is no definitive answer as everyone’s storage system and data use is different.

This article introduces some questions to ask yourself as the answers will help you better understand your snapshot use. You can then use that information to determine a snapshot creation and pruning schedule that fits your needs without introducing a performance hit.

How is data modified?

The first question may not be obvious, but it is crucial to understanding when and how often it makes sense to create snapshots. Ideally, you want to create snapshots that matter and deliver the most value.

As an example: consider a web server where the content changes only when there’s a new product launch, or there is a new software release for an existing product, or the web team does its periodic sweep to refresh and improve content. It makes sense to take a snapshot before the content changes, the web team may want to keep an archive of previous versions of the website for several years. In this case, the number of snapshots is minimal, they are stored for a long time, and depending upon the amount of content changes, there may be quite a few differences between snapshots.

This case is quite different from a file server which stores the home directories of many users or even a personal workstation that you work on all day. These use cases tend to benefit from automated snapshots on a regular schedule, say every 15, 30, or 60 minutes during work hours. This results in a lot of snapshots whose value tends to quickly diminish over time.

When users are making changes to files, how do you determine the value (and hence how often to snapshot and how long to keep the snapshot) of file changes? Of course, it depends. If the system administrator is making changes to config files, there is great value in keeping previous changes, at least until the changes are validated. If a user is making changes to a spreadsheet, a periodic snapshot may or may not catch a specific change they wish to recapture.

Which brings us to the question: which applications are users using? Many modern desktop applications and operating systems provide a built-in file version history. Most developers use a revision system and are taught the mantra “commit early and often”. A lot of business applications operate online or are hosted in an external cloud, often providing a version history.

Only you can understand what applications your users are using, if they are taking advantage of built-in history/revision systems, and if they are bugging you for file restores because they aren’t using revisioning applications or keep forgetting to commit or save versions. You also know which systems are under your control and what type of data is important enough to warrant keeping previous versions using OpenZFS snapshots.

What is the cost of storing snapshots?

If you have lots of storage capacity, the cost of archiving snapshots can be low. However, scheduled snapshots do add up. Consider the math: taking 1 snapshot of a dataset every hour results in 168 snapshots per week—in other words, it would take about 6 weeks on that schedule to achieve that 1000+ snapshots per dataset performance hit. For this example, one would want to consider if a snapshot was needed every hour of every day, as well as when to start pruning older snapshots.

Ask yourself: is there value in keeping a snapshot of a dataset at 10:00 am and 11:00 am from 3 months ago? 1 month ago? Last week?

What is the cost of deleting snapshots?

This is the other side of the previous question. Will it be a big deal if you delete that snapshot of the filesystem at 10:00 am from 5 weeks ago? If not, how far back do you need to go to still have snapshots of value?

Perhaps your snapshots are activity-based rather than schedule-driven? If so, do you still need to access data from 3 pkg-updates ago?

Ask yourself: how much will it cost you in time and effort if a specific revision is no longer available?

How much space is being used by snapshots?

By now, you should have a better idea of what data is important to snapshot and how often you want to capture that data. Next, you’ll want to determine if you have enough storage capacity to maintain the desired number of snapshots. If capacity becomes a concern, you can decide if it is worthwhile to add more capacity or to reconsider your snapshot pruning schedule.

Did you know?

Getting your ZFS infrastructure up to date has never been easier!

Our team provides consistent, expert advice tailored to your business.

Start by listing the space property (-o) of the pool. Here is a snipped example of the tank poolon my laptop:

Zfs list -o space
NAME             AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
tank                   270G   69.2G        0     88K              0      69.2G
tank/ROOT              270G   44.4G        0     88K              0      44.4G
tank/ROOT/mar26   270G   41.6G    18.7G   23.0G                   0          0
tank/usr/home/dru 270G   4.34G    1.17G   3.17G                   0      2.36M

The columns in this listing contain this information:

  • NAME: the name of the filesystem (pool or dataset)
  • AVAIL: available storage capacity
  • USED: amount being used (as with any filesystem, OpenZFS performance will start to suffer when it gets close to capacity; typically you want to stay below 80% or consider adding more capacity as the system starts to approach 90%)
  • USEDSNAP: amount consumed by snapshots of this filesystem
  • USEDDS: amount being used by this filesystem
  • USED REFRESERV: minimum amount of space guaranteed to this filesystem
  • USEDCHILD: amount being used by children of this filesystem

In this example, there is still plenty of storage capacity on this system. It is interesting to note that over 25% of the space usage in dru’s home directory is used by snapshots.

On a system with many snapshots, this type of listing gives a quick glance of which filesystems are consuming the most snapshot space as well as an overall view of how much capacity is still available on the specified pool.

You can also zero in on a particular dataset. Note that the last command was zpool (in order to see pool usage) while this command uses zfs (as I’m listing a dataset). This time I’ll get the usedbysnapshots property of my home directory dataset:

zfs get usedbysnapshots tank/usr/home/dru
NAME                            PROPERTY                        VALUE           SOURCE
tank/usr/home/dru            usedbysnapshots                    1.17G           -

As expected, the space used by snapshots matches the 1.17G seen in the previous listing.

While the usedbysnapshots property gives an idea of how much space is consumed by snapshots, as well as how much space would be freed if all the snapshots in a dataset were destroyed, it does not indicate how much space you’ll get back if you start pruning only some of the snapshots. Due to its COW nature, OpenZFS can’t free blocks that are still being referred to.

As an example, I’ll create a listing that shows the NAME, WRITTEN, REFER, and USED columns (in that order) of just the snapshots in my home directory:

zfs list -t all -o name,written,refer,used | grep dru@
tank/usr/home/dru@test-backup    2.71G  2.71G            176M
tank/usr/home/dru@homedir.        176M  2.71G           12.6M
tank/usr/home/dru@homedir-mod    18.5M  2.71G           18.1M

The written property is useful for understanding snapshot growth as it represents the amount of referenced space written to the dataset since that snapshot was taken. The used column indicates how much of the data is unique to that snapshot; in other words, how much space will be freed if that particular snapshot is deleted.

Performing a verbose dry-run (-nv) will show the amount of space that would be reclaimed by destroying the specified snapshot. The amount will match the used column seen in the listing above:

zfs destroy -nv tank/usr/home/dru@test-backup
would destroy tank/usr/home/dru@test-backup
would reclaim 176M
zfs destroy -nv tank/usr/home/dru@homedir
would destroy tank/usr/home/dru@homedir
would reclaim 12.6M
zfs destroy -nv tank/usr/home/dru@homedir-mod
would destroy tank/usr/home/dru@homedir-mod
would reclaim 18.1M

Did you know?

Want to learn more about ZFS? We consistently write about the awesome powers of OpenZFS in our article series.

Putting it all together

Understanding which data benefits from being in a snapshot and how long it makes sense to keep snapshots will help you get the most out of OpenZFS snapshots. Pruning snapshots to just the ones you need will make it easier to find the data you want to restore, save disk capacity, and prevent performance bottlenecks on your OpenZFS system.

Like this article? Share it!



from Hacker News https://ift.tt/3x5Ywrr

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.