Making Resource-Fork-Aware Backups with rsync on Mac OS X

by Marion Bates <mbates at whoopis.com>

D Andrew Reynhout, author of a patched, HFS-metadata-aware version of rsync, has described the problem better than I ever could've. So here it is verbatim, reprinted without permission from http://www.quesera.com/reynhout/misc/rsync+hfsmode/:

Mac OS X uses the HFS+ filesystem, by default. HFS+ files are often composed of a data fork, a resource fork, and Finder metadata. The data and resource forks contain what you normally think of when you think of file information: data, program code, etc. Finder metadata includes information like file type and creator, comments, modification dates, locked and invisible status, and Finder colors.

Traditional UNIX filesystems only store a single stream of file data (the HFS+ equivalent of the data fork). Mac OS X (or Darwin, more precisely) is a genuine BSD UNIX, but with a nontraditional filesystem. Because of this, standard UNIX tools can only see certain portions of OS X files.

The difficulties caused by HFS/HFS+ aren't new. Early Mac users of BBSes and the Internet had the same problems when uploading files from a Mac for storage by other operating systems. Since the foreign OS had no way to store the additional HFS data, the uploads would be incomplete/useless. To solve this problem, Apple and others invented conversion formats that collected the full set of file information into a single data stream. Examples (some with varying and/or additional design goals) include AppleSingle, AppleDouble, MacBinary, BinHex, and Stuffit.

The relatively new development is OS X. Now that Mac users can run decades of software written for traditional UNIX machines, some of that software needs to be updated to work properly with HFS+ files.

One such standard UNIX tool is rsync, an excellent file synchronization utility which is also great for use as filesystem backup software. Rsync builds and runs without errors on OS X, but because it is unaware of resource forks and other metadata, it creates incomplete (and therefore corrupt) backups.

...

This patch will make rsync HFS+ metadata-aware. Resource forks and Finder metadata are assembled on the sender into an ephemeral file in standard AppleDouble format, before being sent to the destination.

This method preserves disk space on both sides, with zero redundant data and only a small amount of overhead per file (~100 bytes of AppleDouble headers for each file that has a resource fork and/or HFS+ metadata). It works with any destination filesystem and operating system (tested with Solaris and Linux), and even with older or unpatched versions of rsync.

My document here is simply a recipe for making backups using his patched rsync. It is especially geared toward people who are still using and/or storing "Classic" (pre-OS X) Macintosh files on modern, even Intel-based, OS X Macs. I am a collector of old Macintosh computers, and naturally I archive lots of old software on my modern OS X iMac, since old media fails with age; I discovered with horror awhile back that my rsync-based backup was completely useless because of the metadata problem outlined above. I hope this is helpful to others with similar situations.


Backup:

First, download the HFS-metadata-enabled version of rsync from the url above. Place it somewhere NOT in the regular system path. Why? Because you never know when Apple's next Software Update will overwrite rsync with an updated version, one that does not support HFS mode. In my case, I have a directory called "bin" under my home directory, and I stick my "custom" command-line programs in there, so that they will never be interfered with by an update.

Then, in the Terminal, run

chmod 755 ~/bin/rsync
to make it executable.

Make a backup shell script. Here is mine:

#!/bin/bash

# Space-separated list of directories to back up; edit as needed
DIRS="/Users/mbates/personal /Users/mbates/Pictures /Users/mbates/Work"

# Options to pass to rsync; edit as needed
# "--hfs-mode=appledouble" = copies resource fork data 
# "--delete" = destructive backup 
#   (i.e. if you delete a local file, it will be deleted 
#   on the server at next backup; keeps local+backup synchronized)
# "--update" = update only (don't overwrite newer versions of files)
OPTS="--hfs-mode=appledouble --delete --update"

# Backup destination. In this case, it is another hard disk on the same machine.
# Incidentally, it is DOS-formatted, irrelevant here.
# If you wish to back up to a server via ssh, change the line to something like
# BACKUPDIR="remoteusername@someserver.something:/path/to/backup/destination"
BACKUPDIR="/Volumes/BackupDrive/MAC-BACKUP"

# ignore Mac droppings
EXCLUDES="--exclude .DS_Store --exclude .Trash --exclude Cache --exclude Caches"

# Build the actual command
# NOTE the specific path to the "special" version of rsync
COMMAND="/Users/mbates/bin/rsync -av $OPTS $EXCLUDES $DIRS $BACKUPDIR"

# Informative output
echo About to run:
echo $COMMAND
echo Please do not close this window until it is finished.

# DO IT!
$COMMAND

echo Done.

# the end.

You can copy and paste this whole thing into a text file, using a real text editor (NOT Word or TextEdit!), make changes for your system, and save it as something like "backup.sh" and put it somewhere safe (in my case, the same ~/bin dir as the rsync binary).

Then, like before, do

chmod 755 backup.sh
to make it executable.

To run the script, type

~/bin/backup.sh
(return)

The first time it runs, it will take a long time (make sure you have enough space on the destination!) But thereafter, it will run much quicker, as it will only backup new/changed files.

Results and restoration:

Running this script yields on the backup volume a directory of all your files, along with their associated resource fork files, which are hidden in the Finder view. Many of the files will have blank document or generic "unix shell script" icons, and in the case of Classic Mac application files, they will be zero kilobytes in size when you Get Info on them. Yikes! But wait, everything will be okay...

In the following example, I have a folder called "untitled folder" on my main hard disk, and it contains one item -- a Classic Mac app called "Quill". I want to back it up to my secondary hard drive, called Prelude_to_Oblivion. This is how things look before I do anything:

(Note that there is a white circle-slash symbol superimposed over the app icon; this is because I am using an Intel Mac, which of course cannot run Classic. But it can still identify Classic apps as such.)

Now, I run my backup, like so:

If you look carefully at rsync's output, you will notice that a file called "._Quill" has been created on the backup, as well as the actual file. We can't see this dot-underscore file in the Finder. (There is a view preference to override this, I believe, but most people don't have it turned on.)

Here's what the backup looks like in the Finder:

Ewww...blank document, and it's zero KB to boot! But, the Terminal tells a different story:

There's that dot-underscore version -- that's the resource fork data.

Now, we can put the genie back in the bottle. First, change directories one level up -- the command to glue the resource fork back to its original file expects to have the enclosing directory as its target. Then run the command:

/System/Library/CoreServices/FixupResourceForks untitled\ folder

It's a little confusing in the screenshot below, because of line-wrapping and because the FixupResourceForks program outputs the names of the resource-fork files as it runs. After it finished, the next thing I did was cd into the directory and run ls, then ls -al, to show that the dot-underscore file is now gone -- it has been reunited with its "parent" file.

Viewing it in the Finder shows what we want to see -- its icon and data restored!


Now, the caveats: If, before running the Fixup command, you were to rename the backed-up file, then the Fixup command would fail (unless you use the Terminal to rename the dot-underscore version accordingly). Similarly, if you move the file then try to run fixup, it will fail. They must have the same name, and they must be in the same enclosing folder.

My backup "strategy" is simply to leave the files separated from their fork data until or unless I need to use/restore them -- only then do I run FixupResourceForks. You can, alternatively, run FixupResourceForks on the entire backup directory -- it recurses automatically -- but it takes a very long time, and the worst part is, the next time you run your rsync backup, it sees all these updated versions of files and "missing" resource fork data files, and it overwrites them (thus "breaking" them), again.

There is an option to FixupResourceForks, "--no-delete", which leaves the separate fork file in place after it does the restoration, and I use the "--update" flag with rsync to try to get it NOT to overwrite the "newer" version on the backup destination; but even with both of these tricks, I find that rsync still seems to want to re-copy and re-break every file. There is surely a better way, but I haven't found it yet. If you have suggestions, please let me know!


References/Thanks:

  • D Andrew Reynhout. http://www.quesera.com/reynhout/misc/rsync+hfsmode/