#rsync FAQ
Maintained by Kevin Korb (BasketCase)
This is a document that I have written in an attempt to answer some of the common questions that I have encountered in the #rsync support channel on irc.libera.chat. Some of these may be covered in the official FAQ and all of them are probably covered in the man page but people ask anyway so here we are.
Index
- How do I sync both directions?
- How do I rsync a database?
- Rsync seems to want to update file(s) even though they are all the same
- Why doesn't --delete delete anything?
- How do I connect to an sshd on a non-standard port number or use other ssh parameters?
- Do I need to setup the rsync service or rsyncd.conf file?
- How do I rsync on Windows?
- I can't get includes/excludes to work
- How do I get rsync to transfer a file as soon as it is modified?
- Do I need rsync installed on both systems?
- Why does rsync re-copy the entire file when most of it is already there?
- I am afraid of allowing root to ssh in to the system. What are my options?
- I ran rsync but the source and the target are not the same size?
- I am using --checksum and it is really slow. Sometimes so slow that it times out.
- Why can't I access the files I just copied with rsync on Windows 7 or Vista?
- What are some cool command line switches to read about and try with rsync?
- What are some alternatives to rsync?
- Q: How do I sync both directions?
- Explanation: You don't. Rsync has no memory of what it has done in the past. Therefore when a file exists on A but not B rsync has no idea if you added that file to A or deleted it from B. Also, if a file was modified on both rsync can only tell which is newer. The older would then be overwritten without notice of conflicting new data.
- A1: Unison has many of the same features as rsync plus it does remember what it has done in the past so the problems listed above are not an issue with it.
- A2: If you are in this situation it is probably likely that what you really want is a source code revision control system like git, Subversion, or CVS.
- A3: Or perhaps an internally provided cloud storage system such as OwnCloud and the underlying csync tool.
- A4: It is possible to do a sort of 2-way sync. You can accomplish this by running rsync twice in both directions using the --update flag and NOT the --delete flag. Use at your own risk.
- A5: Here is an article from a user who put together a way to allow --delete to work in a multi-pass setup. Note that there is no conflict resolution.
- Q: How do I rsync a database?
- A1: You don't. Database files are constantly being updated at the block level. Using rsync or any other file based tool on them will create an inconsistent and probably corrupt copy.
- A2: You can get around that problem by telling the database engine to freeze the database files (lock the tables) so that they aren't being modified. Then do a snapshot using your LVM or filesystem tools. The other way is to simply shut down the database while rsync is copying it.
- A3: Of course you can also just use the tool that comes with your database that dumps the database to a text file (mysqldump, pg_dump, etc) and then rsync backup those files. However, this takes significantly more time and creates significantly more server load than LVM snapshot method.
- Q: Rsync seems to want to update file(s) even though they are all the same
- A1: Run rsync with --itemize-changes. This will prefix each file name in the output with a string telling you why rsync thinks it needs to modify that file. If it shows only a timestamp difference then make sure you are running rsync with --times (or --archive which includes --times). Note: the meaning of the output is explained in the --itemize-changes section of the rsync man page.
- A2: If rsync is showing only a timestamp difference but --times doesn't seem to help then the problem is probably that your target is FAT formatted. The FAT filesystem can only store time stamps with 2 second resolution. Therefore the time stamps will be wrong. Use the --modify-window=2 parameter to compensate for that. Note that you will need to change that to --modify-window=3602 if you have a daylight savings time change.
- A3: If rsync shows ownership, group, or permission changes and you are running with --archive then you are probably writing to a filesystem (like FAT or CIFS) that does not support those attributes.
- Q: Why doesn't --delete delete anything?
- A1: If rsync encounters any errors it aborts the deletion process. Check your output for errors and correct them or use --force.
- A2: If you use a * or other wildcard in your source path rsync will not delete as you expect it to. Instead of using /path/to/dir/* just use /path/to/dir/ in your source parameter. Technical details: This is because your shell is expanding the * to a list of everything that is in there. If you run rsync again with the * that list will not include things that are no longer there and rsync will not touch them on the target.
- Q: How do I connect to an sshd on a non-standard port number or use other ssh parameters?
- Q: Do I need to setup the rsync service or rsyncd.conf file?
- A: No, you can run rsync over ssh without either of these things. Just use user@host:/path as your source or your target and rsync will use ssh to get there. Note that the user@ part is optional.
- Q: How do I rsync on Windows?
Note: I do not recommend any of these for backing up Windows itself.
- A1: Get DeltaCopy
- A2: Install cygwin and use rsync like normally.
- A3: Get Acrosync (a native Windows implementation of rsync and libssh2).
- A4: Install cwrsync and use rsync from the Windows Command Prompt. This can be used to rsync locally, through ssh, or to a remote rsyncd.
- A5: Install cwrsync then configure and run the rsyncd that comes with it. Then you can connect to it from any other host that supports rsync.
- Q: I can't get includes/excludes to work
- A1: Excludes and includes must be relative to the source path. In other words if you are rsyncing /a/b/c/ and you want to exclude /a/b/c/d/e then your exclude would be /d/e.
- A2: Includes only override excludes. If you thought otherwise then you probably want --files-from.
- A3: Order matters. An include must come before the exclude that it overrides.
- A4: An exclude excludes something from rsync completely including recusion and indexing. That means that if you exclude a directory rsync will never look inside of it to see if any files match your includes. The most common example of this is when you attempt to include a pattern (like say *.txt) while excluding *. Once you exclude * you also exclude all subdirectories so rsync will never look deeper to find your *.txt files. Therefore you will also need to include */ (all directories) so that rsync will look inside of directories to find *.txt files. You can also use --prune-empty-dirs to get rid of the empty parts of the tree.
- A5: Are you sure it isn't working? An excluded item will be neither updated or deleted from the target. That means that a previously copied version may be there causing you confusion. If you wanted the excluded stuff to be deleted from the target add --delete-excluded.
- Q: How do I get rsync to transfer a file as soon as it is modified?
- A1: Rsync is not a real time mirroring system. You probably want DRBD or whatever the equivalent is on your OS. Or possibly even one of the distributed filesystem infrastructures that are available.
- A2: Lsyncd may do what you want.
- A3: Or perhaps an internally provided cloud storage system such as OwnCloud.
- Q: Do I need rsync installed on both systems?
- A: Yes. Rsync's delta transfer system depends on 2 rsync processes each with local disk access communicating over the network. If you can't install or use rsync on the remote system then your only choice would be to use a network mount however the delta transfer system will be disabled.
- Q: Why does rsync re-copy the entire file when most of it is already there?
- A1: You are doing a local rsync not using rsync over the network. When both source and target are local paths rsync forces --whole-file since it is faster to simply recopy a file that is different rather than reading both versions and hashing them to find out what parts are different. This is a good thing.
- A2: If you are using a network mounted filesystem (NFS, CIFS, SMBFS, SSHFS, etc) rsync still forces --whole-file. See previous question for more information.
- Q: I am afraid of allowing root to ssh in to the system. What are my options?
- A1: Rsync over ssh is actually pretty secure as long as you are using key authentication and as long as the sshd is configured to allow root access only through an ssh key. You can set that up by changing the PermitRootLogin setting to without-password in /etc/ssh/sshd_config. Note that this option has been renamed to prohibit-password and is now the default.
- A2: You can further tighten down the ssh key that rsync is using by restricting it to a single IP address and command line. The details of how to do this are in the "AUTHORIZED_KEYS FILE FORMAT" section of man sshd.
- A3: You can tighten it down even further using the rrsync script that is in the support directory of the rsync source tree.
- A4: You can use --fake-super to emulate *some* things that require root access.
- A5: I know that some people (Ubuntu users) insist that all administrative tasks be done through sudo. They want sudo to be used through an unprivileged account on both systems. I personally feel that doing so is LESS secure than the method listed above but here is how to do it anyways. Note: This will allow an unpriviliged user on the server to run rsync as root with no sudo password prompt.
- Create an unprivileged account on the server. For the purposes of this document I will assume you called it rsyncbackup
- Modify the sudoers file on the server to contain: rsyncbackup ALL= NOPASSWD: /usr/bin/rsync
- Make sure that the requiretty feature is not enabled in the sudoers file. This is completely incompatible with rsync over ssh.
- Setup key authentication for that user on the server to accept connections from root on your client
- In addition to runing rsync under sudo on the client you must tell it to run sudo on the other end. To do that add --rsync-path="/usr/bin/sudo /usr/bin/rsync" to your rsync options.
- Q: I ran rsync but the source and the target are not the same size?
- The first thing to consider would be excludes. If you excluded stuff then obviously the sizes will never be the same.
- If your target is slightly smaller than your source the likely cause is a difference in directory sizes. This is simply due to how directories allocate disk space and can't really be helped. I have devised a quick shell command to add up all of the file sizes in the current directory without including the directory sizes:
echo `find . -type f -ls | awk '{print $7 "+"}'`0 | bc
This can be used to confirm that the files themselves are the same sizes even if the directories are not.
- If your target is 1-10% larger than your source the most likely cause is hard links. Rsync by default or even with -a/--archive does not preserve hard links so if you rsync 2 hard links they will end up as duplicate files on the target taking up twice the disk space. If you want to preserve hard links add the -H/--hard-links option.
- If your target is >10% larger than your source the most likely cause is sparse files. Rsync by default does not write any files as sparse files even if they are on the source (it can't actually tell). If you have sparse files (most commonly used as virtual machine images and incomplete p2p downloads) then you will want to use the --sparse option. Note that this can turn things around and make the target smaller than the source as rsync with --sparse will not allocate disk space for any long string of null characters possibly making files on the target sparse when they were not on the source.
- There are also differences in filesystem types, block sizes, file slack overhead, etc. that can cause the outcome to be different.
- If you are comparing sizes using -h or --human-readable you should be aware that all of the GNU fileutilities use binary units (power of 2) while rsync uses decimal units (power of 1000). If you use --human-readable twice on rsync that difference will go away.
- If you have checked all of this and you are still bugged by an unexplained size difference then I would like to point out that simple sizes are not a very useful check for completeness or accuracy on a data copying operation. It doesn't check the contents of the files at all and it is subject to variations I explained above. I would suggest using an actual file verification utility such as cfv to verify your files using real cryptographic hashes. The cfv utility is very similar to the simple md5sum utility except that it is recursive, faster, and has a %completion bar.
- I am using --checksum and it is really slow. Sometimes so slow that it times out.
- A: Don't do that. --checksum is primarily used when you suspect that a hardware failure has silently corrupted data without modifying the metadata and you want to verify that belief. To make matters worse, --checksum will not notify you of this problem. You will need to use --itemize-changes and look for files with a 'c' change but not a 't' change. Otherwise you probably want --ignore-times.
- Q: Why can't I access the files I just copied with rsync on Windows 7 or Vista?
- A: This is a common problem with using rsync (a UNIX oriented tool) on Windows systems. There is no good solution to this problem but this tool makes it convenient to correct.
- Q: What are some cool command line switches to read about and try with rsync?
- --archive: This is often a minimal setting for rsync. It includes a bunch of different options including recursive operation and a bunch of metadata preservation options.
- --dry-run: Tells rsync to tell you what it is planning to do but doesn't actually do anything. Use this for testing.
- --itemize-changes: Tells rsync to tell you why it is updating each file that it updates. Useful for determining why rsync is doing more work than it should. The codes are explained in the man page.
- --hard-links: Preserve hard links. This is not included in --archive but is often needed for backups.
- --sparse: Tells rsync to write out files with large sections of nulls as sparse files. Useful if transferring VM images. Unfortunately it is incompatible with --inplace.
- --inplace: Tells rsync to update files in place instead of using a temporary file. Please read warnings in man page.
- --link-dest: Allows multiple incremental backups using hard links to prevent wasted disk space.
- Q: What are some alternatives to rsync?
- OwnCloud - OwnCloud is a webb application that provides cloud storage services similar to Dropbox. The difference is that it can be installed on any web server with support for PHP and either MySQL or SQLite. It also provides calendars and contacts much like GMail. Personally, I use rsync for lots of things but I also use OwnCloud for a few.
- Unison - Unison is a cross-platform program with similar capabilities and uses as rsync. Unlike rsync it CAN operate bi-directionally and it has a native Windows client.
- DeltaCopy - DeltaCopy is a Windows port of rsync that includes Windows specific capabilities.
- rdiff-backup - Rdiff-backup is a backup tool that stores incremental backups as binary diff files rather than storing complete files. It is based on librsync NOT rsync.
- pylsyncd - Python based daemon that uses rsync and inotify to update files on multiple rsync servers in parallel.