root # emerge net-misc/rsync
rsync is like an advanced cp that it can track if it's already copied the file. Rsync is useful for generating backups.
root # rsync -p source.file.txt destination.file.txt
Enable Timestamp Updates
Here's an important tip for maximizing rsync performance over a network connection.
When using rsync to synchronize files over a network connection, keep in mind that rsync, by default, uses the file's modification time and size to determine if a file at the destination needs to be updated. This is important to note because by default, rsync does not update file modification times on the destination system. This has important implications for performance when rsync is run again to synchronize the same files.
-t option specified, rsync will check the file size (which will match) and modification time (which will not,) and thus assume the file is different. This will cause rsync to use its delta-transfer algorithm to attempt to update the file over the network. The delta-transfer algorithm has been optimized to minimize network utilization, but it still causes both the local and the remote system to load the entire file from disk in order to calculate checksums. This means that a 50GB file, when synchronized this way, will cause about 50GB of disk IO locally, and about 50GB of disk IO on the remote system. This can slow things down significantly, especially when transmitting large quantities of data, or when the systems are already experiencing heavy IO load.
The solution to this problem is to use the
-t option (enabled as part of
-a as well) to enable modification time updates. When you do this, the modification time of the remote file will be updated to match that of the local file. Then, on a successive rsync invocation, rsync will compare the local and remote size and modification time, find that they both match, and will not invoke the delta-transfer algorithm. Congratulations -- if the files you were rsyncing were 50GB, then you just saved about 100GB of disk IO.
If rsync has difficulty setting remote modification times on remote symlinks or directories, use the
-O options to disable setting of times on symlinks and directories respectively. This can sometimes be an issue depending on ACLs and other permission differences on the remote host.