= Enable Timestamp Updates
Here's an important tip for maximizing rsync performance over a network connection.
When using rsync to synchronize files over a network connection, keep in mind that rsync, by default, uses the file's modification time and size to determine if a file at the destination needs to be updated. This is important to note because by default, rsync does not update file modification times on the destination system. This has important implications for performance when rsync is run again to synchronize the same files.
-t option specified, rsync will check the file size (which will match) and modification time (which will not,) and thus assume the file is different. This will cause rsync to use its delta-transfer algorithm to attempt to update the file over the network. The delta-transfer algorithm has been optimized to minimize network utilization, but it still causes both the local system to load the entire local file from disk, and the remote system to load the entire remote file from disk, in order to calculate checksums. This means that a 50GB file, when synchronized this way, will cause about 50GB of disk IO locally, and about 50GB of disk IO on the remote system. This can be quite slow, especially when transmitting large quantities of data.
The solution to this problem is to use the
-t option (enabled as part of
-a as well) to enable modification time updates. When you do this, the modification time of the remote file will be updated to match that of the local file. Then, on a successive rsync invocation, rsync will compare the local and remote size and modification time, find that they both match, and will not invoke the delta-transfer algorithm. Congratulations -- if the files you were rsyncing were 50GB, then you just saved about 100GB of disk IO.