Difference between revisions of "Package:Rsync"

From Funtoo
Jump to navigation Jump to search
Line 12: Line 12:
When using rsync to synchronize files over a network connection, keep in mind that rsync, by default, uses the file's ''modification time and size'' to determine if a file at the destination needs to be updated. This is important to note because by default, rsync does not update file modification times on the destination system. This has important implications for performance when rsync is run again to synchronize the same files.
When using rsync to synchronize files over a network connection, keep in mind that rsync, by default, uses the file's ''modification time and size'' to determine if a file at the destination needs to be updated. This is important to note because by default, rsync does not update file modification times on the destination system. This has important implications for performance when rsync is run again to synchronize the same files.


Without the {{c|-a}} or {{c|-t}} option specified, rsync will check the file size (which will match) and modification time (which will not,) and thus assume the file is different. This will cause rsync to use its delta-transfer algorithm to attempt to update the file over the network. The delta-transfer algorithm has been optimized to minimize network utilization, but it still causes both the local and the remote system to load the entire file from disk in order to calculate checksums. This means that a 50GB file, when synchronized this way, will cause about 50GB of disk IO locally, and about 50GB of disk IO on the remote system. This can slow things down significantly, especially when transmitting large quantities of data, or when one or both systems are already experiencing heavy IO load.
Without the {{c|-a}} or {{c|-t}} option specified, rsync will check the file size (which will match) and modification time (which will not,) and thus assume the file is different. This will cause rsync to use its delta-transfer algorithm to attempt to update the file over the network. The delta-transfer algorithm has been optimized to minimize network utilization, but it still causes both the local and the remote system to load the entire file from disk in order to calculate checksums. This means that a 50GB file, when synchronized this way, will cause about 50GB of disk IO locally, and about 50GB of disk IO on the remote system. This can slow things down significantly, especially when transmitting large quantities of data, or when the systems are already experiencing heavy IO load.


The solution to this problem is to use the {{c|-t}} option (enabled as part of {{c|-a}} as well) to enable modification time updates. When you do this, the modification time of the remote file will be updated to match that of the local file. Then, on a successive rsync invocation, rsync will compare the local and remote size and modification time, find that they both match, and will not invoke the delta-transfer algorithm. Congratulations -- if the files you were rsyncing were 50GB, then you just saved about 100GB of disk IO.
The solution to this problem is to use the {{c|-t}} option (enabled as part of {{c|-a}} as well) to enable modification time updates. When you do this, the modification time of the remote file will be updated to match that of the local file. Then, on a successive rsync invocation, rsync will compare the local and remote size and modification time, find that they both match, and will not invoke the delta-transfer algorithm. Congratulations -- if the files you were rsyncing were 50GB, then you just saved about 100GB of disk IO.
{{EbuildFooter}}
{{EbuildFooter}}

Revision as of 05:23, January 15, 2015

Rsync

   Tip

We welcome improvements to this page. To edit this page, Create a Funtoo account. Then log in and then click here to edit this page. See our editing guidelines to becoming a wiki-editing pro.

Rsync Tips

Enable Timestamp Updates

Here's an important tip for maximizing rsync performance over a network connection.

When using rsync to synchronize files over a network connection, keep in mind that rsync, by default, uses the file's modification time and size to determine if a file at the destination needs to be updated. This is important to note because by default, rsync does not update file modification times on the destination system. This has important implications for performance when rsync is run again to synchronize the same files.

Without the -a or -t option specified, rsync will check the file size (which will match) and modification time (which will not,) and thus assume the file is different. This will cause rsync to use its delta-transfer algorithm to attempt to update the file over the network. The delta-transfer algorithm has been optimized to minimize network utilization, but it still causes both the local and the remote system to load the entire file from disk in order to calculate checksums. This means that a 50GB file, when synchronized this way, will cause about 50GB of disk IO locally, and about 50GB of disk IO on the remote system. This can slow things down significantly, especially when transmitting large quantities of data, or when the systems are already experiencing heavy IO load.

The solution to this problem is to use the -t option (enabled as part of -a as well) to enable modification time updates. When you do this, the modification time of the remote file will be updated to match that of the local file. Then, on a successive rsync invocation, rsync will compare the local and remote size and modification time, find that they both match, and will not invoke the delta-transfer algorithm. Congratulations -- if the files you were rsyncing were 50GB, then you just saved about 100GB of disk IO.