Getting the most out of swap
When you set up a brand new Linux server, do you create a single 128 MB swap partition? If so, did you know that you are severely limiting swap performance? Would you like to increase swap performance by several orders of magnitude, and to create swap partitions larger than 1 GB? It's possible, requiring no kernel patches or special hardware, just pure geek know-how!
Some of you may not really care about swap. After all, Linux systems are typically very memory efficient, and swap is often barely touched. While often true on desktop systems, servers are another story. Because servers may handle unexpected stresses, such as runaway processes, denial of service attacks, or even the Slashdot effect, they need to have adequate high-speed swap so that they do not grind to a halt and possibly crash when all physical memory (and then some) is exhausted.
Still not convinced that this is a big deal? I'll show you how easy it is to bring down a server by launching a massive amount of new processes.
Please, if you try this, do it only on a non-production server that you actually administer!
Let's say you have two customized grep commands in /usr/bin, called bobgrep and jimgrep. Now, let's assume that bobgrep is simply a shell script that calls the ELF executable jimgrep, as follows:
Everything looks good so far, but what happens if jimgrep gets accidentally replaced with a symbolic link to bobgrep? Well, in that case, calling bobgrep or jimgrep will cause an infinite loop, causing hundreds of bash processes to be spawned in mere seconds. This actually happened to me once, and believe me, it hurt!
If a server doesn't have adequate swap, a situation like this can cause the machine to lock up in much less than a minute. How do we fix the problem? One way is to increase the swap size beyond 128 MB. With Linux 2.0, there was a swap limit of 128MB on x86 systems. With 2.2, this limit was raised to 2GB. Currently, with Linux 3.x kernels, the limit is about 17.1TB, so you are unlikely to find yourself unable to allocate enough swap.
While it's nice to be able to increase swap partition size to beyond 128 MB, how about increasing performance? Ideally, it would be nice if we could set up swap partitions in a RAID 0 stripe, so that reads and writes are equally distributed between all partitions. If these partitions are on separate drives and/or controllers, this will multiply swap file performance, allowing your servers to handle temporary memory usage "spikes" without getting dramatically bogged down.
Amazingly, all modern Linux kernels, by default (with no special kernel options or patches) allow you to parallelize swap, just like a RAID 0 stripe. By using the pri option in /etc/fstab to set multiple swap partitions to the same priority, we tell Linux to use them in parallel:
In the above example, Linux will use swap partitions sda2, sdb2, and sdc2 in parallel. Since these partitions are on different drives, and possibly even different SCSI controllers, read and write throughput will nearly triple. The fourth swap partition, sdd2, will be used only after the first three partitions have been exhausted.
The pri option is really easy to use. The priority must be a number between 0 and 32767, with 32767 being the highest priority. The swap partitions will be used from highest priority to lowest priority, meaning that a partition with a priority of x will only be used only if all partitions with a priority greater than x are already full. If several partitions have the same priority, Linux will automatically parallelize access between them. This allows you to not only parallelize swap, but also prioritize access so that the partitions on the fastest drives (or regions of the drives) are used first. So, you can set up an emergency swap partition on an old, slower drive that will be used only if all high-speed swap is exhausted first.
Now it's time to put some of this swapping knowledge into action. To loosely quote Mr. Miyagi of Karate Kid fame: "Swap on, swap off, geek-san!"
Daniel Robbins is best known as the creator of Gentoo Linux and author of many IBM developerWorks articles about Linux. Daniel currently serves as Benevolent Dictator for Life (BDFL) of Funtoo Linux. Funtoo Linux is a Gentoo-based distribution and continuation of Daniel's original Gentoo vision.
RSS/Atom SupportYou can now follow this news feed at http://www.funtoo.org/news/atom.xml .
Creating a Friendly Funtoo CultureThis news item details some recent steps that have been taken to help ensure that Funtoo is a friendly and welcoming place for our users.
CPU FLAGS X86CPU_FLAGS_X86 are being introduced to group together USE flags managing CPU instruction sets.
Browse all our Linux-related articles, below:
- Funtoo Filesystem Guide, Part 1
- Funtoo Filesystem Guide, Part 2
- Funtoo Filesystem Guide, Part 3
- Funtoo Filesystem Guide, Part 4
- Funtoo Filesystem Guide, Part 5
- LVM Fun
- Learning Linux LVM, Part 1
- Learning Linux LVM, Part 2
- Linux Fundamentals, Part 1
- Linux Fundamentals, Part 1/pt-br
- Linux Fundamentals, Part 2
- Linux Fundamentals, Part 3
- Linux Fundamentals, Part 4
- Making the Distribution, Part 1
- Making the Distribution, Part 2
- Making the Distribution, Part 3
- Maximum Swappage
- POSIX Threads Explained, Part 1
- POSIX Threads Explained, Part 2
- POSIX Threads Explained, Part 3
- Partition Planning Tips
- Partitioning in Action, Part 1
- Partitioning in Action, Part 2
- Prompt Magic
- SAN Box used via iSCSI
- Sed by Example, Part 1
- Sed by Example, Part 2
- Sed by Example, Part 3
- Slowloris DOS Mitigation Guide
- The Gentoo.org Redesign, Part 1
- The Gentoo.org Redesign, Part 2
- The Gentoo.org Redesign, Part 3
- The Gentoo.org Redesign, Part 4
- Traffic Control