Difference between pages "Funtoo Filesystem Guide, Part 4" and "Funtoo Filesystem Guide, Part 5"

(Difference between pages)
 
 
Line 1: Line 1:
 
{{Article
 
{{Article
 
|Author=Drobbins
 
|Author=Drobbins
|Previous in Series=Funtoo Filesystem Guide, Part 3
+
|Previous in Series=Funtoo Filesystem Guide, Part 4
|Next in Series=Funtoo Filesystem Guide, Part 5
+
 
}}
 
}}
 
== Introduction ==
 
== Introduction ==
  
In the past few installments, we've taken a bit of a detour by looking at
+
Back around 2002, when this article was originally written, Andrew Morton had a
non-traditional filesystems such as tmpfs and devfs. Now, it's time to get back
+
nice little introduction to using ext3 on his Web site. His site has since
to disk-based filesystems, and we do this by taking a look at ext3. The ext3
+
disappeared, so in this updated article, I'll summarize Andrew Morton's
filesystem, designed by Dr. Stephen Tweedie, is built on the framework of the
+
original documentation. Then, in the second half of the article, we'll delve
existing ext2 filesystem; in fact, ext3 is very similar to ext2 except for one
+
into some meatier ext3 topics, ones that I think you'll find very useful.
small (but important) difference -- it supports journaling. Yet even with this
+
small addition, I think you'll find that that ext3 has several surprising and
+
intriguing capabilities. In this article, I'll give you a good understanding of
+
how ext3 compares to the other journaling filesystems currently available. In
+
my next article, we'll get ext3 up and running.
+
  
== Understanding Ext3 ==
+
== Ext3 QuickStart ==
  
So, how does ext3 compare to ReiserFS? In previous articles, I explained how
+
=== (Not) Patching the Kernel ===
ReiserFS is well suited to handling small files (under 4K), and in certain
+
situations, ReiserFS' small file performance is ''ten to fifteen times''
+
greater than that of ext2 and ext3. In contrast, ext3 is a very
+
''well-rounded'' filesystem. It's a lot like ext2; it's not going to give you
+
the blazingly fast small-file performance that ReiserFS gives you, but it
+
provides journalling and decent performance and is much more easily deployable
+
on legacy ext2 systems, as we'll soon see.
+
  
One of the nice things about ext3 is that because it is based on the ext2 code,
+
These days, there is no need to patch your kernel for ext3. Ext3 has been
ext2 and ext3's on-disk format is identical; this means that a cleanly
+
incorporated into the Linux kernel for a long time, and is very mature.
unmounted ext3 filesystem can be remounted as an ext2 filesystem with
+
absolutely no problems. And that's not all. Thanks to the fact that ext2 and
+
ext3 use identical metadata, it's possible to perform in-place ext2 to ext3
+
filesystem upgrades. Yes, you read that right. By upgrading a few key system
+
utilities, installing a modern 2.4 or 2.6 kernel and typing in a single tune2fs
+
command per filesystem, you can convert your existing ext2 servers into
+
journaling ext3 systems. You can even do this while your ext2 filesystems are
+
mounted. The transition is safe, reversible, and incredibly easy, and unlike a
+
conversion to XFS, JFS, or ReiserFS, you don't need to back up and recreate
+
your filesystems from scratch. Now, for a moment, consider the thousands of
+
production ext2 servers in existence that are just minutes away from an ext3
+
upgrade; then, you'll have a good grasp of ext3's importance to the Linux
+
community.
+
  
If I had to describe ext3 in one word, I'd call it "comfortable".
+
=== Converting ext2 to ext3 ===
It's incredibly easy to ext3-enable an existing ext2 system, and after you do,
+
you're still going to have an ext2-compatible filesystem. And there's yet
+
another way that ext3 excels in the comfort department; ext3 leverages the
+
maturity of ext2 as well as its user-space filesystem tools.
+
  
== Ext3 Reliability ==
+
One of the nice things about ext3 is that you can easily convert an ext2
 +
filesystem to be an ext3 filesystem. All that you need to do is to create a
 +
journal on the ext2 filesystem, as follows:
  
In addition to being ext2-compatible, ext3 inherits other benefits by sharing
+
<pre>tune2fs -j /dev/sdXX</pre>
ext2's metadata format. For one, ext3 users gain access to a rock-solid fsck
+
tool. You'll recall that one of the points of using a journaling filesystem is
+
to avoid the need for an exhaustive fsck in the first place; however if you do
+
end up getting corrupt metadata, either from a flaky kernel, bad hard drive, or
+
something else, you'll greatly appreciate the fact that ext3 inherits ext2's
+
fsck. In contrast, ReiserFS' fsck is decent but hasn't been through as much
+
&quot;real world&quot; scenarios as e2fsck.
+
  
== Metadata-only Journaling ==
+
You can even do this while the filesystem is mounted. If it ''is'' currently
 +
mounted, it will not function as an ext3 filesystem until you are able to
 +
remount it as ext3. Remember to update <tt>/etc/fstab</tt> to so that it refers
 +
to the new filesystem as an <tt>ext3</tt> filesystem.
  
Interestingly, ext3 handles journaling very differently than ReiserFS and other
+
=== Creating New Ext3 Filesystems ===
journaling filesystems do. With ReiserFS, XFS, and JFS, the filesystem driver
+
journals metadata, but makes no provisions for journaling data. With
+
metadata-only journaling, your filesystem metadata is going to be rock solid,
+
and you will probably never need to perform an exhaustive fsck. However,
+
unexpected reboots and system lock-ups can result in significant corruption of
+
recently-modified data. Ext3 uses a couple of innovative solutions to avoid
+
these problems, which we'll look at in a bit.
+
  
But first, it's important to understand exactly how metadata-only journaling
+
Simply use the <tt>-j</tt> option with <tt>mke2fs</tt>, as follows:
could end up biting you. As an example, let's say that you were modifying a
+
file called <tt>/tmp/myfile.txt</tt> when the machine unexpectedly locked up,
+
forcing a reboot. If you were using a metadata-only journaling filesystem such
+
as ReiserFS, XFS or JFS, your filesystem metadata would be easily repaired,
+
thanks to the metadata journal, and you wouldn't need to sit through a
+
laborious fsck. Your filesystem's ''meta'' information would not get messed up.
+
  
However, there's the distinct possibility that when you load
+
<pre>mke2fs -j /dev/sdXX</pre>
<tt>/tmp/myfile.txt</tt> into a text editor, your file will not simply be
+
missing recent changes, but will contain a good amount of garbage and depending
+
upon the circumstances may even be completely unreadable. This is particularly
+
true with XFS. Now, this isn't something that will necessarily happen, but it
+
could happen and often does.
+
  
Here's why. Typical journaled filesystems like ReiserFS, XFS, and JFS take
+
=== Switching Between Ext2 and Ext3 ===
extra special care of metadata, but don't pay as much attention to data. In our
+
above example, the filesystem was in the process of modifying several
+
filesystem blocks. The filesystem updated the appropriate metadata, but didn't
+
have time to flush the data from its caches to the new blocks on disk. Thus,
+
when you loaded up <tt>/tmp/myfile.txt</tt> into a text editor, part or all of
+
the file contained garbage -- blocks of data that didn't get recorded to disk
+
in time before the system locked up.
+
  
== The Ext3 Approach ==
+
It is possible to mount an ext3 filesystem as an ext2 filesystem, as long as
 +
the ext3 filesystem has been cleanly unmounted. Simply specify a filesystem
 +
type of <tt>ext2</tt> when mounting an ext3 filesystem with older kernels. You
 +
can also specify a filesystem type of <tt>auto</tt> in <tt>/etc/fstab</tt>,
 +
which will tell the <tt>mount</tt> and <tt>fsck</tt> commands to utilize the
 +
filesystem as ext3 if such support is available, otherwise falling back to
 +
ext2.
  
Now that we have a good general understanding of this problem, let's look how
+
=== Fixing Dirty Ext3 Filesystems ===
ext3 implements journaling. In ext3, the journaling code uses a special API
+
called the Journaling Block Device layer, or JBD. The JBD has been designed for
+
the express purpose of implementing a journal on any kind of block device. Ext3
+
implements its journaling by &quot;hooking in&quot; to the JBD API. For
+
example, the ext3 filesystem code will inform the JBD of modifications it is
+
performing, and will also request permission from the JBD before modifying
+
certain data on disk. By doing so, the JBD is given the appropriate
+
opportunities to manage the journal on behalf of the ext3 filesystem. It's
+
quite a nice arrangement, and because the JBD is being developed as a separate,
+
generic entity, it could be used to add journaling capabilities to other
+
filesystems in the future.
+
  
Here are a couple of neat things about the JBD-managed ext3 journal. For one,
+
If your ext3 filesystem was not cleanly unmounted, then you should be able clean it up using <tt>e2fsck</tt> as follows:
ext3's journal is stored in an inode -- a file, basically. Depending on how you&quot;ext3-enable&quot; your filesystem, you may or may not be able to see thisfile, located at <tt>/.journal</tt>. Of course, by storing the journal in an
+
inode, ext3 is able to add the needed journal to the filesystem without
+
breaking compatibility with ext2 metadata. This is one of the key ways that anext3 filesystem maintains backwards compatibility with ext2 metadata, and inturn, the ext2 filesystem code in the Linux kernel.
+
  
== Different Journaling Approaches ==
+
<pre>e2fsck -fy /dev/sdXX</pre>
 +
The filesystem should now be mountable as ext2.
  
Not surprisingly, it turns out that there are a number of ways to implement a journal. For example, a filesystem developer could design a journal that storesvariable spans of bytes that need to be modified on the host filesystem. Theadvantage of this approach is that your journal would be able to store lots of
+
=== Ext3 Root Filesystem Tricks ===
tiny little modifications to the filesystem in a very efficient way, since it
+
would only record the specific data that needed to be changed and nothing more.
+
  
JBD takes another, and in some ways better, approach. Rather than recording spans of bytes that must be changed, JBD stores the complete modified
+
If you want to force a Linux kernel to mount an ext3 filesystem as an ext2
filesystem blocks themselves. The ext3 filesystem driver also uses this
+
filesystem, add the <tt>rootfstype=ext2</tt> kernel boot parameter to the
approach and stores complete replicas of the modified blocks (either 1K, 2K, or4K) in memory to track pending IO operations. At first, this may seem a bitwasteful. After all, complete blocks contain modified data but may also contain
+
kernel boot parameters in <tt>lilo.conf</tt> or <tt>grub.conf</tt>.
unmodified (already on disk) data as well.
+
  
The approach that the JBD uses is called physical journaling, which means that the JBD uses complete physical blocks as the underlying currency forimplementing the journal. In contrast, the approach of only storing modified
+
To tell the Linux kernel to mount your root ext3 filesystem using a particular
spans of bytes rather than complete blocks is called logical journaling, and is
+
journaling mode, use the <tt>rootflags</tt> kernel boot parameter as follows
the approach used by XFS. Because ext3 uses physical journaling, an ext3
+
(using a <tt>grub.conf</tt> example):
journal will have a larger relative on-disk footprint than, say, an XFS
+
journal. But because ext3 uses complete blocks internally and in the journal,
+
ext3 doesn't deal with as much complexity as it would if it were to implement
+
logical journaling. In addition, the use of full blocks allows ext3 to perform
+
some additional optimizations, such as &quot;squishing&quot; multiple pending
+
IO operations within a single block into the same in-memory data structure.
+
This, in turn, allows ext3 to write these multiple changes to disk in a single
+
write operation, rather than many. In addition, because the literal block data
+
is stored in memory, little or no massaging of the in-memory data is required
+
before writing it to disk, saving CPU cycles.
+
  
== Ext3, Protector of Data ==
+
<pre>kernel /boot/bzImage rootflags=data=journal root=/dev/sda3</pre>
  
And now, we finally get to see how the ext3 filesystem effectively provides both metadata and data journaling, avoiding the potential data corruption
+
=== Filesystem Check Intervals ===
problem I described earlier in this article that can bite metadata-only
+
journals. In fact, ext3 actually has two methods to ensure data and metadataintegrity.
+
Originally, ext3 was designed to perform full data and metadata journaling. In
+
this mode (called <tt>data=journal</tt> mode), the JBD journals all changes tothe filesystem, whether they are made to data or metadata. Because both dataand metadata are journaled, JBD can use the journal to bring both metadata anddata back to a consistent state. The drawback of full data journaling is that
+
it can be slow, although you can reduce the performance penalty by setting up a
+
relatively large journal.
+
  
More interestingly, ext3 also offers another journaling mode that provides the benefits of full journaling but without introducing a severe performancepenalty. This new mode works by journaling metadata only. However, the ext3
+
By default, <tt>e2fsck</tt> will perform an exhaustive filesystem check every
filesystem driver keeps track of the particular data blocks that correspond
+
now and then, even if the filesystem was cleanly unmounted. By default, this
with each metadata update, grouping them into a single entity called atransaction. When a transaction is applied to the filesystem proper, the datablocks are written to disk first. Once they are written, the metadata changes
+
happens every twentieth mount or every 180 days, whichever comes first.
are then written to the journal. By using this technique (called
+
<tt>data=ordered</tt> mode), ext3 can provide data and metadata consistency,
+
even though only metadata changes are recorded in the journal. ext3 uses this
+
mode by default.
+
  
== Conclusion ==
+
While this might have been handy for ext2 filesystems, it's not optimal for
 +
ext3. To turn this automatic exhaustive filesystem check off, use the following
 +
<tt>tune2fs</tt> command:
  
These days, a lot of people are trying to determine which Linux journaling filesystem is &quot;best&quot;. In truth, there is no one &quot;right&quot;filesystem for every application; each one has its own strengths. This is one
+
<pre>tune2fs -i 0 -c 0 /dev/sdXX</pre>
of the benefits from having so many next-generation Linux filesystems from
+
which to choose. So, instead of picking an arbitrary &quot;best&quot;filesystem and using it for every conceivable application, it's far preferableto understand each filesystem's strengths and weaknesses so that you can make
+
an educated decision as to which one to use.
+
  
Ext3 has a number of strengths. It has been designed to be extremely easy to  
+
You can view the filesystem check interval (as well as lots of other
deploy. It's based on the solid ext2 filesystem code and it inherits a great fsck tool. And ext3's journaling capabilities have been specially designed to  
+
interesting information) by typing <tt>tune2fs -l /dev/sdXX</tt>.
ensure the integrity of both metadata and data. All in all, ext3 is a truly
+
 
great filesystem, and a worthy successor to the now-venerable ext2 filesystem.
+
=== External Journals ===
Join me in my next article, when we get ext3 up and running. Until then, you
+
 
may want to check out the following resources.
+
Ext3 supports the ability to place the journal on a separate persistent device. To create an external journal on <tt>/dev/sdb</tt>, type:
 +
 
 +
<pre>mke2fs -O journal_dev /dev/sdb</pre>
 +
Then, to create an ext3 filesystem on <tt>/dev/sda3</tt> that uses this external journal, type:
 +
 
 +
<pre>mke2fs -J device=/dev/sdb /dev/sda3
 +
mount /dev/sda3 /mnt/test -t ext3</pre>
 +
== Ext3 Journaling Options and Write Latency ==
 +
 
 +
Ext3 allows you to choose from one of three data journaling modes at filesystem
 +
mount time: <tt>data=writeback</tt>, <tt>data=ordered</tt>, and
 +
<tt>data=journal</tt>.
 +
 
 +
To specify a journal mode, you can add the appropriate string
 +
(<tt>data=journal</tt>, for example) to the options section of your
 +
<tt>/etc/fstab</tt>, or specify the <tt>-o data=journal</tt> command-line
 +
option when calling mount directly.
 +
 
 +
As we covered earlier in this article, if you'd like to specify the data
 +
journaling method used for your root filesystem (<tt>data=ordered</tt> is the
 +
default), you can to use a special kernel boot option called
 +
<tt>rootflags</tt>. So, if you'd like to put your root filesystem into full
 +
data journaling mode, add <tt>rootflags=data=journal</tt> to your kernel boot
 +
options.
 +
 
 +
=== <tt>data=writeback</tt> Mode ===
 +
 
 +
In <tt>data=writeback mode</tt>, ext3 doesn't do any form of data journaling at
 +
all, providing you with similar journaling found in the XFS, JFS, and ReiserFS
 +
filesystems -- metadata only. As I explained in my
 +
[http://www.funtoo.org/en/articles/linux/ffg/4/ previous article], this could
 +
allow recently modified files to become corrupted in the event of an unexpected
 +
reboot. Despite this drawback, <tt>data=writeback</tt> mode should give you the
 +
best ext3 performance under most conditions.
 +
 
 +
=== <tt>data=ordered</tt> Mode ===
 +
 
 +
In <tt>data=ordered</tt> mode, ext3 only officially journals metadata, but it logically groups metadata and data blocks into a single unit called atransaction. When it's time to write the new metadata out to disk, the associated data blocks are written first. <tt>data=ordered</tt> mode
 +
effectively solves the corruption problem found in <tt>data=writeback</tt> mode and most other journaled filesystems, and it does so without requiring fulldata journaling. In general, <tt>data=ordered</tt> ext3 filesystems performslightly slower than <tt>data=writeback</tt> filesystems, but significantly faster than their full data journaling counterparts.When appending data to files, <tt>data=ordered</tt> mode provides all of the integrity guarantees offered by ext3's full data journaling mode. However, ifpart of a file is being overwritten and the system crashes, it's possible that
 +
the region being written will contain a combination of original blocks
 +
interspersed with updated blocks. This is because <tt>data=ordered</tt>
 +
provides no guarantees as to which blocks are overwritten first, so you can't
 +
assume that just because overwritten block x was updated, that overwrittenblock x-1 was updated as well. Instead, <tt>data=ordered</tt> leaves the writeordering up to the hard drive's write cache. In general, this limitation
 +
doesn't end up negatively impacting people very often, since file appends are
 +
generally much more common than file overwrites. For this reason,<tt>data=ordered</tt> mode is a good higher-performance replacement for fulldata journaling.
 +
 
 +
=== <tt>data=journal</tt> Mode ===
 +
 
 +
<tt>data=journal</tt> mode provides full data and metadata journaling. All new
 +
data is written to the journal first, and then to its final location. In the
 +
event of a crash, the journal can be replayed, bringing both data and metadata
 +
into a consistent state.
 +
 
 +
Theoretically, <tt>data=journal</tt> mode is the slowest journaling mode of all, since data gets written to disk twice rather than once. However, it turns
 +
out that in certain situations, <tt>data=journal</tt> mode can be blazingly
 +
fast. Andrew Morton, after hearing reports on LKML that ext3
 +
<tt>data=journal</tt> filesystems were giving people unbelievably greatinteractive filesystem performance, decided to put together a little test.
 +
First, he created simple shell script designed to write data to a test
 +
filesystem as quickly as possible:
 +
 
 +
<pre>while true
 +
do       
 +
    dd if=/dev/zero of=largefile bs=16384 count=131072
 +
done</pre>
 +
While data was being written to the test filesystem, he attempted to read 16MB of data from another ext2 filesystem on the same disk, timing the results:
 +
 
 +
<pre># time cat 16-meg-file &gt; /dev/null</pre>
 +
 
 +
The results were astounding. <tt>data=journal</tt> mode allowed the 16 MB file to be read from 9 to over 13 times faster than other ext3 modes, ReiserFS, and even ext2 (which has no journaling overhead):
 +
 
 +
{| {{Table}}
 +
!Filesystem||16 MB read time (seconds)
 +
|-
 +
|ext2
 +
|78
 +
|-
 +
|ReiserFS
 +
|67
 +
|-
 +
|ext3 data=ordered
 +
|93
 +
|-
 +
|ext3 data=writeback
 +
|74
 +
|-
 +
|ext3 data=journal
 +
|7
 +
|}
 +
 
 +
Andrew repeated this test, but tried to read a 16 MB file from the test filesystem (rather than a different filesystem), and he got identical results. So, what does this mean? Somehow, ext3's <tt>data=journal</tt> mode is incredibly well-suited to situations where data needs to be read from and written to disk at the same time. Therefore, ext3's <tt>data=journal</tt> mode, which was assumed to be the slowest of all ext3 modes in nearly all conditions, actually turns out to have a major performance advantage in busy environments where interactive IO performance needs to be maximized. Maybe <tt>data=journal</tt> mode isn't so sluggish after all!
 +
 
 +
Andrew is still trying to figure out exactly why data=journal mode is doing so much better than everything else. When he does, he may be able to add the necessary tweaks to the rest of ext3 so that data=writeback and data=ordered modes see some benefit as well.
 +
 
 +
I hope to see you soon when I release the next installment in the Funtoo Filesystem Guide!
  
 
== Resources ==
 
== Resources ==
Line 180: Line 185:
 
* [[Funtoo Filesystem Guide, Part 4|Part 4]]: Introducing Ext3
 
* [[Funtoo Filesystem Guide, Part 4|Part 4]]: Introducing Ext3
 
* [[Funtoo Filesystem Guide, Part 5|Part 5]]: Ext3 in action
 
* [[Funtoo Filesystem Guide, Part 5|Part 5]]: Ext3 in action
 
Dr. Stephen Tweedie introduced the Ext3 Journaling Filesystem at the Ottawa Linux Symposium in July 2000. For more information on Ext3, read Dr. Stephen
 
Tweedie's
 
[http://olstrans.sourceforge.net/release/OLS2000-ext3/OLS2000-ext3.html 2000 OLS ext3 transcript].
 
 
To keep abreast of the latest ext3 developments, be sure to visit the
 
[https://listman.redhat.com/archives/ext3-users ext3-users mailing list archive]
 
  
 
[[Category:Filesystem Guides]]
 
[[Category:Filesystem Guides]]
 
[[Category:Articles]]
 
[[Category:Articles]]
 
{{ArticleFooter}}
 
{{ArticleFooter}}

Revision as of 08:54, December 28, 2014


Previous in series: Funtoo Filesystem Guide, Part 4

Support Funtoo and help us grow! Donate $15 per month and get a free SSD-based Funtoo Virtual Container.
Looking for people interested in testing and documenting Docker support! Contact Daniel Robbins for more info.

Introduction

Back around 2002, when this article was originally written, Andrew Morton had a nice little introduction to using ext3 on his Web site. His site has since disappeared, so in this updated article, I'll summarize Andrew Morton's original documentation. Then, in the second half of the article, we'll delve into some meatier ext3 topics, ones that I think you'll find very useful.

Ext3 QuickStart

(Not) Patching the Kernel

These days, there is no need to patch your kernel for ext3. Ext3 has been incorporated into the Linux kernel for a long time, and is very mature.

Converting ext2 to ext3

One of the nice things about ext3 is that you can easily convert an ext2 filesystem to be an ext3 filesystem. All that you need to do is to create a journal on the ext2 filesystem, as follows:

tune2fs -j /dev/sdXX

You can even do this while the filesystem is mounted. If it is currently mounted, it will not function as an ext3 filesystem until you are able to remount it as ext3. Remember to update /etc/fstab to so that it refers to the new filesystem as an ext3 filesystem.

Creating New Ext3 Filesystems

Simply use the -j option with mke2fs, as follows:

mke2fs -j /dev/sdXX

Switching Between Ext2 and Ext3

It is possible to mount an ext3 filesystem as an ext2 filesystem, as long as the ext3 filesystem has been cleanly unmounted. Simply specify a filesystem type of ext2 when mounting an ext3 filesystem with older kernels. You can also specify a filesystem type of auto in /etc/fstab, which will tell the mount and fsck commands to utilize the filesystem as ext3 if such support is available, otherwise falling back to ext2.

Fixing Dirty Ext3 Filesystems

If your ext3 filesystem was not cleanly unmounted, then you should be able clean it up using e2fsck as follows:

e2fsck -fy /dev/sdXX

The filesystem should now be mountable as ext2.

Ext3 Root Filesystem Tricks

If you want to force a Linux kernel to mount an ext3 filesystem as an ext2 filesystem, add the rootfstype=ext2 kernel boot parameter to the kernel boot parameters in lilo.conf or grub.conf.

To tell the Linux kernel to mount your root ext3 filesystem using a particular journaling mode, use the rootflags kernel boot parameter as follows (using a grub.conf example):

kernel /boot/bzImage rootflags=data=journal root=/dev/sda3

Filesystem Check Intervals

By default, e2fsck will perform an exhaustive filesystem check every now and then, even if the filesystem was cleanly unmounted. By default, this happens every twentieth mount or every 180 days, whichever comes first.

While this might have been handy for ext2 filesystems, it's not optimal for ext3. To turn this automatic exhaustive filesystem check off, use the following tune2fs command:

tune2fs -i 0 -c 0 /dev/sdXX

You can view the filesystem check interval (as well as lots of other interesting information) by typing tune2fs -l /dev/sdXX.

External Journals

Ext3 supports the ability to place the journal on a separate persistent device. To create an external journal on /dev/sdb, type:

mke2fs -O journal_dev /dev/sdb

Then, to create an ext3 filesystem on /dev/sda3 that uses this external journal, type:

mke2fs -J device=/dev/sdb /dev/sda3
mount /dev/sda3 /mnt/test -t ext3

Ext3 Journaling Options and Write Latency

Ext3 allows you to choose from one of three data journaling modes at filesystem mount time: data=writeback, data=ordered, and data=journal.

To specify a journal mode, you can add the appropriate string (data=journal, for example) to the options section of your /etc/fstab, or specify the -o data=journal command-line option when calling mount directly.

As we covered earlier in this article, if you'd like to specify the data journaling method used for your root filesystem (data=ordered is the default), you can to use a special kernel boot option called rootflags. So, if you'd like to put your root filesystem into full data journaling mode, add rootflags=data=journal to your kernel boot options.

data=writeback Mode

In data=writeback mode, ext3 doesn't do any form of data journaling at all, providing you with similar journaling found in the XFS, JFS, and ReiserFS filesystems -- metadata only. As I explained in my previous article, this could allow recently modified files to become corrupted in the event of an unexpected reboot. Despite this drawback, data=writeback mode should give you the best ext3 performance under most conditions.

data=ordered Mode

In data=ordered mode, ext3 only officially journals metadata, but it logically groups metadata and data blocks into a single unit called atransaction. When it's time to write the new metadata out to disk, the associated data blocks are written first. data=ordered mode effectively solves the corruption problem found in data=writeback mode and most other journaled filesystems, and it does so without requiring fulldata journaling. In general, data=ordered ext3 filesystems performslightly slower than data=writeback filesystems, but significantly faster than their full data journaling counterparts.When appending data to files, data=ordered mode provides all of the integrity guarantees offered by ext3's full data journaling mode. However, ifpart of a file is being overwritten and the system crashes, it's possible that the region being written will contain a combination of original blocks interspersed with updated blocks. This is because data=ordered provides no guarantees as to which blocks are overwritten first, so you can't assume that just because overwritten block x was updated, that overwrittenblock x-1 was updated as well. Instead, data=ordered leaves the writeordering up to the hard drive's write cache. In general, this limitation doesn't end up negatively impacting people very often, since file appends are generally much more common than file overwrites. For this reason,data=ordered mode is a good higher-performance replacement for fulldata journaling.

data=journal Mode

data=journal mode provides full data and metadata journaling. All new data is written to the journal first, and then to its final location. In the event of a crash, the journal can be replayed, bringing both data and metadata into a consistent state.

Theoretically, data=journal mode is the slowest journaling mode of all, since data gets written to disk twice rather than once. However, it turns out that in certain situations, data=journal mode can be blazingly fast. Andrew Morton, after hearing reports on LKML that ext3 data=journal filesystems were giving people unbelievably greatinteractive filesystem performance, decided to put together a little test. First, he created simple shell script designed to write data to a test filesystem as quickly as possible:

while true
do        
    dd if=/dev/zero of=largefile bs=16384 count=131072
done

While data was being written to the test filesystem, he attempted to read 16MB of data from another ext2 filesystem on the same disk, timing the results:

# time cat 16-meg-file > /dev/null

The results were astounding. data=journal mode allowed the 16 MB file to be read from 9 to over 13 times faster than other ext3 modes, ReiserFS, and even ext2 (which has no journaling overhead):

Filesystem 16 MB read time (seconds)
ext2 78
ReiserFS 67
ext3 data=ordered 93
ext3 data=writeback 74
ext3 data=journal 7

Andrew repeated this test, but tried to read a 16 MB file from the test filesystem (rather than a different filesystem), and he got identical results. So, what does this mean? Somehow, ext3's data=journal mode is incredibly well-suited to situations where data needs to be read from and written to disk at the same time. Therefore, ext3's data=journal mode, which was assumed to be the slowest of all ext3 modes in nearly all conditions, actually turns out to have a major performance advantage in busy environments where interactive IO performance needs to be maximized. Maybe data=journal mode isn't so sluggish after all!

Andrew is still trying to figure out exactly why data=journal mode is doing so much better than everything else. When he does, he may be able to add the necessary tweaks to the rest of ext3 so that data=writeback and data=ordered modes see some benefit as well.

I hope to see you soon when I release the next installment in the Funtoo Filesystem Guide!

Resources

Be sure to checkout the other articles in this series:

  • Part 1: Journaling and ReiserFS
  • Part 2: Using ReiserFS and Linux
  • Part 3: Tmpfs and bind mounts
  • Part 4: Introducing Ext3
  • Part 5: Ext3 in action


Support Funtoo and help us grow! Donate $15 per month and get a free SSD-based Funtoo Virtual Container.
Looking for people interested in testing and documenting Docker support! Contact Daniel Robbins for more info.

About the Author

Daniel Robbins is best known as the creator of Gentoo Linux and author of many IBM developerWorks articles about Linux. Daniel currently serves as Benevolent Dictator for Life (BDFL) of Funtoo Linux. Funtoo Linux is a Gentoo-based distribution and continuation of Daniel's original Gentoo vision.

Got Funtoo?

Have you installed Funtoo Linux yet? Discover the power of a from-source meta-distribution optimized for your hardware! See our installation instructions and browse our CPU-optimized builds.

Funtoo News

Drobbins

How We're Keeping You At the Center of the Funtoo Universe

Read about recent developments that keep you, our users, at the forefront of our focus as Funtoo moves forward.
10 April 2015 by Drobbins
Mgorny

New OpenGL management in Funtoo

Funtoo is switching to an improved system for managing multiple OpenGL providers (Mesa/Xorg, AMD and NVIDIA). The update may involve blockers and file collisions.
30 March 2015 by Mgorny
Drobbins

Subarch Profiles are coming...

Subarch profiles are on their way! Learn more here.
29 March 2015 by Drobbins
View More News...

More Articles

Browse all our Linux-related articles, below:

A

B

F

G

K

L

M

O

P

S

T

W

X

Z