Difference between pages "Bash by Example, Part 3" and "Funtoo Filesystem Guide, Part 1"

From Funtoo
(Difference between pages)
Jump to navigation Jump to search
 
 
Line 1: Line 1:
{{Article
{{Article
|Author=Drobbins
|Author=Drobbins
|Previous in Series=Bash by Example, Part 2
|Next in Series=Funtoo Filesystem Guide, Part 2
}}
}}
== Exploring the ebuild system ==
== Journaling and ReiserFS ==


=== Enter the ebuild system ===
=== What's in Store ===
I've really been looking forward to this third and final ''Bash by example'' article, because now that we've already covered bash programming fundamentals in [[Bash by example, Part1|Part 1]] and [[Bash by example, Part 2|Part 2]], we can focus on more advanced topics, like bash application development and program design. For this article, I will give you a good dose of practical, real-world bash development experience by presenting a project that I've spent many hours coding and refining: the Gentoo Linux ebuild system.
The purpose of this series is to give you a solid, practical introduction to Linux's various new filesystems, including ReiserFS, XFS, JFS, GFS, ext3 and others. I want to equip you with the necessary practical knowledge you need to actually start using these filesystems. My goal is to help you avoid as many potential pitfalls as possible; this means that we're going to take a careful look at filesystem stability, performance issues (both good and bad), any negative application interactions that you should be aware of, the best kernel/patch combinations, and more. Consider this series an "insider's guide" to these next-generation filesystems.


As the creator of Gentoo Linux and the guy behind Funtoo Linux, one of my primary responsibilities is to make sure that all of the operating system packages (similar to RPM packages) are created properly and work together. As you probably know, a standard Linux system is not composed of a single unified source tree (like BSD), but is actually made up of about 25+ core packages that work together. Some of the packages include:
So, that's what's in store. But to begin this series, I'm going to diverge from this plan for just one article and prepare you for the journey ahead. I'll cover two topics very important to the Linux development community -- journaling, and the design vision behind ReiserFS. Journaling is very important because it's a technology that we've been anticipating for a long time, and it's finally here. It's used in ReiserFS, XFS, JFS, ext3 and GFS. It's important to understand exactly what journaling does and why Linux needs it. Even if you have a good grasp of journaling, I hope that my journaling intro will serve as a good model for explaining the technology to others, something that'll be common practice as departments and organizations worldwide begin transitioning to these new journaling filesystems. Often, this process begins with a "Linux guy/gal" such as yourself convincing others that it's the right thing to do.


In the second half of this article, we're going to take a look at the design vision behind ReiserFS. By doing so, we're going to get a good grasp on the fact that these new filesystems aren't just about doing the same old thing a bit faster. They also allow us to do things in ways that simply weren't possible before. Developers, keep this in mind as you read this series. The capabilities of these new filesystems will likely affect how you code your future Linux software development projects.


{{TableStart}}
=== Understanding Journaling: Meta-data ===
<tr><td class="active">Package</td><td class="active">Description</td></tr>
As you well know, filesystems exist to allow you to store, retrieve and manipulate data. And, in order to do this, a filesystem needs to maintain an internal data structure that keeps all your data organized and readily accessible. This internal data structure (literally, "the data about the data") is called meta-data. It is the structure of this meta-data that gives a filesystem its particular identity and performance characteristics.
<tr><td>linux</td><td>The actual kernel</td></tr>
<tr><td>util-linux</td><td>A collection of miscellaneous Linux-related programs</td></tr>
<tr><td>e2fsprogs</td><td>A collection of ext2 filesystem-related utilities</td></tr>
<tr><td>glibc</td><td>The GNU C library</td></tr>
{{TableEnd}}


{{Note|Gentoo fans: the original text above used to say "I'm the chief architect of Gentoo Linux, a next-generation Linux OS currently in beta. One of my primary responsibilities is to make sure that all of the binary packages (similar to RPM packages) are created properly and work together." This is noteworthy due to the fact that the initial focus of Gentoo was to provide working binary packages.}}
Normally, we don't interact with a filesystem's meta-data directly. Instead, a specific Linux filesystem driver takes care of that job for us. A Linux filesystem driver is specially written to manipulate this maze of meta-data. However, in order for the filesystem driver to work properly, it has one important requirement; it expects to find the meta-data in some kind of reasonable, consistent, non-corrupted state. Otherwise, the filesystem driver won't be able to understand or manipulate the meta-data, and you won't be able to access your files.


Each package is in its own tarball and is maintained by separate independent developers, or teams of developers. To create a distribution, each package has to be separately downloaded, compiled, and packaged. Every time a package must be fixed, upgraded, or improved, the compilation and packaging steps must be repeated (and this gets old really fast). To help eliminate the repetitive steps involved in creating and updating packages, I created the ebuild system, written almost entirely in bash. To enhance your bash knowledge, I'll show you how I implemented the unpack and compile portions of the ebuild system, step by step. As I explain each step, I'll also discuss why certain design decisions were made. By the end of this article, not only will you have an excellent grasp of larger-scale bash programming projects, but you'll also have implemented a good portion of a complete auto-build system.
=== Understanding Journaling: fsck ===
This is where <span style="color:green">fsck</span> comes in. When a Linux system boots, <span style="color:green">fsck</span> starts up and scans all local filesystems listed in the system's '''/etc/fstab''' file. <span style="color:green">fsck</span>'s job is to ensure that the to-be-mounted filesystems' meta-data is in a usable state. Most of the time, it is. When Linux shuts down, it carefully flushes all cached data to disk and ensures that the filesystem is cleanly unmounted, so that it's ready for use when the system starts up again. Typically, <span style="color:green">fsck</span> scans the to-be-mounted filesystems and finds that they were cleanly unmounted, and makes the reasonable assumption that all meta-data is OK.


=== Why bash? ===
However, we all know that every now and then, something atypical happens, such as an unexpected power failure or system lock-up. When these unfortunate situations occur, Linux doesn't have the opportunity to cleanly unmount the filesystem. When the system is rebooted and <span style="color:green">fsck</span> starts its scan, it detects that these filesystems were not cleanly unmounted and makes a reasonable assumption that the filesystems probably aren't ready to be seen by the Linux filesystem drivers. It's very likely that the meta-data is messed up in some way.
Bash is an essential component of the Gentoo Linux ebuild system. It was chosen as ebuild's primary language for a number of reasons. First, it has an uncomplicated and familiar syntax that is especially well suited for calling external programs. An auto-build system is "glue code" that automates the calling of external programs, and bash is very well suited to this type of application. Second, Bash's support for functions allowed the ebuild system to have modular, easy-to-understand code. Third, the ebuild system takes advantage of bash's support for environment variables, allowing package maintainers and developers to configure it easily, on-the-fly.


=== Build process review ===
So, to fix this situation, <span style="color:green">fsck</span> will begin an exhaustive scan and sanity check on the meta-data, correcting any errors that it finds along the way. Once fsck is complete, the filesystem is ready for use. Although some recently-modified data may have been lost due to the unexpected power failure or system lockup, since the meta-data is now consistent, the filesystem is ready to be mounted and be put to use.
Before we look at the ebuild system, let's review what's involved in getting a package compiled and installed. For our example, we will look at the "sed" package, a standard GNU text stream editing utility that is part of all Linux distributions. First, download the source tarball ('''sed-3.02.tar.gz''') (see [[#Resources|Resources]]). We will store this archive in '''/usr/src/distfiles''', a directory we will refer to using the environment variable <span style="color:green">$DISTDIR</span>. <span style="color:green">$DISTDIR</span> is the directory where all of our original source tarballs live; it's a big vault of source code.


Our next step is to create a temporary directory called '''work''', which houses the uncompressed sources. We'll refer to this directory later using the <span style="color:green">$WORKDIR</span> environment variable. To do this, change to a directory where we have write permission and type the following:
=== The Problem With fsck ===
<source lang="bash">
So far, this may not sound like a bad approach to ensuring filesystem consistency, but the solution isn't optimal. Problems arise from the fact that <span style="color:green">fsck</span> must scan a filesystem's entire meta-data in order to ensure filesystem consistency. Doing a complete consistency check on all meta-data is a time-consuming task in itself, normally taking at least several minutes to complete. Even worse, the bigger the filesystem, the longer this exhaustive scan takes. This is a big problem, because while <span style="color:green">fsck</span> is doing its thing, your Linux system is effectively offline, and if you have a large amount of filesystem storage, your system could be <span style="color:green">fsck</span>-ing for half an hour or more. Of course, standard <span style="color:green">fsck</span> behavior can have devastating results in mission-critical datacenter environments where system uptime is extremely important. Fortunately, there's a better solution.
$ mkdir work
$ cd work
$ tar xzf /usr/src/distfiles/sed-3.02.tar.gz
</source>
The tarball is then decompressed, creating a directory called '''sed-3.02''' that contains all of the sources. We'll refer to the '''sed-3.02''' directory later using the environment variable <span style="color:green">$SRCDIR</span>. To compile the program, type the following:
<source lang="bash">
$ cd sed-3.02
$ ./configure --prefix=/usr
(autoconf generates appropriate makefiles, this can take a while)


$ make
=== The Journal ===
Journaling filesystems solve this <span style="color:green">fsck</span> problem by adding a new data structure, called a journal, to the mix. This journal is an on-disk structure. Before the filesystem driver makes any changes to the meta-data, it writes an entry to the journal that describes what it's about to do. Then, it goes ahead and modifies the meta-data. By doing so, a journaling filesystem maintains a log of recent meta-data modifications, and this comes in handy when it comes time to check the consistency of a filesystem that wasn't cleanly unmounted.


(the package is compiled from sources, also takes a bit of time)
Think of journaling filesystems this way -- in addition to storing data (your stuff) and meta-data (the data about the stuff), they also have a journal, which you could call meta-meta-data (the data about the data about the stuff).
</source>
We're going to skip the "make install" step, since we are just covering the unpack and compile steps in this article. If we wanted to write a bash script to perform all these steps for us, it could look something like this:
<source lang="bash">
#!/usr/bin/env bash


if [ -d work ]
=== Journaling in Action ===
then
So, what does <span style="color:green">fsck</span> do with a journaling filesystem? Actually, normally, it does nothing. It simply ignores the filesystem and allows it to be mounted. The real magic behind quickly restoring the filesystem to a consistent state is found in the Linux filesystem driver. When the filesystem is mounted, the Linux filesystem driver checks to see whether the filesystem is OK. If for some reason it isn't, then the meta-data needs to be fixed, but instead of performing an exhaustive meta-data scan (like <span style="color:green">fsck</span>) it instead takes a look at the journal. Since the journal contains a chronological log of all recent meta-data changes, it simply inspects those portions of the meta-data that have been recently modified. Thus, it is able to bring the filesystem back to a consistent state in a matter of seconds. And unlike the more traditional approach that <span style="color:green">fsck</span> takes, this journal replaying process does not take longer on larger filesystems. Thanks to the journal, hundreds of Gigabytes of filesystem meta-data can be brought to a consistent state almost instantaneously.
# remove old work directory if it exists
      rm -rf work
fi
mkdir work
cd work
tar xzf /usr/src/distfiles/sed-3.02.tar.gz
cd sed-3.02
./configure --prefix=/usr
make
</source>


=== Generalizing the code ===
=== ReiserFS ===
Although this autocompile script works, it's not very flexible. Basically, the bash script just contains the listing of all the commands that were typed at the command line. While this solution works, it would be nice to make a generic script that can be configured quickly to unpack and compile any package just by changing a few lines. That way, it's much less work for the package maintainer to add new packages to the distribution. Let's take a first stab at doing this by using lots of different environment variables, making our build script more generic:
Now, we come to ReiserFS, the first of several journaling filesystems we're going to be investigating. ReiserFS 3.6.x (the version included as part of Linux 2.4+) is designed and developed by Hans Reiser and his team of developers at Namesys. Hans and his team share the philosophy that the best filesystems are those that help create a single shared environment, or namespace, where applications can interact more directly, efficiently and powerfully. To do this, a filesystem should meet the performance and feature needs of its users. That way, users can continue using the filesystem directly rather than building special-purpose layers that run on top of the filesystem, such as databases and the like.
<source lang="bash">
#!/usr/bin/env bash


# P is the package name
=== Small File Performance ===
So, how does one go about making the filesystem more accommodating? Namesys has decided to focus on one aspect of the filesystem, at least initially -- small file performance. In general, filesystems like ext2 and ufs don't do very well in this area, often forcing developers to turn to databases or special organizational hacks to get the kind of performance they need. Over time, this kind of "I'll code around the problem" approach encourages code bloat and lots of incompatible special-purpose APIs, which isn't a good thing.


P=sed-3.02
Here's an example of how ext2 can tend to encourage this kind of programming. ext2 is good at storing lots of twenty-plus k files, but isn't an ideal technology for storing 2,000 50-byte files. Not only does performance drop significantly when ext2 has to deal with extremely small files, but storage efficiency drops as well, since ext2 allocates space in either one or four k chunks (configurable when the filesystem is created).


# A is the archive name
Now, conventional wisdom would say that you aren't supposed to store that many ridiculously small files on a filesystem. Instead, they should be stored in some kind of database that runs above the filesystem. In reply, Hans Reiser would point out that whenever you need to build a layer on top of the filesystem, it means that the filesystem isn't meeting your needs. If the filesystem met your needs, then you could avoid using a special-purpose solution in the first place. You would thus save development time and eliminate the code bloat that you would have created by hand-rolling your own proprietary storage or caching mechanism, interfacing with a database library, etc.


A=${P}.tar.gz
Well, that's the theory. But how good is ReiserFS' small file performance in practice? Amazingly good. In fact, ReiserFS is around eight to fifteen times faster than ext2 when handling files smaller than one k in size! Even better, these performance improvements don't come at the expense of performance for other file types. In general, ReiserFS outperforms ext2 in nearly every area, but really shines when it comes to handling small files.


export ORIGDIR=`pwd`
=== ReiserFS Technology ===
export WORKDIR=${ORIGDIR}/work
So how does ReiserFS go about offering such excellent small file performance? ReiserFS uses a specially optimized b* balanced tree (one per filesystem) to organize all filesystem data. This in itself offers a nice performance boost, as well as easing artificial restrictions on filesystem layouts. It's now possible to have a directory that contains 100,000 other directories, for example. Another benefit of using a b*tree is that ReiserFS, like most other next-generation filesystems, dynamically allocates inodes as needed rather than creating a fixed set of inodes at filesystem creation time. This helps the filesystem to be more flexible to the various storage requirements that may be thrown at it, while at the same time allowing for some additional space-efficiency.
export SRCDIR=${WORKDIR}/${P}


if [ -z "$DISTDIR" ]
ReiserFS also has a host of features aimed specifically at improving small file performance. Unlike ext2, ReiserFS doesn't allocate storage space in fixed one k or four k blocks. Instead, it can allocate the exact size it needs. And ReiserFS also includes some special optimizations centered around tails, a name for files and end portions of files that are smaller than a filesystem block. In order to increase performance, ReiserFS is able to store files inside the b*tree leaf nodes themselves, rather than storing the data somewhere else on the disk and pointing to it.
then
# set DISTDIR to /usr/src/distfiles if not already set
        DISTDIR=/usr/src/distfiles
fi
export DISTDIR


if [ -d ${WORKDIR} ]
This does two things. First, it dramatically increases small file performance. Since the file data and the stat_data (inode) information are stored right next to each other, they can normally be read with a single disk IO operation. Second, ReiserFS is able to pack the tails together, saving a lot of space. In fact, a ReiserFS filesystem with tail packing enabled (the default) can store six percent more data than the equivalent ext2 filesystem, which is amazing in itself.
then   
# remove old work directory if it exists
        rm -rf ${WORKDIR}
fi


mkdir ${WORKDIR}
However, tail packing does cause a slight performance hit since it forces ReiserFS to repack data as files are modified. For this reason, ReiserFS tail packing can be turned off, allowing the administrator to choose between good speed and space efficiency, or opt for even more speed at the cost of some storage capacity.
cd ${WORKDIR}
tar xzf ${DISTDIR}/${A}
cd ${SRCDIR}
./configure --prefix=/usr
make
</source>
We've added a lot of environment variables to the code, but it still does basically the same thing. However, now, to compile any standard GNU autoconf-based source tarball, we can simply copy this file to a new file (with an appropriate name to reflect the name of the new package it compiles), and then change the values of <span style"color:green:>$A</span> and <span style"color:green:>$P</span> to new values. All other environment variables automatically adjust to the correct settings, and the script works as expected. While this is handy, there's a further improvement that can be made to the code. This particular code is much longer than the original "transcript" script that we created. Since one of the goals for any programming project should be the reduction of complexity for the user, it would be nice to dramatically shrink the code, or at least organize it better. We can do this by performing a neat trick -- we'll split the code into two separate files. Save this file as '''sed-3.02.ebuild''':
<source lang="bash">
#the sed ebuild file -- very simple!
P=sed-3.02
A=${P}.tar.gz
</source>
Our first file is trivial, and contains only those environment variables that must be configured on a per-package basis. Here's the second file, which contains the brains of the operation. Save this one as "ebuild" and make it executable:
<source lang="bash">
#!/usr/bin/env bash


 
ReiserFS truly is an excellent filesystem. In my next article, I'll guide you through the process of setting up ReiserFS under Linux. We'll also take a close look at performance tuning, application interactions (and how to work around them), the best kernels to use, and more.
if [ $# -ne 1 ]
then
        echo "one argument expected."
        exit 1
fi
 
if [ -e "$1" ]
then
        source $1
else
        echo "ebuild file $1 not found."
        exit 1
fi
 
export ORIGDIR=`pwd`
export WORKDIR=${ORIGDIR}/work
export SRCDIR=${WORKDIR}/${P}
 
if [ -z "$DISTDIR" ]
then
        # set DISTDIR to /usr/src/distfiles if not already set
        DISTDIR=/usr/src/distfiles
fi
export DISTDIR
 
if [ -d ${WORKDIR} ]
then   
        # remove old work directory if it exists
        rm -rf ${WORKDIR}
fi
 
mkdir ${WORKDIR}
cd ${WORKDIR}
tar xzf ${DISTDIR}/${A}
cd ${SRCDIR}
./configure --prefix=/usr
make
</source>
Now that we've split our build system into two files, I bet you're wondering how it works. Basically, to compile sed, type:
<source lang="bash">
$ ./ebuild sed-3.02.ebuild
</source>
When "ebuild" executes, it first tries to "source" variable <span style="color:green">$1</span>. What does this mean? From my previous article, recall that <span style="color:green">$1</span> is the first command line argument -- in this case, '''sed-3.02.ebuild'''. In bash, the "source" command reads in bash statements from a file, and executes them as if they appeared immediately in the file the "source" command is in. So, "source ${1}" causes the "ebuild" script to execute the commands in '''sed-3.02.ebuild''', which cause <span style="color:green">$P</span> and <span style="color:green">$A</span> to be defined. This design change is really handy, because if we want to compile another program instead of sed, we can simply create a new '''.ebuild''' file and pass it as an argument to our "ebuild" script. That way, the '''.ebuild''' files end up being really simple, while the complicated brains of the ebuild system get stored in one place -- our "ebuild" script. This way, we can upgrade or enhance the ebuild system simply by editing the "ebuild" script, keeping the implementation details outside of the ebuild files. Here's a sample ebuild file for <span style="color:green">gzip</span>:
<source lang="bash">
#another really simple ebuild script!
P=gzip-1.2.4a
A=${P}.tar.gz
</source>
 
=== Adding functionality ===
OK, we're making some progress. But, there is some additional functionality I'd like to add. I'd like the ebuild script to accept a second command-line argument, which will be <span style="color:green">compile</span>, <span style="color:green">unpack</span>, or <span style="color:green">all</span>. This second command-line argument tells the ebuild script which particular step of the build process to perform. That way, I can tell ebuild to unpack the archive, but not compile it (just in case I need to inspect the source archive before compilation begins). To do this, I'll add a case statement that will test variable <span style="color:green">$2</span>, and do different things based on its value. Here's what the code looks like now:
<source lang="bash">
#!/usr/bin/env bash
 
if [ $# -ne 2 ]
then
        echo "Please specify two args - .ebuild file and unpack, compile or all"
        exit 1
fi
 
 
if [ -z "$DISTDIR" ]
then
# set DISTDIR to /usr/src/distfiles if not already set
        DISTDIR=/usr/src/distfiles
fi
export DISTDIR
 
ebuild_unpack() {
        #make sure we're in the right directory
        cd ${ORIGDIR}
       
        if [ -d ${WORKDIR} ]
        then   
                rm -rf ${WORKDIR}
        fi
 
        mkdir ${WORKDIR}
        cd ${WORKDIR}
        if [ ! -e ${DISTDIR}/${A} ]
        then
            echo "${DISTDIR}/${A} does not exist.  Please download first."
            exit 1
        fi   
        tar xzf ${DISTDIR}/${A}
        echo "Unpacked ${DISTDIR}/${A}."
        #source is now correctly unpacked
}
 
 
ebuild_compile() {
       
        #make sure we're in the right directory
        cd ${SRCDIR}
        if [ ! -d "${SRCDIR}" ]
        then
                echo "${SRCDIR} does not exist -- please unpack first."
                exit 1
        fi
        ./configure --prefix=/usr
        make   
}
 
export ORIGDIR=`pwd`
export WORKDIR=${ORIGDIR}/work
 
if [ -e "$1" ]
then
        source $1
else
        echo "Ebuild file $1 not found."
        exit 1
fi
 
export SRCDIR=${WORKDIR}/${P}
 
case "${2}" in
        unpack)
                ebuild_unpack
                ;;
        compile)
                ebuild_compile
                ;;
        all)
                ebuild_unpack
                ebuild_compile
                ;;
        *)
                echo "Please specify unpack, compile or all as the second arg"
                exit 1
                ;;
esac
</source>
We've made a lot of changes, so let's review them. First, we placed the compile and unpack steps in their own functions, and called <span style="color:green:>ebuild_compile()</span> and <span style="color:green">ebuild_unpack()</span>, respectively. This is a good move, since the code is getting more complicated, and the new functions provide some modularity, which helps to keep things organized. On the first line in each function, I explicitly <span style="color:green">cd</span> into the directory I want to be in because, as our code is becoming more modular rather than linear, it's more likely that we might slip up and execute a function in the wrong current working directory. The <span style="color:green">cd</span> commands explicitly put us in the right place, and prevent us from making a mistake later -- an important step -- especially if you will be deleting files inside the functions.
 
Also, I added a useful check to the beginning of the <span style="color:green">ebuild_compile()</span> function. Now, it checks to make sure the <span style="color:green">$SRCDIR</span> exists, and, if not, it prints an error message telling the user to unpack the archive first, and then exits. If you like, you can change this behavior so that if <span style="color:green">$SRCDIR</span> doesn't exist, our ebuild script will unpack the source archive automatically. You can do this by replacing <span style="color:green">ebuild_compile()</span> with the following code:
<source lang="bash">
ebuild_compile() {
        #make sure we're in the right directory
        if [ ! -d "${SRCDIR}" ]
        then
                ebuild_unpack
        fi
        cd ${SRCDIR}
        ./configure --prefix=/usr
        make   
}
</source>
One of the most obvious changes in our second version of the ebuild script is the new case statement at the end of the code. This case statement simply checks the second command-line argument, and performs the correct action, depending on its value. If we now type:
<source lang="bash">
$ ebuild sed-3.02.ebuild
</source>
We'll actually get an error message. ebuild now wants to be told what to do, as follows:
<source lang="bash">
$ ebuild sed-3.02.ebuild unpack
</source>
or:
<source lang="bash">
$ ebuild sed-3.02.ebuild compile
</source>
or:
<source lang="bash">
$ ebuild sed-3.02.ebuild all
</source>
 
{{fancyimportant|If you provide a second command-line argument, other than those listed above, you get an error message (the * clause), and the program exits.}}
 
=== Modularizing the code ===
Now that the code is quite advanced and functional, you may be tempted to create several more ebuild scripts to unpack and compile your favorite programs. If you do, sooner or later you'll come across some sources that do not use autoconf (<span style="color:green">./configure</span>) or possibly others that have non-standard compilation processes. We need to make some more changes to the ebuild system to accommodate these programs. But before we do, it is a good idea to think a bit about how to accomplish this.
 
One of the great things about hard-coding <span style="color:green">./configure --prefix=/usr; make</span> into our compile stage is that, most of the time, it works. But, we must also have the ebuild system accommodate sources that do not use autoconf or normal Makefiles. To solve this problem, I propose that our ebuild script should, by default, do the following:
 
# If there is a configure script in <span style="color:green">${SRCDIR}</span>, execute it as follows: <span style="color:green">./configure --prefix=/usr</span>. Otherwise, skip this step.
# Execute the following command: make
 
Since ebuild only runs configure if it actually exists, we can now automatically accommodate those programs that don't use autoconf and have standard makefiles. But what if a simple "make" doesn't do the trick for some sources? We need a way to override our reasonable defaults with some specific code to handle these situations. To do this, we'll transform our <span style="color:green">ebuild_compile()</span> function into two functions. The first function, which can be looked at as a "parent" function, will still be called <span style="color:green">ebuild_compile()</span>. However, we'll have a new function, called <span style="color:green">user_compile()</span>, which contains only our reasonable default actions:
<source lang="bash">
user_compile() {
        #we're already in ${SRCDIR}
        if [ -e configure ]
        then
                #run configure script if it exists
                ./configure --prefix=/usr
        fi
        #run make
        make
}             
 
ebuild_compile() {
        if [ ! -d "${SRCDIR}" ]
        then
                echo "${SRCDIR} does not exist -- please unpack first."
                exit 1
        fi
        #make sure we're in the right directory
        cd ${SRCDIR}
        user_compile
}
</source>
It may not seem obvious why I'm doing this right now, but bear with me. While the code works almost identically to our previous version of ebuild, we can now do something that we couldn't do before -- we can override <span style="color:green">user_compile()</span> in '''sed-3.02.ebuild'''. So, if the default <span style="color:green:>user_compile()</span> function doesn't meet our needs, we can define a new one in our '''.ebuild''' file that contains the commands required to compile the package. For example, here's an ebuild file for <span style="color:green">e2fsprogs-1.18</span>, which requires a slightly different <span style="color:green">./configure</span> line:
<source lang="bash">
#this ebuild file overrides the default user_compile()
P=e2fsprogs-1.18
A=${P}.tar.gz
user_compile() {
      ./configure --enable-elf-shlibs
      make
}
</source>
Now, <span style="color:green">e2fsprogs</span> will be compiled exactly the way we want it to be. But, for most packages, we can omit any custom <span style="color:green">user_compile()</span> function in the '''.ebuild''' file, and the default user_compile() function is used instead.
 
How exactly does the ebuild script know which user_compile() function to use? This is actually quite simple. In the ebuild script, the default <span style="color:green">user_compile()</span> function is defined before the '''e2fsprogs-1.18.ebuild''' file is sourced. If there is a <span style="color:green">user_compile()</span> in '''e2fsprogs-1.18.ebuild''', it overwrites the default version defined previously. If not, the default <span style="color:green">user_compile()</span> function is used.
 
This is great stuff; we've added a lot of flexibility without requiring any complex code if it's not needed. We won't cover it here, but you could also make similar modifications to <span style="color:green">ebuild_unpack()</span> so that users can override the default unpacking process. This could come in handy if any patching has to be done, or if the files are contained in multiple archives. It is also a good idea to modify our unpacking code so that it recognizes bzip2-compressed tarballs by default.
 
=== Configuration files ===
We've covered a lot of sneaky bash techniques so far, and now it's time to cover one more. Often, it's handy for a program to have a global configuration file that resides in '''/etc'''. Fortunately, this is easy to do using bash. Simply create the following file and save it as '''/etc/ebuild.conf''':
<source lang="bash">
# /etc/ebuild.conf: set system-wide ebuild options in this file
 
# MAKEOPTS are options passed to make
MAKEOPTS="-j2"
</source>
In this example, I've included just one configuration option, but you could include many more. One of the beautiful things about bash is that this file can be parsed by simply sourcing it. This is a design trick that works with most interpreted languages. After '''/etc/ebuild.conf''' is sourced, <span style="color:green">$MAKEOPTS</span> is defined inside our ebuild script. We'll use it to allow the user to pass options to make. Normally, this option would be used to allow the user to tell ebuild to do a parallel make. This is explained below.
 
{{fancynote|'''What is a parallel make?''' <nowiki>To speed compilation on multiprocessor systems, make supports compiling a program in parallel. This means that instead of compiling just one source file at a time, make compiles a user-specified number of source files simultaneously (so those extra processors in a multiprocessor system are used). Parallel makes are enabled by passing the -j # option to make, as follows: make -j4 MAKE="make -j4". This code instructs make to compile four programs simultaneously. The MAKE="make -j4" argument tells make to pass the -j4 option to any child make processes it launches.</nowiki>}}
 
Here's the final version of our ebuild program:
<source lang="bash">
#!/usr/bin/env bash
 
if [ $# -ne 2 ]
then
        echo "Please specify ebuild file and unpack, compile or all"
        exit 1
fi
 
source /etc/ebuild.conf
 
if [ -z "$DISTDIR" ]
then
        # set DISTDIR to /usr/src/distfiles if not already set
        DISTDIR=/usr/src/distfiles
fi
export DISTDIR
 
ebuild_unpack() {
        #make sure we're in the right directory
        cd ${ORIGDIR}
       
        if [ -d ${WORKDIR} ]
        then   
                rm -rf ${WORKDIR}
        fi
 
        mkdir ${WORKDIR}
        cd ${WORKDIR}
        if [ ! -e ${DISTDIR}/${A} ]
        then
                echo "${DISTDIR}/${A} does not exist.  Please download first."
                exit 1
        fi
        tar xzf ${DISTDIR}/${A}
        echo "Unpacked ${DISTDIR}/${A}."
        #source is now correctly unpacked
}
 
user_compile() {
        #we're already in ${SRCDIR}
        if [ -e configure ]
        then
                #run configure script if it exists
                ./configure --prefix=/usr
        fi
        #run make
        make $MAKEOPTS MAKE="make $MAKEOPTS" 
}
 
ebuild_compile() {
        if [ ! -d "${SRCDIR}" ]
        then
                echo "${SRCDIR} does not exist -- please unpack first."
                exit 1
        fi
        #make sure we're in the right directory
        cd ${SRCDIR}
        user_compile
}
 
export ORIGDIR=`pwd`
export WORKDIR=${ORIGDIR}/work
 
if [ -e "$1" ]
then
        source $1
else
        echo "Ebuild file $1 not found."
        exit 1
fi
 
export SRCDIR=${WORKDIR}/${P}
 
case "${2}" in
        unpack)
                ebuild_unpack
                ;;
        compile)
                ebuild_compile
                ;;
        all)
                ebuild_unpack
                ebuild_compile
                ;;
        *)
                echo "Please specify unpack, compile or all as the second arg"
                exit 1
                ;;
esac
</source>
Notice '''/etc/ebuild.conf''' is sourced near the beginning of the file. Also, notice that we use <span style="color:green">$MAKEOPTS</span> in our default <span style="color:green">user_compile()</span> function. You may be wondering how this will work -- after all, we refer to <span style="color:green">$MAKEOPTS</span> before we source '''/etc/ebuild.conf''', which actually defines <span style="color:green">$MAKEOPTS</span> in the first place. Fortunately for us, this is OK because variable expansion only happens when <span style="color:green">user_compile()</span> is executed. By the time <span style="color:green">user_compile()</span> is executed, '''/etc/ebuild.conf''' has already been sourced, and <span style="color:green">$MAKEOPTS</span> is set to the correct value.
 
=== Wrapping it up ===
We've covered a lot of bash programming techniques in this article, but we've only touched the surface of the power of bash. For example, the production Gentoo Linux ebuild system not only automatically unpacks and compiles each package, but it can also:
 
* Automatically download the sources if they are not found in $DISTDIR
* Verify that the sources are not corrupted by using MD5 message digests
* If requested, install the compiled application into the live filesystem, recording all installed files so that the package can be easily uninstalled at a later date.
* If requested, package the compiled application in a tarball (compressed the way you like it) so that it can be installed later, on another computer, or during the CD-based installation process (if you are building a distribution CD)
 
In addition, the production ebuild system has several other global configuration options, allowing the user to specify options such as what optimization flags to use during compilation, and whether optional support for packages like GNOME and slang should be enabled by default in those packages that support it.
 
It's clear that bash can accomplish much more than what I've touched on in this series of articles. I hope you've learned a lot about this incredible tool, and are excited about using bash to speed up and enhance your development projects.


== Resources ==
== Resources ==
Be sure to checkout the other articles in this series:
* [[Funtoo Filesystem Guide, Part 1|Part 1]]: Journaling and ReiserFS
* [[Funtoo Filesystem Guide, Part 2|Part 2]]: Using ReiserFS and Linux
* [[Funtoo Filesystem Guide, Part 3|Part 3]]: Tmpfs and bind mounts
* [[Funtoo Filesystem Guide, Part 4|Part 4]]: Introducing Ext3
* [[Funtoo Filesystem Guide, Part 5|Part 5]]: Ext3 in action


* Download the source tarball ('''sed-3.02.tar.gz''') from ftp://ftp.gnu.org/pub/gnu/sed.
[[Category:Filesystem Guides]]
* Read [[Bash by example, Part1]].
* Read [[Bash by example, Part 2]].
* Check out the [http://www.gnu.org/software/bash/manual/bash.html bash online reference manual].
 
__NOTOC__
[[Category:Linux Core Concepts]]
[[Category:Articles]]
[[Category:Articles]]
{{ArticleFooter}}
{{ArticleFooter}}

Revision as of 08:53, December 28, 2014

   Support Funtoo!
Get an awesome Funtoo container and support Funtoo! See Funtoo Containers for more information.

Journaling and ReiserFS

What's in Store

The purpose of this series is to give you a solid, practical introduction to Linux's various new filesystems, including ReiserFS, XFS, JFS, GFS, ext3 and others. I want to equip you with the necessary practical knowledge you need to actually start using these filesystems. My goal is to help you avoid as many potential pitfalls as possible; this means that we're going to take a careful look at filesystem stability, performance issues (both good and bad), any negative application interactions that you should be aware of, the best kernel/patch combinations, and more. Consider this series an "insider's guide" to these next-generation filesystems.

So, that's what's in store. But to begin this series, I'm going to diverge from this plan for just one article and prepare you for the journey ahead. I'll cover two topics very important to the Linux development community -- journaling, and the design vision behind ReiserFS. Journaling is very important because it's a technology that we've been anticipating for a long time, and it's finally here. It's used in ReiserFS, XFS, JFS, ext3 and GFS. It's important to understand exactly what journaling does and why Linux needs it. Even if you have a good grasp of journaling, I hope that my journaling intro will serve as a good model for explaining the technology to others, something that'll be common practice as departments and organizations worldwide begin transitioning to these new journaling filesystems. Often, this process begins with a "Linux guy/gal" such as yourself convincing others that it's the right thing to do.

In the second half of this article, we're going to take a look at the design vision behind ReiserFS. By doing so, we're going to get a good grasp on the fact that these new filesystems aren't just about doing the same old thing a bit faster. They also allow us to do things in ways that simply weren't possible before. Developers, keep this in mind as you read this series. The capabilities of these new filesystems will likely affect how you code your future Linux software development projects.

Understanding Journaling: Meta-data

As you well know, filesystems exist to allow you to store, retrieve and manipulate data. And, in order to do this, a filesystem needs to maintain an internal data structure that keeps all your data organized and readily accessible. This internal data structure (literally, "the data about the data") is called meta-data. It is the structure of this meta-data that gives a filesystem its particular identity and performance characteristics.

Normally, we don't interact with a filesystem's meta-data directly. Instead, a specific Linux filesystem driver takes care of that job for us. A Linux filesystem driver is specially written to manipulate this maze of meta-data. However, in order for the filesystem driver to work properly, it has one important requirement; it expects to find the meta-data in some kind of reasonable, consistent, non-corrupted state. Otherwise, the filesystem driver won't be able to understand or manipulate the meta-data, and you won't be able to access your files.

Understanding Journaling: fsck

This is where fsck comes in. When a Linux system boots, fsck starts up and scans all local filesystems listed in the system's /etc/fstab file. fsck's job is to ensure that the to-be-mounted filesystems' meta-data is in a usable state. Most of the time, it is. When Linux shuts down, it carefully flushes all cached data to disk and ensures that the filesystem is cleanly unmounted, so that it's ready for use when the system starts up again. Typically, fsck scans the to-be-mounted filesystems and finds that they were cleanly unmounted, and makes the reasonable assumption that all meta-data is OK.

However, we all know that every now and then, something atypical happens, such as an unexpected power failure or system lock-up. When these unfortunate situations occur, Linux doesn't have the opportunity to cleanly unmount the filesystem. When the system is rebooted and fsck starts its scan, it detects that these filesystems were not cleanly unmounted and makes a reasonable assumption that the filesystems probably aren't ready to be seen by the Linux filesystem drivers. It's very likely that the meta-data is messed up in some way.

So, to fix this situation, fsck will begin an exhaustive scan and sanity check on the meta-data, correcting any errors that it finds along the way. Once fsck is complete, the filesystem is ready for use. Although some recently-modified data may have been lost due to the unexpected power failure or system lockup, since the meta-data is now consistent, the filesystem is ready to be mounted and be put to use.

The Problem With fsck

So far, this may not sound like a bad approach to ensuring filesystem consistency, but the solution isn't optimal. Problems arise from the fact that fsck must scan a filesystem's entire meta-data in order to ensure filesystem consistency. Doing a complete consistency check on all meta-data is a time-consuming task in itself, normally taking at least several minutes to complete. Even worse, the bigger the filesystem, the longer this exhaustive scan takes. This is a big problem, because while fsck is doing its thing, your Linux system is effectively offline, and if you have a large amount of filesystem storage, your system could be fsck-ing for half an hour or more. Of course, standard fsck behavior can have devastating results in mission-critical datacenter environments where system uptime is extremely important. Fortunately, there's a better solution.

The Journal

Journaling filesystems solve this fsck problem by adding a new data structure, called a journal, to the mix. This journal is an on-disk structure. Before the filesystem driver makes any changes to the meta-data, it writes an entry to the journal that describes what it's about to do. Then, it goes ahead and modifies the meta-data. By doing so, a journaling filesystem maintains a log of recent meta-data modifications, and this comes in handy when it comes time to check the consistency of a filesystem that wasn't cleanly unmounted.

Think of journaling filesystems this way -- in addition to storing data (your stuff) and meta-data (the data about the stuff), they also have a journal, which you could call meta-meta-data (the data about the data about the stuff).

Journaling in Action

So, what does fsck do with a journaling filesystem? Actually, normally, it does nothing. It simply ignores the filesystem and allows it to be mounted. The real magic behind quickly restoring the filesystem to a consistent state is found in the Linux filesystem driver. When the filesystem is mounted, the Linux filesystem driver checks to see whether the filesystem is OK. If for some reason it isn't, then the meta-data needs to be fixed, but instead of performing an exhaustive meta-data scan (like fsck) it instead takes a look at the journal. Since the journal contains a chronological log of all recent meta-data changes, it simply inspects those portions of the meta-data that have been recently modified. Thus, it is able to bring the filesystem back to a consistent state in a matter of seconds. And unlike the more traditional approach that fsck takes, this journal replaying process does not take longer on larger filesystems. Thanks to the journal, hundreds of Gigabytes of filesystem meta-data can be brought to a consistent state almost instantaneously.

ReiserFS

Now, we come to ReiserFS, the first of several journaling filesystems we're going to be investigating. ReiserFS 3.6.x (the version included as part of Linux 2.4+) is designed and developed by Hans Reiser and his team of developers at Namesys. Hans and his team share the philosophy that the best filesystems are those that help create a single shared environment, or namespace, where applications can interact more directly, efficiently and powerfully. To do this, a filesystem should meet the performance and feature needs of its users. That way, users can continue using the filesystem directly rather than building special-purpose layers that run on top of the filesystem, such as databases and the like.

Small File Performance

So, how does one go about making the filesystem more accommodating? Namesys has decided to focus on one aspect of the filesystem, at least initially -- small file performance. In general, filesystems like ext2 and ufs don't do very well in this area, often forcing developers to turn to databases or special organizational hacks to get the kind of performance they need. Over time, this kind of "I'll code around the problem" approach encourages code bloat and lots of incompatible special-purpose APIs, which isn't a good thing.

Here's an example of how ext2 can tend to encourage this kind of programming. ext2 is good at storing lots of twenty-plus k files, but isn't an ideal technology for storing 2,000 50-byte files. Not only does performance drop significantly when ext2 has to deal with extremely small files, but storage efficiency drops as well, since ext2 allocates space in either one or four k chunks (configurable when the filesystem is created).

Now, conventional wisdom would say that you aren't supposed to store that many ridiculously small files on a filesystem. Instead, they should be stored in some kind of database that runs above the filesystem. In reply, Hans Reiser would point out that whenever you need to build a layer on top of the filesystem, it means that the filesystem isn't meeting your needs. If the filesystem met your needs, then you could avoid using a special-purpose solution in the first place. You would thus save development time and eliminate the code bloat that you would have created by hand-rolling your own proprietary storage or caching mechanism, interfacing with a database library, etc.

Well, that's the theory. But how good is ReiserFS' small file performance in practice? Amazingly good. In fact, ReiserFS is around eight to fifteen times faster than ext2 when handling files smaller than one k in size! Even better, these performance improvements don't come at the expense of performance for other file types. In general, ReiserFS outperforms ext2 in nearly every area, but really shines when it comes to handling small files.

ReiserFS Technology

So how does ReiserFS go about offering such excellent small file performance? ReiserFS uses a specially optimized b* balanced tree (one per filesystem) to organize all filesystem data. This in itself offers a nice performance boost, as well as easing artificial restrictions on filesystem layouts. It's now possible to have a directory that contains 100,000 other directories, for example. Another benefit of using a b*tree is that ReiserFS, like most other next-generation filesystems, dynamically allocates inodes as needed rather than creating a fixed set of inodes at filesystem creation time. This helps the filesystem to be more flexible to the various storage requirements that may be thrown at it, while at the same time allowing for some additional space-efficiency.

ReiserFS also has a host of features aimed specifically at improving small file performance. Unlike ext2, ReiserFS doesn't allocate storage space in fixed one k or four k blocks. Instead, it can allocate the exact size it needs. And ReiserFS also includes some special optimizations centered around tails, a name for files and end portions of files that are smaller than a filesystem block. In order to increase performance, ReiserFS is able to store files inside the b*tree leaf nodes themselves, rather than storing the data somewhere else on the disk and pointing to it.

This does two things. First, it dramatically increases small file performance. Since the file data and the stat_data (inode) information are stored right next to each other, they can normally be read with a single disk IO operation. Second, ReiserFS is able to pack the tails together, saving a lot of space. In fact, a ReiserFS filesystem with tail packing enabled (the default) can store six percent more data than the equivalent ext2 filesystem, which is amazing in itself.

However, tail packing does cause a slight performance hit since it forces ReiserFS to repack data as files are modified. For this reason, ReiserFS tail packing can be turned off, allowing the administrator to choose between good speed and space efficiency, or opt for even more speed at the cost of some storage capacity.

ReiserFS truly is an excellent filesystem. In my next article, I'll guide you through the process of setting up ReiserFS under Linux. We'll also take a close look at performance tuning, application interactions (and how to work around them), the best kernels to use, and more.

Resources

Be sure to checkout the other articles in this series:

   Tip

Read the next article in this series: Funtoo Filesystem Guide, Part 2

   Note

Browse all our available articles below. Use the search field to search for topics and keywords in real-time.

Article Subtitle
Article Subtitle
Awk by Example, Part 1 An intro to the great language with the strange name
Awk by Example, Part 2 Records, loops, and arrays
Awk by Example, Part 3 String functions and ... checkbooks?
Bash by Example, Part 1 Fundamental programming in the Bourne again shell (bash)
Bash by Example, Part 2 More bash programming fundamentals
Bash by Example, Part 3 Exploring the ebuild system
BTRFS Fun
Funtoo Filesystem Guide, Part 1 Journaling and ReiserFS
Funtoo Filesystem Guide, Part 2 Using ReiserFS and Linux
Funtoo Filesystem Guide, Part 3 Tmpfs and Bind Mounts
Funtoo Filesystem Guide, Part 4 Introducing Ext3
Funtoo Filesystem Guide, Part 5 Ext3 in Action
GUID Booting Guide
Learning Linux LVM, Part 1 Storage management magic with Logical Volume Management
Learning Linux LVM, Part 2 The cvs.gentoo.org upgrade
Libvirt
Linux Fundamentals, Part 1
Linux Fundamentals, Part 2
Linux Fundamentals, Part 3
Linux Fundamentals, Part 4
LVM Fun
Making the Distribution, Part 1
Making the Distribution, Part 2
Making the Distribution, Part 3
Maximum Swappage Getting the most out of swap
On screen annotation Write on top of apps on your screen
OpenSSH Key Management, Part 1 Understanding RSA/DSA Authentication
OpenSSH Key Management, Part 2 Introducing ssh-agent and keychain
OpenSSH Key Management, Part 3 Agent Forwarding
Partition Planning Tips Keeping things organized on disk
Partitioning in Action, Part 1 Moving /home
Partitioning in Action, Part 2 Consolidating data
POSIX Threads Explained, Part 1 A simple and nimble tool for memory sharing
POSIX Threads Explained, Part 2
POSIX Threads Explained, Part 3 Improve efficiency with condition variables
Sed by Example, Part 1
Sed by Example, Part 2
Sed by Example, Part 3
Successful booting with UUID Guide to use UUID for consistent booting.
The Gentoo.org Redesign, Part 1 A site reborn
The Gentoo.org Redesign, Part 2 The Documentation System
The Gentoo.org Redesign, Part 3 The New Main Pages
The Gentoo.org Redesign, Part 4 The Final Touch of XML
Traffic Control
Windows 10 Virtualization with KVM