Difference between pages "Sed by Example, Part 2" and "Bash by Example, Part 3"

(Difference between pages)
 
 
Line 1: Line 1:
 
{{Article
 
{{Article
 
|Author=Drobbins
 
|Author=Drobbins
|Previous in Series=Sed by Example, Part 1
+
|Previous in Series=Bash by Example, Part 2
|Next in Series=Sed by Example, Part 3
+
 
}}
 
}}
== How to further take advantage of the UNIX text editor ==
+
== Exploring the ebuild system ==
  
=== Substitution! ===
+
=== Enter the ebuild system ===
Let's look at one of sed's most useful commands, the substitution command. Using it, we can replace a particular string or matched regular expression with another string. Here's an example of the most basic use of this command:
+
I've really been looking forward to this third and final ''Bash by example'' article, because now that we've already covered bash programming fundamentals in [[Bash by example, Part1|Part 1]] and [[Bash by example, Part 2|Part 2]], we can focus on more advanced topics, like bash application development and program design. For this article, I will give you a good dose of practical, real-world bash development experience by presenting a project that I've spent many hours coding and refining: the Gentoo Linux ebuild system.
  
<console>$##i## sed -e 's/foo/bar/' myfile.txt</console>
+
As the creator of Gentoo Linux and the guy behind Funtoo Linux, one of my primary responsibilities is to make sure that all of the operating system packages (similar to RPM packages) are created properly and work together. As you probably know, a standard Linux system is not composed of a single unified source tree (like BSD), but is actually made up of about 25+ core packages that work together. Some of the packages include:
  
The above command will output the contents of myfile.txt to stdout, with the first occurrence of 'foo' (if any) on each line replaced with the string 'bar'. Please note that I said first occurrence on each line, though this is normally not what you want. Normally, when I do a string replacement, I want to perform it globally. That is, I want to replace all occurrences on every line, as follows:
 
  
<console>$##i## sed -e 's/foo/bar/g' myfile.txt</console>
+
{{TableStart}}
 +
<tr><td class="info">Package</td><td class="info">Description</td></tr>
 +
<tr><td>linux</td><td>The actual kernel</td></tr>
 +
<tr><td>util-linux</td><td>A collection of miscellaneous Linux-related programs</td></tr>
 +
<tr><td>e2fsprogs</td><td>A collection of ext2 filesystem-related utilities</td></tr>
 +
<tr><td>glibc</td><td>The GNU C library</td>
 +
{{TableEnd}}
  
The additional 'g' option after the last slash tells sed to perform a global replace.
+
{{Note|Gentoo fans: the original text above used to say "I'm the chief architect of Gentoo Linux, a next-generation Linux OS currently in beta. One of my primary responsibilities is to make sure that all of the binary packages (similar to RPM packages) are created properly and work together." This is noteworthy due to the fact that the initial focus of Gentoo was to provide working binary packages.}}
  
Here are a few other things you should know about the <span style="color:green">s///</span> substitution command. First, it is a command, and a command only; there are no addresses specified in any of the above examples. This means that the <span style="color:green">s///</span> command can also be used with addresses to control what lines it will be applied to, as follows:
+
Each package is in its own tarball and is maintained by separate independent developers, or teams of developers. To create a distribution, each package has to be separately downloaded, compiled, and packaged. Every time a package must be fixed, upgraded, or improved, the compilation and packaging steps must be repeated (and this gets old really fast). To help eliminate the repetitive steps involved in creating and updating packages, I created the ebuild system, written almost entirely in bash. To enhance your bash knowledge, I'll show you how I implemented the unpack and compile portions of the ebuild system, step by step. As I explain each step, I'll also discuss why certain design decisions were made. By the end of this article, not only will you have an excellent grasp of larger-scale bash programming projects, but you'll also have implemented a good portion of a complete auto-build system.
  
$ sed -e '1,10s/enchantment/entrapment/g' myfile2.txt
+
=== Why bash? ===
 +
Bash is an essential component of the Gentoo Linux ebuild system. It was chosen as ebuild's primary language for a number of reasons. First, it has an uncomplicated and familiar syntax that is especially well suited for calling external programs. An auto-build system is "glue code" that automates the calling of external programs, and bash is very well suited to this type of application. Second, Bash's support for functions allowed the ebuild system to have modular, easy-to-understand code. Third, the ebuild system takes advantage of bash's support for environment variables, allowing package maintainers and developers to configure it easily, on-the-fly.
  
The above example will cause all occurrences of the phrase 'enchantment' to be replaced with the phrase 'entrapment', but only on lines one through ten, inclusive.
+
=== Build process review ===
 +
Before we look at the ebuild system, let's review what's involved in getting a package compiled and installed. For our example, we will look at the "sed" package, a standard GNU text stream editing utility that is part of all Linux distributions. First, download the source tarball ('''sed-3.02.tar.gz''') (see [[#Resources|Resources]]). We will store this archive in '''/usr/src/distfiles''', a directory we will refer to using the environment variable <span style="color:green">$DISTDIR</span>. <span style="color:green">$DISTDIR</span> is the directory where all of our original source tarballs live; it's a big vault of source code.
  
$ sed -e '/^$/,/^END/s/hills/mountains/g' myfile3.txt
+
Our next step is to create a temporary directory called '''work''', which houses the uncompressed sources. We'll refer to this directory later using the <span style="color:green">$WORKDIR</span> environment variable. To do this, change to a directory where we have write permission and type the following:
 +
<source lang="bash">
 +
$ mkdir work
 +
$ cd work
 +
$ tar xzf /usr/src/distfiles/sed-3.02.tar.gz
 +
</source>
 +
The tarball is then decompressed, creating a directory called '''sed-3.02''' that contains all of the sources. We'll refer to the '''sed-3.02''' directory later using the environment variable <span style="color:green">$SRCDIR</span>. To compile the program, type the following:
 +
<source lang="bash">
 +
$ cd sed-3.02
 +
$ ./configure --prefix=/usr
 +
(autoconf generates appropriate makefiles, this can take a while)
  
This example will swap 'hills' for 'mountains', but only on blocks of text beginning with a blank line, and ending with a line beginning with the three characters 'END', inclusive.
+
$ make
  
Another nice thing about the <span style="color:green">s///</span> command is that we have a lot of options when it comes to those <span style="color:green">/</span> separators. If we're performing string substitution and the regular expression or replacement string has a lot of slashes in it, we can change the separator by specifying a different character after the 's'. For example, this will replace all occurrences of '''/usr/local''' with '''/usr''':
+
(the package is compiled from sources, also takes a bit of time)
 +
</source>
 +
We're going to skip the "make install" step, since we are just covering the unpack and compile steps in this article. If we wanted to write a bash script to perform all these steps for us, it could look something like this:
 +
<source lang="bash">
 +
#!/usr/bin/env bash
  
<console>$##i## sed -e 's:/usr/local:/usr:g' mylist.txt</console>
+
if [ -d work ]
 +
then
 +
# remove old work directory if it exists
 +
      rm -rf work
 +
fi
 +
mkdir work
 +
cd work
 +
tar xzf /usr/src/distfiles/sed-3.02.tar.gz
 +
cd sed-3.02
 +
./configure --prefix=/usr
 +
make
 +
</source>
  
{{note|In this example, we're using the colon as a separator. If you ever need to specify the separator character in the regular expression, put a backslash before it.}}
+
=== Generalizing the code ===
 +
Although this autocompile script works, it's not very flexible. Basically, the bash script just contains the listing of all the commands that were typed at the command line. While this solution works, it would be nice to make a generic script that can be configured quickly to unpack and compile any package just by changing a few lines. That way, it's much less work for the package maintainer to add new packages to the distribution. Let's take a first stab at doing this by using lots of different environment variables, making our build script more generic:
 +
<source lang="bash">
 +
#!/usr/bin/env bash
  
=== Regexp snafus ===
+
# P is the package name
Up until now, we've only performed simple string substitution. While this is handy, we can also match a regular expression. For example, the following sed command will match a phrase beginning with '<' and ending with '>', and containing any number of characters inbetween. This phrase will be deleted (replaced with an empty string):
+
  
<console>$##i## sed -e 's/<.*>//g' myfile.html</console>
+
P=sed-3.02
  
This is a good first attempt at a sed script that will remove HTML tags from a file, but it won't work well, due to a regular expression quirk. The reason? When sed tries to match the regular expression on a line, it finds the longest match on the line. This wasn't an issue in my previous sed article, because we were using the d and p commands, which would delete or print the entire line anyway. But when we use the s/// command, it definitely makes a big difference, because the entire portion that the regular expression matches will be replaced with the target string, or in this case, deleted. This means that the above example will turn the following line:
+
# A is the archive name
<pre>
+
<b>This</b> is what <b>I</b> meant.
+
</pre>
+
Into this:
+
<pre>
+
meant.
+
</pre>
+
Rather than this, which is what we wanted to do:
+
<pre>
+
This is what I meant.
+
</pre>
+
Fortunately, there is an easy way to fix this. Instead of typing in a regular expression that says "a '<' character followed by any number of characters, and ending with a '>' character", we just need to type in a regexp that says "a '<' character followed by any number of non-'>' characters, and ending with a '>' character". This will have the effect of matching the shortest possible match, rather than the longest possible one. The new command looks like this:
+
  
<console>$##i## sed -e 's/<[^>]*>//g' myfile.html</console>
+
A=${P}.tar.gz
  
In the above example, the '[^>]' specifies a "non-'>'" character, and the '*' after it completes this expression to mean "zero or more non-'>' characters". Test this command on a few sample html files, pipe them to more, and review their results.
+
export ORIGDIR=`pwd`
 +
export WORKDIR=${ORIGDIR}/work
 +
export SRCDIR=${WORKDIR}/${P}
  
=== More character matching ===
+
if [ -z "$DISTDIR" ]
The '[ ]' regular expression syntax has some more additional options. To specify a range of characters, you can use a '-' as long as it isn't in the first or last position, as follows:
+
then
<pre>
+
# set DISTDIR to /usr/src/distfiles if not already set
'[a-x]*'
+
        DISTDIR=/usr/src/distfiles
</pre>
+
fi
This will match zero or more characters, as long as all of them are 'a','b','c'...'v','w','x'. In addition, the '[:space:]' character class is available for matching whitespace. Here's a fairly complete list of available character classes:
+
export DISTDIR
{| border=1
+
!'''Character class'''
+
!'''Description'''
+
|-
+
|<nowiki>[:alnum:]</nowiki>
+
|<nowiki>Alphanumeric [a-z A-Z 0-9]</nowiki>
+
|-
+
|<nowiki>[:alpha:]</nowiki>
+
|<nowiki>Alphabetic [a-z A-Z]</nowiki>
+
|-
+
|<nowiki>[:blank:]</nowiki>
+
|Spaces or tabs
+
|-
+
|<nowiki>[:cntrl:]</nowiki>
+
|Any control characters
+
|-
+
|<nowiki>[:digit:]</nowiki>
+
|<nowiki>Numeric digits [0-9]</nowiki>
+
|-
+
|<nowiki>[:graph:]</nowiki>
+
|Any visible characters (no whitespace)
+
|-
+
|<nowiki>[:lower:]</nowiki>
+
|<nowiki>Lower-case [a-z]</nowiki>
+
|-
+
|<nowiki>[:print:]</nowiki>
+
|Non-control characters
+
|-
+
|<nowiki>[:punct:]</nowiki>
+
|Punctuation characters
+
|-
+
|<nowiki>[:space:]</nowiki>
+
|Whitespace
+
|-
+
|<nowiki>[:upper:]</nowiki>
+
|<nowiki>Upper-case [A-Z]</nowiki>
+
|-
+
|<nowiki>[:xdigit:]</nowiki>
+
|<nowiki>hex digits [0-9 a-f A-F]</nowiki>
+
|}
+
It's advantageous to use character classes whenever possible, because they adapt better to nonEnglish speaking locales (including accented characters when necessary, etc.).
+
  
=== Advanced substitution stuff ===
+
if [ -d ${WORKDIR} ]
We've looked at how to perform simple and even reasonably complex straight substitutions, but sed can do even more. We can actually refer to either parts of or the entire matched regular expression, and use these parts to construct the replacement string. As an example, let's say you were replying to a message. The following example would prefix each line with the phrase "ralph said: ":
+
then   
 +
# remove old work directory if it exists
 +
        rm -rf ${WORKDIR}
 +
fi
  
<console>$##i## sed -e 's/.*/ralph said: &/' origmsg.txt</console>
+
mkdir ${WORKDIR}
 +
cd ${WORKDIR}
 +
tar xzf ${DISTDIR}/${A}
 +
cd ${SRCDIR}
 +
./configure --prefix=/usr
 +
make
 +
</source>
 +
We've added a lot of environment variables to the code, but it still does basically the same thing. However, now, to compile any standard GNU autoconf-based source tarball, we can simply copy this file to a new file (with an appropriate name to reflect the name of the new package it compiles), and then change the values of <span style"color:green:>$A</span> and <span style"color:green:>$P</span> to new values. All other environment variables automatically adjust to the correct settings, and the script works as expected. While this is handy, there's a further improvement that can be made to the code. This particular code is much longer than the original "transcript" script that we created. Since one of the goals for any programming project should be the reduction of complexity for the user, it would be nice to dramatically shrink the code, or at least organize it better. We can do this by performing a neat trick -- we'll split the code into two separate files. Save this file as '''sed-3.02.ebuild''':
 +
<source lang="bash">
 +
#the sed ebuild file -- very simple!
 +
P=sed-3.02
 +
A=${P}.tar.gz
 +
</source>
 +
Our first file is trivial, and contains only those environment variables that must be configured on a per-package basis. Here's the second file, which contains the brains of the operation. Save this one as "ebuild" and make it executable:
 +
<source lang="bash">
 +
#!/usr/bin/env bash
  
The output will look like this:
 
<pre>
 
ralph said: Hiya Jim,
 
ralph said:
 
ralph said: I sure like this sed stuff!
 
ralph said:
 
</pre>
 
In this example, we use the '&' character in the replacement string, which tells sed to insert the entire matched regular expression. So, whatever was matched by '.*' (the largest group of zero or more characters on the line, or the entire line) can be inserted anywhere in the replacement string, even multiple times. This is great, but sed is even more powerful.
 
  
=== Those wonderful backslashed parentheses ===
+
if [ $# -ne 1 ]
Even better than '&', the <span style="color:green">s///</span> command allows us to define regions in our regular expression, and we can refer to these specific regions in our replacement string. As an example, let's say we have a file that contains the following text:
+
then
<pre>
+
        echo "one argument expected."
foo bar oni
+
        exit 1
eeny meeny miny
+
fi
larry curly moe
+
jimmy the weasel
+
</pre>
+
Now, let's say we wanted to write a sed script that would replace "eeny meeny miny" with "Victor eeny-meeny Von miny", etc. To do this, first we would write a regular expression that would match the three strings, separated by spaces:
+
<pre>
+
'.* .* .*'
+
</pre>
+
There. Now, we will define regions by inserting backslashed parentheses around each region of interest:
+
<pre>
+
'\(.*\) \(.*\) \(.*\)'
+
</pre>
+
This regular expression will work the same as our first one, except that it will define three logical regions that we can refer to in our replacement string. Here's the final script:
+
  
<console>$##i## sed -e 's/\(.*\) \(.*\) \(.*\)/Victor \1-\2 Von \3/' myfile.txt</console>
+
if [ -e "$1" ]
 +
then
 +
        source $1
 +
else
 +
        echo "ebuild file $1 not found."
 +
        exit 1
 +
fi
  
As you can see, we refer to each parentheses-delimited region by typing '\x', where x is the number of the region, starting at one. Output is as follows:
+
export ORIGDIR=`pwd`
<pre>
+
export WORKDIR=${ORIGDIR}/work
Victor foo-bar Von oni
+
export SRCDIR=${WORKDIR}/${P}
Victor eeny-meeny Von miny
+
Victor larry-curly Von moe
+
Victor jimmy-the Von weasel
+
</pre>
+
As you become more familiar with sed, you will be able to perform fairly powerful text processing with a minimum of effort. You may want to think about how you'd have approached this problem using your favorite scripting language -- could you have easily fit the solution in one line?
+
  
=== Mixing things up ===
+
if [ -z "$DISTDIR" ]
As we begin creating more complex sed scripts, we need the ability to enter more than one command. There are several ways to do this. First, we can use semicolons between the commands. For example, this series of commands uses the '=' command, which tells sed to print the line number, as well as the p command, which explicitly tells sed to print the line (since we're in '-n' mode):
+
then
 +
        # set DISTDIR to /usr/src/distfiles if not already set
 +
        DISTDIR=/usr/src/distfiles
 +
fi
 +
export DISTDIR
  
<console>$##i## sed -n -e '=;p' myfile.txt</console>
+
if [ -d ${WORKDIR} ]
 +
then   
 +
        # remove old work directory if it exists
 +
        rm -rf ${WORKDIR}
 +
fi
  
Whenever two or more commands are specified, each command is applied (in order) to every line in the file. In the above example, first the '=' command is applied to line 1, and then the p command is applied. Then, sed proceeds to line 2, and repeats the process. While the semicolon is handy, there are instances where it won't work. Another alternative is to use two -e options to specify two separate commands:
+
mkdir ${WORKDIR}
 +
cd ${WORKDIR}
 +
tar xzf ${DISTDIR}/${A}
 +
cd ${SRCDIR}
 +
./configure --prefix=/usr
 +
make
 +
</source>
 +
Now that we've split our build system into two files, I bet you're wondering how it works. Basically, to compile sed, type:
 +
<source lang="bash">
 +
$ ./ebuild sed-3.02.ebuild
 +
</source>
 +
When "ebuild" executes, it first tries to "source" variable <span style="color:green">$1</span>. What does this mean? From my previous article, recall that <span style="color:green">$1</span> is the first command line argument -- in this case, '''sed-3.02.ebuild'''. In bash, the "source" command reads in bash statements from a file, and executes them as if they appeared immediately in the file the "source" command is in. So, "source ${1}" causes the "ebuild" script to execute the commands in '''sed-3.02.ebuild''', which cause <span style="color:green">$P</span> and <span style="color:green">$A</span> to be defined. This design change is really handy, because if we want to compile another program instead of sed, we can simply create a new '''.ebuild''' file and pass it as an argument to our "ebuild" script. That way, the '''.ebuild''' files end up being really simple, while the complicated brains of the ebuild system get stored in one place -- our "ebuild" script. This way, we can upgrade or enhance the ebuild system simply by editing the "ebuild" script, keeping the implementation details outside of the ebuild files. Here's a sample ebuild file for <span style="color:green">gzip</span>:
 +
<source lang="bash">
 +
#another really simple ebuild script!
 +
P=gzip-1.2.4a
 +
A=${P}.tar.gz
 +
</source>
  
<console>$##i## sed -n -e '=' -e 'p' myfile.txt</console>
+
=== Adding functionality ===
 +
OK, we're making some progress. But, there is some additional functionality I'd like to add. I'd like the ebuild script to accept a second command-line argument, which will be <span style="color:green">compile</span>, <span style="color:green">unpack</span>, or <span style="color:green">all</span>. This second command-line argument tells the ebuild script which particular step of the build process to perform. That way, I can tell ebuild to unpack the archive, but not compile it (just in case I need to inspect the source archive before compilation begins). To do this, I'll add a case statement that will test variable <span style="color:green">$2</span>, and do different things based on its value. Here's what the code looks like now:
 +
<source lang="bash">
 +
#!/usr/bin/env bash
  
However, when we get to the more complex append and insert commands, even multiple '-e' options won't help us. For complex multiline scripts, the best way is to put your commands in a separate file. Then, reference this script file with the -f options:
+
if [ $# -ne 2 ]
 +
then
 +
        echo "Please specify two args - .ebuild file and unpack, compile or all"
 +
        exit 1
 +
fi
  
<console>$##i## sed -n -f mycommands.sed myfile.txt</console>
 
  
This method, although arguably less convenient, will always work.
+
if [ -z "$DISTDIR" ]
 +
then
 +
# set DISTDIR to /usr/src/distfiles if not already set
 +
        DISTDIR=/usr/src/distfiles
 +
fi
 +
export DISTDIR
  
=== Multiple commands for one address ===
+
ebuild_unpack() {
Sometimes, you may want to specify multiple commands that will apply to a single address. This comes in especially handy when you are performing lots of s/// to transform words or syntax in the source file. To perform multiple commands per address, enter your sed commands in a file, and use the '{ }' characters to group commands, as follows:
+
        #make sure we're in the right directory
<pre>
+
        cd ${ORIGDIR}
1,20{
+
       
         s/[Ll]inux/GNU\/Linux/g
+
        if [ -d ${WORKDIR} ]
         s/samba/Samba/g
+
         then   
         s/posix/POSIX/g
+
                rm -rf ${WORKDIR}
 +
        fi
 +
 
 +
        mkdir ${WORKDIR}
 +
        cd ${WORKDIR}
 +
        if [ ! -e ${DISTDIR}/${A} ]
 +
        then
 +
            echo "${DISTDIR}/${A} does not exist.  Please download first."
 +
            exit 1
 +
         fi   
 +
        tar xzf ${DISTDIR}/${A}
 +
         echo "Unpacked ${DISTDIR}/${A}."
 +
        #source is now correctly unpacked
 
}
 
}
</pre>
+
 
The above example will apply three substitution commands to lines 1 through 20, inclusive. You can also use regular expression addresses, or a combination of the two:
+
 
<pre>
+
ebuild_compile() {
1,/^END/{
+
       
         s/[Ll]inux/GNU\/Linux/g
+
        #make sure we're in the right directory
         s/samba/Samba/g
+
        cd ${SRCDIR}
         s/posix/POSIX/g
+
         if [ ! -d "${SRCDIR}" ]
      p
+
         then
 +
                echo "${SRCDIR} does not exist -- please unpack first."
 +
                exit 1
 +
         fi
 +
        ./configure --prefix=/usr
 +
        make   
 
}
 
}
</pre>
 
This example will apply all the commands between '{ }' to the lines starting at 1 and up to a line beginning with the letters "END", or the end of file if "END" is not found in the source file.
 
  
=== Append, insert, and change line ===
+
export ORIGDIR=`pwd`
Now that we're writing sed scripts in separate files, we can take advantage of the append, insert, and change line commands. These commands will insert a line after the current line, insert a line before the current line, or replace the current line in the pattern space. They can also be used to insert multiple lines into the output. The insert line command is used as follows:
+
export WORKDIR=${ORIGDIR}/work
<pre>
+
i\
+
This line will be inserted before each line
+
</pre>
+
If you don't specify an address for this command, it will be applied to each line and produce output that looks like this:
+
<pre>
+
This line will be inserted before each line
+
line 1 here
+
This line will be inserted before each line
+
line 2 here
+
This line will be inserted before each line
+
line 3 here
+
This line will be inserted before each line
+
line 4 here
+
</pre>
+
If you'd like to insert multiple lines before the current line, you can add additional lines by appending a backslash to the previous line, like so:
+
<pre>
+
i\
+
insert this line\
+
and this one\
+
and this one\
+
and, uh, this one too.
+
</pre>
+
The append command works similarly, but will insert a line or lines after the current line in the pattern space. It's used as follows:
+
<pre>
+
a\
+
insert this line after each line.  Thanks! :)
+
</pre>
+
On the other hand, the "change line" command will actually replace the current line in the pattern space, and is used as follows:
+
<pre>
+
c\
+
You're history, original line! Muhahaha!
+
</pre>
+
Because the append, insert, and change line commands need to be entered on multiple lines, you'll want to type them in to text sed scripts and tell sed to source them by using the '-f' option. Using the other methods to pass commands to sed will result in problems.
+
  
=== Next time ===
+
if [ -e "$1" ]
Next time, in the final article of this series on sed, I'll show you lots of excellent real-world examples of using sed for many different kinds of tasks. Not only will I show you what the scripts do, but why they do what they do. After you're done, you'll have additional excellent ideas of how to use sed in your various projects. I'll see you then!
+
then
 +
        source $1
 +
else
 +
        echo "Ebuild file $1 not found."
 +
        exit 1
 +
fi
 +
 
 +
export SRCDIR=${WORKDIR}/${P}
 +
 
 +
case "${2}" in
 +
        unpack)
 +
                ebuild_unpack
 +
                ;;
 +
        compile)
 +
                ebuild_compile
 +
                ;;
 +
        all)
 +
                ebuild_unpack
 +
                ebuild_compile
 +
                ;;
 +
        *)
 +
                echo "Please specify unpack, compile or all as the second arg"
 +
                exit 1
 +
                ;;
 +
esac
 +
</source>
 +
We've made a lot of changes, so let's review them. First, we placed the compile and unpack steps in their own functions, and called <span style="color:green:>ebuild_compile()</span> and <span style="color:green">ebuild_unpack()</span>, respectively. This is a good move, since the code is getting more complicated, and the new functions provide some modularity, which helps to keep things organized. On the first line in each function, I explicitly <span style="color:green">cd</span> into the directory I want to be in because, as our code is becoming more modular rather than linear, it's more likely that we might slip up and execute a function in the wrong current working directory. The <span style="color:green">cd</span> commands explicitly put us in the right place, and prevent us from making a mistake later -- an important step -- especially if you will be deleting files inside the functions.
 +
 
 +
Also, I added a useful check to the beginning of the <span style="color:green">ebuild_compile()</span> function. Now, it checks to make sure the <span style="color:green">$SRCDIR</span> exists, and, if not, it prints an error message telling the user to unpack the archive first, and then exits. If you like, you can change this behavior so that if <span style="color:green">$SRCDIR</span> doesn't exist, our ebuild script will unpack the source archive automatically. You can do this by replacing <span style="color:green">ebuild_compile()</span> with the following code:
 +
<source lang="bash">
 +
ebuild_compile() {
 +
        #make sure we're in the right directory
 +
        if [ ! -d "${SRCDIR}" ]
 +
        then
 +
                ebuild_unpack
 +
        fi
 +
        cd ${SRCDIR}
 +
        ./configure --prefix=/usr
 +
        make   
 +
}
 +
</source>
 +
One of the most obvious changes in our second version of the ebuild script is the new case statement at the end of the code. This case statement simply checks the second command-line argument, and performs the correct action, depending on its value. If we now type:
 +
<source lang="bash">
 +
$ ebuild sed-3.02.ebuild
 +
</source>
 +
We'll actually get an error message. ebuild now wants to be told what to do, as follows:
 +
<source lang="bash">
 +
$ ebuild sed-3.02.ebuild unpack
 +
</source>
 +
or:
 +
<source lang="bash">
 +
$ ebuild sed-3.02.ebuild compile
 +
</source>
 +
or:
 +
<source lang="bash">
 +
$ ebuild sed-3.02.ebuild all
 +
</source>
 +
 
 +
{{fancyimportant|If you provide a second command-line argument, other than those listed above, you get an error message (the * clause), and the program exits.}}
 +
 
 +
=== Modularizing the code ===
 +
Now that the code is quite advanced and functional, you may be tempted to create several more ebuild scripts to unpack and compile your favorite programs. If you do, sooner or later you'll come across some sources that do not use autoconf (<span style="color:green">./configure</span>) or possibly others that have non-standard compilation processes. We need to make some more changes to the ebuild system to accommodate these programs. But before we do, it is a good idea to think a bit about how to accomplish this.
 +
 
 +
One of the great things about hard-coding <span style="color:green">./configure --prefix=/usr; make</span> into our compile stage is that, most of the time, it works. But, we must also have the ebuild system accommodate sources that do not use autoconf or normal Makefiles. To solve this problem, I propose that our ebuild script should, by default, do the following:
 +
 
 +
# If there is a configure script in <span style="color:green">${SRCDIR}</span>, execute it as follows: <span style="color:green">./configure --prefix=/usr</span>. Otherwise, skip this step.
 +
# Execute the following command: make
 +
 
 +
Since ebuild only runs configure if it actually exists, we can now automatically accommodate those programs that don't use autoconf and have standard makefiles. But what if a simple "make" doesn't do the trick for some sources? We need a way to override our reasonable defaults with some specific code to handle these situations. To do this, we'll transform our <span style="color:green">ebuild_compile()</span> function into two functions. The first function, which can be looked at as a "parent" function, will still be called <span style="color:green">ebuild_compile()</span>. However, we'll have a new function, called <span style="color:green">user_compile()</span>, which contains only our reasonable default actions:
 +
<source lang="bash">
 +
user_compile() {
 +
        #we're already in ${SRCDIR}
 +
        if [ -e configure ]
 +
        then
 +
                #run configure script if it exists
 +
                ./configure --prefix=/usr
 +
        fi
 +
        #run make
 +
        make
 +
}             
 +
 
 +
ebuild_compile() {
 +
        if [ ! -d "${SRCDIR}" ]
 +
        then
 +
                echo "${SRCDIR} does not exist -- please unpack first."
 +
                exit 1
 +
        fi
 +
        #make sure we're in the right directory
 +
        cd ${SRCDIR}
 +
        user_compile
 +
}
 +
</source>
 +
It may not seem obvious why I'm doing this right now, but bear with me. While the code works almost identically to our previous version of ebuild, we can now do something that we couldn't do before -- we can override <span style="color:green">user_compile()</span> in '''sed-3.02.ebuild'''. So, if the default <span style="color:green:>user_compile()</span> function doesn't meet our needs, we can define a new one in our '''.ebuild''' file that contains the commands required to compile the package. For example, here's an ebuild file for <span style="color:green">e2fsprogs-1.18</span>, which requires a slightly different <span style="color:green">./configure</span> line:
 +
<source lang="bash">
 +
#this ebuild file overrides the default user_compile()
 +
P=e2fsprogs-1.18
 +
A=${P}.tar.gz
 +
 +
user_compile() {
 +
      ./configure --enable-elf-shlibs
 +
      make
 +
}
 +
</source>
 +
Now, <span style="color:green">e2fsprogs</span> will be compiled exactly the way we want it to be. But, for most packages, we can omit any custom <span style="color:green">user_compile()</span> function in the '''.ebuild''' file, and the default user_compile() function is used instead.
 +
 
 +
How exactly does the ebuild script know which user_compile() function to use? This is actually quite simple. In the ebuild script, the default <span style="color:green">user_compile()</span> function is defined before the '''e2fsprogs-1.18.ebuild''' file is sourced. If there is a <span style="color:green">user_compile()</span> in '''e2fsprogs-1.18.ebuild''', it overwrites the default version defined previously. If not, the default <span style="color:green">user_compile()</span> function is used.
 +
 
 +
This is great stuff; we've added a lot of flexibility without requiring any complex code if it's not needed. We won't cover it here, but you could also make similar modifications to <span style="color:green">ebuild_unpack()</span> so that users can override the default unpacking process. This could come in handy if any patching has to be done, or if the files are contained in multiple archives. It is also a good idea to modify our unpacking code so that it recognizes bzip2-compressed tarballs by default.
 +
 
 +
=== Configuration files ===
 +
We've covered a lot of sneaky bash techniques so far, and now it's time to cover one more. Often, it's handy for a program to have a global configuration file that resides in '''/etc'''. Fortunately, this is easy to do using bash. Simply create the following file and save it as '''/etc/ebuild.conf''':
 +
<source lang="bash">
 +
# /etc/ebuild.conf: set system-wide ebuild options in this file
 +
 
 +
# MAKEOPTS are options passed to make
 +
MAKEOPTS="-j2"
 +
</source>
 +
In this example, I've included just one configuration option, but you could include many more. One of the beautiful things about bash is that this file can be parsed by simply sourcing it. This is a design trick that works with most interpreted languages. After '''/etc/ebuild.conf''' is sourced, <span style="color:green">$MAKEOPTS</span> is defined inside our ebuild script. We'll use it to allow the user to pass options to make. Normally, this option would be used to allow the user to tell ebuild to do a parallel make. This is explained below.
 +
 
 +
{{fancynote|'''What is a parallel make?''' <nowiki>To speed compilation on multiprocessor systems, make supports compiling a program in parallel. This means that instead of compiling just one source file at a time, make compiles a user-specified number of source files simultaneously (so those extra processors in a multiprocessor system are used). Parallel makes are enabled by passing the -j # option to make, as follows: make -j4 MAKE="make -j4". This code instructs make to compile four programs simultaneously. The MAKE="make -j4" argument tells make to pass the -j4 option to any child make processes it launches.</nowiki>}}
 +
 
 +
Here's the final version of our ebuild program:
 +
<source lang="bash">
 +
#!/usr/bin/env bash
 +
 
 +
if [ $# -ne 2 ]
 +
then
 +
        echo "Please specify ebuild file and unpack, compile or all"
 +
        exit 1
 +
fi
 +
 
 +
source /etc/ebuild.conf
 +
 
 +
if [ -z "$DISTDIR" ]
 +
then
 +
        # set DISTDIR to /usr/src/distfiles if not already set
 +
        DISTDIR=/usr/src/distfiles
 +
fi
 +
export DISTDIR
 +
 
 +
ebuild_unpack() {
 +
        #make sure we're in the right directory
 +
        cd ${ORIGDIR}
 +
       
 +
        if [ -d ${WORKDIR} ]
 +
        then   
 +
                rm -rf ${WORKDIR}
 +
        fi
 +
 
 +
        mkdir ${WORKDIR}
 +
        cd ${WORKDIR}
 +
        if [ ! -e ${DISTDIR}/${A} ]
 +
        then
 +
                echo "${DISTDIR}/${A} does not exist.  Please download first."
 +
                exit 1
 +
        fi
 +
        tar xzf ${DISTDIR}/${A}
 +
        echo "Unpacked ${DISTDIR}/${A}."
 +
        #source is now correctly unpacked
 +
}
 +
 
 +
user_compile() {
 +
        #we're already in ${SRCDIR}
 +
        if [ -e configure ]
 +
        then
 +
                #run configure script if it exists
 +
                ./configure --prefix=/usr
 +
        fi
 +
        #run make
 +
        make $MAKEOPTS MAKE="make $MAKEOPTS" 
 +
}
 +
 
 +
ebuild_compile() {
 +
        if [ ! -d "${SRCDIR}" ]
 +
        then
 +
                echo "${SRCDIR} does not exist -- please unpack first."
 +
                exit 1
 +
        fi
 +
        #make sure we're in the right directory
 +
        cd ${SRCDIR}
 +
        user_compile
 +
}
 +
 
 +
export ORIGDIR=`pwd`
 +
export WORKDIR=${ORIGDIR}/work
 +
 
 +
if [ -e "$1" ]
 +
then
 +
        source $1
 +
else
 +
        echo "Ebuild file $1 not found."
 +
        exit 1
 +
fi
 +
 
 +
export SRCDIR=${WORKDIR}/${P}
 +
 
 +
case "${2}" in
 +
        unpack)
 +
                ebuild_unpack
 +
                ;;
 +
        compile)
 +
                ebuild_compile
 +
                ;;
 +
        all)
 +
                ebuild_unpack
 +
                ebuild_compile
 +
                ;;
 +
        *)
 +
                echo "Please specify unpack, compile or all as the second arg"
 +
                exit 1
 +
                ;;
 +
esac
 +
</source>
 +
Notice '''/etc/ebuild.conf''' is sourced near the beginning of the file. Also, notice that we use <span style="color:green">$MAKEOPTS</span> in our default <span style="color:green">user_compile()</span> function. You may be wondering how this will work -- after all, we refer to <span style="color:green">$MAKEOPTS</span> before we source '''/etc/ebuild.conf''', which actually defines <span style="color:green">$MAKEOPTS</span> in the first place. Fortunately for us, this is OK because variable expansion only happens when <span style="color:green">user_compile()</span> is executed. By the time <span style="color:green">user_compile()</span> is executed, '''/etc/ebuild.conf''' has already been sourced, and <span style="color:green">$MAKEOPTS</span> is set to the correct value.
 +
 
 +
=== Wrapping it up ===
 +
We've covered a lot of bash programming techniques in this article, but we've only touched the surface of the power of bash. For example, the production Gentoo Linux ebuild system not only automatically unpacks and compiles each package, but it can also:
 +
 
 +
* Automatically download the sources if they are not found in $DISTDIR
 +
* Verify that the sources are not corrupted by using MD5 message digests
 +
* If requested, install the compiled application into the live filesystem, recording all installed files so that the package can be easily uninstalled at a later date.
 +
* If requested, package the compiled application in a tarball (compressed the way you like it) so that it can be installed later, on another computer, or during the CD-based installation process (if you are building a distribution CD)
 +
 
 +
In addition, the production ebuild system has several other global configuration options, allowing the user to specify options such as what optimization flags to use during compilation, and whether optional support for packages like GNOME and slang should be enabled by default in those packages that support it.
 +
 
 +
It's clear that bash can accomplish much more than what I've touched on in this series of articles. I hope you've learned a lot about this incredible tool, and are excited about using bash to speed up and enhance your development projects.
  
 
== Resources ==
 
== Resources ==
* Read Daniel's other sed articles: Sed by Example, [[Sed by Example, Part 1|Part 1]] and [[Sed by Example, Part 3|Part 3]].
+
 
* Check out Eric Pement's excellent [http://sed.sourceforge.net/sedfaq.html sed FAQ].
+
* Download the source tarball ('''sed-3.02.tar.gz''') from ftp://ftp.gnu.org/pub/gnu/sed.
* You can find the sources to sed at ftp://ftp.gnu.org/pub/gnu/sed.
+
* Read [[Bash by example, Part1]].
* Eric Pement also has a handy list of [http://sed.sourceforge.net/sed1line.txt sed one-liners] that any aspiring sed guru should definitely look at.
+
* Read [[Bash by example, Part 2]].
* If you'd like a good old-fashioned book, [http://www.oreilly.com/catalog/sed2/ O'Reilly's sed & awk, 2nd Edition] would be wonderful choice.
+
* Check out the [http://www.gnu.org/software/bash/manual/bash.html bash online reference manual].
* See the regular expressions [http://docs.python.org/dev/howto/regex.html how-to document] from [http://python.org/ python.org].
+
* Refer to an [http://www.uky.edu/ArtsSciences/Classics/regex.html overview of regular expressions] from the University of Kentucky.
+
  
 
__NOTOC__
 
__NOTOC__

Revision as of 08:51, December 28, 2014


Previous in series: Bash by Example, Part 2

Support Funtoo and help us grow! Donate $15 per month and get a free SSD-based Funtoo Virtual Container. 23 spots left.

Exploring the ebuild system

Enter the ebuild system

I've really been looking forward to this third and final Bash by example article, because now that we've already covered bash programming fundamentals in Part 1 and Part 2, we can focus on more advanced topics, like bash application development and program design. For this article, I will give you a good dose of practical, real-world bash development experience by presenting a project that I've spent many hours coding and refining: the Gentoo Linux ebuild system.

As the creator of Gentoo Linux and the guy behind Funtoo Linux, one of my primary responsibilities is to make sure that all of the operating system packages (similar to RPM packages) are created properly and work together. As you probably know, a standard Linux system is not composed of a single unified source tree (like BSD), but is actually made up of about 25+ core packages that work together. Some of the packages include:


PackageDescription
linuxThe actual kernel
util-linuxA collection of miscellaneous Linux-related programs
e2fsprogsA collection of ext2 filesystem-related utilities
glibcThe GNU C library
</div>
Note

Gentoo fans: the original text above used to say "I'm the chief architect of Gentoo Linux, a next-generation Linux OS currently in beta. One of my primary responsibilities is to make sure that all of the binary packages (similar to RPM packages) are created properly and work together." This is noteworthy due to the fact that the initial focus of Gentoo was to provide working binary packages.

Each package is in its own tarball and is maintained by separate independent developers, or teams of developers. To create a distribution, each package has to be separately downloaded, compiled, and packaged. Every time a package must be fixed, upgraded, or improved, the compilation and packaging steps must be repeated (and this gets old really fast). To help eliminate the repetitive steps involved in creating and updating packages, I created the ebuild system, written almost entirely in bash. To enhance your bash knowledge, I'll show you how I implemented the unpack and compile portions of the ebuild system, step by step. As I explain each step, I'll also discuss why certain design decisions were made. By the end of this article, not only will you have an excellent grasp of larger-scale bash programming projects, but you'll also have implemented a good portion of a complete auto-build system.

Why bash?

Bash is an essential component of the Gentoo Linux ebuild system. It was chosen as ebuild's primary language for a number of reasons. First, it has an uncomplicated and familiar syntax that is especially well suited for calling external programs. An auto-build system is "glue code" that automates the calling of external programs, and bash is very well suited to this type of application. Second, Bash's support for functions allowed the ebuild system to have modular, easy-to-understand code. Third, the ebuild system takes advantage of bash's support for environment variables, allowing package maintainers and developers to configure it easily, on-the-fly.

Build process review

Before we look at the ebuild system, let's review what's involved in getting a package compiled and installed. For our example, we will look at the "sed" package, a standard GNU text stream editing utility that is part of all Linux distributions. First, download the source tarball (sed-3.02.tar.gz) (see Resources). We will store this archive in /usr/src/distfiles, a directory we will refer to using the environment variable $DISTDIR. $DISTDIR is the directory where all of our original source tarballs live; it's a big vault of source code.

Our next step is to create a temporary directory called work, which houses the uncompressed sources. We'll refer to this directory later using the $WORKDIR environment variable. To do this, change to a directory where we have write permission and type the following:

$ mkdir work
$ cd work
$ tar xzf /usr/src/distfiles/sed-3.02.tar.gz

The tarball is then decompressed, creating a directory called sed-3.02 that contains all of the sources. We'll refer to the sed-3.02 directory later using the environment variable $SRCDIR. To compile the program, type the following:

$ cd sed-3.02
$ ./configure --prefix=/usr
(autoconf generates appropriate makefiles, this can take a while)
 
$ make
 
(the package is compiled from sources, also takes a bit of time)

We're going to skip the "make install" step, since we are just covering the unpack and compile steps in this article. If we wanted to write a bash script to perform all these steps for us, it could look something like this:

#!/usr/bin/env bash
 
if [ -d work ]
then
# remove old work directory if it exists
      rm -rf work
fi
mkdir work
cd work
tar xzf /usr/src/distfiles/sed-3.02.tar.gz
cd sed-3.02
./configure --prefix=/usr
make

Generalizing the code

Although this autocompile script works, it's not very flexible. Basically, the bash script just contains the listing of all the commands that were typed at the command line. While this solution works, it would be nice to make a generic script that can be configured quickly to unpack and compile any package just by changing a few lines. That way, it's much less work for the package maintainer to add new packages to the distribution. Let's take a first stab at doing this by using lots of different environment variables, making our build script more generic:

#!/usr/bin/env bash
 
# P is the package name
 
P=sed-3.02
 
# A is the archive name
 
A=${P}.tar.gz
 
export ORIGDIR=`pwd`
export WORKDIR=${ORIGDIR}/work
export SRCDIR=${WORKDIR}/${P}
 
if [ -z "$DISTDIR" ]
then
# set DISTDIR to /usr/src/distfiles if not already set
        DISTDIR=/usr/src/distfiles
fi
export DISTDIR
 
if [ -d ${WORKDIR} ]
then    
# remove old work directory if it exists
        rm -rf ${WORKDIR}
fi
 
mkdir ${WORKDIR}
cd ${WORKDIR}
tar xzf ${DISTDIR}/${A}
cd ${SRCDIR}
./configure --prefix=/usr
make

We've added a lot of environment variables to the code, but it still does basically the same thing. However, now, to compile any standard GNU autoconf-based source tarball, we can simply copy this file to a new file (with an appropriate name to reflect the name of the new package it compiles), and then change the values of $A and $P to new values. All other environment variables automatically adjust to the correct settings, and the script works as expected. While this is handy, there's a further improvement that can be made to the code. This particular code is much longer than the original "transcript" script that we created. Since one of the goals for any programming project should be the reduction of complexity for the user, it would be nice to dramatically shrink the code, or at least organize it better. We can do this by performing a neat trick -- we'll split the code into two separate files. Save this file as sed-3.02.ebuild:

#the sed ebuild file -- very simple!
P=sed-3.02
A=${P}.tar.gz

Our first file is trivial, and contains only those environment variables that must be configured on a per-package basis. Here's the second file, which contains the brains of the operation. Save this one as "ebuild" and make it executable:

#!/usr/bin/env bash
 
 
if [ $# -ne 1 ]
then
        echo "one argument expected."
        exit 1
fi
 
if [ -e "$1" ]
then
        source $1
else
        echo "ebuild file $1 not found."
        exit 1
fi
 
export ORIGDIR=`pwd`
export WORKDIR=${ORIGDIR}/work
export SRCDIR=${WORKDIR}/${P}
 
if [ -z "$DISTDIR" ]
then
        # set DISTDIR to /usr/src/distfiles if not already set
        DISTDIR=/usr/src/distfiles
fi
export DISTDIR
 
if [ -d ${WORKDIR} ]
then    
        # remove old work directory if it exists 
        rm -rf ${WORKDIR}
fi
 
mkdir ${WORKDIR}
cd ${WORKDIR}
tar xzf ${DISTDIR}/${A}
cd ${SRCDIR}
./configure --prefix=/usr
make

Now that we've split our build system into two files, I bet you're wondering how it works. Basically, to compile sed, type:

$ ./ebuild sed-3.02.ebuild

When "ebuild" executes, it first tries to "source" variable $1. What does this mean? From my previous article, recall that $1 is the first command line argument -- in this case, sed-3.02.ebuild. In bash, the "source" command reads in bash statements from a file, and executes them as if they appeared immediately in the file the "source" command is in. So, "source ${1}" causes the "ebuild" script to execute the commands in sed-3.02.ebuild, which cause $P and $A to be defined. This design change is really handy, because if we want to compile another program instead of sed, we can simply create a new .ebuild file and pass it as an argument to our "ebuild" script. That way, the .ebuild files end up being really simple, while the complicated brains of the ebuild system get stored in one place -- our "ebuild" script. This way, we can upgrade or enhance the ebuild system simply by editing the "ebuild" script, keeping the implementation details outside of the ebuild files. Here's a sample ebuild file for gzip:

#another really simple ebuild script!
P=gzip-1.2.4a
A=${P}.tar.gz

Adding functionality

OK, we're making some progress. But, there is some additional functionality I'd like to add. I'd like the ebuild script to accept a second command-line argument, which will be compile, unpack, or all. This second command-line argument tells the ebuild script which particular step of the build process to perform. That way, I can tell ebuild to unpack the archive, but not compile it (just in case I need to inspect the source archive before compilation begins). To do this, I'll add a case statement that will test variable $2, and do different things based on its value. Here's what the code looks like now:

#!/usr/bin/env bash
 
if [ $# -ne 2 ]
then
        echo "Please specify two args - .ebuild file and unpack, compile or all"
        exit 1
fi
 
 
if [ -z "$DISTDIR" ]
then
 # set DISTDIR to /usr/src/distfiles if not already set
        DISTDIR=/usr/src/distfiles
fi
export DISTDIR
 
ebuild_unpack() {
         #make sure we're in the right directory
        cd ${ORIGDIR}
 
        if [ -d ${WORKDIR} ]
        then    
                rm -rf ${WORKDIR}
        fi
 
        mkdir ${WORKDIR}
        cd ${WORKDIR}
        if [ ! -e ${DISTDIR}/${A} ]
        then
            echo "${DISTDIR}/${A} does not exist.  Please download first."
            exit 1
        fi    
        tar xzf ${DISTDIR}/${A}
        echo "Unpacked ${DISTDIR}/${A}."
        #source is now correctly unpacked
}
 
 
ebuild_compile() {
 
         #make sure we're in the right directory
        cd ${SRCDIR}
        if [ ! -d "${SRCDIR}" ]
        then
                echo "${SRCDIR} does not exist -- please unpack first."
                exit 1
        fi
        ./configure --prefix=/usr
        make     
}
 
export ORIGDIR=`pwd`
export WORKDIR=${ORIGDIR}/work
 
if [ -e "$1" ]
then
        source $1
else
        echo "Ebuild file $1 not found."
        exit 1
fi
 
export SRCDIR=${WORKDIR}/${P}
 
case "${2}" in
        unpack)
                ebuild_unpack
                ;;
        compile)
                ebuild_compile
                ;;
        all)
                ebuild_unpack
                ebuild_compile
                ;;
        *)
                echo "Please specify unpack, compile or all as the second arg"
                exit 1
                ;;
esac

We've made a lot of changes, so let's review them. First, we placed the compile and unpack steps in their own functions, and called ebuild_compile() and ebuild_unpack(), respectively. This is a good move, since the code is getting more complicated, and the new functions provide some modularity, which helps to keep things organized. On the first line in each function, I explicitly cd into the directory I want to be in because, as our code is becoming more modular rather than linear, it's more likely that we might slip up and execute a function in the wrong current working directory. The cd commands explicitly put us in the right place, and prevent us from making a mistake later -- an important step -- especially if you will be deleting files inside the functions.

Also, I added a useful check to the beginning of the ebuild_compile() function. Now, it checks to make sure the $SRCDIR exists, and, if not, it prints an error message telling the user to unpack the archive first, and then exits. If you like, you can change this behavior so that if $SRCDIR doesn't exist, our ebuild script will unpack the source archive automatically. You can do this by replacing ebuild_compile() with the following code:

ebuild_compile() {
        #make sure we're in the right directory
        if [ ! -d "${SRCDIR}" ]
        then
                ebuild_unpack
        fi
        cd ${SRCDIR}
        ./configure --prefix=/usr
        make     
}

One of the most obvious changes in our second version of the ebuild script is the new case statement at the end of the code. This case statement simply checks the second command-line argument, and performs the correct action, depending on its value. If we now type:

$ ebuild sed-3.02.ebuild

We'll actually get an error message. ebuild now wants to be told what to do, as follows:

$ ebuild sed-3.02.ebuild unpack

or:

$ ebuild sed-3.02.ebuild compile

or:

$ ebuild sed-3.02.ebuild all
Important

If you provide a second command-line argument, other than those listed above, you get an error message (the * clause), and the program exits.

Modularizing the code

Now that the code is quite advanced and functional, you may be tempted to create several more ebuild scripts to unpack and compile your favorite programs. If you do, sooner or later you'll come across some sources that do not use autoconf (./configure) or possibly others that have non-standard compilation processes. We need to make some more changes to the ebuild system to accommodate these programs. But before we do, it is a good idea to think a bit about how to accomplish this.

One of the great things about hard-coding ./configure --prefix=/usr; make into our compile stage is that, most of the time, it works. But, we must also have the ebuild system accommodate sources that do not use autoconf or normal Makefiles. To solve this problem, I propose that our ebuild script should, by default, do the following:

  1. If there is a configure script in ${SRCDIR}, execute it as follows: ./configure --prefix=/usr. Otherwise, skip this step.
  2. Execute the following command: make

Since ebuild only runs configure if it actually exists, we can now automatically accommodate those programs that don't use autoconf and have standard makefiles. But what if a simple "make" doesn't do the trick for some sources? We need a way to override our reasonable defaults with some specific code to handle these situations. To do this, we'll transform our ebuild_compile() function into two functions. The first function, which can be looked at as a "parent" function, will still be called ebuild_compile(). However, we'll have a new function, called user_compile(), which contains only our reasonable default actions:

user_compile() {
        #we're already in ${SRCDIR}
        if [ -e configure ]
        then
                #run configure script if it exists
                ./configure --prefix=/usr
        fi
        #run make
        make
}              
 
ebuild_compile() {
        if [ ! -d "${SRCDIR}" ]
        then
                echo "${SRCDIR} does not exist -- please unpack first."
                exit 1
        fi
        #make sure we're in the right directory
        cd ${SRCDIR}
        user_compile
}

It may not seem obvious why I'm doing this right now, but bear with me. While the code works almost identically to our previous version of ebuild, we can now do something that we couldn't do before -- we can override user_compile() in sed-3.02.ebuild. So, if the default user_compile() function doesn't meet our needs, we can define a new one in our .ebuild file that contains the commands required to compile the package. For example, here's an ebuild file for e2fsprogs-1.18, which requires a slightly different ./configure line:

#this ebuild file overrides the default user_compile()
P=e2fsprogs-1.18
A=${P}.tar.gz
 
user_compile() {
       ./configure --enable-elf-shlibs
       make
}

Now, e2fsprogs will be compiled exactly the way we want it to be. But, for most packages, we can omit any custom user_compile() function in the .ebuild file, and the default user_compile() function is used instead.

How exactly does the ebuild script know which user_compile() function to use? This is actually quite simple. In the ebuild script, the default user_compile() function is defined before the e2fsprogs-1.18.ebuild file is sourced. If there is a user_compile() in e2fsprogs-1.18.ebuild, it overwrites the default version defined previously. If not, the default user_compile() function is used.

This is great stuff; we've added a lot of flexibility without requiring any complex code if it's not needed. We won't cover it here, but you could also make similar modifications to ebuild_unpack() so that users can override the default unpacking process. This could come in handy if any patching has to be done, or if the files are contained in multiple archives. It is also a good idea to modify our unpacking code so that it recognizes bzip2-compressed tarballs by default.

Configuration files

We've covered a lot of sneaky bash techniques so far, and now it's time to cover one more. Often, it's handy for a program to have a global configuration file that resides in /etc. Fortunately, this is easy to do using bash. Simply create the following file and save it as /etc/ebuild.conf:

# /etc/ebuild.conf: set system-wide ebuild options in this file
 
# MAKEOPTS are options passed to make
MAKEOPTS="-j2"

In this example, I've included just one configuration option, but you could include many more. One of the beautiful things about bash is that this file can be parsed by simply sourcing it. This is a design trick that works with most interpreted languages. After /etc/ebuild.conf is sourced, $MAKEOPTS is defined inside our ebuild script. We'll use it to allow the user to pass options to make. Normally, this option would be used to allow the user to tell ebuild to do a parallel make. This is explained below.

Note

What is a parallel make? To speed compilation on multiprocessor systems, make supports compiling a program in parallel. This means that instead of compiling just one source file at a time, make compiles a user-specified number of source files simultaneously (so those extra processors in a multiprocessor system are used). Parallel makes are enabled by passing the -j # option to make, as follows: make -j4 MAKE="make -j4". This code instructs make to compile four programs simultaneously. The MAKE="make -j4" argument tells make to pass the -j4 option to any child make processes it launches.

Here's the final version of our ebuild program:

#!/usr/bin/env bash
 
if [ $# -ne 2 ]
then
        echo "Please specify ebuild file and unpack, compile or all"
        exit 1
fi
 
source /etc/ebuild.conf
 
if [ -z "$DISTDIR" ]
then
        # set DISTDIR to /usr/src/distfiles if not already set
        DISTDIR=/usr/src/distfiles
fi
export DISTDIR
 
ebuild_unpack() {
        #make sure we're in the right directory
        cd ${ORIGDIR}
 
        if [ -d ${WORKDIR} ]
        then    
                rm -rf ${WORKDIR}
        fi
 
        mkdir ${WORKDIR}
        cd ${WORKDIR}
        if [ ! -e ${DISTDIR}/${A} ]
        then
                echo "${DISTDIR}/${A} does not exist.  Please download first."
                exit 1
        fi
        tar xzf ${DISTDIR}/${A}
        echo "Unpacked ${DISTDIR}/${A}."
        #source is now correctly unpacked
}
 
user_compile() {
        #we're already in ${SRCDIR}
        if [ -e configure ]
        then
                #run configure script if it exists
                ./configure --prefix=/usr
        fi
        #run make
        make $MAKEOPTS MAKE="make $MAKEOPTS"  
} 
 
ebuild_compile() {
        if [ ! -d "${SRCDIR}" ]
        then
                echo "${SRCDIR} does not exist -- please unpack first."
                exit 1
        fi
        #make sure we're in the right directory
        cd ${SRCDIR}
        user_compile
}
 
export ORIGDIR=`pwd`
export WORKDIR=${ORIGDIR}/work
 
if [ -e "$1" ]
then
        source $1
else
        echo "Ebuild file $1 not found."
        exit 1
fi
 
export SRCDIR=${WORKDIR}/${P}
 
case "${2}" in
        unpack)
                ebuild_unpack
                ;;
        compile)
                ebuild_compile
                ;;
        all)
                ebuild_unpack
                ebuild_compile
                ;;
        *)
                echo "Please specify unpack, compile or all as the second arg"
                exit 1
                ;;
esac

Notice /etc/ebuild.conf is sourced near the beginning of the file. Also, notice that we use $MAKEOPTS in our default user_compile() function. You may be wondering how this will work -- after all, we refer to $MAKEOPTS before we source /etc/ebuild.conf, which actually defines $MAKEOPTS in the first place. Fortunately for us, this is OK because variable expansion only happens when user_compile() is executed. By the time user_compile() is executed, /etc/ebuild.conf has already been sourced, and $MAKEOPTS is set to the correct value.

Wrapping it up

We've covered a lot of bash programming techniques in this article, but we've only touched the surface of the power of bash. For example, the production Gentoo Linux ebuild system not only automatically unpacks and compiles each package, but it can also:

  • Automatically download the sources if they are not found in $DISTDIR
  • Verify that the sources are not corrupted by using MD5 message digests
  • If requested, install the compiled application into the live filesystem, recording all installed files so that the package can be easily uninstalled at a later date.
  • If requested, package the compiled application in a tarball (compressed the way you like it) so that it can be installed later, on another computer, or during the CD-based installation process (if you are building a distribution CD)

In addition, the production ebuild system has several other global configuration options, allowing the user to specify options such as what optimization flags to use during compilation, and whether optional support for packages like GNOME and slang should be enabled by default in those packages that support it.

It's clear that bash can accomplish much more than what I've touched on in this series of articles. I hope you've learned a lot about this incredible tool, and are excited about using bash to speed up and enhance your development projects.

Resources


Support Funtoo and help us grow! Donate $15 per month and get a free SSD-based Funtoo Virtual Container. 23 spots left.

</div>
About the Author

Daniel Robbins is best known as the creator of Gentoo Linux and author of many IBM developerWorks articles about Linux. Daniel currently serves as Benevolent Dictator for Life (BDFL) of Funtoo Linux. Funtoo Linux is a Gentoo-based distribution and continuation of Daniel's original Gentoo vision.

Got Funtoo?

Have you installed Funtoo Linux yet? Discover the power of a from-source meta-distribution optimized for your hardware! See our installation instructions and browse our CPU-optimized builds.

Funtoo News

Drobbins

Newsletter, Volume 1

Discussed: ati-drivers, GitHub integration, Funtoo on ARM, GNOME updates, Organizations, and two new devs.
27 January 2015 by Drobbins
Drobbins

New Media Mix-ins

Funtoo Linux now has new media mix-ins. Learn about them and how to use them.
11 January 2015 by Drobbins
Drobbins

The Many Builds of Funtoo Linux

We now have lots of different builds of Funtoo Linux for various CPUs, as well as Hardened, Stable and ARM, and a new UI to browse them. Learn more here.
25 December 2014 by Drobbins
View More News...

More Articles

Browse all our Linux-related articles, below:

A

B

F

G

K

L

M

O

P

S

T

W

X

Z


</div></div>