Difference between pages "Bash by Example, Part 1" and "Forking An Ebuild"

(Difference between pages)
 
m
 
Line 1: Line 1:
{{Article
+
Often, a Funtoo developer needs to fork an upstream ebuild. This is necessary when we want to apply fixes to it. This page will explain the concepts of forking and how this works in the context of Funtoo.
|Author=Drobbins
+
|Next in Series=Bash by Example, Part 2
+
}}
+
== Fundamental programming in the Bourne again shell (bash) ==
+
  
=== Introduction ===
+
== Portage Tree Generation ==
You might wonder why you ought to learn Bash programming. Well, here are a couple of compelling reasons:
+
  
=== You're already running it ===
+
Funtoo Linux generates its Portage tree using a special script that essentially takes a Gentoo tree as its starting point, and then applies various modifications to it. The modifications involve adding packages from various overlays, including our [https://github.com/funtoo/funtoo-overlay Funtoo-overlay]. Some packages added are brand new, while other packages are our special forked versions that replace existing packages.
If you check, you'll probably find that you are running bash right now. Even if you changed your default shell, bash is probably still running somewhere on your system, because it's the standard Linux shell and is used for a variety of purposes. Because bash is already running, any additional bash scripts that you run are inherently memory-efficient because they share memory with any already-running bash processes. Why load a 500K interpreter if you already are running something that will do the job, and do it well?
+
  
=== You're already using it ===
+
In the vast majority of cases, when we fork a package, we take full responsibility for all ebuilds associated with that package, meaning that we have a full copy of the <tt>sys-foo/bar</tt> directory in one of our overlays.
Not only are you already running bash, but you're actually interacting with bash on a daily basis. It's always there, so it makes sense to learn how to use it to its fullest potential. Doing so will make your bash experience more fun and productive. But why should you learn bash programming? Easy, because you already think in terms of running commands, CPing files, and piping and redirecting output. Shouldn't you learn a language that allows you to use and build upon these powerful time-saving constructs you already know how to use? Command shells unlock the potential of a UNIX system, and bash is the Linux shell. It's the high-level glue between you and the machine. Grow in your knowledge of bash, and you'll automatically increase your productivity under Linux and UNIX -- it's that simple.
+
  
=== Bash confusion ===
+
If you're interested in seeing the actual script that does all these things, take a look at the following files:
Learning bash the wrong way can be a very confusing process. Many newbies type <span style="color:green;">man bash</span> to view the bash man page, only to be confronted with a very terse and technical description of shell functionality. Others type <span style="color:green;">info bash</span> (to view the GNU info documentation), causing either the man page to be redisplayed, or (if they are lucky) only slightly more friendly info documentation to appear.
+
  
While this may be somewhat disappointing to novices, the standard bash documentation can't be all things to all people, and caters towards those already familiar with shell programming in general. There's definitely a lot of excellent technical information in the man page, but its helpfulness to beginners is limited.
+
; http://git.funtoo.org/funtoo-overlay/tree/funtoo/scripts/current-update.sh: cronned script that calls <tt>merge.py</tt>.
 +
;http://git.funtoo.org/funtoo-overlay/tree/funtoo/scripts/merge.py: python script that does the heavy lifting of combining Gentoo tree with various overlays, including our flora and funtoo-overlay. When we want to change what overlays we merge, what packages we exclude as a matter of policy (such as stale packages in some overlays), we make changes to this file.
 +
; http://git.funtoo.org/funtoo-overlay/tree/funtoo/scripts/merge_utils.py: python module that contains classes and methods that implement the merging functionality.
  
That's where this series comes in. In it, I'll show you how to actually use bash programming constructs, so that you will be able to write your own scripts. Instead of technical descriptions, I'll provide you with explanations in plain English, so that you will know not only what something does, but when you should actually use it. By the end of this three-part series, you'll be able to write your own intricate bash scripts, and be at the level where you can comfortably use bash and supplement your knowledge by reading (and understanding!) the standard bash documentation. Let's begin.
+
== Forking an Ebuild ==
  
=== Environment variables ===
+
In general, we fork ebuilds from Gentoo that we want to modify in some way. Before you fork an ebuild, it's important to understand that in general we fork entire packages, not just a single ebuild. This means that if you want to make some changes to <tt>sys-foo/bar</tt>, you are going to fork all <tt>sys-foo/bar</tt> ebuilds, and then Funtoo will be responsible for continuing to maintain these ebuilds until the package is unforked. Here are the steps we would use to fork <tt>sys-foo/bar</tt>:
Under bash and almost all other shells, the user can define environment variables, which are stored internally as ASCII strings. One of the handiest things about environment variables is that they are a standard part of the UNIX process model. This means that environment variables not only are exclusive to shell scripts, but can be used by standard compiled programs as well. When we "export" an environment variable under bash, any subsequent program that we run can read our setting, whether it is a shell script or not. A good example is the <span style="color:green">vipw</span> command, which normally allows root to edit the system password file. By setting the <span style="color:green">EDITOR</span> environment variable to the name of your favorite text editor, you can configure vipw to use it instead of vi, a handy thing if you are used to xemacs and really dislike vi.
+
  
The standard way to define an environment variable under bash is:
+
# Find <tt>sys-foo/bar</tt> in you regular Portage tree. Make sure you have run <tt>emerge --sync</tt> recently to ensure it is up-to-date. If you want to fork from very recent changes that are not yet in our tree, you may need to grab the most recent Gentoo Portage tree to serve as your source for <tt>sys-foo/bar</tt> (this typically isn't necessary.)
<pre>
+
<console>
$ myvar='This is my environment variable!'
+
# alias to recursively grab latest from Gentoo Portage tree WITHOUT history
</pre>
+
# usage: getgen gentoo-x86/dev-db/mongodb
The above command defined an environment variable called "myvar" and contains the string "This is my environment variable!". There are several things to notice above: first, there is no space on either side of the "=" sign; any space will result in an error (try it and see). The second thing to notice is that while we could have done away with the quotes if we were defining a single word, they are necessary when the value of the environment variable is more than a single word (contains spaces or tabs).
+
alias getgen="cvs -d :pserver:anonymous@anoncvs.gentoo.org:/var/cvsroot export -D$(date '+%Y-%m-%d')"
 +
</console>
 +
# Copy the <tt>sys-foo/bar</tt> directory in its entirety to <tt>funtoo-overlay/sys-foo/bar</tt>.
 +
# Make any necessary modifications to <tt>funtoo-overlay/sys-foo/bar</tt>.
 +
# Perform some funtoo-ification steps prior to commit.
 +
# Add and commit the changes to funtoo-overlay.
 +
# Push changes to funtoo-overlay.
  
{{fancynote|For extremely detailed information on how quotes should be used in bash, you may  want to look at the "QUOTING" section in the bash man page. The existence of special character sequences that get "expanded" (replaced) with other values does complicate how strings are handled in bash. We will just cover the most often-used quoting functionality in this series.}}
+
At this point, the forked <tt>sys-foo/bar</tt> package will be part of funtoo-overlay. The next time our unified Portage tree is generated by <tt>merge.py</tt> (the one that users have in their <tt>/usr/portage</tt> and is updated via <tt>emerge --sync</tt>), your forked ebuild will be used in place of the Gentoo ebuild. Why is this? It is because our <tt>merge.py</tt> script has been defined with a policy that any ebuilds in funtoo-overlay will replace any existing Gentoo ebuilds if they exist. The mechanism of replacement is that our <tt>sys-foo/bar</tt> directory will be used in place of Gentoo's <tt>sys-foo/bar</tt> directory. So this is how the forking process works.
  
Thirdly, while we can normally use double quotes instead of single quotes, doing so in the above example would have caused an error. Why? Because using single quotes disables a bash feature called expansion, where special characters and sequences of characters are replaced with values. For example, the "!" character is the history expansion character, which bash normally replaces with a previously-typed command. (We won't be covering history expansion in this series of articles, because it is not frequently used in bash programming. For more information on it, see the "HISTORY EXPANSION" section in the bash man page.) While this macro-like functionality can come in handy, right now we want a literal exclamation point at the end of our environment variable, rather than a macro.
+
== Funtoo-ification ==
  
Now, let's take a look at how one actually uses environment variables. Here's an example:
+
When we fork a package from Gentoo, we perform the following tweaks to the package directory before committing:
<pre>
+
$ echo $myvar
+
This is my environment variable!
+
</pre>
+
By preceding the name of our environment variable with a $, we can cause bash to replace it with the value of myvar. In bash terminology, this is called "variable expansion". But, what if we try the following:
+
<pre>
+
$ echo foo$myvarbar
+
foo
+
</pre>
+
We wanted this to echo "fooThis is my environment variable!bar", but it didn't work. What went wrong? In a nutshell, bash's variable expansion facility in got confused. It couldn't tell whether we wanted to expand the variable $m, $my, $myvar, $myvarbar, etc. How can we be more explicit and clearly tell bash what variable we are referring to? Try this:
+
<pre>
+
$ echo foo${myvar}bar
+
fooThis is my environment variable!bar
+
</pre>
+
As you can see, we can enclose the environment variable name in curly braces when it is not clearly separated from the surrounding text. While $myvar is faster to type and will work most of the time, ${myvar} can be parsed correctly in almost any situation. Other than that, they both do the same thing, and you will see both forms of variable expansion in the rest of this series. You'll want to remember to use the more explicit curly-brace form when your environment variable is not isolated from the surrounding text by whitespace (spaces or tabs).
+
  
Recall that we also mentioned that we can "export" variables. When we export an environment variable, it's automatically available in the environment of any subsequently-run script or executable. Shell scripts can "get to" the environment variable using that shell's built-in environment-variable support, while C programs can use the getenv() function call. Here's some example C code that you should type in and compile -- it'll allow us to understand environment variables from the perspective of C:
+
# Removal of <tt>ChangeLog</tt>.
<syntaxhighlight lang="c">
+
# Run <tt>ebuild foo-1.0.ebuild digest</tt> before committing. This will cause the <tt>Manifest</tt> file to be regenerated. Gentoo has a lot more entries in this file than we do, since we use mini-Manfiests that only include DIST listings (for distfiles only.) We want to commit our mini-Manifest (still called <tt>Manifest</tt>, just with less entries in it) rather than the one that came from Gentoo.
#include <stdio.h>
+
# Edit the top of each ebuild, and remove all <tt>Copyright</tt> and <tt>$Header:</tt> lines at the top of the file. We have a LICENSE.txt and COPYRIGHT.txt file in the root of our Portage tree, which is easier to maintain than keeping all the years up-to-date in each ebuild. Also, the <tt>$Header:</tt> line is there for the CVS version control system in Gentoo which Funtoo does not use. ''The only comment that should remain on the top of the ebuild is the one stating that it is distributed under the GPLv2.''.
#include <stdlib.h>
+
  
int main(void) {
+
<console>
  char *myenvvar=getenv("EDITOR");
+
# If you find yourself doing this often, place this function in your .bashrc, .zshrc, etc
  printf("The editor environment variable is set to %s\n",myenvvar);
+
funtooize() {
 +
    if [ -z "$1" ]; then
 +
        search_path='.'
 +
    else
 +
        search_path=$1
 +
    fi
 +
 
 +
    find $search_path -type f -exec sed -i -e '/^# Copyright\|^# \$Header/d' {} +
 +
    find $search_path -type f -name "ChangeLog*" -delete
 +
    find $search_path -type f -name '*.ebuild' -exec ebuild {} manifest \;
 
}
 
}
</syntaxhighlight>
+
</console>
Save the above source into a file called '''myenv.c''', and then compile it by issuing the command:
+
<pre>
+
$ gcc myenv.c -o myenv
+
</pre>
+
Now, there will be an executable program in your directory that, when run, will print the value of the <span style="color:green">EDITOR</span> environment variable, if any. This is what happens when I run it on my machine:
+
<pre>
+
$ ./myenv
+
The editor environment variable is set to (null)
+
</pre>
+
Hmmm... because the <span style="color:green">EDITOR</span> environment variable was not set to anything, the C program gets a null string. Let's try setting it to a specific value:
+
<pre>
+
$ EDITOR=xemacs
+
$ ./myenv
+
The editor environment variable is set to (null)
+
</pre>
+
While you might have expected myenv to print the value "xemacs", it didn't quite work, because we didn't export the EDITOR environment variable. This time, we'll get it working:
+
<pre>
+
$ export EDITOR
+
$ ./myenv
+
The editor environment variable is set to xemacs
+
</pre>
+
So, you have seen with your very own eyes that another process (in this case our example C program) cannot see the environment variable until it is exported. Incidentally, if you want, you can define and export an environment variable using one line, as follows:
+
<pre>
+
$ export EDITOR=xemacs
+
</pre>
+
It works identically to the two-line version. This would be a good time to show how to erase an environment variable by using <span style="color:green">unset</span>:
+
<pre>
+
$ unset EDITOR
+
$ ./myenv
+
The editor environment variable is set to (null)
+
</pre>
+
 
+
=== Chopping strings overview ===
+
Chopping strings -- that is, splitting an original string into smaller, separate chunk(s) -- is one of those tasks that is performed daily by your average shell script. Many times, shell scripts need to take a fully-qualified path, and find the terminating file or directory. While it's possible (and fun!) to code this in bash, the standard <span style="color:green">basename</span> UNIX executable performs this extremely well:
+
<pre>
+
$ basename /usr/local/share/doc/foo/foo.txt
+
foo.txt
+
$ basename /usr/home/drobbins
+
drobbins
+
</pre>
+
<span style="color:green">basename</span> is quite a handy tool for chopping up strings. It's companion, called <span style="color:green">dirname</span>, returns the "other" part of the path that <span style="color:green">basename</span> throws away:
+
<pre>
+
$ dirname /usr/local/share/doc/foo/foo.txt
+
/usr/local/share/doc/foo
+
$ dirname /usr/home/drobbins/
+
/usr/home
+
</pre>
+
{{fancynote|Both dirname and basename do not look at any files or directories on disk; they are purely string manipulation commands.}}
+
 
+
=== Command substitution ===
+
One very handy thing to know is how to create an environment variable that contains the result of an executable command. This is very easy to do:
+
<pre>
+
$ MYDIR=$(dirname /usr/local/share/doc/foo/foo.txt)
+
$ echo $MYDIR
+
/usr/local/share/doc/foo
+
</pre>
+
What we did above is called ''command substitution''. Several things are worth noticing in this example. On the first line, we simply enclosed the command we wanted to execute with ''$( )''.
+
 
+
Note that it is also possible to do the same thing using backquotes, the keyboard key that normally sits above the Tab key:
+
<pre>
+
$ MYDIR=`dirname /usr/local/share/doc/foo/foo.txt`
+
$ echo $MYDIR
+
/usr/local/share/doc/foo
+
</pre>
+
As you can see, bash provides multiple ways to perform exactly the same thing. Using command substitution, we can place any command or pipeline of commands in between ''` `'' or ''$( )'' and assign it to an environment variable. Handy stuff! Here's an example of how to use a pipeline with command substitution:
+
 
+
<pre>
+
$ MYFILES=$(ls /etc | grep pa)
+
$ echo $MYFILES
+
pam.d passwd
+
</pre>
+
 
+
It's also worth pointing out that ''$( )'' is generally preferred over ''` `'' in shell scripts because it is more universally supported across different shells, is easier to type and read, and is less complicated to use in a nested form, as follows:
+
<pre>
+
$ MYFILES=$(ls $(dirname foo/bar/oni))
+
</pre>
+
 
+
=== Chopping strings like a pro ===
+
While <span style="color:green">basename</span> and <span style="color:green">dirname</span> are great tools, there are times where we may need to perform more advanced string "chopping" operations than just standard pathname manipulations. When we need more punch, we can take advantage of bash's advanced built-in variable expansion functionality. We've already used the standard kind of variable expansion, which looks like this: ${MYVAR}. But bash can also perform some handy string chopping on its own. Take a look at these examples:
+
<pre>
+
$ MYVAR=foodforthought.jpg
+
$ echo ${MYVAR##*fo}
+
rthought.jpg
+
$ echo ${MYVAR#*fo}
+
odforthought.jpg
+
</pre>
+
In the first example, we typed ${MYVAR##*fo}. What exactly does this mean? Basically, inside the ''${ }'', we typed the name of the environment variable, two ##s, and a wildcard ("*fo"). Then, bash took <span style="color:green">MYVAR</span>, found the longest substring from the beginning of the string "foodforthought.jpg" that matched the wildcard "*fo", and chopped it off the beginning of the string. That's a bit hard to grasp at first, so to get a feel for how this special "##" option works, let's step through how bash completed this expansion. First, it began searching for substrings at the beginning of "foodforthought.jpg" that matched the "*fo" wildcard. Here are the substrings that it checked:
+
<pre>
+
f     
+
fo              MATCHES *fo
+
foo   
+
food
+
foodf         
+
foodfo          MATCHES *fo
+
foodfor
+
foodfort       
+
foodforth
+
foodfortho     
+
foodforthou
+
foodforthoug
+
foodforthought
+
foodforthought.j
+
foodforthought.jp
+
foodforthought.jpg
+
</pre>
+
After searching the string for matches, you can see that bash found two. It selects the longest match, removes it from the beginning of the original string, and returns the result.
+
 
+
The second form of variable expansion shown above appears identical to the first, except it uses only one "#" -- and bash performs an almost identical process. It checks the same set of substrings as our first example did, except that bash removes the shortest match from our original string, and returns the result. So, as soon as it checks the "fo" substring, it removes "fo" from our string and returns "odforthought.jpg".
+
 
+
This may seem extremely cryptic, so I'll show you an easy way to remember this functionality. When searching for the longest match, use ## (because ## is longer than #). When searching for the shortest match, use #. See, not that hard to remember at all! Wait, how do you remember that we are supposed to use the '#' character to remove from the *beginning* of a string? Simple! You will notice that on a US keyboard, shift-4 is "$", which is the bash variable expansion character. On the keyboard, immediately to the left of "$" is "#". So, you can see that "#" is "at the beginning" of "$", and thus (according to our mnemonic), "#" removes characters from the beginning of the string. You may wonder how we remove characters from the end of the string. If you guessed that we use the character immediately to the right of "$" on the US keyboard ("%"), you're right! Here are some quick examples of how to chop off trailing portions of strings:
+
<pre>
+
$ MYFOO="chickensoup.tar.gz"
+
$ echo ${MYFOO%%.*}
+
chickensoup
+
$ echo ${MYFOO%.*}
+
chickensoup.tar
+
</pre>
+
As you can see, the % and %% variable expansion options work identically to # and ##, except they remove the matching wildcard from the end of the string. Note that you don't have to use the "*" character if you wish to remove a specific substring from the end:
+
<pre>
+
MYFOOD="chickensoup"
+
$ echo ${MYFOOD%%soup}
+
chicken
+
</pre>
+
In this example, it doesn't matter whether we use "%%" or "%", since only one match is possible. And remember, if you forget whether to use "#" or "%", look at the 3, 4, and 5 keys on your keyboard and figure it out.
+
 
+
We can use another form of variable expansion to select a specific substring, based on a specific character offset and length. Try typing in the following lines under bash:
+
<pre>
+
$ EXCLAIM=cowabunga
+
$ echo ${EXCLAIM:0:3}
+
cow
+
$ echo ${EXCLAIM:3:7}
+
abunga
+
</pre>
+
This form of string chopping can come in quite handy; simply specify the character to start from and the length of the substring, all separated by colons.
+
 
+
=== Applying string chopping ===
+
Now that we've learned all about chopping strings, let's write a simple little shell script. Our script will accept a single file as an argument, and will print out whether it appears to be a tarball. To determine if it is a tarball, it will look for the pattern ".tar" at the end of the file. Here it is:
+
<syntaxhighlight lang="bash">
+
#!/bin/bash
+
 
+
if [ "${1##*.}" = "tar" ]
+
then
+
      echo This appears to be a tarball.
+
else
+
      echo At first glance, this does not appear to be a tarball.
+
fi
+
</syntaxhighlight>
+
To run this script, enter it into a file called '''mytar.sh''', and type <span style="color:green">chmod 755 mytar.sh</span> to make it executable. Then, give it a try on a tarball, as follows:
+
<pre>
+
$ ./mytar.sh thisfile.tar
+
This appears to be a tarball.
+
$ ./mytar.sh thatfile.gz
+
At first glance, this does not appear to be a tarball.
+
</pre>
+
OK, it works, but it's not very functional. Before we make it more useful, let's take a look at the "if" statement used above. In it, we have a boolean expression. In bash, the "=" comparison operator checks for string equality. In bash, all boolean expressions are enclosed in square brackets. But what does the boolean expression actually test for? Let's take a look at the left side. According to what we've learned about string chopping, "${1##*.}" will remove the longest match of "*." from the beginning of the string contained in the environment variable "1", returning the result. This will cause everything after the last "." in the file to be returned. Obviously, if the file ends in ".tar", we will get "tar" as a result, and the condition will be true.
+
 
+
You may be wondering what the "1" environment variable is in the first place. Very simple -- $1 is the first command-line argument to the script, $2 is the second, etc. OK, now that we've reviewed the function, we can take our first look at "if" statements.
+
 
+
=== If statements ===
+
Like most languages, bash has its own form of conditional. When using them, stick to the format above; that is, keep the "if" and the "then" on separate lines, and keep the "else" and the terminating and required "fi" in horizontal alignment with them. This makes the code easier to read and debug. In addition to the "if,else" form, there are several other forms of "if" statements:
+
<syntaxhighlight lang="bash">
+
if      [ condition ]
+
then
+
        action
+
fi
+
</syntaxhighlight>
+
This one performs an action only if condition is true, otherwise it performs no action and continues executing any lines following the "fi".
+
<syntaxhighlight lang="bash">
+
if [ condition ]
+
then
+
        action
+
elif [ condition2 ]
+
then
+
        action2
+
.
+
.
+
.
+
elif [ condition3 ]
+
then
+
 
+
else
+
        actionx
+
fi
+
</syntaxhighlight>
+
The above "elif" form will consecutively test each condition and execute the action corresponding to the first true condition. If none of the conditions are true, it will execute the "else" action, if one is present, and then continue executing lines following the entire "if,elif,else" statement.
+
  
=== Next time ===
+
Here are a few additional changes that you are allowed to make to any forked ebuilds:
Now that we've covered the most basic bash functionality, it's time to pick up the pace and get ready to write some real scripts. In the next article, I'll cover looping constructs, functions, namespace, and other essential topics. Then, we'll be ready to write some more complicated scripts. In the third article, we'll focus almost exclusively on very complex scripts and functions, as well as several bash script design options. See you then!
+
  
== Resources ==
+
# Line length greater than 80 characters. Gentoo enforces an 80-character line length limit. We don't.
* Read [[Bash by Example, Part 2]]
+
# <tt>KEYWORDS</tt> of <tt>*</tt> and <tt>~*</tt>. Gentoo does not allow these shortcuts. We do. They allow you to say "all arches" and "all unstable arches" in a concise way. Gentoo doesn't allow these shortcuts because it's Gentoo's policy to have each arch team manually approve each package. We do not have this policy so we can use the shortcuts.
* Read [[Bash by Example, Part 3]]
+
# Use of <tt>4-python</tt> EAPI. We allow the use of this EAPI for enhanced python functionality.
* Visit [http://www.gnu.org/software/bash/bash.html GNU's bash home page]
+
  
__NOTOC__
+
[[Category:Development]]
[[Category:Linux Core Concepts]]
+
[[Category:Articles]]
+
{{ArticleFooter}}
+

Latest revision as of 05:25, May 30, 2015

Often, a Funtoo developer needs to fork an upstream ebuild. This is necessary when we want to apply fixes to it. This page will explain the concepts of forking and how this works in the context of Funtoo.

Portage Tree Generation

Funtoo Linux generates its Portage tree using a special script that essentially takes a Gentoo tree as its starting point, and then applies various modifications to it. The modifications involve adding packages from various overlays, including our Funtoo-overlay. Some packages added are brand new, while other packages are our special forked versions that replace existing packages.

In the vast majority of cases, when we fork a package, we take full responsibility for all ebuilds associated with that package, meaning that we have a full copy of the sys-foo/bar directory in one of our overlays.

If you're interested in seeing the actual script that does all these things, take a look at the following files:

http://git.funtoo.org/funtoo-overlay/tree/funtoo/scripts/current-update.sh
cronned script that calls merge.py.
http://git.funtoo.org/funtoo-overlay/tree/funtoo/scripts/merge.py
python script that does the heavy lifting of combining Gentoo tree with various overlays, including our flora and funtoo-overlay. When we want to change what overlays we merge, what packages we exclude as a matter of policy (such as stale packages in some overlays), we make changes to this file.
http://git.funtoo.org/funtoo-overlay/tree/funtoo/scripts/merge_utils.py
python module that contains classes and methods that implement the merging functionality.

Forking an Ebuild

In general, we fork ebuilds from Gentoo that we want to modify in some way. Before you fork an ebuild, it's important to understand that in general we fork entire packages, not just a single ebuild. This means that if you want to make some changes to sys-foo/bar, you are going to fork all sys-foo/bar ebuilds, and then Funtoo will be responsible for continuing to maintain these ebuilds until the package is unforked. Here are the steps we would use to fork sys-foo/bar:

  1. Find sys-foo/bar in you regular Portage tree. Make sure you have run emerge --sync recently to ensure it is up-to-date. If you want to fork from very recent changes that are not yet in our tree, you may need to grab the most recent Gentoo Portage tree to serve as your source for sys-foo/bar (this typically isn't necessary.)
# alias to recursively grab latest from Gentoo Portage tree WITHOUT history
# usage: getgen gentoo-x86/dev-db/mongodb
alias getgen="cvs -d :pserver:anonymous@anoncvs.gentoo.org:/var/cvsroot export -D$(date '+%Y-%m-%d')"
  1. Copy the sys-foo/bar directory in its entirety to funtoo-overlay/sys-foo/bar.
  2. Make any necessary modifications to funtoo-overlay/sys-foo/bar.
  3. Perform some funtoo-ification steps prior to commit.
  4. Add and commit the changes to funtoo-overlay.
  5. Push changes to funtoo-overlay.

At this point, the forked sys-foo/bar package will be part of funtoo-overlay. The next time our unified Portage tree is generated by merge.py (the one that users have in their /usr/portage and is updated via emerge --sync), your forked ebuild will be used in place of the Gentoo ebuild. Why is this? It is because our merge.py script has been defined with a policy that any ebuilds in funtoo-overlay will replace any existing Gentoo ebuilds if they exist. The mechanism of replacement is that our sys-foo/bar directory will be used in place of Gentoo's sys-foo/bar directory. So this is how the forking process works.

Funtoo-ification

When we fork a package from Gentoo, we perform the following tweaks to the package directory before committing:

  1. Removal of ChangeLog.
  2. Run ebuild foo-1.0.ebuild digest before committing. This will cause the Manifest file to be regenerated. Gentoo has a lot more entries in this file than we do, since we use mini-Manfiests that only include DIST listings (for distfiles only.) We want to commit our mini-Manifest (still called Manifest, just with less entries in it) rather than the one that came from Gentoo.
  3. Edit the top of each ebuild, and remove all Copyright and $Header: lines at the top of the file. We have a LICENSE.txt and COPYRIGHT.txt file in the root of our Portage tree, which is easier to maintain than keeping all the years up-to-date in each ebuild. Also, the $Header: line is there for the CVS version control system in Gentoo which Funtoo does not use. The only comment that should remain on the top of the ebuild is the one stating that it is distributed under the GPLv2..
# If you find yourself doing this often, place this function in your .bashrc, .zshrc, etc
funtooize() {
    if [ -z "$1" ]; then
        search_path='.'
    else
        search_path=$1
    fi

    find $search_path -type f -exec sed -i -e '/^# Copyright\|^# \$Header/d' {} +
    find $search_path -type f -name "ChangeLog*" -delete
    find $search_path -type f -name '*.ebuild' -exec ebuild {} manifest \;
}

Here are a few additional changes that you are allowed to make to any forked ebuilds:

  1. Line length greater than 80 characters. Gentoo enforces an 80-character line length limit. We don't.
  2. KEYWORDS of * and ~*. Gentoo does not allow these shortcuts. We do. They allow you to say "all arches" and "all unstable arches" in a concise way. Gentoo doesn't allow these shortcuts because it's Gentoo's policy to have each arch team manually approve each package. We do not have this policy so we can use the shortcuts.
  3. Use of 4-python EAPI. We allow the use of this EAPI for enhanced python functionality.