Difference between pages "Talk:User and Group Management" and "Bash by Example, Part 1"

From Funtoo
(Difference between pages)
Jump to navigation Jump to search
 
 
Line 1: Line 1:
{{fancynote|the Discussion page contains a much more ambitious proposal that I decided was too complex to tackle all at once. It is being split into bite-sized portions (Phases) on the main page.}}
== Fundamental programming in the Bourne again shell (bash) ==


== User and Group Dependencies ==
=== Introduction ===
You might wonder why you ought to learn Bash programming. Well, here are a couple of compelling reasons:


[http://www.exherbo.org/docs/exheres-for-smarties.html#repository_metadata Exheres] defines a dependency-based mechanism for ebuilds to specify their user and group dependencies, which is an appropriate mechanism for specifying dependencies. The specific syntax used is <tt>user/foo</tt> to specify a dependency on user <tt>foo</tt>, and <tt>group/bar</tt> to specify a dependency on group <tt>bar</tt> existing. Dependencies can be build-time or run-time, as required.
=== You're already running it ===
If you check, you'll probably find that you are running bash right now. Even if you changed your default shell, bash is probably still running somewhere on your system, because it's the standard Linux shell and is used for a variety of purposes. Because bash is already running, any additional bash scripts that you run are inherently memory-efficient because they share memory with any already-running bash processes. Why load a 500K interpreter if you already are running something that will do the job, and do it well?


Using dependencies for this purpose allows Portage to create these users and groups at exactly the right time -- prior to build or prior to install, as necessary, and will work just fine with binary packages, with some potential caveats (noted later in the document.) It also allows user and group creation to be affected by <tt>USE</tt> variable settings.
=== You're already using it ===
Not only are you already running bash, but you're actually interacting with bash on a daily basis. It's always there, so it makes sense to learn how to use it to its fullest potential. Doing so will make your bash experience more fun and productive. But why should you learn bash programming? Easy, because you already think in terms of running commands, CPing files, and piping and redirecting output. Shouldn't you learn a language that allows you to use and build upon these powerful time-saving constructs you already know how to use? Command shells unlock the potential of a UNIX system, and bash is the Linux shell. It's the high-level glue between you and the machine. Grow in your knowledge of bash, and you'll automatically increase your productivity under Linux and UNIX -- it's that simple.


This document suggests following the Exheres syntax:
=== Bash confusion ===
Learning bash the wrong way can be a very confusing process. Many newbies type <span style="color:green;">man bash</span> to view the bash man page, only to be confronted with a very terse and technical description of shell functionality. Others type <span style="color:green;">info bash</span> (to view the GNU info documentation), causing either the man page to be redisplayed, or (if they are lucky) only slightly more friendly info documentation to appear.


<pre>
While this may be somewhat disappointing to novices, the standard bash documentation can't be all things to all people, and caters towards those already familiar with shell programming in general. There's definitely a lot of excellent technical information in the man page, but its helpfulness to beginners is limited.
DEPEND="user/lighttpd group/web-server"
RDEPEND="user/lighttpd group/web-server"
</pre>


All this tells Portage is that "This ebuild needs a <tt>lighttpd</tt> user and <tt>web-server</tt> group." But it does not tell Portage what UID it should be, nor does it provide other necessary settings for the user. This data is defined within the Portage tree, and the mechanism for defining this data is described below.
That's where this series comes in. In it, I'll show you how to actually use bash programming constructs, so that you will be able to write your own scripts. Instead of technical descriptions, I'll provide you with explanations in plain English, so that you will know not only what something does, but when you should actually use it. By the end of this three-part series, you'll be able to write your own intricate bash scripts, and be at the level where you can comfortably use bash and supplement your knowledge by reading (and understanding!) the standard bash documentation. Let's begin.


== Profile Settings ==
=== Environment variables ===
Under bash and almost all other shells, the user can define environment variables, which are stored internally as ASCII strings. One of the handiest things about environment variables is that they are a standard part of the UNIX process model. This means that environment variables not only are exclusive to shell scripts, but can be used by standard compiled programs as well. When we "export" an environment variable under bash, any subsequent program that we run can read our setting, whether it is a shell script or not. A good example is the <span style="color:green">vipw</span> command, which normally allows root to edit the system password file. By setting the <span style="color:green">EDITOR</span> environment variable to the name of your favorite text editor, you can configure vipw to use it instead of vi, a handy thing if you are used to xemacs and really dislike vi.


The user or group dependency will just tell Portage that this particular package requires a particular user or group, but any detailed information related to this user or group, such as suggested UID/GID, shell, etc, is stored in the Portage tree itself, and specifically in the Portage ''profile''. The mechanism for defining this information is described below:
The standard way to define an environment variable under bash is:
<pre>
$ myvar='This is my environment variable!'
</pre>
The above command defined an environment variable called "myvar" and contains the string "This is my environment variable!". There are several things to notice above: first, there is no space on either side of the "=" sign; any space will result in an error (try it and see). The second thing to notice is that while we could have done away with the quotes if we were defining a single word, they are necessary when the value of the environment variable is more than a single word (contains spaces or tabs).


=== Core Portage Trees ===
{{fancynote|For extremely detailed information on how quotes should be used in bash, you may  want to look at the "QUOTING" section in the bash man page. The existence of special character sequences that get "expanded" (replaced) with other values does complicate how strings are handled in bash. We will just cover the most often-used quoting functionality in this series.}}


For "core" Portage trees (not overlays,) specific user and group settings are defined using Portage's ''cascading profile'' functionality. Portage would be enhanced to recognize <tt>accounts/users</tt> and <tt>accounts/groups</tt> directories inside profile directories. Users and groups would be defined in these directories, with one user or group per file, and the filename specifying the name of the user or group. Cascading functionality would be enabled so that the full set of user and group data could be a collection of all users and groups defined in parent profiles. This would provide a handy mechanism to share user and group definitions across different operating systems, while allowing for local variations when needed. It makes sense to leverage cascading profiles as much as possible.
Thirdly, while we can normally use double quotes instead of single quotes, doing so in the above example would have caused an error. Why? Because using single quotes disables a bash feature called expansion, where special characters and sequences of characters are replaced with values. For example, the "!" character is the history expansion character, which bash normally replaces with a previously-typed command. (We won't be covering history expansion in this series of articles, because it is not frequently used in bash programming. For more information on it, see the "HISTORY EXPANSION" section in the bash man page.) While this macro-like functionality can come in handy, right now we want a literal exclamation point at the end of our environment variable, rather than a macro.


=== Overlays ===
Now, let's take a look at how one actually uses environment variables. Here's an example:
<pre>
$ echo $myvar
This is my environment variable!
</pre>
By preceding the name of our environment variable with a $, we can cause bash to replace it with the value of myvar. In bash terminology, this is called "variable expansion". But, what if we try the following:
<pre>
$ echo foo$myvarbar
foo
</pre>
We wanted this to echo "fooThis is my environment variable!bar", but it didn't work. What went wrong? In a nutshell, bash's variable expansion facility in got confused. It couldn't tell whether we wanted to expand the variable $m, $my, $myvar, $myvarbar, etc. How can we be more explicit and clearly tell bash what variable we are referring to? Try this:
<pre>
$ echo foo${myvar}bar
fooThis is my environment variable!bar
</pre>
As you can see, we can enclose the environment variable name in curly braces when it is not clearly separated from the surrounding text. While $myvar is faster to type and will work most of the time, ${myvar} can be parsed correctly in almost any situation. Other than that, they both do the same thing, and you will see both forms of variable expansion in the rest of this series. You'll want to remember to use the more explicit curly-brace form when your environment variable is not isolated from the surrounding text by whitespace (spaces or tabs).


The approach described above does not work for overlays -- how are they to extend user and group settings automatically, as required by the ebuilds contained in the overlay?
Recall that we also mentioned that we can "export" variables. When we export an environment variable, it's automatically available in the environment of any subsequently-run script or executable. Shell scripts can "get to" the environment variable using that shell's built-in environment-variable support, while C programs can use the getenv() function call. Here's some example C code that you should type in and compile -- it'll allow us to understand environment variables from the perspective of C:
<syntaxhighlight lang="c">
#include <stdio.h>
#include <stdlib.h>


The proposed solution is to allow overlays to add users and groups via the <tt>OVERLAY_DIR/profiles/accounts/groups</tt> and <tt>PORTDIR/profiles/accounts/users</tt> directories. These directories will ''always'' be searched for user and group data for all active overlays, and merged into the set defined by the profiles. This provides an automatic mechanism for overlays to inject user and group data that they require, without requiring any manual configuration on behalf of the Gentoo/Funtoo Linux user.
int main(void) {
  char *myenvvar=getenv("EDITOR");
  printf("The editor environment variable is set to %s\n",myenvvar);
}
</syntaxhighlight>
Save the above source into a file called '''myenv.c''', and then compile it by issuing the command:
<pre>
$ gcc myenv.c -o myenv
</pre>
Now, there will be an executable program in your directory that, when run, will print the value of the <span style="color:green">EDITOR</span> environment variable, if any. This is what happens when I run it on my machine:
<pre>
$ ./myenv
The editor environment variable is set to (null)
</pre>
Hmmm... because the <span style="color:green">EDITOR</span> environment variable was not set to anything, the C program gets a null string. Let's try setting it to a specific value:
<pre>
$ EDITOR=xemacs
$ ./myenv
The editor environment variable is set to (null)
</pre>
While you might have expected myenv to print the value "xemacs", it didn't quite work, because we didn't export the EDITOR environment variable. This time, we'll get it working:
<pre>
$ export EDITOR
$ ./myenv
The editor environment variable is set to xemacs
</pre>
So, you have seen with your very own eyes that another process (in this case our example C program) cannot see the environment variable until it is exported. Incidentally, if you want, you can define and export an environment variable using one line, as follows:
<pre>
$ export EDITOR=xemacs
</pre>
It works identically to the two-line version. This would be a good time to show how to erase an environment variable by using <span style="color:green">unset</span>:
<pre>
$ unset EDITOR
$ ./myenv
The editor environment variable is set to (null)
</pre>


This way, Portage can have elegant overlay support inherent in the Exheres "global repository of user/group data" design, while still having an extensible mechanism to define users and groups using cascading profiles. In my opinion, this is the best of both worlds.
=== Chopping strings overview ===
Chopping strings -- that is, splitting an original string into smaller, separate chunk(s) -- is one of those tasks that is performed daily by your average shell script. Many times, shell scripts need to take a fully-qualified path, and find the terminating file or directory. While it's possible (and fun!) to code this in bash, the standard <span style="color:green">basename</span> UNIX executable performs this extremely well:
<pre>
$ basename /usr/local/share/doc/foo/foo.txt
foo.txt
$ basename /usr/home/drobbins
drobbins
</pre>
<span style="color:green">basename</span> is quite a handy tool for chopping up strings. It's companion, called <span style="color:green">dirname</span>, returns the "other" part of the path that <span style="color:green">basename</span> throws away:
<pre>
$ dirname /usr/local/share/doc/foo/foo.txt
/usr/local/share/doc/foo
$ dirname /usr/home/drobbins/
/usr/home
</pre>
{{fancynote|Both dirname and basename do not look at any files or directories on disk; they are purely string manipulation commands.}}


=== Account Resolution ===
=== Command substitution ===
One very handy thing to know is how to create an environment variable that contains the result of an executable command. This is very easy to do:
<pre>
$ MYDIR=$(dirname /usr/local/share/doc/foo/foo.txt)
$ echo $MYDIR
/usr/local/share/doc/foo
</pre>
What we did above is called ''command substitution''. Several things are worth noticing in this example. On the first line, we simply enclosed the command we wanted to execute with ''$( )''.


See the following pseudo-code for how resolution of cascading profiles and overlays should work together to resolve user settings. One important thing to note is that user and group resolution cascades through the profiles to create a master list of users, groups and defaults. This master list is extended by any overlays that are active. Then, when user or group data is requested, the resolved user, group and defaults lists are used to generate the resultant data.
Note that it is also possible to do the same thing using backquotes, the keyboard key that normally sits above the Tab key:
<pre>
$ MYDIR=`dirname /usr/local/share/doc/foo/foo.txt`
$ echo $MYDIR
/usr/local/share/doc/foo
</pre>
As you can see, bash provides multiple ways to perform exactly the same thing. Using command substitution, we can place any command or pipeline of commands in between ''` `'' or ''$( )'' and assign it to an environment variable. Handy stuff! Here's an example of how to use a pipeline with command substitution:


'''Users pseudo-code, with Groups being implemented identically:'''
<pre>
$ MYFILES=$(ls /etc | grep pa)
$ echo $MYFILES
pam.d passwd
</pre>


It's also worth pointing out that ''$( )'' is generally preferred over ''` `'' in shell scripts because it is more universally supported across different shells, is easier to type and read, and is less complicated to use in a nested form, as follows:
<pre>
<pre>
class Profile:
$ MYFILES=$(ls $(dirname foo/bar/oni))
</pre>


  def __init__(self,path):
=== Chopping strings like a pro ===
    self.path = path
While <span style="color:green">basename</span> and <span style="color:green">dirname</span> are great tools, there are times where we may need to perform more advanced string "chopping" operations than just standard pathname manipulations. When we need more punch, we can take advantage of bash's advanced built-in variable expansion functionality. We've already used the standard kind of variable expansion, which looks like this: ${MYVAR}. But bash can also perform some handy string chopping on its own. Take a look at these examples:
    self._processed_user_defaults = False
<pre>
    self._required_user_fields = []
$ MYVAR=foodforthought.jpg
    self._alternate_user_fields = {}
$ echo ${MYVAR##*fo}
    self.parents = []
rthought.jpg
    # sample code to recursively create Parent profiles:
$ echo ${MYVAR#*fo}
    if os.path.exists("%s/parents" % self.path):
odforthought.jpg
      a=open("%s/parents" % self.path,"r")
</pre>
       for line in a:
In the first example, we typed ${MYVAR##*fo}. What exactly does this mean? Basically, inside the ''${ }'', we typed the name of the environment variable, two ##s, and a wildcard ("*fo"). Then, bash took <span style="color:green">MYVAR</span>, found the longest substring from the beginning of the string "foodforthought.jpg" that matched the wildcard "*fo", and chopped it off the beginning of the string. That's a bit hard to grasp at first, so to get a feel for how this special "##" option works, let's step through how bash completed this expansion. First, it began searching for substrings at the beginning of "foodforthought.jpg" that matched the "*fo" wildcard. Here are the substrings that it checked:
        self.parents.append(Profile(self.resolve_path(line)))
<pre>
      a.close()
f        
fo              MATCHES *fo
foo   
food
foodf         
foodfo          MATCHES *fo
foodfor
foodfort       
foodforth
foodfortho     
foodforthou
foodforthoug
foodforthought
foodforthought.j
foodforthought.jp
foodforthought.jpg
</pre>
After searching the string for matches, you can see that bash found two. It selects the longest match, removes it from the beginning of the original string, and returns the result.


  @property
The second form of variable expansion shown above appears identical to the first, except it uses only one "#" -- and bash performs an almost identical process. It checks the same set of substrings as our first example did, except that bash removes the shortest match from our original string, and returns the result. So, as soon as it checks the "fo" substring, it removes "fo" from our string and returns "odforthought.jpg".
  def users(self):
    """ returns a dictionary mapping user names to the files on disk defining each user (cascading) """
    users = {}
    for parent in self.parents:
      users.update(self.parent.users)
    for userfile in glob.glob("accounts/users/*"):
      users[os.path.dirname(userfile)] = os.path.abspath(userfile)
    for overlay in self.overlays:
      users.update(self.overlay.users)
    return users


  def userData(self,user):
This may seem extremely cryptic, so I'll show you an easy way to remember this functionality. When searching for the longest match, use ## (because ## is longer than #). When searching for the shortest match, use #. See, not that hard to remember at all! Wait, how do you remember that we are supposed to use the '#' character to remove from the *beginning* of a string? Simple! You will notice that on a US keyboard, shift-4 is "$", which is the bash variable expansion character. On the keyboard, immediately to the left of "$" is "#". So, you can see that "#" is "at the beginning" of "$", and thus (according to our mnemonic), "#" removes characters from the beginning of the string. You may wonder how we remove characters from the end of the string. If you guessed that we use the character immediately to the right of "$" on the US keyboard ("%"), you're right! Here are some quick examples of how to chop off trailing portions of strings:
    """ returns a dictionary of key/value pairs defining the variables for specified user. Note:
<pre>
        * alternative key names are mapped to primary key names
$ MYFOO="chickensoup.tar.gz"
        * an exception is thrown if required fields are missing
$ echo ${MYFOO%%.*}
    """
chickensoup
    out = {}
$ echo ${MYFOO%.*}
    if user in self.users:
chickensoup.tar
      user_data = grabFile(self.users[user])
</pre>
      out = grabFile(self.defaults["user"])
As you can see, the % and %% variable expansion options work identically to # and ##, except they remove the matching wildcard from the end of the string. Note that you don't have to use the "*" character if you wish to remove a specific substring from the end:
      required = []
<pre>
      alternatives = {}
MYFOOD="chickensoup"
      if "required" in out:
$ echo ${MYFOOD%%soup}
        for req_key in out["required"].split(','):
chicken
          alts = req_key.split('|')
</pre>
          required.append(alts[0])
In this example, it doesn't matter whether we use "%%" or "%", since only one match is possible. And remember, if you forget whether to use "#" or "%", look at the 3, 4, and 5 keys on your keyboard and figure it out.
          if len(alts) > 1:
          for alt_key in alts[1:]:
            alternatives[alt_key] = alts[0] 
      if "parent" in user_data:
        # note, this next line requires a grabFile() implementation that supports alternatives, and
        # will use this dict to map any alternative names to the primary name in the return data:
        out.update(grabFile(self.defaults[user_data["parent"]],alternatives=alternatives))
        out.update(user_data,alternatives=alternatives)
    for req_key in required:
      if not req_key in out:
        raise RequiredKeyError(user,req_key)
    return out
 
  @property
  def defaults(self):
    """ returns a dictionary mapping defaults names to the files on disk defining each default (cascading) """
    defaults = {}
    for parent in self.parents:
      defaults.update(self.parent.defaults)
    for defaultsfile in glob.glob("accounts/defaults/*"):
      defaults[os.path.dirname(defaultsfile)] = os.path.abspath(defaultsfile)
    for overlay in self.overlays:
      defaults.update(overlay.user_defaults)
    return defaults
 
profile = Profile("/etc/make.profile")
my_user = profile.userData("nginx")
print my_user["desc"]


We can use another form of variable expansion to select a specific substring, based on a specific character offset and length. Try typing in the following lines under bash:
<pre>
$ EXCLAIM=cowabunga
$ echo ${EXCLAIM:0:3}
cow
$ echo ${EXCLAIM:3:7}
abunga
</pre>
</pre>
This form of string chopping can come in quite handy; simply specify the character to start from and the length of the substring, all separated by colons.


=== User and Group Data Format ===
=== Applying string chopping ===
 
Now that we've learned all about chopping strings, let's write a simple little shell script. Our script will accept a single file as an argument, and will print out whether it appears to be a tarball. To determine if it is a tarball, it will look for the pattern ".tar" at the end of the file. Here it is:
==== Users ====
<syntaxhighlight lang="bash">
 
#!/bin/bash
In a given profile directory, <tt>accounts/users/'''myuser'''</tt> will define settings for a user with the name of <tt>myuser</tt>. The file format used to define users is very similar to and compatible with Exheres, using standard <tt>make.conf</tt>-style key=value syntax, with quoting required for values with whitespace. The following field names are suggested to be used for the initial users implementation. Note that this file format is extensible -- Portage must not complain about any additional fields in the users, groups or defaults files that are not specified above. This allows these formats to be easily extended for alternate operating systems or other distributions without requiring patches to Portage.
 
{| {{table}}
! Name
! Alternate Name
! Description
! Example
! Notes
|-
|<tt>shell</tt>
|N/A
|login shell
|<tt>/bin/bash</tt>
|
|-
|<tt>home</tt>
|N/A
|home directory
|<tt>/dev/null</tt>
|
|-
|<tt>group</tt>
|<tt>primary_group</tt>
|primary group
|<tt>wheel</tt>
|
|-
|<tt>extra_groups</tt>
|N/A
|other group memberships
|<tt>"audio,cdrom"</tt>
|''comma-delimited list''
|-
|<tt>uid</tt>
|<tt>preferred_uid</tt>
|preferred user ID (not guaranteed)
|<tt>37</tt>
|Will be bound by <tt>SYS_UID_MIN</tt> and <tt>SYS_UID_MAX</tt> defined in <tt>/etc/login.defs</tt>?
|-
|<tt>desc</tt>
|<tt>gecos</tt>
|Description/GECOS field
|<tt>"An account for fun"</tt>
|
|-
|<tt>parent</tt>
|N/A
|parent default file
|<tt>user-server</tt>
|
|}
 
Example file <tt>accounts/users/foo</tt>:


if [ "${1##*.}" = "tar" ]
then
      echo This appears to be a tarball.
else
      echo At first glance, this does not appear to be a tarball.
fi
</syntaxhighlight>
To run this script, enter it into a file called '''mytar.sh''', and type <span style="color:green">chmod 755 mytar.sh</span> to make it executable. Then, give it a try on a tarball, as follows:
<pre>
<pre>
shell=/bin/bash
$ ./mytar.sh thisfile.tar
home=/dev/null
This appears to be a tarball.
group=foo
$ ./mytar.sh thatfile.gz
extra_groups="foo bar oni"
At first glance, this does not appear to be a tarball.
uid=37
desc="The cool account"
</pre>
</pre>
OK, it works, but it's not very functional. Before we make it more useful, let's take a look at the "if" statement used above. In it, we have a boolean expression. In bash, the "=" comparison operator checks for string equality. In bash, all boolean expressions are enclosed in square brackets. But what does the boolean expression actually test for? Let's take a look at the left side. According to what we've learned about string chopping, "${1##*.}" will remove the longest match of "*." from the beginning of the string contained in the environment variable "1", returning the result. This will cause everything after the last "." in the file to be returned. Obviously, if the file ends in ".tar", we will get "tar" as a result, and the condition will be true.


==== Groups ====
You may be wondering what the "1" environment variable is in the first place. Very simple -- $1 is the first command-line argument to the script, $2 is the second, etc. OK, now that we've reviewed the function, we can take our first look at "if" statements.
 
* <tt>accounts/groups/'''mygroup'''</tt> will define settings a group with the name of <tt>mygroup</tt>.
 
==== Defaults ====
 
The UID/GID management framework supports the ability to explicitly define default values for all users and groups, or a subset of users and groups. In addition, these default values can be overridden by child profiles. This functionality allows default values to be overridden, and also provides a mechanism for profiles to specify which fields are required for that profile. This allows alternate platforms to have different required values, and also allows different Gentoo-based distributions to have different policies regarding required fields. This allows policy to be defined per distribution rather than being hard-coded into Portage itself.
 
Defaults can be defined inside the <tt>accounts/defaults</tt> directory inside each profile directory. The file <tt>accounts/defaults/user</tt>, if it exists, will be used to define any default settings for user accounts. The file <tt>accounts/defaults/group</tt>, if it exists, will be used to define any default settings for group accounts. These files are typically defined ''in one location'' for an entire set of cascading profiles, such as <tt>profiles/base</tt>.
 
Defaults files consist of key=value pairs, identical to user and group files. Note that the <tt>parent</tt> keyword is not valid in defaults files. A new keyword <tt>required</tt> specifies the required fields for any child users or groups, and may only be specified in the master defaults file 'user' or 'group':
 
{| {{table}}
! Name
! Description
! Example
! Required
! Default
! Notes
|-
|<tt>required</tt>
|Required fields
|<tt>"shell,home,desc<nowiki>|</nowiki>gecos"</tt>
|No
|''None''
|''comma-delimited list'', with "<tt><nowiki>|</nowiki></tt>" used to specify alternate names
|}
 
==== Alternate Defaults ====


In addition, other files in <tt>defaults</tt> can be created, and these files may be used to specify alternate default settings for users and groups, which can be overridden by child profiles. For example, an <tt>accounts/users/foo</tt> file that contains a <tt>parent=user-server</tt> would use the file <tt>accounts/defaults/user-server</tt> for its inherited default settings. The suggested convention for <tt>defaults</tt> values is to prefix user defaults with "<tt>user-</tt>" and group defaults with "<tt>group-</tt>", but this convention must not be enforced by Portage.
=== If statements ===
Like most languages, bash has its own form of conditional. When using them, stick to the format above; that is, keep the "if" and the "then" on separate lines, and keep the "else" and the terminating and required "fi" in horizontal alignment with them. This makes the code easier to read and debug. In addition to the "if,else" form, there are several other forms of "if" statements:
<syntaxhighlight lang="bash">
if      [ condition ]
then
        action
fi
</syntaxhighlight>
This one performs an action only if condition is true, otherwise it performs no action and continues executing any lines following the "fi".
<syntaxhighlight lang="bash">
if [ condition ]
then
        action
elif [ condition2 ]
then
        action2
.
.
.
elif [ condition3 ]
then


Any defaults files can be overridden by child profiles, which will result in the respective default settings changing for all users and groups that use those defaults.
else
        actionx
fi
</syntaxhighlight>
The above "elif" form will consecutively test each condition and execute the action corresponding to the first true condition. If none of the conditions are true, it will execute the "else" action, if one is present, and then continue executing lines following the entire "if,elif,else" statement.


==== Defaults Parsing Rules ====
=== Next time ===
Now that we've covered the most basic bash functionality, it's time to pick up the pace and get ready to write some real scripts. In the next article, I'll cover looping constructs, functions, namespace, and other essential topics. Then, we'll be ready to write some more complicated scripts. In the third article, we'll focus almost exclusively on very complex scripts and functions, as well as several bash script design options. See you then!


Note that all alternate defaults files (such as <tt>user-server</tt>) always inherit (and optionally override) the global defaults defined in <tt>user</tt> and <tt>group</tt>. This means that a <tt>required</tt> setting defined in <tt>user</tt> will be inherited by <tt>user-server</tt> automatically. This allows the <tt>required</tt> field for users to be set globally in <tt>user</tt>, and makes it possible to override it easily, by simply providing a new <tt>user</tt> file in a child profile.
== Resources ==
* Read [[Bash by Example, Part 2]]
* Read [[Bash by Example, Part 3]]
* Visit [http://www.gnu.org/software/bash/bash.html GNU's bash home page]


* A default setting defined in <tt>user</tt> or <tt>group</tt> can be ''unset'' by setting it to a value of <tt>""</tt>.
__NOTOC__
* Non-required fields that have not been explicitly defined have a default value of <tt>""</tt> (the empty string).
[[Category:Linux Core Concepts]]
* Required fields that are unset or have a value of <tt>""</tt> should not be allowed and should be flagged as invalid by Portage.
[[Category:Articles]]
 
=== User and Group Creation ===
 
The commands actually used by Portage to create users and groups need to be able to be customizable, as they vary by operating system.
 
Here are some possible mechanisms to implement this functionality, listed in order of personal preference:
 
# Add a <tt>plugins</tt> directory to profiles and create <tt>user-add</tt> and <tt>group-add</tt> scripts within these directories. This allows the <tt>user-add</tt> and <tt>group-add</tt> scripts to be different between MacOS X and Linux, for example, while allowing common platforms to re-use existing scripts. Users could override the user-creation behavior by creating <tt>/etc/portage/plugins/user-add</tt> script.
# Add <tt>virtual/user-manager</tt> to every system profile which would install <tt>user-add</tt> and <tt>group-add</tt> commands to a Portage plug-in directory. These commands would be used for creating all users and groups on the system, would have a defined command-line API, and could vary based on OS by tweaking the virtual in the system profile.
# Add internal logic to Portage for adding groups and users to various operating systems. I think this solution would be sub-optimal as it is less "tweakable". User and group creation is something that can be useful to tweak in various circumstances, especially by power users.
 
=== Migration ===
 
What remains to be defined is how to transition from <tt>enewgroup</tt> and <tt>enewuser</tt> that are currently being called from <tt>pkg_setup</tt>. The new implementation should be backwards-compatible with the old system to ease transition.
 
Options:
 
# call <tt>pkg_setup</tt> during dependency generation and use <tt>enewgroup</tt> and <tt>enewuser</tt> wrappers to inject dependency info into the metadata, and emit a deprecation warning. Pass only the user/group name to the new system, which would provide its own UID/GID info. This may not be feasible.
# brute-force - grep the ebuild for legacy commands during metadata generation. Integrate new-style dependencies into metadata. This is possibly the least elegant solution but may be the simplest approach.
# fallback - tweak the legacy commands to call the new framework. This means that older ebuilds would not be able to have their users and groups created at the same time as new-style ebuilds (dependency fulfillment time.) However, this may be the most elegant solution and also the least hackish.
 
The last option seems best.
 
=== Architecture ===
 
Here are the various architectural layers of the implementation:
 
# Portage internals to handle "user/" and "group/" as special words. Would be treated almost identically to ebuilds up until actual merge time. Version specifiers, as well as USE flags, would not be allowed.
# Python-based code to parse user and group data in the profiles, and determine proper UID/GID to use on the system. This is the parsing and policy framework, and can be controlled by variables defined in <tt>make.conf</tt>/<tt>make.defaults</tt>. This would all be written in Python and integrated into the Portage core.
## "Core" Portage trees would use cascading profiles to define users and groups. This would allow variations based on architecture (Portage on MacOS X vs. Linux, for example.)
## Overlays would use <tt>OVERLAY_DIR/profiles/users</tt> and <tt>OVERLAY_DIR/profiles/groups</tt> to define user and group information required for the overlay. This way, overlays could extend users and groups.
# Python-based backwards-compatibility code (implementation to be determined)
# Profile-based plugin architecture, again python-based.
# <tt>user-add</tt> and <tt>group-add</tt> scripts, implemented as stand-alone executables (likely written as a shell script.) This is the only part not in python and these scripts do not do any kind of high-level policy decisions. They simply create the user or group and report success or failure.
 
=== Possible Changes and Unresolved Issues ===
 
==== Disable User/Group Creation ====
 
<tt>FEATURES="-auto-accounts"</tt> (<tt>auto-accounts</tt> would be enabled by default)
 
This is a change from GLEP 27 to get rid of ugly "no" prefix and to follow naming conventions for existing <tt>FEATURES</tt> settings.
 
With <tt>auto-accounts</tt> disabled, Portage will do an initial check using libc (respecting <tt>/etc/nsswitch.conf</tt>) to see if all depended-upon users and groups exist. If they exist, the user/group dependency will be satisfied and <tt>ebuild</tt> can continue. If the dependencies are not satisfied, then the ebuild will abort with unsatisfied dependencies and display the users and groups that need to be created, and what their associated settings should be.
 
==== Allow User/Group Names to Be Specified At Build Time ====
 
Some users may want an <tt>nginx</tt> user, while others may want a generic <tt>www</tt> user to be used.
 
TBD.
 
==== Not Elegant for Specific Users/Groups ====
 
This implementation looks cool but is potentially annoying for specific users and groups. For example, for an <tt>nginx</tt> ebuild that needs an <tt>nginx</tt> user, it would need to be added to the system profile. We probably need to implement ebuild-local user/groups as well.
 
==== Specify Required Users and Groups for Profile ====
 
Some users and groups '''must''' be part of the system and should be in the system set. It would be nice to move some of this out of baselayout and into the profiles directly. Maybe a good solution is to have <tt>baselayout</tt> <tt>RDEPEND</tt> on these users and groups.
 
TBD.
 
==== Dependency Prefix ====
 
One possible area of improvement is with the <tt>user/</tt> and <tt>group/</tt> syntax itself, which could be changed slightly to indicate that we are depending on something other than a package. But this is not absolutely necessary and "user" and "group" could be treated as reserved names that cannot be used for categories, since they have a special meaning.
 
==== .tbz2 support ====
 
In general, the design proposed above  will work well for binary packages, as long as the users and groups required by the <tt>.tbz2</tt> can be found in the local Portage tree and overlays. If not, then Portage will not have any metadata relating to the user(s) or group(s) that need to be created for the <tt>.tbz2</tt> and will not be able to create them, resulting in an install failure, which of course is not optimal.
 
Therefore, it may be necessary to embed user and group metadata within the <tt>.tbz2</tt> and have Portage use this data only if local user/group metadata for the requested users and groups is not available. In addition, this user/group metadata may need to be cached persistently inside <tt>/var/db/pkg</tt> or another location to ensure that it is continually available to the Portage UID/GID code. This could add a bit more complexity to the implementation but should solve the <tt>.tbz2</tt> failure problem. This would create three layers of user/group data:
 
# Core user/group metadata defined in <tt>/usr/portage</tt>.
# Overlay user/group metadata defined in <tt>OVERLAY_DIR/profiles/{users,groups}</tt>
# Package user/group metadata
 
Using pseudo-code, we could imagine resolution of user and group metadata at <tt>.tbz2</tt> install time to look like this:
 
<pre>
all_ug_metadata = profile_ug_metadata + overlay_ug_metadata
if (user_or_group in (all_ug_metadata)):
    return all_ug_metadata[user_or_group]
else:
    return binary_package_ug_metadata[user_or_group]
</pre>
==== Compatibility with other distributions ====
If our goal is to ensure a sane method of creating UID/GID's in packages, we should also look at making them compatible with the wider world.  The LSB http://refspecs.freestandards.org/LSB_3.0.0/LSB-Core-generic/LSB-Core-generic/usernames.html specifies very lax standards for system accounts.  Seemingly there are no hard standards for system/daemon UID/GID's, and no real desire in the community from anyone I discussed this issue with to standardize.  There is one important issue to note, and that is the lowest user account number.
* Fedora/RHEL:  Presently RHEL starts assigning UID/GID's to users of the system at 500 and moves up, this will changehttp://lists.fedoraproject.org/pipermail/devel/2011-May/151663.html to number after 1000
* Debian/Ubuntu: Presently Debian starts assigning UID/GID's to users of the system at 1000, and moves up.  This appears to be the standard distributions are moving towards
* Gentoo/Funtoo: Presently Funtoo and Gentoo are both compliant with Debian, and after Fedora 16, and the subsequent RHEL, this will be a standard across most major linux distributions.

Revision as of 06:27, December 25, 2011

Fundamental programming in the Bourne again shell (bash)

Introduction

You might wonder why you ought to learn Bash programming. Well, here are a couple of compelling reasons:

You're already running it

If you check, you'll probably find that you are running bash right now. Even if you changed your default shell, bash is probably still running somewhere on your system, because it's the standard Linux shell and is used for a variety of purposes. Because bash is already running, any additional bash scripts that you run are inherently memory-efficient because they share memory with any already-running bash processes. Why load a 500K interpreter if you already are running something that will do the job, and do it well?

You're already using it

Not only are you already running bash, but you're actually interacting with bash on a daily basis. It's always there, so it makes sense to learn how to use it to its fullest potential. Doing so will make your bash experience more fun and productive. But why should you learn bash programming? Easy, because you already think in terms of running commands, CPing files, and piping and redirecting output. Shouldn't you learn a language that allows you to use and build upon these powerful time-saving constructs you already know how to use? Command shells unlock the potential of a UNIX system, and bash is the Linux shell. It's the high-level glue between you and the machine. Grow in your knowledge of bash, and you'll automatically increase your productivity under Linux and UNIX -- it's that simple.

Bash confusion

Learning bash the wrong way can be a very confusing process. Many newbies type man bash to view the bash man page, only to be confronted with a very terse and technical description of shell functionality. Others type info bash (to view the GNU info documentation), causing either the man page to be redisplayed, or (if they are lucky) only slightly more friendly info documentation to appear.

While this may be somewhat disappointing to novices, the standard bash documentation can't be all things to all people, and caters towards those already familiar with shell programming in general. There's definitely a lot of excellent technical information in the man page, but its helpfulness to beginners is limited.

That's where this series comes in. In it, I'll show you how to actually use bash programming constructs, so that you will be able to write your own scripts. Instead of technical descriptions, I'll provide you with explanations in plain English, so that you will know not only what something does, but when you should actually use it. By the end of this three-part series, you'll be able to write your own intricate bash scripts, and be at the level where you can comfortably use bash and supplement your knowledge by reading (and understanding!) the standard bash documentation. Let's begin.

Environment variables

Under bash and almost all other shells, the user can define environment variables, which are stored internally as ASCII strings. One of the handiest things about environment variables is that they are a standard part of the UNIX process model. This means that environment variables not only are exclusive to shell scripts, but can be used by standard compiled programs as well. When we "export" an environment variable under bash, any subsequent program that we run can read our setting, whether it is a shell script or not. A good example is the vipw command, which normally allows root to edit the system password file. By setting the EDITOR environment variable to the name of your favorite text editor, you can configure vipw to use it instead of vi, a handy thing if you are used to xemacs and really dislike vi.

The standard way to define an environment variable under bash is:

$ myvar='This is my environment variable!'

The above command defined an environment variable called "myvar" and contains the string "This is my environment variable!". There are several things to notice above: first, there is no space on either side of the "=" sign; any space will result in an error (try it and see). The second thing to notice is that while we could have done away with the quotes if we were defining a single word, they are necessary when the value of the environment variable is more than a single word (contains spaces or tabs).

   Note

For extremely detailed information on how quotes should be used in bash, you may want to look at the "QUOTING" section in the bash man page. The existence of special character sequences that get "expanded" (replaced) with other values does complicate how strings are handled in bash. We will just cover the most often-used quoting functionality in this series.

Thirdly, while we can normally use double quotes instead of single quotes, doing so in the above example would have caused an error. Why? Because using single quotes disables a bash feature called expansion, where special characters and sequences of characters are replaced with values. For example, the "!" character is the history expansion character, which bash normally replaces with a previously-typed command. (We won't be covering history expansion in this series of articles, because it is not frequently used in bash programming. For more information on it, see the "HISTORY EXPANSION" section in the bash man page.) While this macro-like functionality can come in handy, right now we want a literal exclamation point at the end of our environment variable, rather than a macro.

Now, let's take a look at how one actually uses environment variables. Here's an example:

$ echo $myvar
This is my environment variable!

By preceding the name of our environment variable with a $, we can cause bash to replace it with the value of myvar. In bash terminology, this is called "variable expansion". But, what if we try the following:

$ echo foo$myvarbar
foo

We wanted this to echo "fooThis is my environment variable!bar", but it didn't work. What went wrong? In a nutshell, bash's variable expansion facility in got confused. It couldn't tell whether we wanted to expand the variable $m, $my, $myvar, $myvarbar, etc. How can we be more explicit and clearly tell bash what variable we are referring to? Try this:

$ echo foo${myvar}bar
fooThis is my environment variable!bar

As you can see, we can enclose the environment variable name in curly braces when it is not clearly separated from the surrounding text. While $myvar is faster to type and will work most of the time, ${myvar} can be parsed correctly in almost any situation. Other than that, they both do the same thing, and you will see both forms of variable expansion in the rest of this series. You'll want to remember to use the more explicit curly-brace form when your environment variable is not isolated from the surrounding text by whitespace (spaces or tabs).

Recall that we also mentioned that we can "export" variables. When we export an environment variable, it's automatically available in the environment of any subsequently-run script or executable. Shell scripts can "get to" the environment variable using that shell's built-in environment-variable support, while C programs can use the getenv() function call. Here's some example C code that you should type in and compile -- it'll allow us to understand environment variables from the perspective of C:

#include <stdio.h>
#include <stdlib.h>

int main(void) {
  char *myenvvar=getenv("EDITOR");
  printf("The editor environment variable is set to %s\n",myenvvar);
}

Save the above source into a file called myenv.c, and then compile it by issuing the command:

$ gcc myenv.c -o myenv

Now, there will be an executable program in your directory that, when run, will print the value of the EDITOR environment variable, if any. This is what happens when I run it on my machine:

$ ./myenv
The editor environment variable is set to (null)

Hmmm... because the EDITOR environment variable was not set to anything, the C program gets a null string. Let's try setting it to a specific value:

$ EDITOR=xemacs
$ ./myenv
The editor environment variable is set to (null)

While you might have expected myenv to print the value "xemacs", it didn't quite work, because we didn't export the EDITOR environment variable. This time, we'll get it working:

$ export EDITOR
$ ./myenv
The editor environment variable is set to xemacs

So, you have seen with your very own eyes that another process (in this case our example C program) cannot see the environment variable until it is exported. Incidentally, if you want, you can define and export an environment variable using one line, as follows:

$ export EDITOR=xemacs

It works identically to the two-line version. This would be a good time to show how to erase an environment variable by using unset:

$ unset EDITOR
$ ./myenv
The editor environment variable is set to (null)

Chopping strings overview

Chopping strings -- that is, splitting an original string into smaller, separate chunk(s) -- is one of those tasks that is performed daily by your average shell script. Many times, shell scripts need to take a fully-qualified path, and find the terminating file or directory. While it's possible (and fun!) to code this in bash, the standard basename UNIX executable performs this extremely well:

$ basename /usr/local/share/doc/foo/foo.txt
foo.txt
$ basename /usr/home/drobbins
drobbins

basename is quite a handy tool for chopping up strings. It's companion, called dirname, returns the "other" part of the path that basename throws away:

$ dirname /usr/local/share/doc/foo/foo.txt
/usr/local/share/doc/foo
$ dirname /usr/home/drobbins/
/usr/home
   Note

Both dirname and basename do not look at any files or directories on disk; they are purely string manipulation commands.

Command substitution

One very handy thing to know is how to create an environment variable that contains the result of an executable command. This is very easy to do:

$ MYDIR=$(dirname /usr/local/share/doc/foo/foo.txt)
$ echo $MYDIR
/usr/local/share/doc/foo

What we did above is called command substitution. Several things are worth noticing in this example. On the first line, we simply enclosed the command we wanted to execute with $( ).

Note that it is also possible to do the same thing using backquotes, the keyboard key that normally sits above the Tab key:

$ MYDIR=`dirname /usr/local/share/doc/foo/foo.txt`
$ echo $MYDIR
/usr/local/share/doc/foo

As you can see, bash provides multiple ways to perform exactly the same thing. Using command substitution, we can place any command or pipeline of commands in between ` ` or $( ) and assign it to an environment variable. Handy stuff! Here's an example of how to use a pipeline with command substitution:

$ MYFILES=$(ls /etc | grep pa)
$ echo $MYFILES
pam.d passwd

It's also worth pointing out that $( ) is generally preferred over ` ` in shell scripts because it is more universally supported across different shells, is easier to type and read, and is less complicated to use in a nested form, as follows:

$ MYFILES=$(ls $(dirname foo/bar/oni))

Chopping strings like a pro

While basename and dirname are great tools, there are times where we may need to perform more advanced string "chopping" operations than just standard pathname manipulations. When we need more punch, we can take advantage of bash's advanced built-in variable expansion functionality. We've already used the standard kind of variable expansion, which looks like this: ${MYVAR}. But bash can also perform some handy string chopping on its own. Take a look at these examples:

$ MYVAR=foodforthought.jpg
$ echo ${MYVAR##*fo}
rthought.jpg
$ echo ${MYVAR#*fo}
odforthought.jpg

In the first example, we typed ${MYVAR##*fo}. What exactly does this mean? Basically, inside the ${ }, we typed the name of the environment variable, two ##s, and a wildcard ("*fo"). Then, bash took MYVAR, found the longest substring from the beginning of the string "foodforthought.jpg" that matched the wildcard "*fo", and chopped it off the beginning of the string. That's a bit hard to grasp at first, so to get a feel for how this special "##" option works, let's step through how bash completed this expansion. First, it began searching for substrings at the beginning of "foodforthought.jpg" that matched the "*fo" wildcard. Here are the substrings that it checked:

f       
fo              MATCHES *fo
foo     
food
foodf           
foodfo          MATCHES *fo
foodfor
foodfort        
foodforth
foodfortho      
foodforthou
foodforthoug
foodforthought
foodforthought.j
foodforthought.jp
foodforthought.jpg

After searching the string for matches, you can see that bash found two. It selects the longest match, removes it from the beginning of the original string, and returns the result.

The second form of variable expansion shown above appears identical to the first, except it uses only one "#" -- and bash performs an almost identical process. It checks the same set of substrings as our first example did, except that bash removes the shortest match from our original string, and returns the result. So, as soon as it checks the "fo" substring, it removes "fo" from our string and returns "odforthought.jpg".

This may seem extremely cryptic, so I'll show you an easy way to remember this functionality. When searching for the longest match, use ## (because ## is longer than #). When searching for the shortest match, use #. See, not that hard to remember at all! Wait, how do you remember that we are supposed to use the '#' character to remove from the *beginning* of a string? Simple! You will notice that on a US keyboard, shift-4 is "$", which is the bash variable expansion character. On the keyboard, immediately to the left of "$" is "#". So, you can see that "#" is "at the beginning" of "$", and thus (according to our mnemonic), "#" removes characters from the beginning of the string. You may wonder how we remove characters from the end of the string. If you guessed that we use the character immediately to the right of "$" on the US keyboard ("%"), you're right! Here are some quick examples of how to chop off trailing portions of strings:

$ MYFOO="chickensoup.tar.gz"
$ echo ${MYFOO%%.*}
chickensoup
$ echo ${MYFOO%.*}
chickensoup.tar

As you can see, the % and %% variable expansion options work identically to # and ##, except they remove the matching wildcard from the end of the string. Note that you don't have to use the "*" character if you wish to remove a specific substring from the end:

MYFOOD="chickensoup"
$ echo ${MYFOOD%%soup}
chicken

In this example, it doesn't matter whether we use "%%" or "%", since only one match is possible. And remember, if you forget whether to use "#" or "%", look at the 3, 4, and 5 keys on your keyboard and figure it out.

We can use another form of variable expansion to select a specific substring, based on a specific character offset and length. Try typing in the following lines under bash:

$ EXCLAIM=cowabunga
$ echo ${EXCLAIM:0:3}
cow
$ echo ${EXCLAIM:3:7}
abunga

This form of string chopping can come in quite handy; simply specify the character to start from and the length of the substring, all separated by colons.

Applying string chopping

Now that we've learned all about chopping strings, let's write a simple little shell script. Our script will accept a single file as an argument, and will print out whether it appears to be a tarball. To determine if it is a tarball, it will look for the pattern ".tar" at the end of the file. Here it is:

#!/bin/bash

if [ "${1##*.}" = "tar" ]
then
       echo This appears to be a tarball.
else
       echo At first glance, this does not appear to be a tarball.
fi

To run this script, enter it into a file called mytar.sh, and type chmod 755 mytar.sh to make it executable. Then, give it a try on a tarball, as follows:

$ ./mytar.sh thisfile.tar
This appears to be a tarball.
$ ./mytar.sh thatfile.gz
At first glance, this does not appear to be a tarball.

OK, it works, but it's not very functional. Before we make it more useful, let's take a look at the "if" statement used above. In it, we have a boolean expression. In bash, the "=" comparison operator checks for string equality. In bash, all boolean expressions are enclosed in square brackets. But what does the boolean expression actually test for? Let's take a look at the left side. According to what we've learned about string chopping, "${1##*.}" will remove the longest match of "*." from the beginning of the string contained in the environment variable "1", returning the result. This will cause everything after the last "." in the file to be returned. Obviously, if the file ends in ".tar", we will get "tar" as a result, and the condition will be true.

You may be wondering what the "1" environment variable is in the first place. Very simple -- $1 is the first command-line argument to the script, $2 is the second, etc. OK, now that we've reviewed the function, we can take our first look at "if" statements.

If statements

Like most languages, bash has its own form of conditional. When using them, stick to the format above; that is, keep the "if" and the "then" on separate lines, and keep the "else" and the terminating and required "fi" in horizontal alignment with them. This makes the code easier to read and debug. In addition to the "if,else" form, there are several other forms of "if" statements:

if      [ condition ]
then
        action
fi

This one performs an action only if condition is true, otherwise it performs no action and continues executing any lines following the "fi".

if [ condition ]
then 
        action
elif [ condition2 ]
then
        action2
.
.
.
elif [ condition3 ]
then

else
        actionx
fi

The above "elif" form will consecutively test each condition and execute the action corresponding to the first true condition. If none of the conditions are true, it will execute the "else" action, if one is present, and then continue executing lines following the entire "if,elif,else" statement.

Next time

Now that we've covered the most basic bash functionality, it's time to pick up the pace and get ready to write some real scripts. In the next article, I'll cover looping constructs, functions, namespace, and other essential topics. Then, we'll be ready to write some more complicated scripts. In the third article, we'll focus almost exclusively on very complex scripts and functions, as well as several bash script design options. See you then!

Resources