Difference between pages "Awk by Example, Part 3" and "Talk:User and Group Management"

From Funtoo
(Difference between pages)
Jump to navigation Jump to search
m (Reverted edits by 37.59.80.67 (talk) to last revision by Drobbins)
 
 
Line 1: Line 1:
{{WikiArticle}}
{{fancynote|the Discussion page contains a much more ambitious proposal that I decided was too complex to tackle all at once. It is being split into bite-sized portions (Phases) on the main page.}}


== String functions and ... checkbooks? ==
== User and Group Dependencies ==


=== Formatting output ===
[http://www.exherbo.org/docs/exheres-for-smarties.html#repository_metadata Exheres] defines a dependency-based mechanism for ebuilds to specify their user and group dependencies, which is an appropriate mechanism for specifying dependencies. The specific syntax used is <tt>user/foo</tt> to specify a dependency on user <tt>foo</tt>, and <tt>group/bar</tt> to specify a dependency on group <tt>bar</tt> existing. Dependencies can be build-time or run-time, as required.
While awk's print statement does do the job most of the time, sometimes more is needed. For those times, awk offers two good old friends called printf() and sprintf(). Yes, these functions, like so many other awk parts, are identical to their C counterparts. printf() will print a formatted string to stdout, while sprintf() returns a formatted string that can be assigned to a variable. If you're not familiar with printf() and sprintf(), an introductory C text will quickly get you up to speed on these two essential printing functions. You can view the printf() man page by typing "man 3 printf" on your Linux system.
 
Using dependencies for this purpose allows Portage to create these users and groups at exactly the right time -- prior to build or prior to install, as necessary, and will work just fine with binary packages, with some potential caveats (noted later in the document.) It also allows user and group creation to be affected by <tt>USE</tt> variable settings.
 
This document suggests following the Exheres syntax:


Here's some sample awk sprintf() and printf() code. As you can see in the following script, everything looks almost identical to C.
<pre>
<pre>
#!/usr/bin/awk -f
DEPEND="user/lighttpd group/web-server"
BEGIN {
RDEPEND="user/lighttpd group/web-server"
x=1
b="foo"
printf("%s got a %d on the last test\n","Jim",83)
myout=sprintf("%s-%d",b,x)
print myout
}
</pre>
</pre>
This code will print:
 
All this tells Portage is that "This ebuild needs a <tt>lighttpd</tt> user and <tt>web-server</tt> group." But it does not tell Portage what UID it should be, nor does it provide other necessary settings for the user. This data is defined within the Portage tree, and the mechanism for defining this data is described below.
 
== Profile Settings ==
 
The user or group dependency will just tell Portage that this particular package requires a particular user or group, but any detailed information related to this user or group, such as suggested UID/GID, shell, etc, is stored in the Portage tree itself, and specifically in the Portage ''profile''. The mechanism for defining this information is described below:
 
=== Core Portage Trees ===
 
For "core" Portage trees (not overlays,) specific user and group settings are defined using Portage's ''cascading profile'' functionality. Portage would be enhanced to recognize <tt>accounts/users</tt> and <tt>accounts/groups</tt> directories inside profile directories. Users and groups would be defined in these directories, with one user or group per file, and the filename specifying the name of the user or group. Cascading functionality would be enabled so that the full set of user and group data could be a collection of all users and groups defined in parent profiles. This would provide a handy mechanism to share user and group definitions across different operating systems, while allowing for local variations when needed. It makes sense to leverage cascading profiles as much as possible.
 
=== Overlays ===
 
The approach described above does not work for overlays -- how are they to extend user and group settings automatically, as required by the ebuilds contained in the overlay?
 
The proposed solution is to allow overlays to add users and groups via the <tt>OVERLAY_DIR/profiles/accounts/groups</tt> and <tt>PORTDIR/profiles/accounts/users</tt> directories. These directories will ''always'' be searched for user and group data for all active overlays, and merged into the set defined by the profiles. This provides an automatic mechanism for overlays to inject user and group data that they require, without requiring any manual configuration on behalf of the Gentoo/Funtoo Linux user.
 
This way, Portage can have elegant overlay support inherent in the Exheres "global repository of user/group data" design, while still having an extensible mechanism to define users and groups using cascading profiles. In my opinion, this is the best of both worlds.
 
=== Account Resolution ===
 
See the following pseudo-code for how resolution of cascading profiles and overlays should work together to resolve user settings. One important thing to note is that user and group resolution cascades through the profiles to create a master list of users, groups and defaults. This master list is extended by any overlays that are active. Then, when user or group data is requested, the resolved user, group and defaults lists are used to generate the resultant data.
 
'''Users pseudo-code, with Groups being implemented identically:'''
 
<pre>
<pre>
Jim got a 83 on the last test
class Profile:
foo-1
 
  def __init__(self,path):
    self.path = path
    self._processed_user_defaults = False
    self._required_user_fields = []
    self._alternate_user_fields = {}
    self.parents = []
    # sample code to recursively create Parent profiles:
    if os.path.exists("%s/parents" % self.path):
      a=open("%s/parents" % self.path,"r")
      for line in a:
        self.parents.append(Profile(self.resolve_path(line)))
      a.close()
 
  @property
  def users(self):
    """ returns a dictionary mapping user names to the files on disk defining each user (cascading) """
    users = {}
    for parent in self.parents:
      users.update(self.parent.users)
    for userfile in glob.glob("accounts/users/*"):
      users[os.path.dirname(userfile)] = os.path.abspath(userfile)
    for overlay in self.overlays:
      users.update(self.overlay.users)
    return users
 
  def userData(self,user):
    """ returns a dictionary of key/value pairs defining the variables for specified user. Note:
        * alternative key names are mapped to primary key names
        * an exception is thrown if required fields are missing
    """
    out = {}
    if user in self.users:
      user_data = grabFile(self.users[user])
      out = grabFile(self.defaults["user"])
      required = []
      alternatives = {}
      if "required" in out:
        for req_key in out["required"].split(','):
          alts = req_key.split('|')
          required.append(alts[0])
          if len(alts) > 1:
          for alt_key in alts[1:]:
            alternatives[alt_key] = alts[0] 
      if "parent" in user_data:
        # note, this next line requires a grabFile() implementation that supports alternatives, and
        # will use this dict to map any alternative names to the primary name in the return data:
        out.update(grabFile(self.defaults[user_data["parent"]],alternatives=alternatives))
        out.update(user_data,alternatives=alternatives)
    for req_key in required:
      if not req_key in out:
        raise RequiredKeyError(user,req_key)
    return out
 
  @property
  def defaults(self):
    """ returns a dictionary mapping defaults names to the files on disk defining each default (cascading) """
    defaults = {}
    for parent in self.parents:
      defaults.update(self.parent.defaults)
    for defaultsfile in glob.glob("accounts/defaults/*"):
      defaults[os.path.dirname(defaultsfile)] = os.path.abspath(defaultsfile)
    for overlay in self.overlays:
      defaults.update(overlay.user_defaults)
    return defaults
 
profile = Profile("/etc/make.profile")
my_user = profile.userData("nginx")
print my_user["desc"]
 
</pre>
</pre>


=== String functions ===
=== User and Group Data Format ===
Awk has a plethora of string functions, and that's a good thing. In awk, you really need string functions, since you can't treat a string as an array of characters as you can in other languages like C, C++, and Python. For example, if you execute the following code:
 
==== Users ====
 
In a given profile directory, <tt>accounts/users/'''myuser'''</tt> will define settings for a user with the name of <tt>myuser</tt>. The file format used to define users is very similar to and compatible with Exheres, using standard <tt>make.conf</tt>-style key=value syntax, with quoting required for values with whitespace. The following field names are suggested to be used for the initial users implementation. Note that this file format is extensible -- Portage must not complain about any additional fields in the users, groups or defaults files that are not specified above. This allows these formats to be easily extended for alternate operating systems or other distributions without requiring patches to Portage.
 
{| {{table}}
! Name
! Alternate Name
! Description
! Example
! Notes
|-
|<tt>shell</tt>
|N/A
|login shell
|<tt>/bin/bash</tt>
|
|-
|<tt>home</tt>
|N/A
|home directory
|<tt>/dev/null</tt>
|
|-
|<tt>group</tt>
|<tt>primary_group</tt>
|primary group
|<tt>wheel</tt>
|
|-
|<tt>extra_groups</tt>
|N/A
|other group memberships
|<tt>"audio,cdrom"</tt>
|''comma-delimited list''
|-
|<tt>uid</tt>
|<tt>preferred_uid</tt>
|preferred user ID (not guaranteed)
|<tt>37</tt>
|Will be bound by <tt>SYS_UID_MIN</tt> and <tt>SYS_UID_MAX</tt> defined in <tt>/etc/login.defs</tt>?
|-
|<tt>desc</tt>
|<tt>gecos</tt>
|Description/GECOS field
|<tt>"An account for fun"</tt>
|
|-
|<tt>parent</tt>
|N/A
|parent default file
|<tt>user-server</tt>
|
|}
 
Example file <tt>accounts/users/foo</tt>:
 
<pre>
<pre>
mystring="How are you doing today?"
shell=/bin/bash
print mystring[3]
home=/dev/null
group=foo
extra_groups="foo bar oni"
uid=37
desc="The cool account"
</pre>
</pre>
You'll receive an error that looks something like this:
<pre>
awk: string.gawk:59: fatal: attempt to use scalar as array
</pre>
Oh, well. While not as convenient as Python's sequence types, awk's string functions get the job done. Let's take a look at them.


First, we have the basic length() function, which returns the length of a string. Here's how to use it:
==== Groups ====
<pre>
 
print length(mystring)
* <tt>accounts/groups/'''mygroup'''</tt> will define settings a group with the name of <tt>mygroup</tt>.
</pre>
 
This code will print the value:
==== Defaults ====
<pre>
 
24
The UID/GID management framework supports the ability to explicitly define default values for all users and groups, or a subset of users and groups. In addition, these default values can be overridden by child profiles. This functionality allows default values to be overridden, and also provides a mechanism for profiles to specify which fields are required for that profile. This allows alternate platforms to have different required values, and also allows different Gentoo-based distributions to have different policies regarding required fields. This allows policy to be defined per distribution rather than being hard-coded into Portage itself.
</pre>
 
OK, let's keep going. The next string function is called index, and will return the position of the occurrence of a substring in another string, or it will return 0 if the string isn't found. Using mystring, we can call it this way:
Defaults can be defined inside the <tt>accounts/defaults</tt> directory inside each profile directory. The file <tt>accounts/defaults/user</tt>, if it exists, will be used to define any default settings for user accounts. The file <tt>accounts/defaults/group</tt>, if it exists, will be used to define any default settings for group accounts. These files are typically defined ''in one location'' for an entire set of cascading profiles, such as <tt>profiles/base</tt>.  
<pre>
 
print index(mystring,"you")
Defaults files consist of key=value pairs, identical to user and group files. Note that the <tt>parent</tt> keyword is not valid in defaults files. A new keyword <tt>required</tt> specifies the required fields for any child users or groups, and may only be specified in the master defaults file 'user' or 'group':
</pre>
 
Awk prints:
{| {{table}}
<pre>
! Name
9
! Description
</pre>
! Example
We move on to two more easy functions, tolower() and toupper(). As you might guess, these functions will return the string with all characters converted to lowercase or uppercase respectively. Notice that tolower() and toupper() return the new string, and don't modify the original. This code:
! Required
<pre>
! Default
print tolower(mystring)
! Notes
print toupper(mystring)
|-
print mystring
|<tt>required</tt>
</pre>
|Required fields
....will produce this output:
|<tt>"shell,home,desc<nowiki>|</nowiki>gecos"</tt>
<pre>
|No
how are you doing today?
|''None''
HOW ARE YOU DOING TODAY?
|''comma-delimited list'', with "<tt><nowiki>|</nowiki></tt>" used to specify alternate names
How are you doing today?
|}
</pre>
 
So far so good, but how exactly do we select a substring or even a single character from a string? That's where substr() comes in. Here's how to call substr():
==== Alternate Defaults ====
<pre>
 
mysub=substr(mystring,startpos,maxlen)
In addition, other files in <tt>defaults</tt> can be created, and these files may be used to specify alternate default settings for users and groups, which can be overridden by child profiles. For example, an <tt>accounts/users/foo</tt> file that contains a <tt>parent=user-server</tt> would use the file <tt>accounts/defaults/user-server</tt> for its inherited default settings. The suggested convention for <tt>defaults</tt> values is to prefix user defaults with "<tt>user-</tt>" and group defaults with "<tt>group-</tt>", but this convention must not be enforced by Portage.
</pre>
 
mystring should be either a string variable or a literal string from which you'd like to extract a substring. startpos should be set to the starting character position, and maxlen should contain the maximum length of the string you'd like to extract. Notice that I said maximum length; if length(mystring) is shorter than startpos+maxlen, your result will be truncated. substr() won't modify the original string, but returns the substring instead. Here's an example:
Any defaults files can be overridden by child profiles, which will result in the respective default settings changing for all users and groups that use those defaults.
<pre>
 
print substr(mystring,9,3)
==== Defaults Parsing Rules ====
</pre>
 
Awk will print:
Note that all alternate defaults files (such as <tt>user-server</tt>) always inherit (and optionally override) the global defaults defined in <tt>user</tt> and <tt>group</tt>. This means that a <tt>required</tt> setting defined in <tt>user</tt> will be inherited by <tt>user-server</tt> automatically. This allows the <tt>required</tt> field for users to be set globally in <tt>user</tt>, and makes it possible to override it easily, by simply providing a new <tt>user</tt> file in a child profile.
<pre>
 
you
* A default setting defined in <tt>user</tt> or <tt>group</tt> can be ''unset'' by setting it to a value of <tt>""</tt>.
</pre>
* Non-required fields that have not been explicitly defined have a default value of <tt>""</tt> (the empty string).
If you regularly program in a language that uses array indices to access parts of a string (and who doesn't), make a mental note that substr() is your awk substitute. You'll need to use it to extract single characters and substrings; because awk is a string-based language, you'll be using it often.
* Required fields that are unset or have a value of <tt>""</tt> should not be allowed and should be flagged as invalid by Portage.
 
=== User and Group Creation ===
 
The commands actually used by Portage to create users and groups need to be able to be customizable, as they vary by operating system.
 
Here are some possible mechanisms to implement this functionality, listed in order of personal preference:
 
# Add a <tt>plugins</tt> directory to profiles and create <tt>user-add</tt> and <tt>group-add</tt> scripts within these directories. This allows the <tt>user-add</tt> and <tt>group-add</tt> scripts to be different between MacOS X and Linux, for example, while allowing common platforms to re-use existing scripts. Users could override the user-creation behavior by creating <tt>/etc/portage/plugins/user-add</tt> script.
# Add <tt>virtual/user-manager</tt> to every system profile which would install <tt>user-add</tt> and <tt>group-add</tt> commands to a Portage plug-in directory. These commands would be used for creating all users and groups on the system, would have a defined command-line API, and could vary based on OS by tweaking the virtual in the system profile.
# Add internal logic to Portage for adding groups and users to various operating systems. I think this solution would be sub-optimal as it is less "tweakable". User and group creation is something that can be useful to tweak in various circumstances, especially by power users.
 
=== Migration ===
 
What remains to be defined is how to transition from <tt>enewgroup</tt> and <tt>enewuser</tt> that are currently being called from <tt>pkg_setup</tt>. The new implementation should be backwards-compatible with the old system to ease transition.
 
Options:
 
# call <tt>pkg_setup</tt> during dependency generation and use <tt>enewgroup</tt> and <tt>enewuser</tt> wrappers to inject dependency info into the metadata, and emit a deprecation warning. Pass only the user/group name to the new system, which would provide its own UID/GID info. This may not be feasible.
# brute-force - grep the ebuild for legacy commands during metadata generation. Integrate new-style dependencies into metadata. This is possibly the least elegant solution but may be the simplest approach.
# fallback - tweak the legacy commands to call the new framework. This means that older ebuilds would not be able to have their users and groups created at the same time as new-style ebuilds (dependency fulfillment time.) However, this may be the most elegant solution and also the least hackish.
 
The last option seems best.
 
=== Architecture ===


Now, we move on to some meatier functions, the first of which is called match(). match() is a lot like index(), except instead of searching for a substring like index() does, it searches for a regular expression. The match() function will return the starting position of the match, or zero if no match is found. In addition, match() will set two variables called RSTART and RLENGTH. RSTART contains the return value (the location of the first match), and RLENGTH specifies its span in characters (or -1 if no match was found). Using RSTART, RLENGTH, substr(), and a small loop, you can easily iterate through every match in your string. Here's an example match() call:
Here are the various architectural layers of the implementation:
<pre>
print match(mystring,/you/), RSTART, RLENGTH
</pre>
Awk will print:
<pre>
9 9 3
</pre>


=== String substitution ===
# Portage internals to handle "user/" and "group/" as special words. Would be treated almost identically to ebuilds up until actual merge time. Version specifiers, as well as USE flags, would not be allowed.
Now, we're going to look at a couple of string substitution functions, sub() and gsub(). These guys differ slightly from the functions we've looked at so far in that they actually modify the original string. Here's a template that shows how to call sub():
# Python-based code to parse user and group data in the profiles, and determine proper UID/GID to use on the system. This is the parsing and policy framework, and can be controlled by variables defined in <tt>make.conf</tt>/<tt>make.defaults</tt>. This would all be written in Python and integrated into the Portage core.
<pre>
## "Core" Portage trees would use cascading profiles to define users and groups. This would allow variations based on architecture (Portage on MacOS X vs. Linux, for example.)
sub(regexp,replstring,mystring)
## Overlays would use <tt>OVERLAY_DIR/profiles/users</tt> and <tt>OVERLAY_DIR/profiles/groups</tt> to define user and group information required for the overlay. This way, overlays could extend users and groups.
</pre>
# Python-based backwards-compatibility code (implementation to be determined)
When you call sub(), it'll find the first sequence of characters in mystring that matches regexp, and it'll replace that sequence with replstring. sub() and gsub() have identical arguments; the only way they differ is that sub() will replace the first regexp match (if any), and gsub() will perform a global replace, swapping out all matches in the string. Here's an example sub() and gsub() call:
# Profile-based plugin architecture, again python-based.
<pre>
# <tt>user-add</tt> and <tt>group-add</tt> scripts, implemented as stand-alone executables (likely written as a shell script.) This is the only part not in python and these scripts do not do any kind of high-level policy decisions. They simply create the user or group and report success or failure.
sub(/o/,"O",mystring)
print mystring
mystring="How are you doing today?"
gsub(/o/,"O",mystring)
print mystring
</pre>
We had to reset mystring to its original value because the first sub() call modified mystring directly. When executed, this code will cause awk to output:
<pre>
HOw are you doing today?
HOw are yOu dOing tOday?
</pre>
Of course, more complex regular expressions are possible. I'll leave it up to you to test out some complicated regexps.


We wrap up our string function coverage by introducing you to a function called split(). split()'s job is to "chop up" a string and place the various parts into an integer-indexed array. Here's an example split() call:
=== Possible Changes and Unresolved Issues ===
<pre>
numelements=split("Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec",mymonths,",")
</pre>
When calling split(), the first argument contains the literal string or string variable to be chopped. In the second argument, you should specify the name of the array that split() will stuff the chopped parts into. In the third element, specify the separator that will be used to chop the strings up. When split() returns, it'll return the number of string elements that were split. split() assigns each one to an array index starting with one, so the following code:
<pre>
print mymonths[1],mymonths[numelements]
</pre>
....will print:
<pre>
Jan Dec
</pre>


=== Special string forms ===
==== Disable User/Group Creation ====
A quick note -- when calling length(), sub(), or gsub(), you can drop the last argument and awk will apply the function call to $0 (the entire current line). To print the length of each line in a file, use this awk script:
<pre>
{
    print length()
}
</pre>


=== Financial fun ===
<tt>FEATURES="-auto-accounts"</tt> (<tt>auto-accounts</tt> would be enabled by default)
A few weeks ago, I decided to write my own checkbook balancing program in awk. I decided that I'd like to have a simple tab-delimited text file into which I can enter my most recent deposits and withdrawals. The idea was to hand this data to an awk script that would automatically add up all the amounts and tell me my balance. Here's how I decided to record all my transactions into my "ASCII checkbook":
<pre>
23 Aug 2000    food    -   -   Y    Jimmy's Buffet    30.25
</pre>
Every field in this file is separated by one or more tabs. After the date (field 1, $1), there are two fields called "expense category" and "income category". When I'm entering an expense like on the above line, I put a four-letter nickname in the exp field, and a "-" (blank entry) in the inc field. This signifies that this particular item is a "food expense" :) Here's what a deposit looks like:
<pre>
23 Aug 2000    -   inco    -    Y    Boss Man        2001.00
</pre>
In this case, I put a "-" (blank) in the exp category, and put "inco" in the inc category. "inco" is my nickname for generic (paycheck-style) income. Using category nicknames allows me to generate a breakdown of my income and expenditures by category. As far as the rest of the records, all the other fields are fairly self-explanatory. The cleared? field ("Y" or "N") records whether the transaction has been posted to my account; beyond that, there's a transaction description, and a positive dollar amount.


The algorithm used to compute the current balance isn't too hard. Awk simply needs to read in each line, one by one. If an expense category is listed but there is no income category (denoted by "-"), then this item is a debit. If an income category is listed, but no expense category (denoted by "-") is present, then the dollar amount is a credit. And, if there is both an expense and income category listed, then this amount is a "category transfer"; that is, the dollar amount will be subtracted from the expense category and added to the income category. Again, all these categories are virtual, but are very useful for tracking income and expenditures, as well as for budgeting.
This is a change from GLEP 27 to get rid of ugly "no" prefix and to follow naming conventions for existing <tt>FEATURES</tt> settings.


=== The code ===
With <tt>auto-accounts</tt> disabled, Portage will do an initial check using libc (respecting <tt>/etc/nsswitch.conf</tt>) to see if all depended-upon users and groups exist. If they exist, the user/group dependency will be satisfied and <tt>ebuild</tt> can continue. If the dependencies are not satisfied, then the ebuild will abort with unsatisfied dependencies and display the users and groups that need to be created, and what their associated settings should be.
Time to look at the code. We'll start off with the first line, the BEGIN block and a function definition:
<pre>
#!/usr/bin/awk -f
BEGIN {
    FS="\t+"
    months="Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec"
}


function monthdigit(mymonth) {
==== Allow User/Group Names to Be Specified At Build Time ====
    return (index(months,mymonth)+3)/4
}
</pre>
Adding the first "#!..." line to any awk script will allow it to be directly executed from the shell, provided that you "chmod +x myscript" first. The remaining lines define our BEGIN block, which gets executed before awk starts processing our checkbook file. We set FS (the field separator) to "\t+", which tells awk that the fields will be separated by one or more tabs. In addition, we define a string called months that's used by our monthdigit() function, which appears next.


The last three lines show you how to define your own awk function. The format is simple -- type "function", then the function name, and then the parameters separated by commas, inside parentheses. After this, a "{ }" code block contains the code that you'd like this function to execute. All functions can access global variables (like our months variable). In addition, awk provides a "return" statement that allows the function to return a value, and operates similarly to the "return" found in C, Python, and other languages. This particular function converts a month name in a 3-letter string format into its numeric equivalent. For example, this:
Some users may want an <tt>nginx</tt> user, while others may want a generic <tt>www</tt> user to be used.
<pre>
print monthdigit("Mar")
</pre>
....will print this:
<pre>
3
</pre>
Now, let's move on to some more functions.


=== Financial functions ===
TBD.
Here are three more functions that perform the bookkeeping for us. Our main code block, which we'll see soon, will process each line of the checkbook file sequentially, calling one of these functions so that the appropriate transactions are recorded in an awk array. There are three basic kinds of transactions, credit (doincome), debit (doexpense) and transfer (dotransfer). You'll notice that all three functions accept one argument, called mybalance. mybalance is a placeholder for a two-dimensional array, which we'll pass in as an argument. Up until now, we haven't dealt with two-dimensional arrays; however, as you can see below, the syntax is quite simple. Just separate each dimension with a comma, and you're in business.


We'll record information into "mybalance" as follows. The first dimension of the array ranges from 0 to 12, and specifies the month, or zero for the entire year. Our second dimension is a four-letter category, like "food" or "inco"; this is the actual category we're dealing with. So, to find the entire year's balance for the food category, you'd look in mybalance[0,"food"]. To find June's income, you'd look in mybalance[6,"inco"].
==== Not Elegant for Specific Users/Groups ====
<pre>       
function doincome(mybalance) {
    mybalance[curmonth,$3] += amount
    mybalance[0,$3] += amount       
}


function doexpense(mybalance) {
This implementation looks cool but is potentially annoying for specific users and groups. For example, for an <tt>nginx</tt> ebuild that needs an <tt>nginx</tt> user, it would need to be added to the system profile. We probably need to implement ebuild-local user/groups as well.
    mybalance[curmonth,$2] -= amount
    mybalance[0,$2] -= amount       
}


function dotransfer(mybalance) {
==== Specify Required Users and Groups for Profile ====
    mybalance[0,$2] -= amount
    mybalance[curmonth,$2] -= amount
    mybalance[0,$3] += amount
    mybalance[curmonth,$3] += amount
}
</pre>
When doincome() or any of the other functions are called, we record the transaction in two places -- mybalance[0,category] and mybalance[curmonth, category], the entire year's category balance and the current month's category balance, respectively. This allows us to easily generate either an annual or monthly breakdown of income/expenditures later on.


If you look at these functions, you'll notice that the array referenced by mybalance is passed in by reference. In addition, we also refer to several global variables: curmonth, which holds the numeric value of the month of the current record, $2 (the expense category), $3 (the income category), and amount ($7, the dollar amount). When doincome() and friends are called, all these variables have already been set correctly for the current record (line) being processed.
Some users and groups '''must''' be part of the system and should be in the system set. It would be nice to move some of this out of baselayout and into the profiles directly. Maybe a good solution is to have <tt>baselayout</tt> <tt>RDEPEND</tt> on these users and groups.


=== The main block ===
TBD.
Here's the main code block that contains the code that parses each line of input data. Remember, because we have set FS correctly, we can refer to the first field as $1, the second field as $2, etc. When doincome() and friends are called, the functions can access the current values of curmonth, $2, $3 and amount from inside the function. Take a look at the code and meet me on the other side for an explanation.
<pre>
{
    curmonth=monthdigit(substr($1,4,3))
    amount=$7
     
    #record all the categories encountered
    if ( $2 != "-" )
        globcat[$2]="yes"
    if ( $3 != "-" )
        globcat[$3]="yes"


    #tally up the transaction properly
==== Dependency Prefix ====
    if ( $2 == "-" ) {
        if ( $3 == "-" ) {
            print "Error: inc and exp fields are both blank!"
            exit 1
        } else {
            #this is income
            doincome(balance)
            if ( $5 == "Y" )
                doincome(balance2)
        }
    } else if ( $3 == "-" ) {
        #this is an expense
        doexpense(balance)
        if ( $5 == "Y" )
            doexpense(balance2)
    } else {
        #this is a transfer
        dotransfer(balance)
        if ( $5 == "Y" )
            dotransfer(balance2)
    }                       
}
</pre>
In the main block, the first two lines set curmonth to an integer between 1 and 12, and set amount to field 7 (to make the code easier to understand). Then, we have four interesting lines, where we write values into an array called globcat. globcat, or the global categories array, is used to record all those categories encountered in the file -- "inco", "misc", "food", "util", etc. For example, if $2 == "inco", we set globcat["inco"] to "yes". Later on, we can iterate through our list of categories with a simple "for (x in globcat)" loop.


On the next twenty or so lines, we analyze fields $2 and $3, and record the transaction appropriately. If $2=="-" and $3!="-", we have some income, so we call doincome(). If the situation is reversed, we call doexpense(); and if both $2 and $3 contain categories, we call dotransfer(). Each time, we pass the "balance" array to these functions so that the appropriate data is recorded there.
One possible area of improvement is with the <tt>user/</tt> and <tt>group/</tt> syntax itself, which could be changed slightly to indicate that we are depending on something other than a package. But this is not absolutely necessary and "user" and "group" could be treated as reserved names that cannot be used for categories, since they have a special meaning.


You'll also notice several lines that say "if ( $5 == "Y" ), record that same transaction in balance2". What exactly are we doing here? You'll recall that $5 contains either a "Y" or a "N", and records whether the transaction has been posted to the account. Because we record the transaction to balance2 only if the transaction has been posted, balance2 will contain the actual account balance, while "balance" will contain all transactions, whether they have been posted or not. You can use balance2 to verify your data entry (since it should match with your current account balance according to your bank), and use "balance" to make sure that you don't overdraw your account (since it will take into account any checks you have written that have not yet been cashed).
==== .tbz2 support ====


=== Generating the report ===
In general, the design proposed above  will work well for binary packages, as long as the users and groups required by the <tt>.tbz2</tt> can be found in the local Portage tree and overlays. If not, then Portage will not have any metadata relating to the user(s) or group(s) that need to be created for the <tt>.tbz2</tt> and will not be able to create them, resulting in an install failure, which of course is not optimal.  
After the main block repeatedly processes each input record, we now have a fairly comprehensive record of debits and credits broken down by category and by month. Now, all we need to do is define an END block that will generate a report, in this case a modest one:
<pre>
END {
    bal=0
    bal2=0       
    for (x in globcat) {
        bal=bal+balance[0,x]
        bal2=bal2+balance2[0,x]   
    }
    printf("Your available funds: %10.2f\n", bal)
    printf("Your account balance: %10.2f\n", bal2)       
}
</pre>
This report prints out a summary that looks something like this:
<pre>
Your available funds:    1174.22
Your account balance:    2399.33
</pre>
In our END block, we used the "for (x in globcat)" construct to iterate through every category, tallying up a master balance based on all the transactions recorded. We actually tally up two balances, one for available funds, and another for the account balance. To execute the program and process your own financial goodies that you've entered into a file called '''mycheckbook.txt''', put all the above code into a text file called '''balance''' and do <span style="color:green;">"chmod +x balance"</span>, and then type <span style="color:green;">"./balance mycheckbook.txt"</span>. The balance script will then add up all your transactions and print out a two-line balance summary for you.


=== Upgrades ===
Therefore, it may be necessary to embed user and group metadata within the <tt>.tbz2</tt> and have Portage use this data only if local user/group metadata for the requested users and groups is not available. In addition, this user/group metadata may need to be cached persistently inside <tt>/var/db/pkg</tt> or another location to ensure that it is continually available to the Portage UID/GID code. This could add a bit more complexity to the implementation but should solve the <tt>.tbz2</tt> failure problem. This would create three layers of user/group data:
I use a more advanced version of this program to manage my personal and business finances. My version (which I couldn't include here due to space limitations) prints out a monthly breakdown of income and expenses, including annual totals, net income and a bunch of other stuff. Even better, it outputs the data in HTML format, so that I can view it in a Web browser :) If you find this program useful, I encourage you to add these features to this script. You won't need to configure it to record any additional information; all the information you need is already in balance and balance2. Just upgrade the END block, and you're in business!


I hope you've enjoyed this series. For more information on awk, check out the resources listed below.
# Core user/group metadata defined in <tt>/usr/portage</tt>.
# Overlay user/group metadata defined in <tt>OVERLAY_DIR/profiles/{users,groups}</tt>
# Package user/group metadata


== Resources ==
Using pseudo-code, we could imagine resolution of user and group metadata at <tt>.tbz2</tt> install time to look like this:
* Read Daniel's other awk articles on Funtoo: Awk By Example, [[Awk by example, Part1|Part 1]] and [[Awk by example, Part2|Part 2]].
* If you'd like a good old-fashioned book, [http://www.oreilly.com/catalog/sed2/ O'Reilly's sed & awk, 2nd Edition] is a wonderful choice.
* Be sure to check out the [http://www.faqs.org/faqs/computer-lang/awk/faq/ comp.lang.awk FAQ]. It also contains lots of additional awk links.
* Patrick Hartigan's [http://sparky.rice.edu/~hartigan/awk.html awk tutorial] is packed with handy awk scripts.
* [http://www.tasoft.com/tawk.html Thompson's TAWK Compiler] compiles awk scripts into fast binary executables. Versions are available for Windows, OS/2, DOS, and UNIX.
* [http://www.gnu.org/software/gawk/manual/gawk.html The GNU Awk User's Guide] is available for online reference.


[[ Category:Linux Core Concepts ]]
<pre>
[[Category:Articles]]
all_ug_metadata = profile_ug_metadata + overlay_ug_metadata
if (user_or_group in (all_ug_metadata)):
    return all_ug_metadata[user_or_group]
else:
    return binary_package_ug_metadata[user_or_group]
</pre>
==== Compatibility with other distributions ====
If our goal is to ensure a sane method of creating UID/GID's in packages, we should also look at making them compatible with the wider world.  The LSB http://refspecs.freestandards.org/LSB_3.0.0/LSB-Core-generic/LSB-Core-generic/usernames.html specifies very lax standards for system accounts.  Seemingly there are no hard standards for system/daemon UID/GID's, and no real desire in the community from anyone I discussed this issue with to standardize.  There is one important issue to note, and that is the lowest user account number.
* Fedora/RHEL:  Presently RHEL starts assigning UID/GID's to users of the system at 500 and moves up, this will changehttp://lists.fedoraproject.org/pipermail/devel/2011-May/151663.html to number after 1000
* Debian/Ubuntu: Presently Debian starts assigning UID/GID's to users of the system at 1000, and moves up.  This appears to be the standard distributions are moving towards
* Gentoo/Funtoo: Presently Funtoo and Gentoo are both compliant with Debian, and after Fedora 16, and the subsequent RHEL, this will be a standard across most major linux distributions.

Revision as of 20:03, July 4, 2011

   Note

the Discussion page contains a much more ambitious proposal that I decided was too complex to tackle all at once. It is being split into bite-sized portions (Phases) on the main page.

User and Group Dependencies

Exheres defines a dependency-based mechanism for ebuilds to specify their user and group dependencies, which is an appropriate mechanism for specifying dependencies. The specific syntax used is user/foo to specify a dependency on user foo, and group/bar to specify a dependency on group bar existing. Dependencies can be build-time or run-time, as required.

Using dependencies for this purpose allows Portage to create these users and groups at exactly the right time -- prior to build or prior to install, as necessary, and will work just fine with binary packages, with some potential caveats (noted later in the document.) It also allows user and group creation to be affected by USE variable settings.

This document suggests following the Exheres syntax:

DEPEND="user/lighttpd group/web-server"
RDEPEND="user/lighttpd group/web-server"

All this tells Portage is that "This ebuild needs a lighttpd user and web-server group." But it does not tell Portage what UID it should be, nor does it provide other necessary settings for the user. This data is defined within the Portage tree, and the mechanism for defining this data is described below.

Profile Settings

The user or group dependency will just tell Portage that this particular package requires a particular user or group, but any detailed information related to this user or group, such as suggested UID/GID, shell, etc, is stored in the Portage tree itself, and specifically in the Portage profile. The mechanism for defining this information is described below:

Core Portage Trees

For "core" Portage trees (not overlays,) specific user and group settings are defined using Portage's cascading profile functionality. Portage would be enhanced to recognize accounts/users and accounts/groups directories inside profile directories. Users and groups would be defined in these directories, with one user or group per file, and the filename specifying the name of the user or group. Cascading functionality would be enabled so that the full set of user and group data could be a collection of all users and groups defined in parent profiles. This would provide a handy mechanism to share user and group definitions across different operating systems, while allowing for local variations when needed. It makes sense to leverage cascading profiles as much as possible.

Overlays

The approach described above does not work for overlays -- how are they to extend user and group settings automatically, as required by the ebuilds contained in the overlay?

The proposed solution is to allow overlays to add users and groups via the OVERLAY_DIR/profiles/accounts/groups and PORTDIR/profiles/accounts/users directories. These directories will always be searched for user and group data for all active overlays, and merged into the set defined by the profiles. This provides an automatic mechanism for overlays to inject user and group data that they require, without requiring any manual configuration on behalf of the Gentoo/Funtoo Linux user.

This way, Portage can have elegant overlay support inherent in the Exheres "global repository of user/group data" design, while still having an extensible mechanism to define users and groups using cascading profiles. In my opinion, this is the best of both worlds.

Account Resolution

See the following pseudo-code for how resolution of cascading profiles and overlays should work together to resolve user settings. One important thing to note is that user and group resolution cascades through the profiles to create a master list of users, groups and defaults. This master list is extended by any overlays that are active. Then, when user or group data is requested, the resolved user, group and defaults lists are used to generate the resultant data.

Users pseudo-code, with Groups being implemented identically:

class Profile:

  def __init__(self,path):
    self.path = path
    self._processed_user_defaults = False
    self._required_user_fields = []
    self._alternate_user_fields = {}
    self.parents = []
    # sample code to recursively create Parent profiles:
    if os.path.exists("%s/parents" % self.path):
      a=open("%s/parents" % self.path,"r")
      for line in a:
        self.parents.append(Profile(self.resolve_path(line)))
      a.close()

  @property
  def users(self):
    """ returns a dictionary mapping user names to the files on disk defining each user (cascading) """
    users = {}
    for parent in self.parents:
      users.update(self.parent.users)
    for userfile in glob.glob("accounts/users/*"):
      users[os.path.dirname(userfile)] = os.path.abspath(userfile)
    for overlay in self.overlays:
      users.update(self.overlay.users)
    return users

  def userData(self,user):
    """ returns a dictionary of key/value pairs defining the variables for specified user. Note:
        * alternative key names are mapped to primary key names
        * an exception is thrown if required fields are missing
    """
    out = {}
    if user in self.users:
      user_data = grabFile(self.users[user])
      out = grabFile(self.defaults["user"])
      required = []
      alternatives = {}
      if "required" in out:
        for req_key in out["required"].split(','):
          alts = req_key.split('|')
          required.append(alts[0])
          if len(alts) > 1:
          for alt_key in alts[1:]:
            alternatives[alt_key] = alts[0]  
      if "parent" in user_data:
        # note, this next line requires a grabFile() implementation that supports alternatives, and
        # will use this dict to map any alternative names to the primary name in the return data:
        out.update(grabFile(self.defaults[user_data["parent"]],alternatives=alternatives))
        out.update(user_data,alternatives=alternatives)
    for req_key in required:
      if not req_key in out:
        raise RequiredKeyError(user,req_key)
    return out

  @property
  def defaults(self):
    """ returns a dictionary mapping defaults names to the files on disk defining each default (cascading) """
    defaults = {}
    for parent in self.parents:
      defaults.update(self.parent.defaults)
    for defaultsfile in glob.glob("accounts/defaults/*"):
      defaults[os.path.dirname(defaultsfile)] = os.path.abspath(defaultsfile)
    for overlay in self.overlays:
      defaults.update(overlay.user_defaults)
    return defaults

profile = Profile("/etc/make.profile")
my_user = profile.userData("nginx")
print my_user["desc"]

User and Group Data Format

Users

In a given profile directory, accounts/users/myuser will define settings for a user with the name of myuser. The file format used to define users is very similar to and compatible with Exheres, using standard make.conf-style key=value syntax, with quoting required for values with whitespace. The following field names are suggested to be used for the initial users implementation. Note that this file format is extensible -- Portage must not complain about any additional fields in the users, groups or defaults files that are not specified above. This allows these formats to be easily extended for alternate operating systems or other distributions without requiring patches to Portage.

Name Alternate Name Description Example Notes
shell N/A login shell /bin/bash
home N/A home directory /dev/null
group primary_group primary group wheel
extra_groups N/A other group memberships "audio,cdrom" comma-delimited list
uid preferred_uid preferred user ID (not guaranteed) 37 Will be bound by SYS_UID_MIN and SYS_UID_MAX defined in /etc/login.defs?
desc gecos Description/GECOS field "An account for fun"
parent N/A parent default file user-server

Example file accounts/users/foo:

shell=/bin/bash
home=/dev/null
group=foo
extra_groups="foo bar oni"
uid=37
desc="The cool account"

Groups

  • accounts/groups/mygroup will define settings a group with the name of mygroup.

Defaults

The UID/GID management framework supports the ability to explicitly define default values for all users and groups, or a subset of users and groups. In addition, these default values can be overridden by child profiles. This functionality allows default values to be overridden, and also provides a mechanism for profiles to specify which fields are required for that profile. This allows alternate platforms to have different required values, and also allows different Gentoo-based distributions to have different policies regarding required fields. This allows policy to be defined per distribution rather than being hard-coded into Portage itself.

Defaults can be defined inside the accounts/defaults directory inside each profile directory. The file accounts/defaults/user, if it exists, will be used to define any default settings for user accounts. The file accounts/defaults/group, if it exists, will be used to define any default settings for group accounts. These files are typically defined in one location for an entire set of cascading profiles, such as profiles/base.

Defaults files consist of key=value pairs, identical to user and group files. Note that the parent keyword is not valid in defaults files. A new keyword required specifies the required fields for any child users or groups, and may only be specified in the master defaults file 'user' or 'group':

Name Description Example Required Default Notes
required Required fields "shell,home,desc|gecos" No None comma-delimited list, with "|" used to specify alternate names

Alternate Defaults

In addition, other files in defaults can be created, and these files may be used to specify alternate default settings for users and groups, which can be overridden by child profiles. For example, an accounts/users/foo file that contains a parent=user-server would use the file accounts/defaults/user-server for its inherited default settings. The suggested convention for defaults values is to prefix user defaults with "user-" and group defaults with "group-", but this convention must not be enforced by Portage.

Any defaults files can be overridden by child profiles, which will result in the respective default settings changing for all users and groups that use those defaults.

Defaults Parsing Rules

Note that all alternate defaults files (such as user-server) always inherit (and optionally override) the global defaults defined in user and group. This means that a required setting defined in user will be inherited by user-server automatically. This allows the required field for users to be set globally in user, and makes it possible to override it easily, by simply providing a new user file in a child profile.

  • A default setting defined in user or group can be unset by setting it to a value of "".
  • Non-required fields that have not been explicitly defined have a default value of "" (the empty string).
  • Required fields that are unset or have a value of "" should not be allowed and should be flagged as invalid by Portage.

User and Group Creation

The commands actually used by Portage to create users and groups need to be able to be customizable, as they vary by operating system.

Here are some possible mechanisms to implement this functionality, listed in order of personal preference:

  1. Add a plugins directory to profiles and create user-add and group-add scripts within these directories. This allows the user-add and group-add scripts to be different between MacOS X and Linux, for example, while allowing common platforms to re-use existing scripts. Users could override the user-creation behavior by creating /etc/portage/plugins/user-add script.
  2. Add virtual/user-manager to every system profile which would install user-add and group-add commands to a Portage plug-in directory. These commands would be used for creating all users and groups on the system, would have a defined command-line API, and could vary based on OS by tweaking the virtual in the system profile.
  3. Add internal logic to Portage for adding groups and users to various operating systems. I think this solution would be sub-optimal as it is less "tweakable". User and group creation is something that can be useful to tweak in various circumstances, especially by power users.

Migration

What remains to be defined is how to transition from enewgroup and enewuser that are currently being called from pkg_setup. The new implementation should be backwards-compatible with the old system to ease transition.

Options:

  1. call pkg_setup during dependency generation and use enewgroup and enewuser wrappers to inject dependency info into the metadata, and emit a deprecation warning. Pass only the user/group name to the new system, which would provide its own UID/GID info. This may not be feasible.
  2. brute-force - grep the ebuild for legacy commands during metadata generation. Integrate new-style dependencies into metadata. This is possibly the least elegant solution but may be the simplest approach.
  3. fallback - tweak the legacy commands to call the new framework. This means that older ebuilds would not be able to have their users and groups created at the same time as new-style ebuilds (dependency fulfillment time.) However, this may be the most elegant solution and also the least hackish.

The last option seems best.

Architecture

Here are the various architectural layers of the implementation:

  1. Portage internals to handle "user/" and "group/" as special words. Would be treated almost identically to ebuilds up until actual merge time. Version specifiers, as well as USE flags, would not be allowed.
  2. Python-based code to parse user and group data in the profiles, and determine proper UID/GID to use on the system. This is the parsing and policy framework, and can be controlled by variables defined in make.conf/make.defaults. This would all be written in Python and integrated into the Portage core.
    1. "Core" Portage trees would use cascading profiles to define users and groups. This would allow variations based on architecture (Portage on MacOS X vs. Linux, for example.)
    2. Overlays would use OVERLAY_DIR/profiles/users and OVERLAY_DIR/profiles/groups to define user and group information required for the overlay. This way, overlays could extend users and groups.
  3. Python-based backwards-compatibility code (implementation to be determined)
  4. Profile-based plugin architecture, again python-based.
  5. user-add and group-add scripts, implemented as stand-alone executables (likely written as a shell script.) This is the only part not in python and these scripts do not do any kind of high-level policy decisions. They simply create the user or group and report success or failure.

Possible Changes and Unresolved Issues

Disable User/Group Creation

FEATURES="-auto-accounts" (auto-accounts would be enabled by default)

This is a change from GLEP 27 to get rid of ugly "no" prefix and to follow naming conventions for existing FEATURES settings.

With auto-accounts disabled, Portage will do an initial check using libc (respecting /etc/nsswitch.conf) to see if all depended-upon users and groups exist. If they exist, the user/group dependency will be satisfied and ebuild can continue. If the dependencies are not satisfied, then the ebuild will abort with unsatisfied dependencies and display the users and groups that need to be created, and what their associated settings should be.

Allow User/Group Names to Be Specified At Build Time

Some users may want an nginx user, while others may want a generic www user to be used.

TBD.

Not Elegant for Specific Users/Groups

This implementation looks cool but is potentially annoying for specific users and groups. For example, for an nginx ebuild that needs an nginx user, it would need to be added to the system profile. We probably need to implement ebuild-local user/groups as well.

Specify Required Users and Groups for Profile

Some users and groups must be part of the system and should be in the system set. It would be nice to move some of this out of baselayout and into the profiles directly. Maybe a good solution is to have baselayout RDEPEND on these users and groups.

TBD.

Dependency Prefix

One possible area of improvement is with the user/ and group/ syntax itself, which could be changed slightly to indicate that we are depending on something other than a package. But this is not absolutely necessary and "user" and "group" could be treated as reserved names that cannot be used for categories, since they have a special meaning.

.tbz2 support

In general, the design proposed above will work well for binary packages, as long as the users and groups required by the .tbz2 can be found in the local Portage tree and overlays. If not, then Portage will not have any metadata relating to the user(s) or group(s) that need to be created for the .tbz2 and will not be able to create them, resulting in an install failure, which of course is not optimal.

Therefore, it may be necessary to embed user and group metadata within the .tbz2 and have Portage use this data only if local user/group metadata for the requested users and groups is not available. In addition, this user/group metadata may need to be cached persistently inside /var/db/pkg or another location to ensure that it is continually available to the Portage UID/GID code. This could add a bit more complexity to the implementation but should solve the .tbz2 failure problem. This would create three layers of user/group data:

  1. Core user/group metadata defined in /usr/portage.
  2. Overlay user/group metadata defined in OVERLAY_DIR/profiles/{users,groups}
  3. Package user/group metadata

Using pseudo-code, we could imagine resolution of user and group metadata at .tbz2 install time to look like this:

all_ug_metadata = profile_ug_metadata + overlay_ug_metadata
if (user_or_group in (all_ug_metadata)):
    return all_ug_metadata[user_or_group]
else:
    return binary_package_ug_metadata[user_or_group]

Compatibility with other distributions

If our goal is to ensure a sane method of creating UID/GID's in packages, we should also look at making them compatible with the wider world. The LSB http://refspecs.freestandards.org/LSB_3.0.0/LSB-Core-generic/LSB-Core-generic/usernames.html specifies very lax standards for system accounts. Seemingly there are no hard standards for system/daemon UID/GID's, and no real desire in the community from anyone I discussed this issue with to standardize. There is one important issue to note, and that is the lowest user account number.

  • Fedora/RHEL: Presently RHEL starts assigning UID/GID's to users of the system at 500 and moves up, this will changehttp://lists.fedoraproject.org/pipermail/devel/2011-May/151663.html to number after 1000
  • Debian/Ubuntu: Presently Debian starts assigning UID/GID's to users of the system at 1000, and moves up. This appears to be the standard distributions are moving towards
  • Gentoo/Funtoo: Presently Funtoo and Gentoo are both compliant with Debian, and after Fedora 16, and the subsequent RHEL, this will be a standard across most major linux distributions.