Difference between pages "Awk by Example, Part 2" and "Welcome"

(Difference between pages)
 
 
Line 1: Line 1:
{{Article
+
<div class="container" style="font-family: Open Sans; font-size: 14px; line-height: 20px;"><div class="row"><div class="col-xs-12 col-md-8 col-lg-8">
|Author=Drobbins
+
{{Slideshow}}
|Previous in Series=Awk by Example, Part 1
+
</div><div class="col-xs-12 col-md-4 col-lg-4">
|Next in Series=Awk by Example, Part 3
+
'''Funtoo Linux''' is a Linux-based operating system that is a variant of [http://en.wikipedia.org/wiki/Gentoo_Linux Gentoo Linux], led by [[User:Drobbins|Daniel Robbins]] (the creator and former Chief Architect of Gentoo) who serves as benevolent dictator for life (BDFL) of the project. ''Funtoo Linux is optimized for the best possible performance, supporting Intel Core i7, AMD FX Processors, and others.''  [[Subarches|See what we support.]] See [[#Distinctives|Distinctives]], below, for more information about what makes us special.
}}
+
== Records, loops, and arrays ==
+
  
=== Multi-line records ===
+
'''Other Funtoo Projects include''':
Awk is an excellent tool for reading in and processing structured data, such as the system's /etc/passwd file. /etc/passwd is the UNIX user database, and is a colon-delimited text file, containing a lot of important information, including all existing user accounts and user IDs, among other things. In my previous article, I showed you how awk could easily parse this file. All we had to do was to set the FS (field separator) variable to ":".
+
'''[[Keychain]]''', an SSH/GPG agent front-end.
 +
* '''[[Metro]]''', automated Funtoo build engine.
 +
* '''[[Linux_Fundamentals,_Part_1|Learn Linux]]'''! [[Awk_by_Example,_Part_1|Awk]], [[Bash_by_Example,_Part_1|Bash]], [[Sed_by_Example,_Part_1|Sed]]  and more.
  
By setting the FS variable correctly, awk can be configured to parse almost any kind of structured data, as long as there is one record per line. However, just setting FS won't do us any good if we want to parse a record that exists over multiple lines. In these situations, we also need to modify the RS record separator variable. The RS variable tells awk when the current record ends and a new record begins.
 
  
As an example, let's look at how we'd handle the task of processing an address list of Federal Witness Protection Program participants:
+
'''Ebuild pages recently updated:''' {{#ask: [[Category:Ebuilds]]
<pre>
+
| order=descending
Jimmy the Weasel
+
| sort=Modification date
100 Pleasant Drive
+
| format=list
San Francisco, CA 12345
+
| limit=10
 +
| searchlabel=
 +
}} [[Ebuilds|more...]]
  
Big Tony
+
'''Want to submit a screenshot? [http://forums.funtoo.org/index.php?/topic/180-screenshots/ See here.]'''
200 Incognito Ave.
+
</div></div><div class="row"><div class="col-xs-12">
Suburbia, WA 67890
+
{{Announce|[[Support Funtoo]] and help us grow! '''Donate $15 per month and get a free SSD-based [[Funtoo Hosting|Funtoo Virtual Container]].'''}}
</pre>
+
</div></div><div class="row"><div class="col-xs-12 col-md-4 col-lg-4">
Ideally, we'd like awk to recognize each 3-line address as an individual record, rather than as three separate records. It would make our code a lot simpler if awk would recognize the first line of the address as the first field ($1), the street address as the second field ($2), and the city, state, and zip code as field $3. The following code will do just what we want:
+
=== News ===
<pre>
+
{{NewsList|3}}
BEGIN {
+
[[News|View More News...]]
    FS="\n"
+
    RS=""
+
}
+
</pre>
+
Above, setting FS to "\n" tells awk that each field appears on its own line. By setting RS to "", we also tell awk that each address record is separated by a blank line. Once awk knows how the input is formatted, it can do all the parsing work for us, and the rest of the script is simple. Let's look at a complete script that will parse this address list and print out each address record on a single line, separating each field with a comma.
+
<pre>
+
BEGIN {
+
    FS="\n"
+
    RS=""
+
}
+
{ print $1 ", " $2 ", " $3 }
+
</pre>
+
If this script is saved as address.awk, and the address data is stored in a file called address.txt, you can execute this script by typing awk -f address.awk address.txt. This code produces the following output:
+
<pre>
+
Jimmy the Weasel, 100 Pleasant Drive, San Francisco, CA 12345
+
Big Tony, 200 Incognito Ave., Suburbia, WA 67890
+
</pre>
+
  
=== OFS and ORS ===
+
=== Expand the wiki! ===
In address.awk's print statement, you can see that awk concatenates (joins) strings that are placed next to each other on a line. We used this feature to insert a comma and a space (", ") between the three address fields that appeared on the line. While this method works, it's a bit ugly looking. Rather than inserting literal ", " strings between our fields, we can have awk do it for us by setting a special awk variable called OFS. Take a look at this code snippet.
+
<pre>
+
print "Hello", "there", "Jim!"
+
</pre>
+
  
The commas on this line are not part of the actual literal strings. Instead, they tell awk that "Hello", "there", and "Jim!" are separate fields, and that the OFS variable should be printed between each string. By default, awk produces the following output:
+
The [[:Help:Funtoo_Editing_Guidelines | How to 'wiki']] will help get you started on wiki editing. Have a look at [[Requested-Documents]] and [[:Category:Needs_Updates | pages that need to be updated.]]
<pre>
+
Hello there Jim!
+
</pre>
+
This shows us that by default, OFS is set to " ", a single space. However, we can easily redefine OFS so that awk will insert our favorite field separator. Here's a revised version of our original address.awk program that uses OFS to output those intermediate ", " strings:
+
<pre>
+
BEGIN {
+
    FS="\n"
+
    RS=""
+
    OFS=", "
+
}
+
{ print $1, $2, $3 }
+
</pre>
+
Awk also has a special variable called ORS, called the "output record separator". By setting ORS, which defaults to a newline ("\n"), we can control the character that's automatically printed at the end of a print statement. The default ORS value causes awk to output each new print statement on a new line. If we wanted to make the output double-spaced, we would set ORS to "\n\n". Or, if we wanted records to be separated by a single space (and no newline), we would set ORS to " ".
+
  
=== Multi-line to tabbed ===
+
See [[:Category:Ebuilds|Ebuilds]] for a list of all ebuild pages, and [[Adding an Ebuild to the Wiki]] for information on how to add one.
Let's say that we wrote a script that converted our address list to a single-line per record, tab-delimited format for import into a spreadsheet. After using a slightly modified version of address.awk, it would become clear that our program only works for three-line addresses. If awk encountered the following address, the fourth line would be thrown away and not printed:
+
</div><div class="col-sm-12 col-xs-12 col-md-4 col-lg-4">
<pre>
+
=== Distinctives ===
Cousin Vinnie
+
Vinnie's Auto Shop
+
300 City Alley
+
Sosueme, OR 76543
+
</pre>
+
To handle situations like this, it would be good if our code took the number of records per field into account, printing each one in order. Right now, the code only prints the first three fields of the address. Here's some code that does what we want:
+
<pre>
+
BEGIN {
+
    FS="\n"  
+
    RS=""
+
    ORS=""
+
}
+
+
+
    x=1
+
    while ( x<NF ) {
+
        print $x "\t"
+
        x++
+
    }
+
    print $NF "\n"
+
}
+
</pre>
+
First, we set the field separator FS to "\n" and the record separator RS to "" so that awk parses the multi-line addresses correctly, as before. Then, we set the output record separator ORS to "", which will cause the print statement to not output a newline at the end of each call. This means that if we want any text to start on a new line, we need to explicitly write print "\n".
+
  
In the main code block, we create a variable called x that holds the number of current field that we're processing. Initially, it's set to 1. Then, we use a while loop (an awk looping construct identical to that found in the C language) to iterate through all but the last record, printing the record and a tab character. Finally, we print the last record and a literal newline; again, since ORS is set to "", print won't output newlines for us. Program output looks like this, which is exactly what we wanted:
+
Funtoo Linux is a meta-distribution, which means it is built (fully automatically) with the functionality and optimizations that ''you'' want, not what some distro maintainer thought was best for you. Packages are installed directly from source code, thanks to the [http://en.wikipedia.org/wiki/Portage_(software) Portage ports system], inspired by the FreeBSD ports system, written in Python and with full advanced package management functionality.  
<pre>
+
Jimmy the Weasel        100 Pleasant Drive      San Francisco, CA 12345
+
Big Tony        200 Incognito Ave.      Suburbia, WA 67890
+
Cousin Vinnie  Vinnie's Auto Shop      300 City Alley  Sosueme, OR 76543
+
</pre>
+
  
=== Looping constructs ===
+
''Benefits for desktops'': leaner, optimized, faster system. ''Additional benefits for servers'': enable only what you actually need to reduce attack surface, thus improving security.
We've already seen awk's while loop construct, which is identical to its C counterpart. Awk also has a "do...while" loop that evaluates the condition at the end of the code block, rather than at the beginning like a standard while loop. It's similar to "repeat...until" loops that can be found in other languages. Here's an example:
+
<pre>
+
{
+
    count=1
+
    do {
+
        print "I get printed at least once no matter what"
+
    } while ( count != 1 )
+
}
+
</pre>
+
Because the condition is evaluated after the code block, a "do...while" loop, unlike a normal while loop, will always execute at least once. On the other hand, a normal while loop will never execute if its condition is false when the loop is first encountered.
+
  
=== for loops ===
+
We use [http://en.wikipedia.org/wiki/Git_(software) Git] for all our development, and we also use Git to deliver our ports tree to you.
Awk allows you to create for loops, which like while loops are identical to their C counterpart:
+
<pre>
+
for ( initial assignment; comparison; increment ) {
+
    code block
+
}
+
</pre>
+
Here's a quick example:
+
<pre>
+
for ( x = 1; x <= 4; x++ ) {
+
    print "iteration",x
+
}
+
</pre>
+
This snippet will print:
+
<pre>
+
iteration 1
+
iteration 2
+
iteration 3
+
iteration 4
+
</pre>
+
  
=== Break and continue ===
+
In contrast to Gentoo Linux, we offer a number of innovations, including our extensive use of git, [[Funtoo 1.0 Profile|our profile system]], [[Package:Boot-Update|boot-update]] boot management tool, our incredibly flexible [[Funtoo Linux Networking|template-based networking scripts]], [[Metro Quick Start Tutorial|Metro]] distribution build system, support of Debian, RHEL and other kernels, [[Creating_Python-related_Ebuilds|enhanced Python support]], Portage mini-manifests, user-centric distribution model, and a large number of community infrastructure improvements.
Again, just like C, awk provides break and continue statements. These statements provide better control over awk's various looping constructs. Here's a code snippet that desperately needs a break statement:
+
</div><div class="col-sm-12 col-xs-12 col-md-4 col-lg-4">
<pre>
+
=== Getting Started ===
while (1) {
+
    print "forever and ever..."
+
}
+
</pre>
+
Because 1 is always true, this while loop runs forever. Here's a loop that only executes ten times:
+
<pre>
+
x=1
+
while(1) {
+
    print "iteration",x
+
    if ( x == 10 ) {
+
        break
+
    }
+
    x++
+
}
+
</pre>
+
Here, the break statement is used to "break out" of the innermost loop. "break" causes the loop to immediately terminate and execution to continue at the line after the loop's code block.
+
  
The continue statement complements break, and works like this:
+
'''[[Funtoo Linux Installation|Install Funtoo Linux]]''' and get involved in our user community. Get to know fellow users on our '''[http://forums.funtoo.org forums]'''. Funtoo Linux has a very active [http://en.wikipedia.org/wiki/IRC IRC] community on [http://freenode.net Freenode] (in the <code>#funtoo</code> channel) and you are encouraged to hang out with us.
<pre>
+
x=1
+
while (1) {
+
    if ( x == 4 ) {
+
        x++
+
        continue
+
    }
+
    print "iteration",x
+
    if ( x > 20 ) {
+
        break
+
    }
+
    x++
+
}
+
</pre>
+
This code will print "iteration 1" through "iteration 21", except for "iteration 4". If iteration equals 4, x is incremented and the continue statement is called, which immediately causes awk to start to the next loop iteration without executing the rest of the code block. The continue statement works for every kind of awk iterative loop, just as break does. When used in the body of a for loop, continue will cause the loop control variable to be automatically incremented. Here's an equivalent for loop:
+
<pre>
+
for ( x=1; x<=21; x++ ) {
+
    if ( x == 4 ) {
+
        continue
+
    }
+
    print "iteration",x
+
}
+
</pre>
+
It wasn't necessary to increment x just before calling continue as it was in our while loop, since the for loop increments x automatically.
+
  
=== Arrays ===
+
'''[[Reporting Bugs|We welcome bug reports and suggestions]]'''.  Please report bugs to our '''[http://bugs.funtoo.org bug tracker]'''. We take all bugs seriously, and all work performed is tracked on our bug tracker, for purposes of transparency.
You'll be pleased to know that awk has arrays. However, under awk, it's customary to start array indices at 1, rather than 0:
+
<pre>
+
myarray[1]="jim"
+
myarray[2]=456
+
</pre>
+
When awk encounters the first assignment, myarray is created and the element myarray[1] is set to "jim". After the second assignment is evaluated, the array has two elements.
+
  
Once defined, awk has a handy mechanism to iterate over the elements of an array, as follows:
+
'''{{CreateAccount}}''', which allows you to log in to the wiki, [http://forums.funtoo.org forums] and [https://bugs.funtoo.org bug tracker]. See the [[Funtoo Authentication FAQ|Auth FAQ]] for more info about account creation.
<pre>
+
for ( x in myarray ) {
+
    print myarray[x]
+
}
+
</pre>
+
This code will print out every element in the array myarray. When you use this special "in" form of a for loop, awk will assign every existing index of myarray to x (the loop control variable) in turn, executing the loop's code block once after each assignment. While this is a very handy awk feature, it does have one drawback -- when awk cycles through the array indices, it doesn't follow any particular order. That means that there's no way for us to know whether the output of above code will be:
+
<pre>
+
jim
+
456
+
</pre>
+
or
+
<pre>
+
456
+
jim
+
</pre>
+
To loosely paraphrase Forrest Gump, iterating over the contents of an array is like a box of chocolates -- you never know what you're going to get. This has something to do with the "stringiness" of awk arrays, which we'll now take a look at.
+
  
=== Array index stringiness ===
+
'''See our [[Funtoo Linux FAQ|FAQ]] for answers to common questions.'''
[[Awk by example, Part1 |In my previous article]], I showed you that awk actually stores numeric values in a string format. While awk performs the necessary conversions to make this work, it does open the door for some odd-looking code:
+
<pre>
+
a="1"
+
b="2"
+
c=a+b+3
+
</pre>
+
After this code executes, c is equal to 6. Since awk is "stringy", adding strings "1" and "2" is functionally no different than adding the numbers 1 and 2. In both cases, awk will successfully perform the math. Awk's "stringy" nature is pretty intriguing -- you may wonder what happens if we use string indexes for arrays. For instance, take the following code:
+
<pre>
+
myarr["1"]="Mr. Whipple"
+
print myarr["1"]
+
</pre>
+
As you might expect, this code will print "Mr. Whipple". But how about if we drop the quotes around the second "1" index?
+
<pre>
+
myarr["1"]="Mr. Whipple"
+
print myarr[1]
+
</pre>
+
Guessing the result of this code snippet is a bit more difficult. Does awk consider myarr["1"] and myarr[1] to be two separate elements of the array, or do they refer to the same element? The answer is that they refer to the same element, and awk will print "Mr. Whipple", just as in the first code snippet. Although it may seem strange, behind the scenes awk has been using string indexes for its arrays all this time!
+
  
After learning this strange fact, some of us may be tempted to execute some wacky code that looks like this:
+
Other resources include [http://larrythecow.org larrythecow.org], the Gentoo blog aggregator, [http://kernel-seeds.org kernel-seeds.org], and [http://git.funtoo.org git.funtoo.org], our cgit repository browser.
<pre>
+
</div></div></div>
myarr["name"]="Mr. Whipple"
+
print myarr["name"]
+
</pre>
+
Not only does this code not raise an error, but it's functionally identical to our previous examples, and will print "Mr. Whipple" just as before! As you can see, awk doesn't limit us to using pure integer indexes; we can use string indexes if we want to, without creating any problems. Whenever we use non-integer array indices like myarr["name"], we're using associative arrays. Technically, awk isn't doing anything different behind the scenes than when we use a string index (since even if you use an "integer" index, awk still treats it as a string). However, you should still call 'em associative arrays -- it sounds cool and will impress your boss. The stringy index thing will be our little secret. ;)
+
  
=== Array tools ===
+
__NOTITLE__
When it comes to arrays, awk gives us a lot of flexibility. We can use string indexes, and we aren't required to have a continuous numeric sequence of indices (for example, we can define myarr[1] and myarr[1000], but leave all other elements undefined). While all this can be very helpful, in some circumstances it can create confusion. Fortunately, awk offers a couple of handy features to help make arrays more manageable.
+
__NOEDITSECTION__
 +
{{#subobject:|slideIndex=0|slideCaption=
 +
<h4>h3nnn4n</h4>
  
First, we can delete array elements. If you want to delete element 1 of your array fooarray, type:
+
Awesome WM / Conky / screenfetch
<pre>
+
|slideImage=File:H3nnn4n.jpg}}
delete fooarray[1]
+
{{#subobject:|slideIndex=1|slideCaption=
</pre>
+
<h4>Help us document the Gentoo Ecosystem!</h4>
And, if you want to see if a particular array element exists, you can use the special "in" boolean operator as follows:
+
From Enoch to Gentoo to Funtoo to ChromeOS, and beyond...
<pre>
+
|slideImage=File:Ecosystem-snapshot.jpg|slideLink=Gentoo Ecosystem}}
if ( 1 in fooarray ) {
+
{{#subobject:|slideIndex=2|slideCaption=
    print "Ayep!  It's there."
+
<h4>brushdemon</h4>
} else {
+
    print "Nope!  Can't find it."
+
}
+
</pre>
+
  
=== Next time ===
+
OpenBox / screenfetch
We've covered a lot of ground in this article. Next time, I'll round out your awk knowledge by showing you how to use awk's math and string functions and how to create your own functions. I'll also walk you through the creation of a checkbook balancing program. Until then, I encourage you to write some of your own awk programs, and to check out the following resources.
+
|slideImage=File:brushdemon.jpg}}
 +
{{#subobject:|slideIndex=3|slideCaption=
 +
<h4>drobbins</h4>
  
== Resources ==
+
[[GNOME First Steps|GNOME 3.14]] / [[Funtoo_Linux_FAQ#Do_you_support_systemd.3F|without systemd]] / Badgers optional
* Read Daniel's other awk articles on Funtoo: Awk By Example, [[Awk by example, Part1|Part 1]] and [[Awk by example, Part3|Part 3]].
+
|slideImage=File:gnome3122.jpg|slideLink=GNOME First Steps}}
* If you'd like a good old-fashioned book, [http://www.oreilly.com/catalog/sed2/ O'Reilly's sed & awk, 2nd Edition] is a wonderful choice.
+
* Be sure to check out the [http://www.faqs.org/faqs/computer-lang/awk/faq/ comp.lang.awk FAQ]. It also contains lots of additional awk links.
+
* Patrick Hartigan's [http://sparky.rice.edu/~hartigan/awk.html awk tutorial] is packed with handy awk scripts.
+
* [http://www.tasoft.com/tawk.html Thompson's TAWK Compiler] compiles awk scripts into fast binary executables. Versions are available for Windows, OS/2, DOS, and UNIX.
+
* [http://www.gnu.org/software/gawk/manual/gawk.html The GNU Awk User's Guide] is available for online reference.
+
  
[[Category:Linux Core Concepts]]
+
{{#subobject:|slideIndex=4|slideCaption=
[[Category:Articles]]
+
<h4>spectromas</h4>
{{ArticleFooter}}
+
 
 +
[[Package:Awesome_(Window_Manager)|Awesome WM]]
 +
|slideImage=File:awesome.jpg|slideLink=Package:Awesome (Window Manager)}}
 +
 
 +
{{#seo:
 +
|title=Funtoo Linux
 +
|keywords=funtoo,linux,gentoo,Daniel Robbins
 +
|description=Funtoo Linux is a Gentoo-based OS that uses a git-based Portage tree. Run by Daniel Robbins, creator of Gentoo.
 +
}}

Revision as of 04:54, January 2, 2015

Funtoo Linux is a Linux-based operating system that is a variant of Gentoo Linux, led by Daniel Robbins (the creator and former Chief Architect of Gentoo) who serves as benevolent dictator for life (BDFL) of the project. Funtoo Linux is optimized for the best possible performance, supporting Intel Core i7, AMD FX Processors, and others. See what we support. See Distinctives, below, for more information about what makes us special.

Other Funtoo Projects include:


Ebuild pages recently updated: Lightdm, Qjackctl, Firewalld, Odhcploc, Dhcpcd, Quodlibet, NetworkManager, Lilo, Darktable, Zoneminder more...

Want to submit a screenshot? See here.

Support Funtoo and help us grow! Donate $15 per month and get a free SSD-based Funtoo Virtual Container.

News

Drobbins

Better Experiences: Ego and Vim

Info on Funtoo's new personality tool called 'ego', and user-focused updates to vim's defaults.
27 April 2015 by Drobbins
Drobbins

How We're Keeping You At the Center of the Funtoo Universe

Read about recent developments that keep you, our users, at the forefront of our focus as Funtoo moves forward.
10 April 2015 by Drobbins
Mgorny

New OpenGL management in Funtoo

Funtoo is switching to an improved system for managing multiple OpenGL providers (Mesa/Xorg, AMD and NVIDIA). The update may involve blockers and file collisions.
30 March 2015 by Mgorny
View More News...

Expand the wiki!

The How to 'wiki' will help get you started on wiki editing. Have a look at Requested-Documents and pages that need to be updated.

See Ebuilds for a list of all ebuild pages, and Adding an Ebuild to the Wiki for information on how to add one.

Distinctives

Funtoo Linux is a meta-distribution, which means it is built (fully automatically) with the functionality and optimizations that you want, not what some distro maintainer thought was best for you. Packages are installed directly from source code, thanks to the Portage ports system, inspired by the FreeBSD ports system, written in Python and with full advanced package management functionality.

Benefits for desktops: leaner, optimized, faster system. Additional benefits for servers: enable only what you actually need to reduce attack surface, thus improving security.

We use Git for all our development, and we also use Git to deliver our ports tree to you.

In contrast to Gentoo Linux, we offer a number of innovations, including our extensive use of git, our profile system, boot-update boot management tool, our incredibly flexible template-based networking scripts, Metro distribution build system, support of Debian, RHEL and other kernels, enhanced Python support, Portage mini-manifests, user-centric distribution model, and a large number of community infrastructure improvements.

Getting Started

Install Funtoo Linux and get involved in our user community. Get to know fellow users on our forums. Funtoo Linux has a very active IRC community on Freenode (in the #funtoo channel) and you are encouraged to hang out with us.

We welcome bug reports and suggestions. Please report bugs to our bug tracker. We take all bugs seriously, and all work performed is tracked on our bug tracker, for purposes of transparency.

Create a Funtoo account, which allows you to log in to the wiki, forums and bug tracker. See the Auth FAQ for more info about account creation.

See our FAQ for answers to common questions.

Other resources include larrythecow.org, the Gentoo blog aggregator, kernel-seeds.org, and git.funtoo.org, our cgit repository browser.