Today I Learned

awk – Introduction – Part 2

Yesterday I introduced awk and showed how it could be used to parse and edit structured text files. If you read that you will remember we had a text file called programming_languages.txt and it looked like this:

1) Python   Guido    1990
2) Ruby     Yukihiro 1995
3) Erlang   Joe      1986
4) PHP      Rasmus   1995
5) COBOL    Grace    1959

We finished by running a single awk command on it to output the 2nd and 4th field of each line.

$ awk '{print $2 "t" $4}' programming_languages.txt
Python  1990
Ruby    1995
Erlang  1986
PHP 1995
COBOL   1959

Let’s make things a little trickier by adding a BEGIN and END command.

BEGIN

The begin command is executed once before the file is parsed line by line. So it’s ideal to create a header or title for out output.

$ awk 'BEGIN 
{print "Programming Languagesn---------------------"}
{print $2 "t" $4}' programming_languages.txt
Programming Languages
---------------------
Python	1990
Ruby	1995
Erlang	1986
PHP	1995
COBOL	1959

Here you can see here we start with a BEGIN statement followed by a command between curly braces. It says to print a title, then a new line, then a load of dashed.

END

END works in a very similar way:

$ awk 'BEGIN {print "Programming Languagesn---------------------"}
> {print $2 "t" $4}
> END {print "---------------------nEnd of File"}' 
> programming_languages.txt
Programming Languages
---------------------
Python	1990
Ruby	1995
Erlang	1986
PHP	1995
COBOL	1959
---------------------
End of File

Field Separators

By default awk uses white space as the separator. If you have something different you can specify it as an option. For example with this pipe separated version of the above.

1)|Python|Guido|1990
2)|Ruby|Yukihiro|1995
3)|Erlang|Joe|1986
4)|PHP|Rasmus|1995
5)|COBOL|Grace|1959

Just add the -F”|” option:

$ awk -F"|" '{print $2 "t" $4}' programming_languages_2.txt
Python	1990
Ruby	1995
Erlang	1986
PHP	1995
COBOL	1959

Conditional Operators

As I said in part 1 of ths article, awk is a programming language. You therefore can use conditional operators. e.g. To output only programming languages released on or after 1990.

$ awk '{if ($4 >= 1990) print $2 "t" $4}' programming_languages.txt
Python	1990
Ruby	1995
PHP	1995

See the addition of “if ($4 >= 1990)” or “if field 4 is greater than or equal to 1990, then print.

More Complicated Commands

As the commands get more complicated it can be tricky to keep track whilst trying to write them on the command line. What you can do is store the commands in a file and then pass the file. For example if you added your commands to a file called ‘my_commands’ you could execute it like this:

$ awk -F"|" -f my_commands programming_languages_2.txt

Conclusion

There’s so much to awk I couldn’t possibly include it all in an article like this. Hopefully I’ve given you enough a taster that you’ll recognise how useful it could be.