Yesterday I introduced awk and showed how it could be used to parse and edit structured text files. If you read that you will remember we had a text file called programming_languages.txt and it looked like this:
1) Python Guido 1990
2) Ruby Yukihiro 1995
3) Erlang Joe 1986
4) PHP Rasmus 1995
5) COBOL Grace 1959
We finished by running a single awk command on it to output the 2nd and 4th field of each line.
$ awk '{print $2 "t" $4}' programming_languages.txt
Python 1990
Ruby 1995
Erlang 1986
PHP 1995
COBOL 1959
Let’s make things a little trickier by adding a BEGIN and END command.
BEGIN
The begin command is executed once before the file is parsed line by line. So it’s ideal to create a header or title for out output.
$ awk 'BEGIN
{print "Programming Languagesn---------------------"}
{print $2 "t" $4}' programming_languages.txt
Programming Languages
---------------------
Python 1990
Ruby 1995
Erlang 1986
PHP 1995
COBOL 1959
Here you can see here we start with a BEGIN statement followed by a command between curly braces. It says to print a title, then a new line, then a load of dashed.
END
END works in a very similar way:
$ awk 'BEGIN {print "Programming Languagesn---------------------"}
> {print $2 "t" $4}
> END {print "---------------------nEnd of File"}'
> programming_languages.txt
Programming Languages
---------------------
Python 1990
Ruby 1995
Erlang 1986
PHP 1995
COBOL 1959
---------------------
End of File
Field Separators
By default awk uses white space as the separator. If you have something different you can specify it as an option. For example with this pipe separated version of the above.
1)|Python|Guido|1990
2)|Ruby|Yukihiro|1995
3)|Erlang|Joe|1986
4)|PHP|Rasmus|1995
5)|COBOL|Grace|1959
Just add the -F”|” option:
$ awk -F"|" '{print $2 "t" $4}' programming_languages_2.txt
Python 1990
Ruby 1995
Erlang 1986
PHP 1995
COBOL 1959
Conditional Operators
As I said in part 1 of ths article, awk is a programming language. You therefore can use conditional operators. e.g. To output only programming languages released on or after 1990.
$ awk '{if ($4 >= 1990) print $2 "t" $4}' programming_languages.txt
Python 1990
Ruby 1995
PHP 1995
See the addition of “if ($4 >= 1990)” or “if field 4 is greater than or equal to 1990, then print.
More Complicated Commands
As the commands get more complicated it can be tricky to keep track whilst trying to write them on the command line. What you can do is store the commands in a file and then pass the file. For example if you added your commands to a file called ‘my_commands’ you could execute it like this:
$ awk -F"|" -f my_commands programming_languages_2.txt
Conclusion
There’s so much to awk I couldn’t possibly include it all in an article like this. Hopefully I’ve given you enough a taster that you’ll recognise how useful it could be.