Yesterday I introduced awk and showed how it could be used to parse and edit structured text files. If you read that you will remember we had a text file called programming_languages.txt and it looked like this:
1 2 3 4 5 | 1) Python Guido 1990 2) Ruby Yukihiro 1995 3) Erlang Joe 1986 4) PHP Rasmus 1995 5) COBOL Grace 1959 |
We finished by running a single awk command on it to output the 2nd and 4th field of each line.
1 2 3 4 5 6 | $ awk '{print $2 "t" $4}' programming_languages.txt Python 1990 Ruby 1995 Erlang 1986 PHP 1995 COBOL 1959 |
Let’s make things a little trickier by adding a BEGIN and END command.
BEGIN
The begin command is executed once before the file is parsed line by line. So it’s ideal to create a header or title for out output.
1 2 3 4 5 6 7 8 9 10 | $ awk 'BEGIN {print "Programming Languagesn---------------------" } {print $2 "t" $4}' programming_languages.txt Programming Languages --------------------- Python 1990 Ruby 1995 Erlang 1986 PHP 1995 COBOL 1959 |
Here you can see here we start with a BEGIN statement followed by a command between curly braces. It says to print a title, then a new line, then a load of dashed.
END
END works in a very similar way:
1 2 3 4 5 6 7 8 9 10 11 12 13 | $ awk 'BEGIN {print "Programming Languagesn---------------------" } > {print $2 "t" $4} > END {print "---------------------nEnd of File" }' > programming_languages.txt Programming Languages --------------------- Python 1990 Ruby 1995 Erlang 1986 PHP 1995 COBOL 1959 --------------------- End of File |
Field Separators
By default awk uses white space as the separator. If you have something different you can specify it as an option. For example with this pipe separated version of the above.
1 2 3 4 5 | 1)|Python|Guido|1990 2)|Ruby|Yukihiro|1995 3)|Erlang|Joe|1986 4)|PHP|Rasmus|1995 5)|COBOL|Grace|1959 |
Just add the -F”|” option:
1 2 3 4 5 6 | $ awk -F "|" '{print $2 "t" $4}' programming_languages_2.txt Python 1990 Ruby 1995 Erlang 1986 PHP 1995 COBOL 1959 |
Conditional Operators
As I said in part 1 of ths article, awk is a programming language. You therefore can use conditional operators. e.g. To output only programming languages released on or after 1990.
1 2 3 4 | $ awk '{if ($4 >= 1990) print $2 "t" $4}' programming_languages.txt Python 1990 Ruby 1995 PHP 1995 |
See the addition of “if ($4 >= 1990)” or “if field 4 is greater than or equal to 1990, then print.
More Complicated Commands
As the commands get more complicated it can be tricky to keep track whilst trying to write them on the command line. What you can do is store the commands in a file and then pass the file. For example if you added your commands to a file called ‘my_commands’ you could execute it like this:
1 | $ awk -F "|" -f my_commands programming_languages_2.txt |
Conclusion
There’s so much to awk I couldn’t possibly include it all in an article like this. Hopefully I’ve given you enough a taster that you’ll recognise how useful it could be.