Today I Learned

awk – Introduction – Part 1

I’ve been using Linux for 10 years and I’ve avoided looking at awk. Like most people I tend to reach for the tool that feels most familiar. If I want a quick script for processing a text file I’ll go to Python. Today I decided to bite the bullet and take a look at what I was missing out on.

What is awk?

awk isn’t just a Linux command, it is in fact a programming language. It’s primarily using for text and string manipulation within shell scripts. It’s particularly useful when the text can be viewed as having records and fields. For example, consider the following text file called languages.txt. It lists programming languages, their creators and the year they were released.

1) Python   Guido    1990
2) Ruby     Yukihiro 1995
3) Erlang   Joe      1986
4) PHP      Rasmus   1995
5) COBOL    Grace    1959

In awk speak, each line is a ‘record’ and each column is a ‘field’.

By default awk will split each record into fields using whitespace. Although you can specify your own delimiter if required. The awk command looks like this:

awk options program file

We can ignore the ‘options’ for now so we can jump straight into the ‘program’.

The Program

The Flow

awk programs have three main parts, the BEGIN, the body and the END and executes as follows:

  • Execute the BEGIN commands
  • Then loop through each line in the file and execute the body command on each.
  • Once finished with all lines execute the END commands.

Assigned Variables

As each line is read, awk assigns variables to each record in numerical order.

  • $0 – Represents the full line
  • $1 – Represents the first item
  • $2 – the second item

So in the first line of our example above awk would split it as follows:

So a very simple awk program to print all the programming languages and the year they were released would look like this:

$ awk '{print $2 "t" $4}' programming_languages.txt
Python	1990
Ruby	1995
Erlang	1986
PHP	1995
COBOL	1959

This is saying: “for each line of the file, print field 2 then a tab then field 4.”

Conclusion

Today I’ve introduced awk, it’s basic structure and showed a basic example. In my next post I’ll show some other features which may prove useful.