Use python in shell, as if it were awk

Use python in shell, as if it were awk

·

4 min read

Use python in shell, as if it were awk

My first technical article was about how to use awk. For some reason, I like learning bash even if it doesn't always make sense. I posted an article on hacker news and somebody suggested using one-liners, Perl. I thought that was a good idea but with Python, not Perl.

After starting to work on it I realized by default python is not that fitted for this. Unfortunately, python is not that useful for command-line one-liner programs. As soon as you need an indented block (like if) there is no way around line break, but even so, I think it will be a good exercise. Awk also allows you to provide a file with the script so it might not be a big deal in the end.

Basic usage

The first thing you need is to run python with some input as code. This is easily achievable using -c

python -c 'print(1 + 1)'

The most common usage of awk is to print a few fields out of a file, for example:

$ cat file.in
Name, Population(2020), Population(2020), Land area(sq mi),
Los Angeles, 3898747, 3792621, 469.49
San Diego, 1386932, 1307402, 325.88
San Jose, 1013240,945942, 178.26
San Francisco, 873965, 805235, 46.91

$ awk  -F, '{print $1 $3}' file.in
Name Population(2020)
Los Angeles 3792621
San Diego 1307402
San Jose945942
San Francisco 805235

The same thing in Python will look like:

$ python -c 'for line in open("file.in"): fields=line.split(","); print (fields[0].strip(),fields[3].strip())'
Name    Land area(sq mi)
Los Angeles    469.49
San Diego    325.88
San Jose    178.26
San Francisco    46.91

Python looks a bit more complex and since it is longer it will be more complicated to type if this is not part of a script. Another inconvenience is that each field needs to be stripped. We also made use of ";" and wrote multiple instructions on the same line.

Begin end

Awk has begin and end actions. These are very useful if you need to do sum or averages.

$ seq 99 101 >numbers.txt
$ awk '{sum=sum+$1} END {print "Sum: "sum" Avg: "sum/NR}' numbers.tx
Sum: 300 Avg: 100

Python version will look like:

$ python -c 'sum=0; NR=0
for line in open("numbers.tx"): sum+=int(line); NR+=1
print("Sum:",sum," Avg:",sum/NR)'
Sum: 300  Avg: 100.0

Awk can be written as a one-line while the python version will need multiple lines and initialization for variables. At this point, I started to think that even if I do these multi-line commands it will be better to have a small utility function that will deal with the parsing and initialize standard awk variables.

import sys

def arg_param_value(i):
    if i < len(sys.argv):
        return sys.argv[i]
    else:
        return ""


def parse_params():
    i = 1
    while i < len(sys.argv):
        arg = sys.argv[i]
        if arg.startswith('-'):
            i += 1
            args[arg] = arg_param_value(i)
        else:
            inputs.append(sys.argv[i])
        i += 1


def code():
    if '-f' in args:
        with open(args['-f'], "r") as f:
            return f.readlines()
    else:
        return inputs.pop(0)  # removes first element from the list


def process_file():
    global NR
    FNR = 0
    with open(FILENAME, "r") as file:
        for line in file:
            line = line.strip()
            if len(line) == 0:
                continue
            if FS == '':
                fields = line.split()
            else:
                fields = line.split(FS)
            NF = len(fields)
            FNR += 1
            NR += 1
            exec(code)

args = {}
inputs = []

parse_params()
code = code()
NR = 0
FS = ''
if '-F' in args: FS = args['-F']

for FILENAME in inputs:
    process_file()

And I run it like:

$ python main.py -F ";" "print(fields[2], fields[3])" in1.csv in2.csv

Of course, there is more functionality in awk, I will slowly add it to my script and hopefully, I can turn it into something usable. There won't be a $0 $1 $2 .. $n, instead, I will use line for $0 and fields[] for $1 $2 and all the others.

After doing all this I think python is useful for these types of tasks especially if you already use it in other places. The main advantage would be that you don't have to learn awk if you already know python. The GitHub with a more complete version is here: https://github.com/liviusd/pawk

Please let me know what you think about it!