Use python in shell, as if it were awk
My first technical article was about how to use awk. For some reason, I like learning bash even if it doesn't always make sense. I posted an article on hacker news and somebody suggested using one-liners, Perl. I thought that was a good idea but with Python, not Perl.
After starting to work on it I realized by default python is not that fitted for this. Unfortunately, python is not that useful for command-line one-liner programs. As soon as you need an indented block (like if) there is no way around line break, but even so, I think it will be a good exercise. Awk also allows you to provide a file with the script so it might not be a big deal in the end.
Basic usage
The first thing you need is to run python with some input as code. This is easily achievable using -c
python -c 'print(1 + 1)'
The most common usage of awk is to print a few fields out of a file, for example:
$ cat file.in
Name, Population(2020), Population(2020), Land area(sq mi),
Los Angeles, 3898747, 3792621, 469.49
San Diego, 1386932, 1307402, 325.88
San Jose, 1013240,945942, 178.26
San Francisco, 873965, 805235, 46.91
$ awk -F, '{print $1 $3}' file.in
Name Population(2020)
Los Angeles 3792621
San Diego 1307402
San Jose945942
San Francisco 805235
The same thing in Python will look like:
$ python -c 'for line in open("file.in"): fields=line.split(","); print (fields[0].strip(),fields[3].strip())'
Name Land area(sq mi)
Los Angeles 469.49
San Diego 325.88
San Jose 178.26
San Francisco 46.91
Python looks a bit more complex and since it is longer it will be more complicated to type if this is not part of a script. Another inconvenience is that each field needs to be stripped. We also made use of ";" and wrote multiple instructions on the same line.
Begin end
Awk has begin and end actions. These are very useful if you need to do sum or averages.
$ seq 99 101 >numbers.txt
$ awk '{sum=sum+$1} END {print "Sum: "sum" Avg: "sum/NR}' numbers.tx
Sum: 300 Avg: 100
Python version will look like:
$ python -c 'sum=0; NR=0
for line in open("numbers.tx"): sum+=int(line); NR+=1
print("Sum:",sum," Avg:",sum/NR)'
Sum: 300 Avg: 100.0
Awk can be written as a one-line while the python version will need multiple lines and initialization for variables. At this point, I started to think that even if I do these multi-line commands it will be better to have a small utility function that will deal with the parsing and initialize standard awk variables.
import sys
def arg_param_value(i):
if i < len(sys.argv):
return sys.argv[i]
else:
return ""
def parse_params():
i = 1
while i < len(sys.argv):
arg = sys.argv[i]
if arg.startswith('-'):
i += 1
args[arg] = arg_param_value(i)
else:
inputs.append(sys.argv[i])
i += 1
def code():
if '-f' in args:
with open(args['-f'], "r") as f:
return f.readlines()
else:
return inputs.pop(0) # removes first element from the list
def process_file():
global NR
FNR = 0
with open(FILENAME, "r") as file:
for line in file:
line = line.strip()
if len(line) == 0:
continue
if FS == '':
fields = line.split()
else:
fields = line.split(FS)
NF = len(fields)
FNR += 1
NR += 1
exec(code)
args = {}
inputs = []
parse_params()
code = code()
NR = 0
FS = ''
if '-F' in args: FS = args['-F']
for FILENAME in inputs:
process_file()
And I run it like:
$ python main.py -F ";" "print(fields[2], fields[3])" in1.csv in2.csv
Of course, there is more functionality in awk, I will slowly add it to my script and hopefully, I can turn it into something usable. There won't be a $0 $1 $2 .. $n, instead, I will use line for $0 and fields[] for $1 $2 and all the others.
After doing all this I think python is useful for these types of tasks especially if you already use it in other places. The main advantage would be that you don't have to learn awk if you already know python. The GitHub with a more complete version is here: https://github.com/liviusd/pawk
Please let me know what you think about it!