The AWK State Machine Parser Pattern
I run the Ganglia Monitoring System on the home lan to keep an eye on
temperatures, loads, disk usages, network throughput and such on the gateway
and, more recently, my desktop workstation as well1 The primary reason is
that I have noticed some instability, and I would like to know if it’s related
to temperature.. Since temperature is not reported by default, let’s say we
want to parse the output2 Using the default format, which is only loosely
structured. There is a -u
flag that you should use to get machine friendly
output, but we’ll pretend that doesn’t exist for the purpose of this article.
of lm_sensors
and report it to the monitoring system by using the gmetric
command.
So this is what we have to extract relevant numbers from.
radeon-pci-0100 Adapter: PCI adapter temp1: +50.5°C (crit = +120.0°C, hyst = +90.0°C) k10temp-pci-00c3 Adapter: PCI adapter temp1: +36.8°C (high = +70.0°C) (crit = +80.0°C, hyst = +78.0°C) f71889ed-isa-0480 Adapter: ISA adapter +3.3V: +3.23 V in1: +1.07 V (max = +2.04 V) in2: +1.09 V in3: +0.89 V in4: +0.58 V in5: +1.23 V in6: +1.53 V 3VSB: +3.25 V Vbat: +3.31 V fan1: 3978 RPM fan2: 0 RPM ALARM fan3: 0 RPM ALARM temp1: +31.0°C (high = +85.0°C, hyst = +81.0°C) (crit = +80.0°C, hyst = +76.0°C) sensor = transistor temp2: +43.0°C (high = +85.0°C, hyst = +77.0°C) (crit = +100.0°C, hyst = +92.0°C) sensor = thermistor temp3: +38.0°C (high = +70.0°C, hyst = +68.0°C) (crit = +85.0°C, hyst = +83.0°C) sensor = transistor
It’s divided into sections, but not in a way that awk understands right away. As you have suspected from the title, the key is to realise that whenever you encounter a section heading (or something else that indicates you’ve left the previous section), you make a note of that in the program state. So in my script, that is expressed as
#!/usr/bin/awk -f BEGIN { # Which section do we start the script in? unit = "none"; } /^radeon-pci-/ { # Encountered the GPU section header unit = "GPU"; } /^k10temp-pci-/ { # Encountered the CPU section header unit = "CPU"; } /^f71889ed-isa-/ { # I don't know what this is, actually, so # I'm just going to call them SYS_1 etc. unit = "SYS"; } /^temp[0-9]:/ { # Yay, a temperature reading! # Which number does this reading have? number = substr($1, 5, 1); # Which temperature is given? match($2, /+([0-9.]+)°C/, matches); temp = substr( $2, matches[1, "start"], matches[1, "length"] ); # Store the results in an array temps[unit "_" number] = temp; } END { # Call gmetric for all temperatures found for (temp in temps) { system( "gmetric -t uint16 -u Celsius" " -n " temp " -v " temps[temp] ); } }
The general pattern, as you probably realise, is a bunch of small pattern-action statements that look like3 Where text enclosed in angle brackets represent metasyntactic variables, i.e. placeholders in this generic example.
/〈regex for section header〉/ {
〈state variable〉 = 〈section identifier〉
}
and then one or more pattern-action statements that actually extract data that look something like
/〈regex indicating data to parse/ { switch (〈state variable〉) { case 〈section A〉: 〈action〉; case 〈section B〉: 〈different action〉; } }
Once you’re fairly comfortable with awk, using it for these things is so convenient that I highly recommend spending some time to get familiar with it. It’s a tiny language that can be learnt in a day or so, but surprisingly useful.
In Perl ( update)
Since I’ve started using Perl more for the things I used to do in awk, here’s the corresponding script in Perl also.
#!/usr/bin/perl -ln $unit = do { if (/^radeon-pci-/) { "GPU" } elsif (/^k10temp-pci-/) { "CPU" } elsif (/^f71889ed-isa-/) { "SYS" } else { $unit } }; /temp([0-9]): *\+([0-9.]+)°C/ and $temps{"${unit}_${1}"} = $2; END { system("gmetric -t uint16 -u Celsius -n $_ -v $temps{$_}") for keys %temps; }
It’s a bit more concise for mainly for three reasons:
- It let’s us fold together all those assignments to the
unit
variable into onedo
expression. - Perl’s built-in regex language features turn the five-line central awk block into just one line.
- Perl is flexible with control flow, allowing
and
as a primitiveif
statement and in-line for loops.