The AWK State Machine Parser Pattern
I run the Ganglia Monitoring System on the home lan to keep an eye on
temperatures, loads, disk usages, network throughput and such on the gateway
and, more recently, my desktop workstation as well1 The primary reason is
that I have noticed some instability, and I would like to know if it’s related
to temperature.. Since temperature is not reported by default, let’s say we
want to parse the output2 Using the default format, which is only loosely
structured. There is a -u flag that you should use to get machine friendly
output, but we’ll pretend that doesn’t exist for the purpose of this article.
of lm_sensors and report it to the monitoring system by using the gmetric
command.
So this is what we have to extract relevant numbers from.
radeon-pci-0100
Adapter: PCI adapter
temp1: +50.5°C (crit = +120.0°C,
hyst = +90.0°C)
k10temp-pci-00c3
Adapter: PCI adapter
temp1: +36.8°C (high = +70.0°C)
(crit = +80.0°C,
hyst = +78.0°C)
f71889ed-isa-0480
Adapter: ISA adapter
+3.3V: +3.23 V
in1: +1.07 V (max = +2.04 V)
in2: +1.09 V
in3: +0.89 V
in4: +0.58 V
in5: +1.23 V
in6: +1.53 V
3VSB: +3.25 V
Vbat: +3.31 V
fan1: 3978 RPM
fan2: 0 RPM ALARM
fan3: 0 RPM ALARM
temp1: +31.0°C (high = +85.0°C,
hyst = +81.0°C)
(crit = +80.0°C,
hyst = +76.0°C)
sensor = transistor
temp2: +43.0°C (high = +85.0°C,
hyst = +77.0°C)
(crit = +100.0°C,
hyst = +92.0°C)
sensor = thermistor
temp3: +38.0°C (high = +70.0°C,
hyst = +68.0°C)
(crit = +85.0°C,
hyst = +83.0°C)
sensor = transistor
It’s divided into sections, but not in a way that awk understands right away. As you have suspected from the title, the key is to realise that whenever you encounter a section heading (or something else that indicates you’ve left the previous section), you make a note of that in the program state. So in my script, that is expressed as
#!/usr/bin/awk -f BEGIN { # Which section do we start the script in? unit = "none"; } /^radeon-pci-/ { # Encountered the GPU section header unit = "GPU"; } /^k10temp-pci-/ { # Encountered the CPU section header unit = "CPU"; } /^f71889ed-isa-/ { # I don't know what this is, actually, so # I'm just going to call them SYS_1 etc. unit = "SYS"; } /^temp[0-9]:/ { # Yay, a temperature reading! # Which number does this reading have? number = substr($1, 5, 1); # Which temperature is given? match($2, /+([0-9.]+)°C/, matches); temp = substr( $2, matches[1, "start"], matches[1, "length"] ); # Store the results in an array temps[unit "_" number] = temp; } END { # Call gmetric for all temperatures found for (temp in temps) { system( "gmetric -t uint16 -u Celsius" " -n " temp " -v " temps[temp] ); } }
The general pattern, as you probably realise, is a bunch of small pattern-action statements that look like3 Where text enclosed in angle brackets represent metasyntactic variables, i.e. placeholders in this generic example.
/〈regex for section header〉/ {
〈state variable〉 = 〈section identifier〉
}
and then one or more pattern-action statements that actually extract data that look something like
/〈regex indicating data to parse/ { switch (〈state variable〉) { case 〈section A〉: 〈action〉; case 〈section B〉: 〈different action〉; } }
Once you’re fairly comfortable with awk, using it for these things is so convenient that I highly recommend spending some time to get familiar with it. It’s a tiny language that can be learnt in a day or so, but surprisingly useful.
In Perl ( update)
Since I’ve started using Perl more for the things I used to do in awk, here’s the corresponding script in Perl also.
#!/usr/bin/perl -ln $unit = do { if (/^radeon-pci-/) { "GPU" } elsif (/^k10temp-pci-/) { "CPU" } elsif (/^f71889ed-isa-/) { "SYS" } else { $unit } }; /temp([0-9]): *\+([0-9.]+)°C/ and $temps{"${unit}_${1}"} = $2; END { system("gmetric -t uint16 -u Celsius -n $_ -v $temps{$_}") for keys %temps; }
It’s a bit more concise for mainly for three reasons:
- It let’s us fold together all those assignments to the
unitvariable into onedoexpression. - Perl’s built-in regex language features turn the five-line central awk block into just one line.
- Perl is flexible with control flow, allowing
andas a primitiveifstatement and in-line for loops.