Entropic Thoughts

The AWK State Machine Parser Pattern

The AWK State Machine Parser Pattern

I run the Ganglia Monitoring System on the home lan to keep an eye on temperatures, loads, disk usages, network throughput and such on the gateway and, more recently, my desktop workstation as well1 The primary reason is that I have noticed some instability, and I would like to know if it’s related to temperature.. Since temperature is not reported by default, let’s say we want to parse the output2 Using the default format, which is only loosely structured. There is a -u flag that you should use to get machine friendly output, but we’ll pretend that doesn’t exist for the purpose of this article. of lm_sensors and report it to the monitoring system by using the gmetric command.

So this is what we have to extract relevant numbers from.

radeon-pci-0100
Adapter: PCI adapter
temp1:        +50.5°C  (crit = +120.0°C,
                        hyst = +90.0°C)

k10temp-pci-00c3
Adapter: PCI adapter
temp1:        +36.8°C  (high = +70.0°C)
                       (crit = +80.0°C,
                        hyst = +78.0°C)

f71889ed-isa-0480
Adapter: ISA adapter
+3.3V:        +3.23 V
in1:          +1.07 V  (max =  +2.04 V)
in2:          +1.09 V
in3:          +0.89 V
in4:          +0.58 V
in5:          +1.23 V
in6:          +1.53 V
3VSB:         +3.25 V
Vbat:         +3.31 V
fan1:        3978 RPM
fan2:           0 RPM  ALARM
fan3:           0 RPM  ALARM
temp1:        +31.0°C  (high = +85.0°C,
                        hyst = +81.0°C)
                       (crit = +80.0°C,
                        hyst = +76.0°C)
                        sensor = transistor
temp2:        +43.0°C  (high = +85.0°C,
                        hyst = +77.0°C)
                       (crit = +100.0°C,
                        hyst = +92.0°C)
                        sensor = thermistor
temp3:        +38.0°C  (high = +70.0°C,
                        hyst = +68.0°C)
                       (crit = +85.0°C,
                        hyst = +83.0°C)
                        sensor = transistor

It’s divided into sections, but not in a way that awk understands right away. As you have suspected from the title, the key is to realise that whenever you encounter a section heading (or something else that indicates you’ve left the previous section), you make a note of that in the program state. So in my script, that is expressed as

#!/usr/bin/awk -f

BEGIN {
    # Which section do we start the script in?
    unit = "none";
}

/^radeon-pci-/ {
    # Encountered the GPU section header
    unit = "GPU";
}

/^k10temp-pci-/ {
    # Encountered the CPU section header
    unit = "CPU";
}

/^f71889ed-isa-/ {
    # I don't know what this is, actually, so
    # I'm just going to call them SYS_1 etc.
    unit = "SYS";
}

/^temp[0-9]:/ {
    # Yay, a temperature reading!
    # Which number does this reading have?
    number = substr($1, 5, 1);

    # Which temperature is given?
    match($2, /+([0-9.]+)°C/, matches);
    temp = substr(
        $2,
        matches[1, "start"],
        matches[1, "length"]
    );

    # Store the results in an array
    temps[unit "_" number] = temp;
}

END {
    # Call gmetric for all temperatures found
    for (temp in temps) {
        system(
            "gmetric -t uint16 -u Celsius"
            " -n " temp " -v " temps[temp]
        );
    }
}

The general pattern, as you probably realise, is a bunch of small pattern-action statements that look like3 Where text enclosed in angle brackets represent metasyntactic variables, i.e. placeholders in this generic example.

/〈regex for section header〉/ {
    〈state variable〉 = 〈section identifier〉
}

and then one or more pattern-action statements that actually extract data that look something like

/〈regex indicating data to parse/ {
    switch (〈state variable〉) {
    case 〈section A〉:
        〈action〉;
    case 〈section B〉:
        〈different action〉;
    }
}

Once you’re fairly comfortable with awk, using it for these things is so convenient that I highly recommend spending some time to get familiar with it. It’s a tiny language that can be learnt in a day or so, but surprisingly useful.

In Perl ( update)

Since I’ve started using Perl more for the things I used to do in awk, here’s the corresponding script in Perl also.

#!/usr/bin/perl -ln

$unit = do {
    if    (/^radeon-pci-/)   { "GPU" }
    elsif (/^k10temp-pci-/)  { "CPU" }
    elsif (/^f71889ed-isa-/) { "SYS" }
    else                     { $unit }
};

/temp([0-9]): *\+([0-9.]+)°C/
    and $temps{"${unit}_${1}"} = $2;

END {
    system("gmetric -t uint16 -u Celsius -n $_ -v $temps{$_}")
        for keys %temps;
}

It’s a bit more concise for mainly for three reasons:

  • It let’s us fold together all those assignments to the unit variable into one do expression.
  • Perl’s built-in regex language features turn the five-line central awk block into just one line.
  • Perl is flexible with control flow, allowing and as a primitive if statement and in-line for loops.

Referencing This Article