What the Meteorologist: Announcing wxpull
“I get your point about probabilistic weather forecasts,” my wife says. “I agree they are better. But those plots are horrible to read and the ux of the ecmwf website is not great. Why don’t you create a better frontend for it?”
“I have looked into it before,” I say, “but their api costs hundreds of thousands to license.”
She goes quiet for a few moments. “Aren’t they funded by government money? They must have some open data.”
“I don’t think so,” I said, but I couldn’t let it go. I discovered a few days later that they do have some open data. In particular, the data set called wmo Recommended looks like it does come with a little ensemble data.1 There is also a wmo Core subset, but that is too small to be usable for private purposes. I’m sure it’s useful for people modeling the climate because it contains essentials such as geopotential height which nobody cares about when deciding whether to bring an umbrella or put on sunscreen.
Ensemble forecasts
In case you’re not familiar with how this works: ecmwf has a computer program that simulates weather. To get a weather forecast, they put current measurements from around the globe into this model and then run it forward, and write down the results, like temperature, air pressure, wave height, etc. This produces a point forecast: their best guess for what the weather will be, given what it is today.
But weather is a famously chaotic system, meaning small changes in current measurements can result in widely different outcomes later. To get a sense for how large the uncertainty in outcomes are, the ecmwf also fudges the current measurements randomly a little and then run the model again, getting a different outcome. They do this around fifty times. This is called the ensemble forecast.
The point forecast (best guess) for a location at 12 o’clock three days from now might be 14 °C with no rain. But in maybe eight of the simulations with adjusted parameters, the forecast is 10 °C with 4 mm/h of rain at the same time, and so on. The ensemble forecast then would be a temperature of 10–17 °C, a 37 % probability of a light drizzle and 16 % probability of rain.
The ensemble forecast is much more nuanced and more useful for planning than the point forecast. It tells us the likely span of weathers, how confident we can be about the weather, and it’s also less likely to be wrong. (The point forecast is almost guaranteed to be wrong, given how difficult weather is to predict.)
Early trials of wxpull
The current version of this service only produces reports for Stockholm, but feel free to ask for other locations if you would like to use it. A report looks like
wxpull 20250929 00 06 12 18 p>1 p>10 ---------------------------------------------- 29 M 82 84 85 81 87 83 0 0 30 T 82 83 83 84 86 83 11 0 01 W 81 82 84 86 42 0 02 T 79 81 83 86 9 0 03 F 79 83 83 87 15 0 04 S 81 84 83 88 70 9 05 S 82 88 82 87 85 16 06 M 78 83 68 6 07 T 66 14 08 W 47 8 09 T
The width is set to fit into my phone’s web browser in portrait mode.
The first column is the day of month, followed by a letter indicating weekday. Then there are four columns: one for midnight, one for six in the morning, one for midday, and one for six in the evening. These are utc times – that’s not a big difference for Sweden, but might be for other locations. Under these columns are temperatures. Two numbers means a confidence interval based on ensemble data, and one number means a point forecast.2 The open data only makes ensemble uncertainty information available for every twelfth hour, while the point forecast data is available for every sixth hour for the first two days.
The temperatures are specified in kelvin, but with the leading digit removed. Thus if they read 84, that means 284 K, or 11 °C. If they read 59, that means 259 K, or −14 °C. If they read 04, that means 304 K, or 31 °C.3 I’m not fully convinced kelvin are the right choice for a weather report, but I’m willing to give it a shot. It is really nice to be able to unambiguously print temperatures in two-digit columns.
Following that are two columns for precipitation, one for any precipitation at all (totaling greater than 1 mm over the day) and one for significant rainfall (totaling greater than 10 mm over the day). These numbers are probabilities given as percentages, i.e. if the first precipitation column says 50, that means some rain is just as likely as none.
Finding the forecast data
Making this required a lot of sleuthing to get the right data out of ecmwf. Their api is clearly meant for other meteorologists, not hackers.
If we go to the wmo Recommended page and click “access the free version” we are taken to a bunch of directories that look named after timestamps. We click into the latest created at midnight4 Based on intuition. I had a feeling we would get the most different types of data from midnight-based forecasts, and that turns out to be true., and we are met with a wall of filenames like
A_HDXA92ECMW240000_C_ECMF_20250924000000_an_d_925hPa_global_0p5deg_grib2.bin A_HEXE01ECEP240000_C_ECMF_20250924000000_24h_ep_tpg1_global_0p5deg_grib2.bin A_HEXQ99ECEP240000_C_ECMF_20250924000000_144h_ep_tpg100_global_0p5deg_grib2.bin A_HHXE50ECED240000_C_ECMF_20250924000000_24h_es_gh_500hPa_global_0p5deg_grib2.bin A_HHXN25ECMF240000_C_ECMF_20250924000000_108h_gh_250hPa_global_0p5deg_grib2.bin A_HIXG88ECMF240000_C_ECMF_20250924000000_36h_mwp_global_0p5deg_grib2.bin A_HJXD88ECMF240000_C_ECMF_20250924000000_18h_swh_global_0p5deg_grib2.bin
Going back to the table on the wmo Recommended page, we can find among “probabilities” a row that says “total precipitation of at least 1 mm”. That sounds like a useful forecast to start with, and as a bonus, its short name is tpg1, a fairly distinct sequence of characters, meaning we are more likely to find it among the wall of filenames.
Indeed, we find the following file:
A_HEXE01ECEP240000_C_ECMF_20250924000000_24h_ep_tpg1_global_0p5deg_grib2.bin
To understand what goes on here, we need to pay attention to the note on the wmo Recommended page that says “File names follow the wmo standards.” I had to go partway down that rabbit hole; whether you like it or not, you’re coming down with me, but you’re getting the whirlwind edition.
A quick web search takes us to the wmo File Name Recommended Practice Documentation, where we learn that the first character of the filename tells us how to interpret the next group of characters. It says that a leading A means
the next field will be decoded as a standard Abbreviated Heading
Searching the web for wmo abbreviated headings leads to a page for the wmo
Communication Header: Telecommunications Abbreviated Heading Symbolic Structure
Explained.5 At first I was confused by the presence of the word
telecommunications in there but I guess it’s because bulletins were sent over
pre-internet information channels. Here we learn how to interpret the next
field in the filename, the part where it says HEXE01ECEP240000
. We split it up
into seven subfields: H-E-X-E-01-ECEP-240000
.
- The leading H can be looked up in Table A of wmo Manual 386, and indicates the file contains grid point information. This is the same for all forecasts published in the wmo Recommended data set.
- The next character (here E) indicates the type of forecast according to Table
B2. E means precipitation, but the wmo Recommended data set also includes D
(thickness6 And this seems to be used for the divergence forecasts. You can
read more about divergence here, if nothing else to see the oddly formatted
university notes that contain an all-caps
FINAL SPECIAL NOTE!!! IF WINDS WERE PURELY GEOSTROPHIC EVERYWHERE THERE WOULD NOT BE WEATHER!!!
), H (height7 In this case geopotential height.), I (unassigned), J (wave height), L (unassigned), N (radiation8 In this case, skin temperature.), P (pressure), R (humidity), T (temperature), U (eastward wind component), V (westward wind component), W (wind), X (unassigned).9 ecmwf here uses the unassigned identifiers for wave period data. - Then comes a character that is always X in these forecasts, and the meaning of that is not given by any wmo table I have found.
- After that is a letter signifying how far out the forecast is for, according to Table C4. As a fun surprise, the tpg1 files specifically do not follow this convention, and seem to pick this character arbitrarily. Our code will have to have a separate conversion table for constructing tpg1 filenames, while using the standard for all other forecasts, including tpg10.
- After that are two numbers, in this case 01. These indicate the height over sea level the forecast is made for, in tens of hPa of air pressure. Except HAHA not quite. The pressures 65, 73, and 81 refer to actual distance-over-sea-level altitudes, rather than pressure altitudes. There are also special numbers like 98 which means “air properties at ground level”, 88 which means “ground properties”, and 96 is “whichever altitude has the strongest winds.” For tpg forecasts, this number is not an altitude at all. Instead, it indicates the precipitation rate threshold, i.e. 01 for 1 mm, 25 for 25 mm, etc.
- This is followed by four characters identifying “the processing centre that
generated the bulletin.” The data set has
ECMW
,ECEP
,ECED
,ECEM
,ECMH
and maybe more. These are all variants of ecmwf, except they are further indicating specific models or model measurements. For example,ECMH
seems to be used for models of geopotential height,ECEP
for probability outputs from ensemble models,ECEM
andECED
are means and standard deviations from ensemble models. - Then’s the date stamp of the bulletin, given as meteorologists are wont to do: day of month, hour, minute.10 Meteorologists never need to know the month or the year of a bulletin because weather forecasts have a very short shelf-life anyway, so a day-of-month number can only mean the nearest day with that number.
We now know how to interpret HEXE01ECEP240000
: it’s ostensibly a probabilistic
(ECEP
) 24-hour (E
) precipitation (E
) forecast of exceeding 1 mm
accumulated precipitation (01
), made at midnight on the 24th 240000
.
Following that product identifier in the file name comes some uninteresting
stuff: information about the originator (C_ECMF
) and a timestamp again, this
time including year, month, and seconds.
This is where wmo guidance stops, and the “freeform” part of the filename
starts. The example file we were looking at had a freeform field of
24h_ep_tpg1_global_0p5deg_grib2
. This seems to follow a structure of
- How long out the forecast is for, in this case 24 hours (
24h
). - What type of forecast it is, in this case the ensemble probability (
ep
) of precipitation of at least 1 mm (tpg1
). - The rest is the same for all files: they’re covering the entire world (
globe
), at 0.5 degree accuracy (0p5deg
), and the data is stored in the grib2 format.
We’ll download this file; it seems useful. It would let us answer what the probability is that it will rain on us – any amount of rain at all – in the next 24 hours after the forecast is made.
Reading the precipitation forecast
I had not encountered the grib2 format before, but it is popular for this kind
of data. A brief web search reveals the wgrib2
command-line tool might be
helpful. Let’s see, is that in nixpkgs?… Nope. But we can add it. It’s a
straightforward CMake build process with no required dependencies.
We do that and it builds, but when we try to use it with the file we got from ecmwf, a error is vomit:
*** FATAL ERROR: packing type 40 not supported ***
Our llm helpfully tells us this means wgrib2
was not compiled with support
for jpeg2000 compression.11 Hey! I remember reading about jpeg2000 being
the cool new thing for the web in a computer magazine many, many years ago. This
is the first time I’ve seen it in the wild. All of the documentation for
compiling wgrib2
with jp2 support is outdated, so we need to read the
CMakeLists to figure it out. Seems like jp2 support is not built into wgrib2
these days, but supplied by nceplibs-g2c
. I guess we’re packaging that for
nixpkgs too.
The good thing about doing this through nixpkgs is that five years from now when we want to read grib2 data again, we don’t have to solve these build problems again – the Nix build declaratively documents all the steps we need to take to get from nothing to working application.
Once done, wgrib2
can read the precipitation data we downloaded.
$ wgrib2 A_HEXE01ECEP240000_C_ECMF_20250924000000_24h_ep_tpg1_global_0p5deg_grib2.bin 1:0:d=2025092400:TPRATE:surface:0-1 day acc@(fcst,dt=12 hour),missing=0:prob >1:prob fcst 0/0:probability forecast
Uh … huh. I can guess what some of this means. TPRATE
might mean total
precipitation rate. surface
might mean it’s a forecast pertaining what goes on
at ground level. 0–1 day acc
means it’s the amount that has accumulated
between 0 and 1 days after the reference time, seen as d=2025092400
.
We can further query this file with wgrib2
. For example, what’s the time at
which this forecast can be verified?
$ wgrib2 -vt A_HEXE01ECEP240000_C_ECMF_20250924000000_24h_ep_tpg1_global_0p5deg_grib2.bin 1:0:vt=2025092500
We can also get a rough idea of the data contained inside.
$ wgrib2 -stats A_HEXE01ECEP240000_C_ECMF_20250924000000_24h_ep_tpg1_global_0p5deg_grib2.bin 1:0:ndata=259920:undef=0:mean=39.0755:min=0:max=100:cos_wt_mean=42.6476
I guess this means there are 259920 grid points covering the world, the lowest probability of rain for any grid point is 0 %, the highest is 100 %. Interestingly, the mean is around 40 %. I suppose that means that on September 24th, if we were to be teleported to a random location on earth, it’s more likely to be dry than rainy.12 At first I wanted to say “random grid point” under the assumption that the grid is not uniform – not that I’m in any way qualified, but I have read that one fluid dynamics book which said (non-uniform) grid design is one of the most high-leverage activities in creating a simulation – but further interaction has me believing that whatever non-uniform grid they use for simulation has had its results quantised into a uniform grid for these files.
We can also find the forecast for a specific grid point. Let’s see what things are like in my city:
$ wgrib2 -lon 18.07 59.33 A_HEXE01ECEP240000_C_ECMF_20250924000000_24h_ep_tpg1_global_0p5deg_grib2.bin 1:0:lon=18.000000,lat=59.500000,val=0
Since the data only has half-degree precision, wgrib2
returns the closest
forecast information which is for 59°30′ N, 18°00′ E. Zero percent of rain!
We can download forecasts of the probabilities of exceeding 1 mm, 5 mm, 10 mm, 20 mm, 25 mm, 50 mm and 100 mm, and these forecasts are available in 12-hour increments into the future (though each is a 24-hour aggregated precipitation forecast). We could sit down and start sketching out a more useful interface against this.
Reading the temperature forecast
But we don’t just want precipitation amounts. We are also interested in
temperatures. According to the table at the wmo Recommended data set page, the
open data includes probabilistic temperature information too. Under “ens Mean
and Standard deviation” we have a row for “Temperature”. Using our knowledge of
file names, we’ll look for a file with a name containing _24h_em_t_
, as this
is a 24-hours-out ensemble mean temperature forecast.
There are both 850 hPa and 250 hPa versions of these forecasts. We’ll download
the 850 hPa one as that is what’s closest to surface pressure altitude. The file
name we find starts with A_HTXE85ECEM
, further confirming we have found the
one. What’s the data in this look like?
$ wgrib2 -stats A_HTXE85ECEM240000_C_ECMF_20250924000000_24h_em_t_850hPa_global_0p5deg_grib2.bin 1:0:ndata=259920:undef=0:mean=276.258:min=231.576:max=303.976:cos_wt_mean=282.445
Ah, of course, temperatures in kelvin. Converted to normal human degrees of temperature, that would be a range of −42 °C to +31 °C. What is it at my location?
$ wgrib2 -lon 18.07 59.33 A_HTXE85ECEM240000_C_ECMF_20250924000000_24h_em_t_850hPa_global_0p5deg_grib2.bin 1:0:lon=18.000000,lat=59.500000,val=274.976
That’s just under 2 °C. But if we cross-reference with the corresponding meteogram, we see that the 2 m (“surface”) temperature at that time will be somewhere around 8 °C.
The problem is we’ve downloaded the forecast for a pressure altitude of 850 hPa, which corresponds to a distance altitude of like 2,000 m.13 Here’s how you know: take the log of the pressure altitude, which in this case is around 2.9. Subtract this from three, leaving 0.1. Multiply this with 20,000 m. That’s the formula for approximating a distance altitude from pressure altitude: \(20,000 \times 3 - \log{p}\). We might think we can save ourselves by extrapolation: at these altitudes, air temperature drops linearly with distance from the ground. Maybe we can extrapolate the 850 hPa down to sea level. Unfortunately, the lapse rate is normally given as 6.5 °C per 1000 metres, which would give us a forecast of 15 °C at sea level. That’s also nowhere near enough to 8 °C to be useful.14 In fact, it seems the difference between the 850 hPa temperature and the ground level temperature is small during night and higher during day, averaging out to 6.5. This is what it looks like, with every second point indicating a forecast of night temperatures.
That’s a shame. Temperatures at a pressure altitude of 850 hPa might be useful for weather modeling, but not useful to a someone who just wants a probabilistic weather forecast.
Licensing requirements
Okay, but is there any chance of obtaining an API licence for the data we want? I’m a citizen of one of the funding countries of ecmwf, after all. The licensing information is confusingly structured on their web page, but the easiest-to-read page I’ve found can be summarised with no. Free access to the data is available to people who
fulfil national governmental obligations related to the protection of life and property and use the data only for research projects and educational use.
That’s not a description of the personal dashboard I imagine. Maybe we can sort of shoehorn this in as research? It is possible to get a time-limited, but renewable, licence for research. Unfortunately, a research project
must not provide operational services that could be used for operational or commercial purposes by yourself or any other third-party.
Well, obviously a weather dashboard could be used for operational or commercial services by myself or a third party, so that licence is out of the question.
Dreams = crushed. Thanks, ecmwf.
Alternative solutions for temperature
We could still try to do this. We have a point forecast for the temperature at
ground level: this should be something like HTXE98
, and files with that name
exist. This gives a temperature of 6.6 °C for the reference time we’ve used as
an example, and that is probably correct. What we need is an idea of the level
of uncertainty around this estimation.
Recalling that the ensemble forecast is made by running the point forecast fifty times with fudged parameters, we might have the idea of deriving our own ensemble forecast from successive point forecasts. Two point forecasts after each other will be run with slightly different parameters, after all. We’ll have much fewer than fifty, but we could get 19 or so.
That sounds like it would work, but it would be complicated. We could also just use the standard deviation from the 850 hPa forecast and assume the uncertainty around the ground level forecast is similar. I don’t know if that’s the case, but it looks reasonably similar for the few data points I tried to verify manually.
This is how we end up with the current wxpull report.
wxpull 20250929 00 06 12 18 p>1 p>10 ---------------------------------------------- 29 M 82 84 85 81 87 83 0 0 30 T 82 83 83 84 86 83 11 0 01 W 81 82 84 86 42 0 02 T 79 81 83 86 9 0 03 F 79 83 83 87 15 0 04 S 81 84 83 88 70 9 05 S 82 88 82 87 85 16 06 M 78 83 68 6 07 T 66 14 08 W 47 8 09 T
At this stage, I will try using this for a while to better learn how it can be improved. Some things I have thought about:
- Try to use the lapse rate15 How much temperature increases between 850 hPa and surface. in the forecast to extrapolate the 850 hPa forecast down to ground level for the last three days. The important thing here is to add in additional uncertainty from the extrapolation.
- Interpolate uncertainty for 06 and 18 hours, so we get a confidence interval for all temperature forecasts.
- Find better ways to present the data that still give a quick overview and fit on a phone screen.
- Set up a script that watches the nginx logs for 404s on the
wx
url and automatically generates reports for those locations. - Generate alternative reports that are wider but show temperatures in degrees Celsius.
- Convert the time of day to local time, rather than utc. (This is trickier than just arithmetic, because it will change the day some forecasts are for.)
I just don’t know yet. We’ll see!