From....
http://www.anenglishmanscastle.com/HARRY_READ_ME.txt
...I find these interesting.
22. Right, time to stop pussyfooting around the niceties of Tim's labyrinthine software
suites - let's have a go at producing CRU TS 3.0! since failing to do that will be the
definitive failure of the entire project..
Â…Â…Â…Â…Â…Â…..
Essentially, two thirds of the stations have no normals! Of course, this still leaves us with
a lot more stations than we had for tmean (goodnorm reported 3316 saved, 1749 deleted) though
still far behind precipitation (goodnorm reported 7910 saved, 8027 deleted).
I suspect the high percentage lost reflects the influx of modern Australian data. Indeed, nearly
3,000 of the 3,500-odd stations with missing WMO codes were excluded by this operation. This means
that, for tmn.0702091139.dtb, 1240 Australian stations were lost, leaving only 278.
This is just silly. I can't dump these stations, they are needed to potentially match with the
bulletin stations. I am now going to try the following:
1. Attempt to pair bulletin stations with existing in the tmin database. Mark pairings in the
database headers and in a new 'Australian Mappings' file. Program auminmatch.for.
2. Run an enhanced filtertmm to synchronise the tmin and tmax databases, but prioritising the
'paired' stations from step 1 (so they are not lost). Mark the same pairings in the tmax
headers too, and update the 'Australian Mappings' file.
3. Add the bulletins to the databases.
Â…Â…Â…Â…Â…Â….
An interesting aside.. David was looking at the v3.00 precip to help National Geographic with
an enquiry. I produced a second 'station' file with the 'honest' counts (see above) and he used
that to mask out cells with a 0 count (ie that only had indirect data from 'nearby' stations).
There were some odd results.. with certain months havign data, and others being missing. After
considerable debate and investigation, it was understood that anomdtb calculates normals on a
monthly basis. So, where there are 7 or 8 missing values in each month (1961-1990), a station
may end up contributing only in certain months of the year, throughout its entire run! This was
noticed in the Seychelles, where only October has real data (the remaining months being relaxed
to the climatology but excluded by David using the 'tight' station mask). There is no easy
solution, because essentially it's an honest result: only October has sufficient values to form
a normal, so only October gets anomalised. It's an unfortunate concidence that it's the only
station in the cell, but it's not the only one. A 'solution' could be for anomdtb to get a bit
more involved in the gridding, to check that if a cell only has one station (for one or more
years) then it's all-or-nothing. Maybe if only one month has a normal then it's dumped and the
whole reverts to climatology. Maybe if 4 or more months have normals.. maybe if >0 months have
normals and the rest can be brought in with a minor relaxation of the '75% rule'.. who knows.
Â…Â…Â…Â…Â….
Got all that fixed. Then onto the excessions Tim found - quite a lot that really should
have triggered the 3/4 sd cutoff in anomauto.for. Wrote 'retrace.for', a proglet I've
been looking for an excuse to write. It takes a country or individual cell, along with
dates and a run ID, and preforms a reverse trace from final output files to database. It's
not complete yet but it already gives extremely helpful information - I was able to look
at the first problem (Guatemala in Autumn 1995 has a massive spike) and find that a
station in Mexico has a temperature of 78 degrees in November 1995! This gave a local
anomaly of 53.23 (which would have been 'lost' amongst the rest of Mexico as Tim just
did country averages) and an anomaly in Guatemala of 24.08 (which gave us the spike):
Â…Â….
Had to briefly divert to trick makegridsauto into thinking it was in the middle of a full 1901-2006
update, to get CLD NetCDF files produced for the whole period to June '06. Kept some important users
in Bristol happy.
So, back to VAP. Tried dividing the incoming TMP 7 DTR binaries by 1000! Still no joy. Then had the
bright idea of imposing a threshold on the 3.00 vap in the Matlab program. The result was that
quite a lot of data was lost from 3.00, but what remained was a very good match for the 2.10 data
(on which the thresholds were based).
I think I've got it! Hey - I might be home by 11. I got quick_interp_tdm2 to dump a min/max
for the synthetic grids. Guess what? Our old friend 32767 is here again, otherwise known as big-endian
trauma. And sure enough, the 0.5 and 2.5 binary normals (which I inherited, I've never produced them),
both need to be opened for reading with:
openr,lun,fname,/swap_if_big_endian
..so I added that as an argument to rdbin, and used it wherever rdbin is called to open these normals.
Â…Â…Â…Â…Â…Â….
So, to station counts. These will have to mirror section 3 above. Coverage of secondary parameters is
particularly difficult - what is the best approach? To include synthetic coverage, when it's only at
2.5-degree?
No. I'm going to back my previous decision - all station count files reflect actualy obs for that
parameter only. So for secondaries, you get actual obs of that parameter (ie naff all for FRS). You
get the info about synthetics that enables you to use the relevant primary counts if you want to. Of
course, I'm going to have to provide a combined TMP and DTR station count to satisfy VAP & FRS users.
The problem is that the synthetics are incorporated at 2.5-degrees, NO IDEA why, so saying they affect
particular 0.5-degree cells is harder than it should be. So we'll just gloss over that entirely ;0)
ARGH. Just went back to check on synthetic production. Apparently - I have no memory of this at all -
we're not doing observed rain days! It's all synthetic from 1990 onwards. So I'm going to need
conditionals in the update program to handle that. And separate gridding before 1989. And what TF
happens to station counts?
OH FUCK THIS. It's Sunday evening, I've worked all weekend, and just when I thought it was done I'm
hitting yet another problem that's based on the hopeless state of our databases. There is no uniform
data integrity, it's just a catalogue of issues that continues to grow as they're found.
Â…Â….
...and this one is outstanding!
- Bishop Hill blog - Climate cuttings 33
I like this thread!
