News & Commentary: 2009-01-03
Standard Deviation in Means taken from Chaotic Data
Climate studies are all the rage these days and as the Global Warming issue starts
to get political momentum, a larger number of scientists from broader disciplines are
taking a skeptical review of the climate scientists findings. It's hard to believe that
the world really will end, so many people have cried wolf so many times.
In most scientific and engineering disciplines, being skeptical is a normal everyday
activity, a way of life really. In the Global Warming debate, being called a skeptic
is a derogatory term, designed to put people in their place and cow them into humility
before the True Way of the True Believers. In God we trust, anyone else better provide
convincing (and auditable) evidence.
What does this have to do with standard deviation and means? Well, there are a few
(not entirely unreasonable) assumptions here:
- Assume that the weather is a chaotic system.
- Assume that all measurements are weather measurements (one way or another).
- Climate can only be detected by taking large numbers of weather measurements and
doing some sort of average (not necessarily a simple mean).
- As a starting point let's further presume that the average is a simple arithmetic
mean taken some finite stretch of time.
Attractor of Edward Norton Lorenz
For a chaotic system, might as well go back to where it all began with
the Lorenz Attractor
which is relatively easy to generate on a computer and is a well understood example
of a chaotic system.
- lorenz_2009-01-02.c -- get your source code here,
it contains defintions of the differential equations that need to be solved and
a Jacobian matrix generated from those equations. The equations are parametric with
default values for the "classic" Lorentz attractor.
- lorenz_basic.tsv.gz -- results from a basic run
(no averaging, just the straight-forward time series).
- run_2009-01-02.pl -- perl script that runs the simulation
with various sizes of time window over which averages are taken. Standard deviation is
calculated (note the total length of the run gets larger so the SD is calculated with
the same number of input values).
- results_2009-01-02.tsv -- results from the averaging run
giving a relationship between size of window and SD for each variable (note that the size
of the window is in number of samples, each sample is a time step of 0.005)
The compilation was done by linking to the GNU Scientific Library
which provides the ODE solver (otherwise the above source code would be many times more complex).
The Centos package "gsl-1.5" was used for this test.
Basic Output and 2D Projection of the Attractor
The famaliar "owl face" of the attractor plotted. OK, it turns up in a lot of places
but at least we can say with confidence that this software is producing expected results
and most likely the equations have been correctly coded.
Three Variables Plotted Against Time
These are included to give a feel for typical "look and feel" of
the chaotic time series. Note that the blue data has more of the look of
a periodic function (although not exactly periodic, the period is approximately
1 time unit), while both the red series and black series look
more chaotic and (while similar to each other), they have a quite
distinct appearance as compared with the blue. Although the somewhat handwaving
explanation here is not sufficient for any sort of mathematical analysis,
an important difference in behaviour can be seen below.
Standard Deviation of the Three Variables, as Averaging is Applied
Here's the important result, how the standard deviation changes as the average
is applied over larger and larger timescales. Intuitively, it would seem reasonable
that sufficient averaging must reduce the standard deviation, and this is indeed
what happens. It might be a reasonable guess that the rule applying to random variables
might apply here (the SD of the mean is proportional to the reciprocal of the square
root of the number of independent random samples taken).
The legend is as follows:
- Red thick solid -- y[0], as per red time series above.
- Black thick solid -- y[1], as per black time series above.
- Blue thick solid -- y[2], as per blue time series above.
- Green thin dashed -- traditional SD formula for mean of random variable: 7 / sqrt( t )
- Purple thin dashed -- curve fit to give reasonable asymptotes for Blue: 7.442565 / ( 1 + ( 2.419795 * t ) ^ 2 ) ^ 0.5
- Purple thin dashed -- similar curve fit for Red and Black: 7.731371 / ( 1 + ( 1.269642 * t ) ^ 2 ) ^ 0.25
Note that "t" in this context is the size of the time window over which the averages are taken,
not instantaneous time as per the ODE simulation. In the context of the source code above, time
steps were 0.005 time units, so a time window of 1 time unit was a mean of 200 samples.
Discussion and Conclusions
Although the system is not exactly periodic, it does regularly show quasi-periodic behaviour
with a time period of the order of 1 time unit. For all variables in the system, taking
averages smaller than one time unit has no substantial effect of the standard deviation.
Since the ODE's defining the system are smooth at the microscopic scale (i.e. fully
differentiable, not discontinuities, etc) taking microscopic averages over time windows
substantially shorter than a system cycle period does not gather any additional information,
regardless of how many samples are taken in this time window. This is quite an intuitive
result.
Some variables within a chaotic system are more chaotic than others. Thus, the blue series
can achieve very small standard deviation relatively easily and the roll-off is asymptotically
in proportion to the size of the time window used for averaging (as one would expect from a periodic
function). In addition, time windows that are clean multiples of the typical period will be more
efficient than other time windows, thus the rolloff shows a "staricase" like effect which becomes
less significant for larger time windows (again, to be expected for a quasi-periodic function).
Other variables in the system (the red and black series) behave like random variables when
averaged over sufficiently large timescales. The standard deviation of the mean asymptotically
follows the reciprocal of the sqare root of the number of full periods which that variable
is averaged over. Note that this rule only properly applies (in this experiment) for time windows
larger than 10 time units. These type of variables do not have particular time windows that
are more efficient than other time windows (no "staircase" effect is visible).
Applying this result to climate change, we can conclude that it is possible to achieve
stable measurements with low standard deviation by taking averages over large timescales
(thus extracting some sort of climate status information such as global temperature trends).
However, the danger in this approach is that not all variables behave the same way when
averaged, and in some cases unexpectedly large time windows are required before the chaotic
variable follows the expected rules for independent random statistical samples.
Further Consideration
Is there a mathematically rigerous test that can separate variables with a rapidly converging
standard deviation (such as the blue above) from those that behave like random samples (or worse)
and converge only slowly? It would be necessary to have a test that requires only a limited section
of the time series (the graph above required rather massive chunks of the time series to generate,
making it a useless methodology for any real-world measurements). From gut instinct, you can look
at the different time series graphs and see some fundamental difference in the structure, but this is
not rigerous.
Global Warming and Climate Change References
This is the best one to start with: history of chaos in climate change science
at the American Institute of Physics presents a reasonably neutral coverage of the
problems of taking long term measurements from a climate system.
Wired Science has a layman's description of climate and chaos
by Michael Tobis with a lovely colour graphic of the Lorentz Attractor (better than my plot above, but exactly the same shape so that's encouraging).
The key result is this:
I have shown that the long term aggregate behavior of a system can be known
(the shape of the two loops in the far future) even if the long term dynamic prediction
(where on the loops the dot will be at some time in the far future) cannot.
A paper by Roger a. Pielke and Xubin Zeng called Long-Term Variability of Climate
uses a different set of three dimensional chaotic equations (also equations from Lorenz) and performs
a power spectral analysis (using FFT) to detect energy in the ultra-low frequency part
of the spectrum. They also try repeating the experiment with multiple different initial
values and try with and without a periodic "forcing function". The conclusions drawn are
not clear-cut but the most important thing to take home is that ultra-low frequency spectral
energy does exist in some sets of chaotic equations (and these particular set are plausibly
explained as a model for wind and thermal energy transfer, so it isn't completely unrealistic
to believe that weather models behave in a qualitatively similar manner).
TODO: Repeat my experiment using the same chaos equations used by Pielke, Zeng, and Lorenz (1984, 1990).
James Annan and William Connolley wrote
this article, called "Chaos and Climate"
that concludes there is no problem taking long term averages of weather measurements with the
key conclusion being:
We can demonstrate this sort of climate response clearly in the Lorenz model,
or any more complex climate model.
Perturbing the initial conditions gives a completely different trajectory (weather),
but this averages out over time, and the statistics of different long-term runs are indistinguishable.
However, a steady perturbation to the system can generate a significant change to the long-term statistics.
Certainly, this conclusion is supported by my experiment above... but only for sufficiently large
values of "long-term" and James Annan is a bit vague on relationship between timescale and
the precision to which a climate parameter can be determined.
There is also the brief description of an experiment that would be worth repeating for independent verification:
Here is some output from a run of the Lorenz model in which a change was applied half way through.
At time t=0, the parameter "r" (which relates to an idealised thermal forcing) is changed from 26 to 28.
When viewed in close-up detail, the trajectory looks qualitatively similar before and after the change,
but in fact the long-term statistics such as the mean value of z, and its 95% range, are changed.
In this simple model, the steady pertubation changes the climate in a highly linear manner -
increasing r again to 30 would add the same change on top of that shown for 26 to 28, and r=27 would sit half-way between the cases shown.
The interesting thing about this test is that the "z" value James was taking statistics
for is also the blue line above (i.e. the one that converges fastest with averaging).
Full verification would require checking the statistics for the other values which
converge much slower.
This list is various intersting bits and pieces, only vaguely related.
- joannenova.com.au -- A skeptic, JoNova
wrote a Skeptics Handbook and is working to keep religion and science separate
(possibly with a somewhat anti-religious stance which may or may not be helpful,
depending on your philosophy).
- www.climateaudit.org -- More skeptics,
this is the site of Steve McIntyre who regularly demands a high standard of proof from global warming claims
(e.g. transparent source data, independently verifiable measures of statistical significance,
very good explanation for cherry-picked data, etc).
- www.realclimate.org -- The cautious believers,
lots of genuine science here, but definately from a believer's point of view. They do allow
dissenting discussion so this is probably where the valid scientific ideas circulate.
- scienceblogs.com/deltoid -- Tim Lambert's articles and
comments, definite believer territory but nevertheless reasonably rigerous in
chasing up scientific evidence and in knocking down the obvious strawmen put up by populist media.
Tim thinks the Australian is mostly bullshit and I could comfortably agree with him on that score.
- www.ipcc.ch -- The United Nations anointed committee
who publish the big reports with lots of references. Somewhat on the alarmist side, but
then how else to get juicy research grants?
- climateprogress.org -- The true believers,
unabashed political agendas and who can stand in the way of progress? Don't expect much
science here, but sometimes you get links to science.
- www.climate.org -- There is no discussion, we know
the answer and we will tell you. You better believe it. This sort of stuff is pure propaganda
trying to gain legitimacy by describing itself as science.
Edward Lorenz References
Would be silly not to include some of these (but easy to find these days using the power of WWW searching).
This work is licensed under a Creative Commons License.
Back to News Commentary Index