News & Commentary: 2009-01-03

Standard Deviation in Means taken from Chaotic Data

Climate studies are all the rage these days and as the Global Warming issue starts to get political momentum, a larger number of scientists from broader disciplines are taking a skeptical review of the climate scientists findings. It's hard to believe that the world really will end, so many people have cried wolf so many times.

In most scientific and engineering disciplines, being skeptical is a normal everyday activity, a way of life really. In the Global Warming debate, being called a skeptic is a derogatory term, designed to put people in their place and cow them into humility before the True Way of the True Believers. In God we trust, anyone else better provide convincing (and auditable) evidence.

What does this have to do with standard deviation and means? Well, there are a few (not entirely unreasonable) assumptions here:

Attractor of Edward Norton Lorenz

For a chaotic system, might as well go back to where it all began with the Lorenz Attractor which is relatively easy to generate on a computer and is a well understood example of a chaotic system. The compilation was done by linking to the GNU Scientific Library which provides the ODE solver (otherwise the above source code would be many times more complex). The Centos package "gsl-1.5" was used for this test.

Basic Output and 2D Projection of the Attractor

The famaliar "owl face" of the attractor plotted. OK, it turns up in a lot of places but at least we can say with confidence that this software is producing expected results and most likely the equations have been correctly coded.

Three Variables Plotted Against Time

These are included to give a feel for typical "look and feel" of the chaotic time series. Note that the blue data has more of the look of a periodic function (although not exactly periodic, the period is approximately 1 time unit), while both the red series and black series look more chaotic and (while similar to each other), they have a quite distinct appearance as compared with the blue. Although the somewhat handwaving explanation here is not sufficient for any sort of mathematical analysis, an important difference in behaviour can be seen below.

Standard Deviation of the Three Variables, as Averaging is Applied

Here's the important result, how the standard deviation changes as the average is applied over larger and larger timescales. Intuitively, it would seem reasonable that sufficient averaging must reduce the standard deviation, and this is indeed what happens. It might be a reasonable guess that the rule applying to random variables might apply here (the SD of the mean is proportional to the reciprocal of the square root of the number of independent random samples taken).
The legend is as follows: Note that "t" in this context is the size of the time window over which the averages are taken, not instantaneous time as per the ODE simulation. In the context of the source code above, time steps were 0.005 time units, so a time window of 1 time unit was a mean of 200 samples.

Discussion and Conclusions

Although the system is not exactly periodic, it does regularly show quasi-periodic behaviour with a time period of the order of 1 time unit. For all variables in the system, taking averages smaller than one time unit has no substantial effect of the standard deviation. Since the ODE's defining the system are smooth at the microscopic scale (i.e. fully differentiable, not discontinuities, etc) taking microscopic averages over time windows substantially shorter than a system cycle period does not gather any additional information, regardless of how many samples are taken in this time window. This is quite an intuitive result.

Some variables within a chaotic system are more chaotic than others. Thus, the blue series can achieve very small standard deviation relatively easily and the roll-off is asymptotically in proportion to the size of the time window used for averaging (as one would expect from a periodic function). In addition, time windows that are clean multiples of the typical period will be more efficient than other time windows, thus the rolloff shows a "staricase" like effect which becomes less significant for larger time windows (again, to be expected for a quasi-periodic function).

Other variables in the system (the red and black series) behave like random variables when averaged over sufficiently large timescales. The standard deviation of the mean asymptotically follows the reciprocal of the sqare root of the number of full periods which that variable is averaged over. Note that this rule only properly applies (in this experiment) for time windows larger than 10 time units. These type of variables do not have particular time windows that are more efficient than other time windows (no "staircase" effect is visible).

Applying this result to climate change, we can conclude that it is possible to achieve stable measurements with low standard deviation by taking averages over large timescales (thus extracting some sort of climate status information such as global temperature trends). However, the danger in this approach is that not all variables behave the same way when averaged, and in some cases unexpectedly large time windows are required before the chaotic variable follows the expected rules for independent random statistical samples.

Further Consideration

Is there a mathematically rigerous test that can separate variables with a rapidly converging standard deviation (such as the blue above) from those that behave like random samples (or worse) and converge only slowly? It would be necessary to have a test that requires only a limited section of the time series (the graph above required rather massive chunks of the time series to generate, making it a useless methodology for any real-world measurements). From gut instinct, you can look at the different time series graphs and see some fundamental difference in the structure, but this is not rigerous.

Global Warming and Climate Change References

This is the best one to start with: history of chaos in climate change science at the American Institute of Physics presents a reasonably neutral coverage of the problems of taking long term measurements from a climate system.

Wired Science has a layman's description of climate and chaos by Michael Tobis with a lovely colour graphic of the Lorentz Attractor (better than my plot above, but exactly the same shape so that's encouraging). The key result is this:

I have shown that the long term aggregate behavior of a system can be known (the shape of the two loops in the far future) even if the long term dynamic prediction (where on the loops the dot will be at some time in the far future) cannot.

A paper by Roger a. Pielke and Xubin Zeng called Long-Term Variability of Climate uses a different set of three dimensional chaotic equations (also equations from Lorenz) and performs a power spectral analysis (using FFT) to detect energy in the ultra-low frequency part of the spectrum. They also try repeating the experiment with multiple different initial values and try with and without a periodic "forcing function". The conclusions drawn are not clear-cut but the most important thing to take home is that ultra-low frequency spectral energy does exist in some sets of chaotic equations (and these particular set are plausibly explained as a model for wind and thermal energy transfer, so it isn't completely unrealistic to believe that weather models behave in a qualitatively similar manner).

TODO: Repeat my experiment using the same chaos equations used by Pielke, Zeng, and Lorenz (1984, 1990).

James Annan and William Connolley wrote this article, called "Chaos and Climate" that concludes there is no problem taking long term averages of weather measurements with the key conclusion being:

We can demonstrate this sort of climate response clearly in the Lorenz model, or any more complex climate model. Perturbing the initial conditions gives a completely different trajectory (weather), but this averages out over time, and the statistics of different long-term runs are indistinguishable. However, a steady perturbation to the system can generate a significant change to the long-term statistics.
Certainly, this conclusion is supported by my experiment above... but only for sufficiently large values of "long-term" and James Annan is a bit vague on relationship between timescale and the precision to which a climate parameter can be determined. There is also the brief description of an experiment that would be worth repeating for independent verification:
Here is some output from a run of the Lorenz model in which a change was applied half way through. At time t=0, the parameter "r" (which relates to an idealised thermal forcing) is changed from 26 to 28. When viewed in close-up detail, the trajectory looks qualitatively similar before and after the change, but in fact the long-term statistics such as the mean value of z, and its 95% range, are changed. In this simple model, the steady pertubation changes the climate in a highly linear manner - increasing r again to 30 would add the same change on top of that shown for 26 to 28, and r=27 would sit half-way between the cases shown.
The interesting thing about this test is that the "z" value James was taking statistics for is also the blue line above (i.e. the one that converges fastest with averaging). Full verification would require checking the statistics for the other values which converge much slower.

This list is various intersting bits and pieces, only vaguely related.

Edward Lorenz References

Would be silly not to include some of these (but easy to find these days using the power of WWW searching).

Creative Commons License
This work is licensed under a Creative Commons License.

Back to News Commentary Index