Locate Previous Next Contents

Histograms with Overlays

No statistician is happy without a tool for drawing a histogram. It conveys information on location and scatter. The shape of the histogram also can reveal hidden structures in the data. First we need a tool to create a frequency table, and then a simple utility function produces plotting points to draw a histogram.

      .5 1 Freqtable 1 2 3 2 3 2
1 1
2 3
3 2

This simple function says effectively – produce a frequency table of the data in the right argument using class intervals which have 0.5 as a class boundary and classes of width 1. The function ch.Step provides the required facility, both for histograms with equal class intervals and otherwise.

      ch.Set 'style' 'xyplot,lines,nomarkers,norisers'
      ch.Set 'pattern' 0
      ch.Step (.5 1.5 2.5 3.5)(1 3 2)
      View PG„ch.Close

It is useful to have a utility program Hist which checks to see whether its right argument is a two-element nested vector (unequal class widths) or a two-column matrix (equal class widths) and executes the appropriate ch.Step command – hence the function Hist.

For a more interesting histogram, let’s look at a histogram of the demand for sliced bread:

      ch.Set 'style' 'xyplot,lines,nomarkers'
      Hist 1000 200 Freqtable demand[;1]
      View PG„ch.Close

The histogram shows clearly that there is structure in the data – of course we already know that the data have a time-series structure and the small values come from the demand at the weekend.

Statisticians like to compare the observed histogram with a model histogram – often the Normal or Gaussian distribution. Clearly it would not make sense to fit one bell-shaped curve to the data, so we will manufacture some coin-tossing data – yes you knew it had to come sooner or later!

First the data – the number of heads in 25 tosses of a fair coin, repeated 100 times.

     ntoss„+/¯1+?100 25½2
     ch.Set 'style' 'xyplot,lines,nomarkers'
     Hist .5 1 Freqtable ntoss
     View PG„ch.Close

Now to add the normal curve – the function Normden takes a right argument specifying the mean and standard deviation. whilst the ambivalent left argument specifies the area under the normal curve required. The theoretical mean and standard deviation are equal to 12.5 and 2.5 respectively, or we can calculate the sample values from the data. The total area of the histogram will be 100 (total frequency equals 100 times the class-width of 1), so we can produce the overlay:

      ch.Set 'style' 'xyplot,lines,nomarkers'
      Hist .5 1 Freqtable ntoss
      ch.Plot 100 Normden 12.5 2.5
      View PG„ch.Close

To complete the picture, add a title, axis captions and a key:

      ch.Set 'style' 'xyplot,lines,nomarkers,boxed'
      ch.Set 'head' 'Binomial Distribution with Normal Overlay'
      ch.Set('xcap' 'Number of heads')('ycap' 'Frequency')
      ch.Set 'style' 'xyplot,lines,nomarkers,boxed'
      ch.Set 'key' 'Histogram,Normal Density'
      Hist .5 1 Freqtable ntoss
      ch.Plot 100 Normden 12.5 2.5
      View PG„ch.Close

Summary
It often pays to add your own simple utilities to group a well-used set of RainPro calls, or to organise the data in a form suitable for the plotting calls.


Continue to: More on Time Series
© Copyright Alan Sykes and Adrian Smith 1999