Locate Previous Next Contents

Histograms with Overlays

No statistician is happy without a tool for drawing a histogram. It conveys information on location and scatter. The shape of the histogram also can reveal hidden structures in the data. First we need a tool to create a frequency table, and then a simple utility function produces plotting points to draw a histogram.

      .5 1 Freqtable 1 2 3 2 3 2
1 1
2 3
3 2

This simple function says effectively – produce a frequency table of the data in the right argument using class intervals which have 0.5 as a class boundary and classes of width 1. The function chStep provides the required facility, both for histograms with equal class intervals and otherwise.

      chSet 'style' 'xyplot,lines,nomarkers,norisers'
      chSet 'pattern' 0
      chStep (.5 1.5 2.5 3.5)(1 3 2)
      psView PG„chClose

It is useful to have a utility program Hist which checks to see whether its right argument is a two-element nested vector (unequal class widths) or a two-column matrix (equal class widths) and executes the appropriate chStep command – hence the function Hist.

For a more interesting histogram, let’s look at a histogram of the demand for sliced bread:

      chSet 'style' 'xyplot,lines,nomarkers'
      Hist 1000 200 Freqtable demand[;1]
      psView PG„chClose

The histogram shows clearly that there is structure in the data – of course we already know that the data have a time-series structure and the small values come from the demand at the weekend.

Statisticians like to compare the observed histogram with a model histogram – often the Normal or Gaussian distribution. Clearly it would not make sense to fit one bell-shaped curve to the data, so we will manufacture some coin-tossing data – yes you knew it had to come sooner or later!

First the data – the number of heads in 25 tosses of a fair coin, repeated 100 times.

     ntoss„+/¯1+?100 25½2
     chSet 'style' 'xyplot,lines,nomarkers'
     Hist .5 1 Freqtable ntoss
     psView PG„chClose

Now to add the normal curve – the function Normden takes a right argument specifying the mean and standard deviation. whilst the ambivalent left argument specifies the area under the normal curve required. The theoretical mean and standard deviation are equal to 12.5 and 2.5 respectively, or we can calculate the sample values from the data. The total area of the histogram will be 100 (total frequency equals 100 times the class-width of 1), so we can produce the overlay:

      chSet 'style' 'xyplot,lines,nomarkers'
      Hist .5 1 Freqtable ntoss
      chPlot 100 Normden 12.5 2.5
      psView PG„chClose

To complete the picture, add a title, axis captions and a key:

      chSet 'style' 'xyplot,lines,nomarkers,boxed'
      chSet 'head' 'Binomial Distribution with Normal Overlay'
      chSet('xcap' 'Number of heads')('ycap' 'Frequency')
      chSet 'style' 'xyplot,lines,nomarkers,boxed'
      chSet 'key' 'Histogram,Normal Density'
      Hist .5 1 Freqtable ntoss
      chPlot 100 Normden 12.5 2.5
      psView PG„chClose

Summary
It often pays to add your own simple utilities to group a well-used set of RainPro calls, or to organise the data in a form suitable for the plotting calls.


Continue to: More on Time Series
© Copyright Alan Sykes and Adrian Smith 1999