| Locate Previous Next | Contents |
No statistician is happy without a tool for drawing a histogram. It conveys information on location and scatter. The shape of the histogram also can reveal hidden structures in the data. First we need a tool to create a frequency table, and then a simple utility function produces plotting points to draw a histogram.
.5 1 Freqtable 1 2 3 2 3 2 1 1 2 3 3 2
This simple function says effectively produce a frequency table of the data in the right argument using class intervals which have 0.5 as a class boundary and classes of width 1. The function ch.Step provides the required facility, both for histograms with equal class intervals and otherwise.
ch.Set 'style' 'xyplot,lines,nomarkers,norisers'
ch.Set 'pattern' 0
ch.Step (.5 1.5 2.5 3.5)(1 3 2)
View PGch.Close
It is useful to have a utility program Hist which checks to see whether its right argument is a two-element nested vector (unequal class widths) or a two-column matrix (equal class widths) and executes the appropriate ch.Step command hence the function Hist.
For a more interesting histogram, lets look at a histogram of the demand for sliced bread:
ch.Set 'style' 'xyplot,lines,nomarkers'
Hist 1000 200 Freqtable demand[;1]
View PGch.Close
The histogram shows clearly that there is structure in the data of course we already know that the data have a time-series structure and the small values come from the demand at the weekend.
Statisticians like to compare the observed histogram with a model histogram often the Normal or Gaussian distribution. Clearly it would not make sense to fit one bell-shaped curve to the data, so we will manufacture some coin-tossing data yes you knew it had to come sooner or later!
First the data the number of heads in 25 tosses of a fair coin, repeated 100 times.
ntoss+/¯1+?100 25½2
ch.Set 'style' 'xyplot,lines,nomarkers'
Hist .5 1 Freqtable ntoss
View PGch.Close
Now to add the normal curve the function Normden takes a right argument specifying the mean and standard deviation. whilst the ambivalent left argument specifies the area under the normal curve required. The theoretical mean and standard deviation are equal to 12.5 and 2.5 respectively, or we can calculate the sample values from the data. The total area of the histogram will be 100 (total frequency equals 100 times the class-width of 1), so we can produce the overlay:
ch.Set 'style' 'xyplot,lines,nomarkers'
Hist .5 1 Freqtable ntoss
ch.Plot 100 Normden 12.5 2.5
View PGch.Close
To complete the picture, add a title, axis captions and a key:
ch.Set 'style' 'xyplot,lines,nomarkers,boxed'
ch.Set 'head' 'Binomial Distribution with Normal Overlay'
ch.Set('xcap' 'Number of heads')('ycap' 'Frequency')
ch.Set 'style' 'xyplot,lines,nomarkers,boxed'
ch.Set 'key' 'Histogram,Normal Density'
Hist .5 1 Freqtable ntoss
ch.Plot 100 Normden 12.5 2.5
View PGch.Close
Summary
It often pays to add your own simple utilities to group a well-used set of RainPro calls, or to organise the data in a form suitable for the plotting calls.