|Locate Previous Next||Contents|
The idea of a Boxplot is due to Tukey. He wanted a simple pictorial representation of one or more data sets highlighting (and contrasting) location and spread. The basic idea is that for each dataset, you require a box consisting of a top-half (bottom-half) which represents the 25% of the data between the median and the upper- (lower-) quartile respectively. On to each box, you can extend whiskers to indicate the range of the data. And if you want to be really sophisticated you can indicate extreme values.
Suppose that we have the following table:
min lq med uq max 20 35 40 60 80 1 30 50 80 90 10 30 35 40 95
then we can produce the boxplot in four operations:
The first two operations use
ch.Bar having set the option for bars to be floating. The last two operations require the use of
ch.Vline using as the right argument a matrix each of whose columns gives the two y-values.
However we need to remember two important points: one is that in using the
ch.Bar facility, the x-axis is automatically sorted out for us. Secondly, we cannot allow Rain to set the y-axis range as whichever graphics command we use first, it will not set the range to include the other values required. So the first step is to set the y-range.
ch.Set 'yrange' 0 100 ch.Set 'style' 'fbar' ch.Bar 3 2½35 40 30 50 30 35
Add the companion line:
ch.Bar 3 2½40 60 50 80 35 40
Add the bottom whiskers:
ch.Vline 3 2½20 35 1 30 10 30
Add the top whiskers:
ch.Vline 3 2½60 80 80 90 40 95
and your box-plot is almost complete, save a heading and some labels for each boxplot.
ch.Set 'head' 'Trial Boxplot' ch.Set 'yrange' 0 100 ch.Set 'style' 'fbar' ch.Set 'xlab' 'Box 1,Box 2,Box 3' ch.Bar 3 2½35 40 30 50 30 35 ch.Bar 3 2½40 60 50 80 35 40 ch.Vline 3 2½20 35 1 30 10 30 ch.Vline 3 2½60 80 80 90 40 95 View PGch.Close
It is useful to be able to produce simple boxplots for a number of data sets or groups of data sets. The function
ch.Boxplot does just this for you and takes care of the key and the labelling for you. To demonstrate this, here is a boxplot, where the groups correspond to the farms and the two boxes within a group correspond to the years we are examining how the yields from the different varieties behave over the six farms and the two years.
ch.Set'head' 'Boxplot of Barley Data' ch.Set('ggap' 3) ch.Set'lfont' 'ti,6,neutral' ch.Boxplot barley[;1 4 3] View PGch.Close
The second line has been inserted to widen the gaps between the farms to 3 times the box width. The next line reduces the font size of the tick mark labels so that the farm names can be accommodated (otherwise every other farm label is missed out).
This simple boxplot also reveals the mistake alluded to earlier!