Locate Previous Next Contents

How to Construct a Boxplot

The idea of a Boxplot is due to Tukey. He wanted a simple pictorial representation of one or more data sets highlighting (and contrasting) location and spread. The basic idea is that for each dataset, you require a box consisting of a top-half (bottom-half) which represents the 25% of the data between the median and the upper- (lower-) quartile respectively. On to each box, you can extend whiskers to indicate the range of the data. And if you want to be really sophisticated you can indicate extreme values.

Suppose that we have the following table:

min  lq  med  uq  max 
  20  35   40  60   80 
   1  30   50  80   90 
  10  30   35  40   95 

then we can produce the boxplot in four operations:

The first two operations use ch.Bar having set the option for bars to be ‘floating’. The last two operations require the use of ch.Vline using as the right argument a matrix each of whose columns gives the two y-values.

However we need to remember two important points: one is that in using the ch.Bar facility, the x-axis is automatically sorted out for us. Secondly, we cannot allow Rain to set the y-axis range as whichever graphics command we use first, it will not set the range to include the other values required. So the first step is to set the y-range.

      ch.Set 'yrange' 0 100
      ch.Set 'style' 'fbar'
      ch.Bar 3 2½35 40 30 50 30 35

Add the companion line:

      ch.Bar 3 2½40 60 50 80 35 40

Add the bottom whiskers:

      ch.Vline 3 2½20 35 1 30 10 30

Add the top whiskers:

      ch.Vline 3 2½60 80 80 90 40 95

and your box-plot is almost complete, save a heading and some labels for each boxplot.

      ch.Set 'head' 'Trial Boxplot'
      ch.Set 'yrange' 0 100
      ch.Set 'style' 'fbar'
      ch.Set 'xlab' 'Box 1,Box 2,Box 3'
      ch.Bar 3 2½35 40 30 50 30 35
      ch.Bar 3 2½40 60 50 80 35 40
      ch.Vline 3 2½20 35 1 30 10 30
      ch.Vline 3 2½60 80 80 90 40 95
      View PG„ch.Close

It is useful to be able to produce simple boxplots for a number of data sets or groups of data sets. The function ch.Boxplot does just this for you and takes care of the key and the labelling for you. To demonstrate this, here is a boxplot, where the groups correspond to the farms and the two boxes within a group correspond to the years – we are examining how the yields from the different varieties behave over the six farms and the two years.

      ch.Set'head' 'Boxplot of Barley Data'
      ch.Set('ggap' 3)                     
      ch.Set'lfont' 'ti,6,neutral'         
      ch.Boxplot barley[;1 4 3]            
      View PG„ch.Close                    

The second line has been inserted to widen the gaps between the farms to 3 times the box width. The next line reduces the font size of the tick mark labels so that the farm names can be accommodated (otherwise every other farm label is missed out).


This simple boxplot also reveals the mistake alluded to earlier!



Continue to: Using RAIN and ASLGREG
© Copyright Alan Sykes and Adrian Smith 1999