| Search Locate Previous Next | Contents |
The idea of a Boxplot is due to Tukey. He wanted a simple pictorial representation of one or more data sets highlighting (and contrasting) location and spread. The basic idea is that for each dataset, you require a box consisting of a top-half (bottom-half) which represents the 25% of the data between the median and the upper- (lower-) quartile respectively. On to each box, you can extend whiskers to indicate the range of the data. And if you want to be really sophisticated you can indicate extreme values.
Suppose that we have the following table:
min lq med uq max 20 35 40 60 80 1 30 50 80 90 10 30 35 40 95
then we can produce the boxplot in four operations:
The first two operations use chBar having set the option for bars to be floating. The last two operations require the use of chVline using as the right argument a matrix each of whose columns gives the two y-values.
However we need to remember two important points: one is that in using the chBar facility, the x-axis is automatically sorted out for us. Secondly, we cannot allow Rain to set the y-axis range as whichever graphics command we use first, it will not set the range to include the other values required. So the first step is to set the y-range.
chSet 'yrange' 0 100 chSet 'style' 'fbar' chBar 3 2½35 40 30 50 30 35
Add the companion line:
chBar 3 2½40 60 50 80 35 40
Add the bottom whiskers:
chVline 3 2½20 35 1 30 10 30
Add the top whiskers:
chVline 3 2½60 80 80 90 40 95
and your box-plot is almost complete, save a heading and some labels for each boxplot.
chSet 'head' 'Trial Boxplot' chSet 'yrange' 0 100 chSet 'style' 'fbar' chSet 'xlab' 'Box 1,Box 2,Box 3' chBar 3 2½35 40 30 50 30 35 chBar 3 2½40 60 50 80 35 40 chVline 3 2½20 35 1 30 10 30 chVline 3 2½60 80 80 90 40 95 View PGchClose
This technique gives you total flexibility, and you can easily adapt it to show outliers, overlay extra information and so on. However if you are happy to let Rain do more of the work, you can have the quartiles computed for you by the supplied utility chBoxplot as follows:
chSet'head' 'Boxplot of Barley Data - Varieties' chSet 'xs' 'angled' chBoxplot barley[;1 2] View PGchClose
This takes one of three possible slices though the barley data try it with columns 1,3 and 1,4 also! Notice the use of angled X-labels to allow space for the rather long variety names here:
Data with multiple category axes
It is useful to be able to produce simple boxplots for a number of data sets or groups of data sets. The function chBoxplot does just this for you and takes care of the key and the labelling. To demonstrate this, here is a boxplot, where the groups correspond to the farms and the two boxes within a group correspond to the years we are examining how the yields from the different varieties behave over the six farms and the two years.
chSet'head' 'Boxplot of Barley Data by Farm & Year'
chSet('ggap' 3)('xs' 'between,labmid,grid')
chSet'lfont' 'ti,6,neutral'
chBoxplot barley[;1 4 3]
View PGchClose
The second line has been inserted to widen the gaps between the farms to 3 times the box width, move the ticks between the groups and add vertical gridlines for clarity. The next line reduces the font size of the tick mark labels so that the farm names can be accommodated horizontally (otherwise every other farm label is missed out).
This simple boxplot also reveals the mistake alluded to in the Trellis example!