Visualising public health data

Data on population health is often presented as bar charts (single measure across many geographical units) or as line charts ( a single measure over time and across 1 or more areal units) .  These charts are usually produced in Microsoft Excel. This might serve the purpose but Excel is designed more for business use and there are better ways of making the data come alive. The free statistical software package R and its many thousands of add-on ‘packages’  is relatively unknown in public health circles. The graphical packlage GGplot2 is particular, though not as straightforward to learn as Excel, opens the door to some amazing ways to visualise date.

Let’s take a simple example to contrast the two approaches. Data on excess winter deaths is available from the Office for National Statistics for each year from 1991 and for England as a whole and each local authority area within it. The data looks like this:

ewddatasnap

The aim is to show a time-series style chart with the year along the X-axis and a) the excess winter deaths index for Walsall plus its associated 95% confidence intervals, and b) the same data fro England together with its confidence intervals.

In Excel  the only option is to add each of these as a data series and colour them suitably to distinguish one from the other. I may not be an expert in Excel charting buit the best I could come up with was this:

ewd I find it altogether rather muddled and confusing. In any case I elected not to show the confidence interval for England, since it is very narrow and therefore the actual EWDI value is good enough. The CI for Walsall is wide and is worth showing .

A better alternative is to use R and the GGplot2 package, which gives me this rather more elegant graphic where the confidence intervals are shown as ribbons. I wondered whether to show the actual EWDI values for Walsall as a line that sits in the middle of the confidence band and after trying both options the chart with it left in looked rather prettier.

ewd_ggplot

I think this chart looks much more attractive, is clearer and instantly brings out the key message that Walsall’s excess winter deaths index generally tracks the England figure and is not systematically higher.

The code for the above chart is as follows:

ewd <- read.csv("ewd.csv")
library(ggplot2)
p <- ggplot(data=ewd, aes(x=Year, y=England_Index))
p <- p+scale_y_continuous(name="Excess Winter Deaths Index",
 limits = c(0,45)) # title of y axis and strats the graph from zero
p <- p + scale_x_continuous(breaks = c(1991, 1994,1997, 2000, 2003, 2006, 2009, 2012),
 labels = c("91", "94", "97", "2000", "03", "06", "09", "12"))
 # the default x-axis ticks dont look good 
p <- p + geom_line(aes(y=Walsall_Index, colour="Walsall EWDI"))
p <- p + geom_ribbon(aes(x=Year, 
 ymin=England_LCL, 
 ymax=England_UCL, fill="England CI"), alpha=0.5)
p <- p + geom_ribbon(aes(x=Year, 
 ymin=Walsall_LCL, 
 ymax=Walsall_UCL, fill="Walsall CI"), alpha=0.2)
p <- p +scale_color_manual(name="", 
 values = c("Walsall EWDI"="dark Blue",
 "England CI"="dark green",
 "Walsall CI" = "blue") )
 # this and the next lines are neded to generate the legend
p <- p + scale_fill_manual(name="", values = c("England CI"="dark green",
 "Walsall CI" = "blue" ))
p <- p + ggtitle("Excess Winter deaths Index, Walsall and England, 1991 to 2013")
p <- p + theme(legend.position=c(0.7,0.8))
p

 

 

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s