NHS Trusts – learning league table

Data visualisation using R

An example using NHS data

Most data charts in the NHS use Excel. This may be adequate for most purposes. But there are times when the richness of the data may not be fully exploited by the standard data types available in Excel.

Other statistical and data science tools such as Python, and R provide a richer set of chart types and techniques to turn data into exciting images.

Let me illustrate this with an example.

Ive taken a data set published on 9 March 2016 by Monitor and NHS Trust Development Authority. (Note: These two bodies came together as NHS Improvement effective 1st April 2016.

Download the league table of all 230 Trusts as a pdf here and the same data as an excel file here

Understanding the data

Each Trust is placed in a category according to an assessment of its systems and capability for learning from mistakes. The 4 categories are:

  • outstanding
  • Good
  • Needs improvement, and
  • Poor

The excel data file also provides a great deal of detail on the 2015 staff survey. I have confined myself to the composite measure of the Staff survey. This varies from 3.29 to 4.0 with a median of 3.64 and follows a roughly normal distribtion with rather flat tails at both ends.

The question

  • Do Foundation Trusts (FTs) differ from Non-FTs?
  • Does the staff survey composite score (SSCS) vary according to the learning from mistakes category (LFMC)

A closer look at the data

This is what the data looks like.. Ive shown a few rows from the top of the league table and a couple from the bottom.

No. Code Trust Category SSCS Rank
1 RTF Northumbria Healthcare NHS Foundation Trust Good 3.93 1
2 RPG Oxleas NHS Foundation Trust Good 3.90 2
3 RPY The Royal Marsden NHS Foundation Trust Good 3.90 3
….
….
225 RBS Alder Hey Children’s NHS Foundation Trust Significant Concerns 3.38 225
230 RXC East Sussex Healthcare NHS Trust Significant Concerns 3.29 230

The code in R

d <- read.csv("NHS_Trusts_Learning_league.csv") # reads the data into an object d

levels(d$Category) <- c("Outstanding","Good","Significant Concerns","Poor")
        # this resets the levels of the category variable in the explicit order.
        # R's default woiuld have been to assign levels alphabetically
## note that the original data set does not label each Trust as an FT or a non FT
## the following code does this job

d$FT.Status <- "NHS Trust" # creates a new column and sets all the rows 
                           # to 'NHS Trust'
Is.Foundation <- grepl("Foundation", d$Trust) # grepl is a text pattern matching 
                                              # function that creates a logical 
                                              # vector of length = no of rows 
                                              # in the data table
d$FT.Status <- ifelse(Is.Foundation, "Foundation Trust", "NHS Trust") 
            # this line of code assigns "Foundation Trust" to rows for which 
            # the logical vector has a 'TRUE'

We are now ready to create the plot with the following code. I assume the R package ‘ggplot2’ is installed and loaded for the session

p <- ggplot(data=d, aes(x=Category, y=Staff.survey.measure))
p + geom_jitter(aes(colour=FT.Status), width=0.4) + 
    scale_colour_manual(values = c("red","blue"), guide_legend(title = "")) +
    ggtitle("Staff survey results, FT status correlate poorly with \n
             levels of openness and transparency") +
    xlab("Openness and transparency category \n 
          https://www.gov.uk/government/publications/
                  learning-from-mistakes-league") +
    ylab(" Staff Survey measure") 

The chart

Trust_Openness_3
Staff survey summary score by level of openness to learning from mistakes; FTs and non FTs

That is a pretty impressive looking chart!!

Conclusions

This is not based on statistical tests but at  first glance it would appear that it would be fraught to predict the level of openness to learning from the staff survey summary score. There is also little or no separation between FTs and non-FTs. Surprisingly, the Trusts with a ‘Good’ rating on learning from mistakes have a higher staff score than ‘Outstanding’ Trusts. Likewise, Trusts with a ‘Poor’ score have a higher staff survey score than those rated ‘Significant concerns’.

 

 

Visualising public health data

Data on population health is often presented as bar charts (single measure across many geographical units) or as line charts ( a single measure over time and across 1 or more areal units) .  These charts are usually produced in Microsoft Excel. This might serve the purpose but Excel is designed more for business use and there are better ways of making the data come alive. The free statistical software package R and its many thousands of add-on ‘packages’  is relatively unknown in public health circles. The graphical packlage GGplot2 is particular, though not as straightforward to learn as Excel, opens the door to some amazing ways to visualise date.

Let’s take a simple example to contrast the two approaches. Data on excess winter deaths is available from the Office for National Statistics for each year from 1991 and for England as a whole and each local authority area within it. The data looks like this:

ewddatasnap

The aim is to show a time-series style chart with the year along the X-axis and a) the excess winter deaths index for Walsall plus its associated 95% confidence intervals, and b) the same data fro England together with its confidence intervals.

In Excel  the only option is to add each of these as a data series and colour them suitably to distinguish one from the other. I may not be an expert in Excel charting buit the best I could come up with was this:

ewd I find it altogether rather muddled and confusing. In any case I elected not to show the confidence interval for England, since it is very narrow and therefore the actual EWDI value is good enough. The CI for Walsall is wide and is worth showing .

A better alternative is to use R and the GGplot2 package, which gives me this rather more elegant graphic where the confidence intervals are shown as ribbons. I wondered whether to show the actual EWDI values for Walsall as a line that sits in the middle of the confidence band and after trying both options the chart with it left in looked rather prettier.

ewd_ggplot

I think this chart looks much more attractive, is clearer and instantly brings out the key message that Walsall’s excess winter deaths index generally tracks the England figure and is not systematically higher.

The code for the above chart is as follows:

ewd <- read.csv("ewd.csv")
library(ggplot2)
p <- ggplot(data=ewd, aes(x=Year, y=England_Index))
p <- p+scale_y_continuous(name="Excess Winter Deaths Index",
 limits = c(0,45)) # title of y axis and strats the graph from zero
p <- p + scale_x_continuous(breaks = c(1991, 1994,1997, 2000, 2003, 2006, 2009, 2012),
 labels = c("91", "94", "97", "2000", "03", "06", "09", "12"))
 # the default x-axis ticks dont look good 
p <- p + geom_line(aes(y=Walsall_Index, colour="Walsall EWDI"))
p <- p + geom_ribbon(aes(x=Year, 
 ymin=England_LCL, 
 ymax=England_UCL, fill="England CI"), alpha=0.5)
p <- p + geom_ribbon(aes(x=Year, 
 ymin=Walsall_LCL, 
 ymax=Walsall_UCL, fill="Walsall CI"), alpha=0.2)
p <- p +scale_color_manual(name="", 
 values = c("Walsall EWDI"="dark Blue",
 "England CI"="dark green",
 "Walsall CI" = "blue") )
 # this and the next lines are neded to generate the legend
p <- p + scale_fill_manual(name="", values = c("England CI"="dark green",
 "Walsall CI" = "blue" ))
p <- p + ggtitle("Excess Winter deaths Index, Walsall and England, 1991 to 2013")
p <- p + theme(legend.position=c(0.7,0.8))
p

 

 

 

 

INR/CNY currency movements – The long view

19 Oct 2015

Introduction

There’s always been interest in how the world’s 2 largest emerging economies – India and China – are faring, both in absolute terms and relative to each other. One simple measure is exhange rate movements.

Detail

The Bank of England publishes daily spot prices of all the world’s currencies against each of the 3 major reserve currencies, the US Dollar, Sterling and the Euro.

R provides the means to download these data directly from the Bank’s website and process them further. This report is produced in RMarkdown with R code using the packages ‘GGplot2’ and ‘gridExtra’

It extracts daily spot prices of the Indian Rupee and the Chinese RMB against Sterling and the US Dollar from 1st Jan 2006 to the latest date and presents the data as 4 time series plots.

Summary

On 16 Oct 2015 the Indian Rupee traded at 64.7 to the US dollar, and 99.93 to the GB Pound

On 16 Oct 2015 the Chinese Yuan traded at 6.35 to the US dollar, and 9.81 to the GB Pound.

The chart

currency

Note: this note and the accompanying chart are an example of how RStudio’s knitr package is used to produce ready to publish reports straight from R code without the need for cut and paste.

NHS Emergency department activity by hour and day of week

Introduction

This is a report on NHS Emergency Department attendances.
The question is this: How does the hourly attendance figures vary by hour and day of week?

Data sources

The health and Social care Information centre publishes lots of data in various formats. Data on ED attendance by hour and day of week (aggregate for all hospitals with an A&E Unit in England) for the years 2011-12 and 2012-13 were downloaded from the HSCIC website.

The weekly pattern

Using R’s base plotting 2011-12 data

ae1

Using GGplot2 with faceting

ae2

Conclusion

There is a clear ebb and flow pattern in the hourly number of cases that are seen in Accident and Emergency Departments in England. regardless of week day or weekend the peak hourly attendance occurs at around 10 or 11 am, with the trough in the early hours of the morning between 4 and 6 am.

Whether or not this fluctuating pattern is justified by clinical need is another question.