Data visualisation using R
An example using NHS data
Most data charts in the NHS use Excel. This may be adequate for most purposes. But there are times when the richness of the data may not be fully exploited by the standard data types available in Excel.
Other statistical and data science tools such as Python, and R provide a richer set of chart types and techniques to turn data into exciting images.
Let me illustrate this with an example.
Ive taken a data set published on 9 March 2016 by Monitor and NHS Trust Development Authority. (Note: These two bodies came together as NHS Improvement effective 1st April 2016.
Understanding the data
Each Trust is placed in a category according to an assessment of its systems and capability for learning from mistakes. The 4 categories are:
- Needs improvement, and
The excel data file also provides a great deal of detail on the 2015 staff survey. I have confined myself to the composite measure of the Staff survey. This varies from 3.29 to 4.0 with a median of 3.64 and follows a roughly normal distribtion with rather flat tails at both ends.
- Do Foundation Trusts (FTs) differ from Non-FTs?
- Does the staff survey composite score (SSCS) vary according to the learning from mistakes category (LFMC)
A closer look at the data
This is what the data looks like.. Ive shown a few rows from the top of the league table and a couple from the bottom.
|1||RTF||Northumbria Healthcare NHS Foundation Trust||Good||3.93||1|
|2||RPG||Oxleas NHS Foundation Trust||Good||3.90||2|
|3||RPY||The Royal Marsden NHS Foundation Trust||Good||3.90||3|
|225||RBS||Alder Hey Children’s NHS Foundation Trust||Significant Concerns||3.38||225|
|230||RXC||East Sussex Healthcare NHS Trust Significant||Concerns||3.29||230|
The code in R
d <- read.csv("NHS_Trusts_Learning_league.csv") # reads the data into an object d levels(d$Category) <- c("Outstanding","Good","Significant Concerns","Poor") # this resets the levels of the category variable in the explicit order. # R's default woiuld have been to assign levels alphabetically ## note that the original data set does not label each Trust as an FT or a non FT ## the following code does this job d$FT.Status <- "NHS Trust" # creates a new column and sets all the rows # to 'NHS Trust' Is.Foundation <- grepl("Foundation", d$Trust) # grepl is a text pattern matching # function that creates a logical # vector of length = no of rows # in the data table d$FT.Status <- ifelse(Is.Foundation, "Foundation Trust", "NHS Trust") # this line of code assigns "Foundation Trust" to rows for which # the logical vector has a 'TRUE'
We are now ready to create the plot with the following code. I assume the R package ‘ggplot2’ is installed and loaded for the session
p <- ggplot(data=d, aes(x=Category, y=Staff.survey.measure)) p + geom_jitter(aes(colour=FT.Status), width=0.4) + scale_colour_manual(values = c("red","blue"), guide_legend(title = "")) + ggtitle("Staff survey results, FT status correlate poorly with \n levels of openness and transparency") + xlab("Openness and transparency category \n https://www.gov.uk/government/publications/ learning-from-mistakes-league") + ylab(" Staff Survey measure")
That is a pretty impressive looking chart!!
This is not based on statistical tests but at first glance it would appear that it would be fraught to predict the level of openness to learning from the staff survey summary score. There is also little or no separation between FTs and non-FTs. Surprisingly, the Trusts with a ‘Good’ rating on learning from mistakes have a higher staff score than ‘Outstanding’ Trusts. Likewise, Trusts with a ‘Poor’ score have a higher staff survey score than those rated ‘Significant concerns’.