The Times Higher Education recently published the results of its annual Student Experience Survey. More than 13,000 undergraduates rated their universities under 21 headings on a seven point scale. The points were given weightings, tallied up and then averaged out so that a league table of universities could be produced. Top of the poll was the Loughborough University, for the fifth year in a row.
As a statistician, my interest is in whether league tables such as these have any real meaning. I have a number of concerns.
Who is doing the rating? A sample of 13,000 seems like a lot. But this is spread amongst 113 universities. That's around 115 undergraduate per university, though actual numbers varied from 30 to 257. A typical university has around 10-15,000 undergraduates so each university's score is based on a tiny sample. When opinion polls are carried out these are typically taken from about 1,000 people. In those circumstances, and assuming that the people were selected at random from the entire population and asked a yes/no question, there is a margin of error of 3%. These samples are nowhere near that large and the consequent margin of error will be very much greater. Worse, the sample in this case is not random (it comprises voluntary responses from those invited to participate), the questions were not yes/no but 21 ratings on a subjective seven point scale. And what of those voluntary responses? What kind of person doesn't respond to such a survey? What kind of person does? Some universities strongly encourage students to respond to the survey: why would they do that?
How do you rate something on a seven point scale? What is your frame of reference? How can you ensure that you are consistent in your ratings across the different headings? How can you be sure that your understanding of the points on the scale is consistent with that of other people? When I taught in the US, I had to grade every student on a 12 point scale. One year a student told me that if I graded him A– in my class he would get admission to his favoured university. I had provisionally graded him B+. But I had no confidence that I had any understanding of the difference between those two grades. I gave him the A–.
Physical scientists use scales that are very carefully defined and calibrated: there is very precise agreement on what one metre is, for example. And length is such that if you double the value you get double the length. But social scientists don't use scales like this. On a ten point scale, I'd say my happiness today is a five. But what does that mean? If tomorrow my happiness is ten, does that mean I'm twice as happy? If you are feeling five-happy today are you just as happy as I am? If I'm feeling five-happy this time next week will I be just as happy as I am today?
(Happiness may not be easy to measure but beauty is. Helen of Troy was said to have beauty sufficient to launch a thousand ships. If I flatter myself, my own beauty might be sufficient to launch a single, albeit very small, ship, so my beauty is one millihelen.)
How do you combine 21 scores? If I think my university library is worth six out of seven and the university canteen is worth five, does that mean that the two together are worth 11? Scores such as these are called ordinal: which essentially means that you can put them in an order. If I think my library is worth six points but your library is worth five, we are entitled to conclude that I think my library is better than yours. (Though what basis I have for that judgement and whether anyone would agree is entirely moot.) The problem with ordinal data is that you cannot meaningfully add it up. If you booked three nights at a four star hotel but arrived to find they'd overbooked, would you be happy to be given six nights in a two star hotel? After all, 3x4=6x2. Obviously this is nonsense, yet the Student Experience Survey adds up the scores under the various headings. In fact it weights the scores, doubling some scores for headings that are considered more important. Such judgements are, however, necessarily arbitrary.
How do you compare totals? The final results of the Student Experience Survey are quoted to two decimal places. This is presumably done so that universities can be separated and ranked. If you look at the data you'll see that there would be a lot of ties if the scores were only quoted to one decimal place. Yet this is an extraordinary level of precision for data that is so subjective. If I can't be sure that even one of my 21 scores has the same meaning as one of yours, how can we combine those scores and report a result to two decimal places? In fact, there was remarkably little difference in the final scores. The top 50 universities were separated by less than 10 points: a one point improvement would see the University of Plymouth jump 10 places up the table to replace the University of Edinburgh at 29th.
So does the survey tell us anything? The Times Higher Education defends the survey in a number of ways, not least by pointing to the relative consistency of table position from year to year. It must mean something that Loughborough has come out top five years in a row. But what can we conclude from the fact that the University of York came 32 with a score of 77.63 whereas York St John University came 54th with a score of 75.12? Or that the former jumped 15 places up the table this year compared with last while the latter jumped 18? (Maybe York itself has become a more conducive place to study?)
Ironically, this week's issue of the Times Higher Education reports the publication of a paper critical of the grade-point average system used by many universities. Dr Soh Kay Cheng's critique of the system of grading students is not a million miles away from my own of the ranking of universities themselves.