Much of the recent international discussion regarding the measurement of learning outcomes globally has been driven by the need to monitor Sustainable Development Goal 4 – ‘to ensure inclusive and equitable quality education for all’. Such learning assessments, as will be shown in the next GEM Report due out in October of this year, are one of many types of mechanisms being used to hold different actors to account if progress towards SDG4 is dragging its feet. However, against the backdrop of increased threats to aid funding in countries such as the UK, and the prevalent use of ‘pay by results’ in development programming (such as the £344 million Girls’ Education Challenge Fund), the stakes involved in measuring learning outcomes are being raised.
The need to measure learning outcomes well in development programming is rarely seen as overly contentious until we start to drill down into the practicalities. In practice, value for money concerns, the need for rapid data to inform policy and a simple lack of technical know-how often result in unreliable or invalid learning measures.
It is clear that large-scale assessments cost money, and equally clear that the limited international development resources of governments and donors demand a focus on value for money.
However, value for money doesn’t mean simply choosing the cheapest option, and it should look at longer-term sustainable benefits, not just short-term tangible resources. At the same time, assessments and evaluations must ensure that policy makers can make practicable decisions to inform the next programme before the assessment data becomes outdated or irrelevant.
What are the risks of choosing poor quality instruments, and what are the factors that should be considered when deciding which learning assessments to pursue?
- Developing robust learning measures is a complex task
While it is relatively simple to measure something in the physical world, such as height or weight, measuring learning requires a more complex and nuanced approach. (It has even been suggested that learning cannot be quantified at all.) Not only is learning less observable and quantifiable, it is much less consistent: if you were to measure your height twice, you would be much more likely to get the same result than if you took two tests of the same difficulty on different days. Standardising test conditions, test administration and test marking and knowing what to test and why requires expertise.
- Low quality measures produce a lack of information
An assessment that is too easy or too difficult for the vast majority of students leads to a lack of information, defeating the purpose of the assessment. If most of the students score very low, the information gathered only shows what students don’t know, not what they do know. If all the students score perfect or near perfect scores, we cannot ascertain the upper threshold of what a group of students know. This is the equivalent of using a set of scales that only measure up to ten stone to monitor adult weight in the UK: such an approach would tell you very little about the vast majority of the population.
- What we test becomes what is taught
Teachers and students tend to see the contents of assessments as reflecting what students should learn and what teachers should teach. Nag’s recent review highlights how assessment informs teaching practice. Poor quality assessments can test students on things they don’t need to know and this can lead to confusion. In doing so, they re-orientate what is taught in the classroom away from what is most beneficial for students. Sonali Nag’s research finds that ‘teachers who assess well and use test information well, teach better’.
- Value for money
While developing quality assessments requires an investment in technical expertise, the most expensive aspect of measuring learning is actually administration – and this cost doesn’t usually vary depending on the quality of the assessment. There is a strong value for money argument, then, to invest in developing quality assessments so that these costs provide the most amount of accurate information possible