Summative Testing to Retrieval Practice: clarifying terms and outlining classroom benefits

The words ‘assess’, ‘test’, ‘exam’ and ‘quiz’ often have negative connotations to us as teachers and most certainly for our pupils. They can be stressful, anxiety provoking and feeljudgmental.

However, in my last post, I ended with this:

…Ebbinghaus (1885/1964) claimed that we can ‘interrupt’ forgetting by recalling information over time, spacing our retrieval out at regular intervals. In more recent years, Bjork’s (1992) research has shown that forgettingand then retrieving information can aid our memory:

“Forgetting, rather than undoing learning, creates the opportunity to reach additional levels of learning.”[i]

This puts the strategy of getting our students to ‘test’ the information they can get out of their long-term memories in a place that is far from a hindrance to learning, even if they fail. In fact, it seems that testing is essential to students’ remembering for later use and application. So, even if tests are anxiety provoking, research suggests they are effective teaching tools (Roediger et al. 2010).

So this is the paradox: testing provokes anxiety in pupils and steals learning time from teachers, meanwhile, the research says that the testing effect is a highly effective strategy. Can we square this circle? I believe so. It begins with being clear about terms and arriving at an alternative one.

The aims of testing: summative and formative

Let’s begin with being clear about what is meant by ‘tests’ and ‘testing’ and the two aims of the process. I’ll begin with a definition:

“Test: A procedure intended to establish the quality, performance, or reliability of something.”

 This isn’t particularly controversial. It is a widely accepted understanding in education that the word ‘test’ is used to mean the processes used to understand how much and how deep of pupil learning is.

Moving forward from this definition, there are two very different purposes, or uses, of testing (Wiliam & Black, 1996). The first aim is summative. We might give a test to give a grade, to give a judgment, to give a final remark or report on how well a student has done. This sort of ‘test’ has a lingering sense of conclusiveness. A good example of this sort of test is the standardised exams we might give our pupils at the end of a term or at key stages such as Year 2 and Year 6 SATs. These are examples of summative tests. Research conducted by both Ofsted and Educationalists elsewhere has shown time and again how this sort of assessment might be necessary for external stakeholders and school leaders in education but too much of this sort of testing for our students might be a bad thing (Speilman, 2017; Christodoulou, 2016). This is not the sort of testing I think we ought to do more of in our classrooms, neither is it the focus of this post. So, to summarise what I’m saying about using tests with summative purposes: they are important at the end of a course or key stage but too much of them in the classroom might cause issues for our learners (for details of this, see Speilman, 2017).

The alternative is to use tests for formativepurposes. By doing this, teachers can use tests to make sense of what their students have remembered, what has been forgotten and how they, as the educator, can do something about it. These sorts of tests are developmental and have the potential to shape what happens in the learning process. Tests with these purposes provide teachers with valuable information about where to go next with their students. Using tests formatively doesn’t give grades and final judgments; it gives guidance for the ways teachers can respond to the needs of their learners.

This might mean that we repurpose old SATs tests so that they serve formative purposes. I think this must be done with real care (as I mention in this post). The original intentions of SATs, GCSE and A Level tests is to provide a summative judgment across an entire course. They are designed to give a snapshot of the achievement of pupils across a key stage. They are not designed to test every curriculum objective that has been covered since Year 3 or Year 7. For this reason, like I experienced, several questions ‘smuggle in’ a whole host of objectives to provide a better summative judgment. They provide very limited formative information for teachers if they are not used correctly.

Some have claimed that by engaging in formative tests of this kind, it doesn’t cultivate a deep understanding of what is being learnt; it embodies rote learning that is superficial. Roediger et al. (2011) points to work conducted by Butler (2010) that shows that, if used properly, retrieval practice strategies in the classroom can contribute to deep learning that can be used in other contexts.

This is the original purpose of forgetting that Bjork was referring to and I discussed in more detail in my last post: by forgetting and getting it wrong on a formative test, more learning gains can eventually be achieved. Pupil learning becomes deeper. Making these tests ‘low-stakes’ with no final grade takes the pressure off students to perform and gives them the incentive to learn the things they got wrong on the test. This gives the opportunity to teachers to use ‘tests’ to improve their teaching and for students to improve their learning.

So far, I have hopefully given a clearer definition of the aims of testing and explained how one purpose is necessary (summative); the other can be a highly effective teaching tool (formative). I have then discussed some of the caveats and criticisms that have been levied.

What are the benefits of formative testing in the classroom?

Roediger et al. (2011) produced a helpful table to summarise the research of the benefits of the ‘testing effect’, with formative aims. For more details on this, see here.

Screen Shot 2018-11-10 at 09.27.55

And….to retrieval practice (finally)

At this point, to be clearer, it’s worth changing the terminology of formative, low-stakes testing strategies in classrooms, just as the research has done. A more precise way to articulate what I’ve referred to above is ‘retrieval practice’. This makes it a whole lot easier to explain. It removes the negative connotations associated with testing and orientates the approach towards practicing, improving…..and learning.

This is because the purposes of testing our pupils, in a low stakes, quick and easy way, is to help them recall their knowledge and make it easier for them to use, not to put them under pressure.

How then should we teach using retrieval practice in our classrooms?

My next post five flexible strategies for using retrieval practice in the classroom.


Bjork, R. A., & Bjork, E. L. (1992). A new theory of disuse and an old theory of stimulus fluctuation. In A. Healy, S Kosslyn, & R. Shiffrin, (Eds.), From learning processes to cognitive processes: Essays in honor ofWilliam K. Estes (Vol. 2, pp. 35–67). Hillsdale, NJ: Erlbaum.

Butler, A. C. (2010). Repeated testing produces superior transfer of learning relative to repeated studying. Journal of Experimental Psychology: Learning,Memory, and Cognition, 36, 1118–1133.

Christodoulou, D. (2017) Making Good Progress? Oxford: OUP

Ebbinghaus, H. (1885). U¨ ber das Ged€achtnis. Leipzig: Duncker & Humblot.

Roediger, H. L., & Karpicke, J. D. (2006b). The power of testing memory: Basic research and implications for educational practice. Perspectives on Psychological Science, 1,181–210.

Speilman, A (2017), ‘Amanda Speilman’s speech as the Festival of Education’, transcript, DfE, viewed 9 December 2017, of-education.

Wiliam, D. and Black, P. (1996) Meanings and Consequences: a basis for distinguishing formative and summative functions of assessment?, British Educational Research Journal, 22(5), pp.537-548


[i]I got this quote from this post from the Institute of Teaching website.

3 thoughts on “Summative Testing to Retrieval Practice: clarifying terms and outlining classroom benefits”

  1. There was a time when tests were a normal part of teaching and learning, but Plowden pretty much put an end to that. By the time ‘Inside the Black Box’ came along, many (if not most) teachers really had very little idea how much of their lessons were learned, let alone retained. Sadly, AfL focused largely on higher-order skills, which left the bottom half floundering, but fortunately we are now much more aware that children can’t skip the lower levels of Bloom’s Pyramid. You are to be congratulated for finding Roediger’s ‘ten benefits of testing’–I’ve used it for INSET, but this is the first time I’ve ever seen it in a blog.

    However, I’d like to propose an 11th benefit, one which will appear counter-intuitive to most teachers: once they get used to them, kids love knowledge tests. Even the least able pupiils are highly motivated by them. Things have moved on slightly since we published this paper just under two years ago, but you may find it of interest:


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s