Rubrics for Integrated Assessment
The single biggest professional mistake I’ve ever made was to get my master’s degree in TESOL at The New School. Like, by a long shot. You see, prior to grad school I was, bar none, without qualification, the best English teacher the world had ever seen. I knew everything there was to know about everything there was to know about. It wasn’t until I started my master’s program that I started to learn about all these things I didn’t know. Volume upon pricey volume of things I didn’t know. Scott Thornbury was even so kind as to compile and alphabetize all the things I don’t know. And one of the subjects about which I came to know I knew about least was assessment.
Problems with Conventional Assessment
When first The New School began to rob me of my expertise, I was teaching at an IEP in Boston. There, our placement test was a brief interview with one of a few senior teachers who were considered “qualified.” They’d rattle through a script of questions chosen to elicit a variety of grammatical structures. As students answered, the teacher would compare the language of their answers to that used by students in our various levels, and they’d place the incoming student accordingly. To progress to the next level, students would take a series of achievement tests comprising multiple choice and true/false items designed in-house, by the teachers.
These two test types that we employed illustrate some common problems with the conventional assessments that teachers so often use. In the case of the placement interview, there are major issues with reliability. There were no stays in place to ensure that each scorer is scoring the same as the other scorers, or that scores weren’t being influenced by factors like unconscious biases or the state of the scorer’s digestion. And indeed, a couple of years on, we found ourselves complaining, “They just don’t make Level 2s like they used to no more.”
The issue with our achievement tests was one of validity, that is, whether we were testing what we wanted to be testing. What we wanted to know was whether our students could produce the target language forms. But in order to learn that we presented them with multiple-choice questions and asked them to circle a letter. Doesn’t exactly sequitur? I agree. If you want to know whether students can do one thing, you don’t ask them to do something altogether different.
Scoring rubrics can address many issues of both reliability and validity. Rubrics consist of a series of scores, each of which corresponds to a description, in concrete, observable terms, of what a performance at a given level should look like. Thus scores are tethered to something independently observable and not just subject to the whims of the scorer, ensuring a degree of reliability. More importantly, we can use rubrics to score actual language performances, which brings us one step closer to ensuring validity than we get with most selected response techniques. Reliability and validity are complex notions, and I’m simplifying quite a bit for the sake of bloggability, but you get the idea.
Another benefit of rubrics that I think receives too little attention is the non-binary nature of the scores they produce. With a selected response question, students are either correct or incorrect. They get a point or they don’t. No gray area. But does that reflect the nature of language and what we know about how it is acquired? Do our students go from absolutely not knowing a form to absolutely knowing it? No, there are incremental improvements, which a binary scoring system often fails to measure.
Moreover, compared to many conventional assessment techniques, scoring rubrics are more versatile and easier to design well. I would also argue that an imperfectly designed rubric is more effective than an imperfectly written selected response item.
Types of Rubrics
There are two common types of scoring rubrics: analytic and holistic. Basically, an analytic rubric takes a student response and breaks out the target characteristics—or criteria—on which the student is assigned a score, or level. Here’s an example of an analytic writing rubric I have used:
A holistic rubric combines all these characteristics into a more, well, holistic description of a performance. Thus, only one score is given on a holistic rubric. ETS uses holistic rubrics to score the writing sections of the TOEFL.
As you’d guess, there are benefits and drawbacks to each version. Analytic rubrics are more precise and provide more information: we can home in on individual strengths and weaknesses. They help your students to see exactly which features it is that you’re looking for. But they’re also time-consuming to administer. The primary benefit of holistic scoring is efficiency, but at the cost of a more detailed picture of students’ strengths and weaknesses.
The choice between analytic and holistic is essentially one between speed and precision. But you need make that choice only if you need to assess each aspect of a performance. I actually propose a third type of rubric for quick but focused in-class assessment: single-criterion rubrics. I don’t know whether anyone else is using these, but I’ve used them quite a bit and found them very useful. Essentially we’re talking about one row of an analytic rubric, focused on a particular criterion of an utterance. Here’s an example of a single-criterion rubric and its descriptors:
Designing Single-Criterion Rubrics
So how do we make a good rubric? I’m going to quickly run through how to construct a solid single-criterion rubric. The same basic principles can be applied to constructing holistic and analytic ones.
- Select the criterion that you hope to assess. This could be anything from clarity of pronunciation in an oral presentation to communicativity in grammar in a brief essay. It should be aligned with your learning objectives and should lend itself to scaled scoring (as opposed to features that are simply correct or incorrect such as selecting the right preposition).
- Determine what observable features distinguish different performance levels. The best way to come up with those observable features? Observe! For instance, if you’re assessing communicativity in use of the conditional, some speakers might not be able to communicate using the structure at all, others might be able to communicate a general idea that still leaves us with serious questions, others might be able to communicate in a way that leaves us fairly certain we’ve understood, and still others will have achieved native-like communication.
- Start assigning scores to different performance levels only once you’ve described your performance levels. Do not decide on 3 or 5 just because they’re nice round numbers.
- Keep your draft rubric around while students are performing relevant tasks. Informally compare their performances to your descriptors, and adjust as needed before using it for actual assessment.
This is really just a quick primer on rubric development. For a much more complete guide to developing analytic rubrics, check out this site. I hope at this point, you’ve got some idea why rubrics are preferable to conventional assessment, and how to get started using them. If you feel like you know less now than you did when you started, perfect!
Rob Sheppard is director of adult education at Quincy Asian Resources, a member of the community advisory council at First Literacy, a regular blogger for TESOL International, and a graduate of The New School’s MA TESOL program.