2 Assessment Concerns

Last night I was at a district meeting on Communicating Student Learning. There are a few different CSL projects going on in our school district and these meetings are good places to share our individual school experiences and collaborate on new ideas.

At one point in the meeting, two concerns about proficiency based assessment/reporting came up. I wanted to write about them because these are two issues that I see raised with regards to assessment and Standards Based Grading (SBG) quite often and they are great questions. The first concern was asking how teachers can decide whether a student is ready to progress on to the next grade level when using proficiency scales. The second concern was asking if we have research evidence to show that proficiency scales improve student learning.

I’ll start by addressing the second concern. In many education circles we are seeing an increased interest in making evidence-based decisions. I welcome this interest and it’s something that I’ve carried with me from my previous career in the medical device industry. Investigating the research around teaching methods is all about questioning our confidence in how successful a particular intervention is. However, asking if proficiency scales improves student outcomes is a red herring because the underlying assumption being made is that we use proficiency scales for the specific purpose of improving learning. Suppose a school district decided to change from white paper to a brown-ish recycled paper. We wouldn’t ask if this is helping student learning because that wouldn’t be what its goal is. I feel that the primary reason for using a proficiency scale to report how a student is performing on a task, compared to using an averaged 100 point scale which may report something very different.

While I believe we are mostly interested in using a proficiency scale to communicate what a student has done, there is more to it than being a one-way communication tool to parents. Proficiency scales are tightly associated and tied to criteria based formative assessment. All formative assessment schemes involve proficiency scales at some level. We also know that formative assessment is one of the best and most cost effective ways of increasing student learning. In this regard I feel confident in saying that as long as proficiency scales are part of a thoughtful formative assessment methodology, then student learning will improve.

Proficiency scales are often used and required in formative assessment, and formative assessment improves learning. Therefore proficiency scales often play a role in improving student learning.

As for the first concern mentioned above, it’s common for people to ask how we decide to pass or fail a student if we are using proficiency scales. The inference in this question is that a 100 point percent system is very clear because everyone knows that 50% is a pass. You get 50% or more, you pass the course.

Before I describe how proficiency scales can be used to determine a passing grade, I first would like to point out the absurdity of the 50% pass. I’ll start with an anecdote. Generally speaking, I don’t think I’ve ever seen a student with 50% that can really do anything correctly in a course. The 50% student pretty much doesn’t get the course skills or understanding. This is especially true for students that have 50% purely based on judging assessments, not because of late marks or other behavioral reasons. I suppose if a teacher deducts points/percents from a grade for behavioral reasons, a student could have less than 50% but still be skilled in some learning objectives.

In terms of accuracy, no one can grade classroom work to within 1%, especially if we want the grade to reflect student knowledge, skill and understanding. There are many reasons that we don’t have this kind of accuracy. The day of the test/assessment might be a bad day for a kid that is sick/sleep deprived, caught between fighting parents, anxious, preparing for a intense out-of-school competition, etc. The assessment itself might be invalid because the questions and tasks don’t fairly represent what the students were explicitly taught. A test that involves a broad range of topics can have test items that are biased in favour of a few topics, and therefore different students will get different grades depending which particular topic that are better in. This last point is one reason why I dislike teacher developed “final exams” that are meant to cover 9 months of content and understanding in a one hour long test. I believe that many tests used in the classroom are simply not robust. Assessment accuracy depends on a balance between robustness (did one student do well because the test happened to be weighted on the one topic they knew really well?) and sensitivity (does the assessment actually allow us to make good inferences on how much the student knows?). Test robustness and sensitivity require comprehensive test items that cover all topics in varying degrees of difficulty.

It is very difficult to write a valid assessment whether we are using a 100 point scale or proficiency scale. What we need to do is recognize the faults in our assessments and not attribute a perception of accuracy that is not there.

The last reason we should be wary of the 100 point scale is the idea that a 1% change is the difference between pass and fail, and reconciling that with what a student has to do to make a 1% change. For example, suppose a student has 49% and they just have to hand in one homework set to get the additional 1%. It’s likely that the student’s understanding and knowledge will not have changed from before handing in a homework set to after handing it in. If their knowledge and understanding hasn’t changed, why would their pass/fail status change? A teacher could instead have a student re-do an assessment in order for the grade to improve but the assessment would have to be carefully crafted so that an improved grade relies on the student improving on something they weren’t good at previously. Even if this is accomplished, there is still the problem of accuracy in the assessments. If we can’t measure better than +/- 5%, how can we say that 49% is a fail but 50% is a pass?

The pitfalls of the 100 point scale don’t speak to the concern on passing and failing using proficiency scales, but it can be useful to compare the two. Let’s compare examples of clearly defining what constitutes a “pass” for a course.

100 point scale	proficiency scale
You must get an average of 50% or greater	You must be “proficient in at least 50% of your learning objectives, and “beginning in no more than 20% of your learning objectives.

The 100-point scale describes what a student gets whereas the proficiency scale describes what a student can do. The proficiency scale example gives a better description of what constitutes a pass.

There are many different ways to describe a pass using proficiency scales, the above example is just one. In my experience, when a student has a portfolio of work assessed using proficiency scales, it has always been very clear on whether the student should pass the course. In the time leading up to a final report, suggested actions can be given to a student so they will clearly know what is required to pass. “Sally-Jane needs to improve on the learning objectives X, Y and Z in order to successfully complete the course.” In this example, the student is given clear and specific guidance on what needs to be done, and the action directly relates to knowing and understanding of the course’s curricular goals.

Whether we are talking about representation of accuracy, communication or improved student learning, the use of proficiency scales is an improvement on using a 100 point scale. I would be interested in hearing any counter arguments to this in the comments below.