(Continued from Part-3)
Finding 9: machine scoring shows a bias against second-language writers (Chen & Cheng, 2008) and minority writers such as Hispanics and African Americans (Elliot, Deess, Rudniy, & Joshi., 2012]
Report-1: The best part about this report is the stance it takes. It immediately got me interested in it.
“Given the fact that many AWE programs have already been in use and involve multiple stakeholders, a blanket rejection of these products may not be a viable, practical stand.
A more pressing question, accordingly, is probably not whether AWE should be used but how this new technology can be used to achieve more desirable learning outcomes while avoiding potential harms that may result from limitations inherent in the technology.”
-This exactly has been my problem with these statements on humanreaders.org website.
The study primarily supports the idea of integrating human and machine assessments, by taking advantage of good things in both, as mentioned below:
“The AWE implementation was viewed comparatively more favorably when the program was used to facilitate students’ early drafting and revising process, and when the teacher made a policy of asking students to meet a preliminary required standard and subsequently provided human feedback. The integration of automated assessment and human assessment for formative learning offers three advantages….”
At least I did not find a direct mention about a “bias” against second language writers in this report! We need to stretch our imagination a bit to reach that conclusion!
Report-2: – The second report was already mentioned in Finding 7. Like before, I did not find a direct relevance of these results to this “finding”. However, I see the point in raising this issue. But, what I don’t understand is that this is just like some American coming and correcting Indian English😛 So, this kind of “bias” can exist in humans as well. What really is a way to handle this, manually or automatically? This does not make the case favourable to human assessment (IMHO).
– I actually did not go through these references beyond their intro and conclusion sections as I felt that “finding” is too much of a blanket statement to be connected to these findings. Skimming through the two freely accessible reports among these confirmed my suspicion. These reports focus more on doing a critical analysis of automated systems and suggesting ways to improve them and combine them with some other things… and not say “machine scores predict future academic success abysmally”.
Part-2 of these findings were focused on the statement: “machine scoring does not measure, and therefore does not promote, authentic acts of writing”
While I intend to stop here (partly because its so time consuming to go through so many reports and partly because of a growing feeling of irritation with these claims), some of these findings-part 2 made me think and some made me smile and some made me fire a question back…
Those that made me think:
“students who know that they are writing only for a machine may be tempted to turn their writing into a game, trying to fool the machine into producing a higher score, which is easily done ”
-This is something that recurs in my thoughts each time I think of automated assessment.. and I know it is not super-difficult to fool the machine in some cases.
“as a result, the machine grading of high-stakes writing assessments seriously degrades instruction in writing (Perelman, 2012a), since teachers have strong incentives to train students in the writing of long verbose prose, the memorization of lists of lengthy and rarely used words, the fabrication rather than the researching of supporting information, in short, to dumb down student writing.”
– This part is actually the main issue. Rather than focusing on just making blanket claims and pushing everything aside, humanreaders.org or any such initiatives should understand the inevitability of automated assessment and focus on how to combine it with human evaluation and other means better!
Those that rather made me smile, although I can’t brush aside these things as impossible
“students are subjected to a high-stakes response to their writing by a device that, in fact, cannot read, as even testing firms admit.”
“in machine-scored testing, often students falsely assume that their writing samples will be read by humans with a human’s insightful understanding”
“teachers are coerced into teaching the writing traits that they know the machine will count .. .. and into not teaching the major traits of successful writing.. .. ”
– No I don’t have any specific comments on these. But, in parts, this is very imaginative…and in parts, it is not entirely impossible.
Those that gave new questions:
“conversely, students who knowingly write for a machine are placed in a bind since they cannot know what qualities of writing the machine will react to positively or negatively, the specific algorithms being closely guarded secrets of the testing firms (Frank, 1992; Rubin & O’Looney, 1990)—a bind made worse when their essay will be rated by both a human and a machine”
-as if we know what humans expect! I thought I wrote very a good essay on Sri Sri in my SSC exam’s question “my favourite poet”… and I got less marks (compared to my past performances) in Telugu, of all things. Apparently, teachers in school and those external examiners did not think alike! (or probably the examiner was a SriSri hater!)
“machines also cannot measure authentic audience awareness”
– Who can? Can humans do that with fellow humans? I don’t think so. I know there are people who think I am dumb. There are also those who think I am smart. There are also those who think I am mediocre. Who is right?
Conclusion of this series:
Although I did not do a real research-like reading and this is not some peer-reviewed article series, I spent some time doing this and it has been an enriching experience in terms of the insights it provided on the field of automated assessment and its criticisms.
What I learnt are two things:
* Automated Assessment is necessary and could not be avoided in future (among several reasons, because of the sheer number of students compared to the number of trained evaluators)
* Overcoming the flaws of automated assessments and efficient ways to combine it with trained human evaluators is more important, realistic and challenging than just branding everything as rubbish.
Although the above two are rather obvious, humanreaders.org “findings” and the way such “findings” immedietly grab media attention convinced me about the above two things more than before!🙂