Reflecting Allowed

Maha Bali’s blog about education

On Noticing Absence in Algorithms part 2


Reading Time: 5 minutes

This post comes mins after the previous one on noticing absence. After I published it I realized I had a couple more things to say..but I think they deserve their own post. I may or may not finish it right away.

This one is about computer algorithms of the kind we talk about a lot in edtech. I am reading Weapons of Math Destruction by Cathy O’Neill (i was heading off to continue reading it then had this brainstorm).

Let me repeat this piece of information. I studied Neural Networks as an undergraduate and my computer science graduation thesis was a neural network (with population-oriented simulated annealing as a genetic algorithm to optimize it, but never mind). I say this because those algorithms that are “black boxes” to us are some evolution of this now almost 15-year old concept I used to know.

Here are thoughts on absence 

  1. Yes. The programmer who wrote the code doesn’t actually know how the algorithm processes the information. Because the programmer doesn’t tell the system what to do with the data. It allows the system to learn from the “training set” data until it’s good enough to then work with “real” data and get sufficiently accurate results. It’s complicated to understand and explain. But believe me on this one. They really don’t know. I mean someone could probably trace it but it wouldn’t be humanly comprehensible, for the most part. I know there’s more to it than what I studied 15 years ago and fuzzy logic and stuff, but the basics are the same because I continue to find this in everything I read 
  2. How on earth did they teach us this stuff and not once had a discussion about ethics of it? I wonder if they do that now, or if computer scientists are meant to be asocial amoral human beings? All the ethics we were taught was about intellectual property. Seriously. 
  3. What matters about bias in algorithms are three things,imho:
  • The training data you give it can have bias/absence. E.g. if u r trying to predict breast cancer survival and u give it only data from white women in Wisconsin, it will probably not work if breast cancer behaves differently for non-white women, or outside Wisconsin). Moreover if u give it human results to adhere to, and those results are biased, it will learn that bias. An example I heard recently related to hairstyles and how googling something like appropriate vs inappropriate hairstyles produced racially biased results. Because apparently Google learned from clicks that people tended to prefer certain results
  • Programmers have some choice over which dimensions of data to feed a system to learn from. Take this scenario from my head. If Amazon (which in reality btw seem to have a straightforward non-black-box stupid algorithm that seems transparent to me) wanted to create a smart algorithm to give u recommendations, it can choose whether or not to take into account other books u have bought, who else bought those books, what else they bought (it already does this) but it also has a choice to include book reviews you wrote about books you DIDN’T buy through Amazon (it currently does not, I believe). It could focus on what price range you usually buy (and keep recommending cheaper books if u tend to buy them – it may already be doing that). It may choose to record dates and in which case it might recognize ur xmas shopping patterns differ from usual. It may record ur postcode and recognize when an author was visiting your city for a book signing. Whatever. In any case, it may miss an important factor. Or include an insignificant factor. And feeding those into neural networks may or may not cause the network to create false correlations. I remember a teacher who once found that student clicker numbers correlated w grades or something. I said probably the ones in front row have similar clicker numbers (coz distributed from uni to them for that class together) and are good students 
  • Programmers have choice over how to optimize the network and when to tweak it. How much human agency to let in. Cathy O’Neil’s book and Facebook’s algorithm both made me realize how often we seem to be asking people to conform to the algorithm vs judgment and agency of people. And that’s the worst thing there is, really, for me.

At OEB16 (I watched some recordings) there were a couple sessions on AI. A debate on whether Artificial Intelligence could replace teachers, and a keynote talk or such by a German, Tarek Besold.

Donald Clark listed features of a good tutor then ticked off how they could be  done via AI. Tarek did similar things with creativity and teaching again.

Neither of them accounted for things like human connections, empathy, feelings, social interaction as important aspects of a teacher-student relationship. Tarek didn’t account for enjoyment or self-fulfillment in creativity . 

When we focus on cognitive, measurable aspects of what teaching and creativity are, we help ourselves imagine them as replaceable by machines. When we remind ourselves that humanity has many more dimensions than cognitive, that there are emotional and social dimensions and that there is agency (as Nell Watson said in that AI debate), that is when the conversation becomes meaningful and the answer has got to be no. Machines won’t give you hugs and life you up when you’re down. A human behind a machine might, though. 

I remember the excitement a few years ago over automatic grading software (esp w MOOCs). The problem for me isn’t whether the tools are accurate or resemble any teacher’s actual grading but:

  1. The inhumanity of asking students to write for a machine audience! If what we want to do is machine grade students writing then we aren’t asking them to write anything meaningful in the first place. Absence of a purpose or understateding of the rea goal of a writing assignment. 
  2. The inhumanity of forgetting the importance of emotional aspects of giving feedback and the impact of those conversations on students’ confidence, motivation, wellbeing 
  3. The often ignored aspect that a teacher needs to grade hundreds or thousands of a certain assignment for the machine to learn their approach. First of all, I almost never assign the same assignment twice. I tweak it and often change it drastically each semester. For good reason. Even if I had 1,000 students this semester and that was enough training data, it wouldn’t work next semester. Even if i didn’t change the assignment, I might actually evolve the way I grade. I actually LEARN from reading student assignments and I learn how to teach them better. Machines, analytics, they may give me partial perspectives on that, but very partial and predictable according to what someone else told the algorithm to look for and look at, and what some training data told it was the right way to weigh that data. I would rather trust my own judgment 

I need to stop and sleep now 🙂 or read a little of that book that got me on this track!


  1. “How on earth did they teach us this stuff and not once had a discussion about ethics of it? I wonder if they do that now, or if computer scientists are meant to be asocial amoral human beings? All the ethics we were taught was about intellectual property. Seriously. ”

    There is some more movement now in this direction. For a while now the ACM has has in its guidelines for CS Curriculum Ethics as an area of study. Here we have just introduced a new course in Ethics and Professional Practice in Computing which I hope to dig into next year. Someone who sits at the intersection of the social and the technical is in a reasonable position to bring a more diverse perspective to the subject. Often tech people do not see thru or past the tech.

Leave a Reply


Get every new post on this blog delivered to your Inbox.

Join other followers:

%d bloggers like this: