October 31, 2018

What Do Recruitment Algorithms Measure?

By Juanjo Cardona

6 min

Machine-LearningMulti-IndustryRecruitment

The revelation early this month that Amazon had scrapped its A.I. recruitment software due to its gender bias has placed the spotlight on what are those algorithms made of and what is it that they measure.

“We become what we behold. We shape our tools and then our tools shape us”

John Culkin, professor of communication, Fordham University

 

Early in October, it was reported that Amazon had discontinued its artificial intelligence (A.I.) recruitment tool because it was gender biased for technical roles. According to members of the team working on the algorithm, the system taught itself that male candidates were preferable.

Despite developing software to review job applicants since 2014, Amazon’s algorithm was still in experimental stage as late as 2018. The algorithm assigned job candidates a score ranging from one to five stars – like the system the online retailer offers to shoppers to review products in its platform – but, because it was informed by data coming from applications submitted over a period of 10 years, it reproduced the same gender biases that predominated when humans did the screening.

It is to the credit of Amazon though that this A.I. tool is no longer in use. On the other hand, does this mean they are back to their old ways? If so, its recruitment process might still remain as biased as it was.

Algorithms are ubiquitous, permeating many spheres of our daily lives. They define our credit-worthiness, whom should we date, what news is in the menu for us, what movies to watch, what books to read, where to invest our savings, which college to attend, or the premium on our house/car/health insurance. And, as in the case of Amazon and many others, they also play a role in whether we get a job.

The widespread use of algorithms isn’t inherently good or bad. We might get even tangled on a more philosophical discussion about what does good or bad mean (a subject beyond the scope of this article). However, we could agree that to input data to an algorithm from a system that is flawed will only cause to perpetuate, through the efficiency and scale only algorithms are capable of, the underlying flaws of that system. 

 

Algorithm Dissection

In a nutshell, an algorithm is a step-by-step process that indicates how to combine certain inputs (i.e., a baby crying, a lullaby, a cradle, a pacifier, and a mushy pillow) to get to a certain output (i.e., stop the baby from crying). In this example, the algorithm would look for a combination of the available inputs that optimize for the desired output.

Even a baby would understand the above example (no pun intended). However, algorithms become far more intricate, often hand in hand with the complexity of issues they try to solve: how to optimize a company’s supply chain, how to prevent cancerous cells from spreading, or how to estimate box office numbers for the next Star Wars installment. Algorithms take the form of abstract mathematical models that look to make up for all the nuances and complex relationships underneath the input variables. 

Basically, algorithms crawl datasets (that is, they look at the past) to identify patterns to help achieve a certain goal in the future: a measure of success, or solving a problem.

To illustrate that, let’s use an A.I. recruitment tool from a hypothetical organization as an example. The team working on the machine-learning algorithm to help on recruitment might define success as a function of whether past hires stayed around for a certain amount of time, had been positively reviewed by their managers and peers in regular assessments, and then look at those who eventually got a promotion. I am no data scientist, no software engineer. Nor I need to be to assume these seem to be reasonable metrics to define the success for our recruitment tool. What could possibly go wrong?

However, these metrics – although straightforward in appearance – conceal intricate nuances. What if the corporate culture of this organization has been historically male-oriented? Or what if employees had been traditionally sourced from 985 universities? Additionally, what if the people that made it to the top were die-hard fans of chocolate ice cream? 

On the other hand, what proportion of the final scores in evaluation appraisals that took place in the past could have been biased towards factors that had nothing to do with on-the-job performance? Let’s consider the halo effect: because my co-worker doesn’t like chocolate ice cream, and I absolutely love it, I cannot stand him/her. And, because it is only human, I project this emotion to dislike everything he/she does. On the other hand, the “similar to me” bias might grant higher scores in reviews to co-workers that objectively did not fare very well. We say to ourselves “Yes, I know, he/she needs to improve on that” because consciously or unconsciously we know that he/she is one of us and deserves a second chance.

 

“What does gender, university, or preference for chocolate ice cream tell us about a particular individual’s ability to perform well in a job? Barely anything at all. Correlation does not mean causality”

 

How to discount the biases coming from those evaluations? This is a very complex task. It involves diving into the inner workings of people’s minds, and still, they are extremely difficult to account for and code into software. Similarly, how to discount the preference for candidates from one gender over another. How to level the playing field for candidates that did not study at a specific pool of universities or simply don’t happen to like chocolate ice cream? One possible way could be to give feedback to the algorithm on how candidates that left the company or did not make it passed the screening fared in other professional endeavors. That might reveal flaws in the internal recruitment process if a significant portion of those that were turned away proved to be high-performers elsewhere. However, that data is hard to come by. But unfortunately, this is the sort of feedback that would take to improve the model.

Because of the algorithm measures success leaving out complex but relevant information what we are left is with a model the simplifies reality. Think about it: what does gender, university, or preference for chocolate ice cream tell us about a particular individual’s ability to perform well in a job? Barely anything at all. Correlation does not mean causality. Birds don’t fly because they have feathers. Feathers help to keep the bodies warm during the flight: from an evolutionary perspective, it helps a lot not to freeze to death while flying. But flying itself comes from the difference in air pressure over on top and at the bottom of the wings, that creates a force on the wings that lift birds into the air.

You might argue that, if an organization chooses that the sort of candidate that is the best fit for its culture is a male, who went to a specific set of universities and happens to love chocolate ice cream, then so be it. Leaving aside that some of these criteria fall flat out of the boundaries of the law, let’s assume if only for a moment that we could accept this argument. The problem is this is not what the algorithm was set for. Those were not meant to be the vectors that would translate into success for the initial model. Remember, the algorithm was configured to look for prospective job applicants with a higher likelihood to remain longer at the company, to get a promotion, and to pass assessments from managers with flying colors.

The great irony is that there is no one to blame: no software engineer sat down to explicitly code for chocolate ice cream, no secret management meeting took place in an obscure room to devise a convoluted model with a hidden agenda. As former Google advertising strategist James Williams very elegantly puts it “at “fault” are more often the emergent dynamics of complex multiagent systems rather than the internal decision-making of a single individual” (1).

Conclusion

Algorithms are here to stay. Again, this is neither intrinsically positive or negative. There is no technological determinism here. However, it is important to know what the algorithms are looking for. What is the purpose of the algorithm? How was then that purpose defined and coded into software? Replacing realities that are complex and nuanced by models that simplify it can have pervasive effects: prejudices and biases might be embedded in the code. The speed and scale that algorithms provide will then amplify and propagate those biases.

The digital gatekeepers can be gamed by tech-savvy individuals, creating a pervasive circle. The algorithm uses proxies to define a metric of success, with those proxies being but a poor simplification of reality. A portion of job applicants possess the skills or have the necessary connections to out-game the algorithm – much in the same way Search Engine Optimization looks to place content on the first page of search engines, rather than focusing on the value provided to the user. Then the algorithm catches up with the tactics and tools tech-savvy job applicants use and tweaks its model with further abstractions that further detach it from reality. This downward spiral imposes a cat-and-mouse dynamic that creates a divide between those who have a greater likelihood to get a job and those who don’t, based on the achievement of some measure of success that by now is completely detached from original goal the algorithm was designed for.

Algorithms should be supportive of positive transformation within organizations, rather than encapsulating into software whatever biases pre-existed. Transparency and accountability should be extended to algorithms in the digital realm, the same way is expected from individuals, organizations, and institutions here down the clouds.   

 

 

(1) Stand Out of Our Light. Freedom and Resistance in the Attention Economy. James Williams. Cambridge University Press (May 2018). Page 102


 

Juanjo Cardona

Editor at ChinaHRnews.com

L: English, Spanish

T: +86 21 6010 5000

E: j.cardona@directhr.cn