From music suggestions to help with medical diagnoses, population surveillance, university selection and professional recruitment, algorithms are everywhere, and transform our everyday lives. Sometimes, they lead us astray. At fault are the statistical, economic and cognitive biases inherent to the very nature of the current algorithms, which are supplied with massive data that may be incomplete or incorrect. However, there are solutions for reducing and correcting these biases. Stéphan Clémençon and David Bounie, Télécom ParisTech researchers in machine learning and economics, respectively, recently published a report on the current approaches and those which are under exploration.
Ethics and equity in algorithms are increasingly important issues for the scientific community. Algorithms are supplied with the data we give them including texts, images, videos and sounds, and they learn from these data through reinforcement. Their decisions are therefore based on subjective criteria: ours, and those of the data supplied. Some biases can thus be learned and accentuated by automated learning. This results in the algorithm deviating from what should be a neutral result, leading to potential discrimination based on origin, gender, age, financial situation, etc. In their report “Algorithms: bias, discrimination and fairness”, a cross-disciplinary team[1] of researchers at Télécom ParisTech and the University of Paris Nanterre investigated these biases. They asked the following basic questions: Why are algorithms likely to be distorted? Can these biases be avoided? If yes, how can we minimize them?
The authors of the report are categorical: algorithms are not neutral. On the one hand, because they are designed by humans. On the other hand, because “these biases partly occur because the learning data lacks representativity” explains David Bounie, researcher in economics at Télécom ParisTech and co-author of the report. For example: the recruitment algorithm for the giant Amazon was heavily criticized in 2015 for having discriminated against female applicants. At fault, was an imbalance in the history of the pre-existing data. The people recruited in the previous ten years were primarily men. The algorithm had therefore been trained by a gender-biased learning corpus. As the saying goes, “garbage in, garbage out”. In other words, if the input data is of poor quality, the output will be poor too.
Stéphan Clémençon is a researcher in machine learning at Télécom Paristech and co-author of the report. For him, “this is one of the growing accusations made of artificial intelligence: the absence of control over the data acquisition process.” For the researchers, one way of introducing equity into algorithms is to contradict them. An analogy can be drawn with surveys: “In surveys, we ensure that the data are representative by using a controlled sample based on the known distribution of the general population” says Stéphan Clémençon.
Using statistics to make up for missing data
From employability to criminality or solvency, learning algorithms have a growing impact on decisions and human lives. These biases could be overcome by calculating the probability that an individual with certain characteristics is included in the sample. “We essentially need to understand why some groups of people are under-represented in the database” the researchers explain. Coming back to the example of Amazon, the algorithm favored applications from men because the recruitments made over the last ten years were primarily men. This bias could have been avoided by realizing that the likelihood of finding a woman in the data sample used was significantly lower than the distribution of women in the population.
“While this probability is not known, we need to be able to explain why an individual is in the database or not, according to additional characteristics” adds Stéphan Clémençon. For example, when assessing banking risk, algorithms use data on the people eligible for a loan at a particular bank to determine the borrower’s risk category. These algorithms do not look at applications by people who were refused a loan, who have not needed to borrow money or who obtained a loan in another bank. In particular, young people under 35 years old are systematically assessed as carrying a higher level of risk than their elders. Identifying these associated criteria would make it possible to correct the biases.
Controlling data also means looking at what researchers call “time drift”. By analyzing data over very short periods of time, an algorithm may not account for certain characteristics of the phenomenon being studied. It may also miss long-term trends. By limiting the duration of the study, it will not pick up on seasonal effects or breaks. However, some data must be analyzed on the fly as they are collected. In this case, when the time scale cannot be extended, it is essential to integrate equations describing potential developments in the phenomena analyzed, to compensate for the lack of data.
The difficult issue of equity in algorithms
Other than the possibility of using statistics, researchers are also looking at developing algorithmic equity. This means developing algorithms which meet equity criteria according to attributes protected under law such as ethnicity, gender or sexual orientation. As for statistical solutions, this means integrating constraints into the learning program. For example, it is possible to impose that the probability of a particular algorithmic result will be equal for all individuals belonging to a particular group. It is also possible to integrate independence between the result and a type of data, such as gender, income level, geographical location, etc.
But which equity rules should be adopted? For the controversial Parcoursup algorithm for higher education applications, several incompatibilities were raised. “Take the example of individual equity and group equity. If we consider only the criterion of individual equity, each student should have an equal chance at success. But this is incompatible with the criterion of group equity, which stipulates that admission rates should be equal for certain protected attributes, such as gender” says David Bounie. In other words, we cannot give an equal chance to all individuals regardless of their gender and, at the same time, apply criteria of gender equity. This example illustrates a concept familiar to researchers: the rules of equity contradict each other and are not universal. They depend on ethical and political values that are specific to individuals and societies.
There are complex, considerable challenges facing social acceptance of algorithms and AI. But it is essential to be able to look back through the algorithm’s decision chain in order to explain its results. “While this is perhaps not so important for film or music recommendations, it is an entirely different story for biometrics or medicine. Medical experts must be able to understand the results of an algorithm and refute them where necessary” says David Bounie. This has raised hopes of transparency in recent years, but is no more than wishful thinking. “The idea is to make algorithms public or restrict them in order to audit them for any potential difficulties” the researchers explain. However, these recommendations are likely to come up against trade secret and personal data ownership laws. Algorithms, like their data sets, remain fairly inaccessible. However, the need for transparency is fundamentally linked with that of responsibility. Algorithms amplify the biases that already exist in our societies. New approaches are required in order to track, identify and moderate them.
[1] The report (in French) Algorithms: bias, discrimination and equity was written by Patrice Bertail (University of Paris Nanterre), David Bounie, Stephan Clémençon and Patrick Waelbroeck (Télécom ParisTech), with the support of Fondation Abeona.
Article written for I’MTech by Anne-Sophie Boutaud
To learn more about this topic:
Ethics, an overlooked aspect of algorithms?
Ethical algorithms in health: a technological and societal challenge