Our environment plays a significant role in the prejudices we possess - our upbringing, the people we interact with, and the nature of what we learn from various media. One might assume then that Artificial Intelligence is impartial. After all, AI can bypass the possibility of unconscious biases, which are often prone in human beings. However, software inherently needs data. And all data is inherently biased. That's ridiculous, you may protest, but it's true. Data is bound by restrictions, whether on purpose or in ignorance. Thus, before we attempt to resolve discrimination or prejudice by Artificial Intelligence, it serves us well to understand the underlying reasons for the same. So,
What makes bias inevitable?
A classic example of AI bias is the screening and selection process algorithm that Amazon rolled out some years back. Though initially thought to help in their quest for diversity and inclusivity, it did just the opposite. The main reason behind this was that the data fed to the system came from previously successful candidate models over ten years, a significant part of which did not encompass much variety in terms of gender, race, or any other subset. Therefore, bias in software training or equipping AI with the ability to choose is quite predominant.
Bias nearly always exists definition-wise, mainly because fairness, impartiality, or equality is conflicted. Should you judge based on opportunity or outcome? (At this point, we're transitioning into ethical and political dilemmas that make the boundaries for our technological questions less obvious)
Let me give an example. Say you have two chocolates and two kids in front of you. Let's make it more difficult - both of them like chocolates. One of them, however, hasn't eaten anything for almost a day. At this point, would you be biased if you were to give both chocolates to the starving child? Would it be "fair" if both of them got a chocolate apiece? Or would it be discriminatory? Now, obviously, this is an overly simple example. Start putting race, religion, sex, culture, language, nationality, residence, history, appearance, and personality; it's a mess. Artificial intelligence should theoretically be immune to these faults, but that's not the case.
AI is hardly not based on machine learning systems, which means that data analysis and categorization take priority over manual programming. Of course, the former is incredibly more efficient, and we cannot claim the latter's any better technologically. It also ironically goes against the fundamental use of AI, i.e. to reduce human effort.
How do we prevent this bias or discrimination?
There are three main stages in the mitigation or correction of AI discrimination:
1) The "Training/Learning" Period:
Possibly the best window of opportunity for biases to creep in, this period has to be carefully scrutinized to ensure that counterfactual fairness exists. Now, for those who don't know what that means, it basically implies that for equivalent profiles, the outcome or the result should remain the same even for someone or something of a completely different demographic or subset. For example, would an Indian female with the same level of competency and expertise as a Caucasian male have the same probability of getting hired by the same company? To understand this concept, we'll have to forego thinking about reservations or affirmative action and the like for the moment. When the training material is just pure data, companies and developers can analyze AI to predict if it's impartial regardless of demographic subsets.
2) The Processing/Testing Period:
This period is the most important for understanding what subsets provide the actual demographic imbalance for the AI to learn. At this point, companies and developers focus on eliminating biases such as confirmation bias or funding bias, which are types of observer bias. Fundamentally, observer bias, as its name suggests, is based on the individual assigning some significance to different factors affecting the result. Though perhaps unintentional, the software developers or data scientists are at fault here either because of a preconceived notion about the outcome or an external sponsor factor that clouds objectivity.
3) The Post-Processing/Evaluation Period:
This period is markedly significant because of two reasons. Firstly, this is the period in which outcomes are determined before, during and even after release. Therefore, data scientists can understand and rectify errors in result orientation due to prejudiced software. Secondly, consumer interaction helps highlight any discrimination or marginalization that the AI possesses.
It's easy to think that getting rid of problematic categorization that induces bias is the best solution. For instance, let us consider that the gender option gets removed from a company's job selection process. There should be no way that gender can influence the outcome now, is there? Unfortunately, that's not the case. If the AI is capable enough to be used in recruitment, gender is but an explicit statement for the clues within the dataset that it can use for accurate gender prediction. As in the
case of the Amazon example and criminal justice systems, the training material has to be taken care of to prevent bias. There are also a couple of other ways to deal with AI discrimination:
1) A balanced training dataset or a significantly varied one is quite helpful in mitigating the risk
of any pre-purported bias.
2) Cross-validation, either by someone not directly involved with the software development or not affected by the outcome, is a powerful tool in ensuring that lapses of judgment made by data scientists or observer bias do not take place.
3) We can analyze the accuracy of outcomes for different demographic groups to see if any subset undergoes unfair treatment.
Finally, fairness itself needs to be understood, something beyond our ethical capacity to solve problems, at least right now. As we rest our thought processes and the judgments we were so long solely capable of on new shoulders, it's easy to be frightened of the path ahead. But the future need not be figured out, lest we lose our amazement in exploration.
Image by Freepik