Mikaela Pisani - How Women Need To Be Involved In Data Science To Prevent Bias In AlgorithmsApply to Speak

Automatic Summary

Understanding Bias in Data Science and the Need for Greater Diversity

Welcome! In today's post, we delve into the significant issue of bias in data science and explore the need for increased gender diversity in data science teams. Our guide is Mika Visan, the head of a data science team at RRP, a Uruguay-based custom software development agency. Mika is also the Managing Director of a non-profit organization, Gals in Tech (GS in Tech), that aims to nurture and encourage women in the tech field in Uruguay.

Data Science, Machine Learning, and Bias

Data science is an amalgamation of various skills. Understanding the business and the problem at hand are as crucial as knowing how to decipher the outputs from different algorithms. Meanwhile, machine learning in data science enables us to learn patterns from data and apply this knowledge to predict, detect opportunities or anomalies in data, or identify groups.

This process, however, can encounter bias. This bias refers to errors in the assumptions over the data. It's vital to note that both data and humans who develop these algorithms can be biased. Although removing bias entirely is impracticable, it is feasible to reduce it through careful consideration.

Recognizing Bias in Data Science

  • Algorithm bias: This is a type of bias that can appear in relation to specific data groups, often minority groups.
  • Gender and race bias: These biases can manifest in different forms. Examples of this can be seen in Google Translate's gender bias and GP T-3's language model.

Clearly, managing bias in data science matters because the applications of this tech govern decisions in several sectors, from job hiring, criminal justice, to healthcare.

Factors Influencing Bias

The two main bias-inducing culprits are biased data and human bias. Data might be incomplete, possess missing information, or be imbalanced, causing one population group to be overrepresented at the expense of another group. Similarly, human bias influences the way data is chosen, algorithms are developed, and the interpretation of algorithm outcomes.

Different types of cognitive bias that individuals must be aware of include:

  1. Availability bias: We tend to overestimate things that spring readily to mind.
  2. Anchoring and adjustment: We often benchmark and compare to a reference point.
  3. Representativeness: We rely on stereotypes.

Women are underrepresented in data and data science teams, representing only 22% within the field and 13% in research participation, accentuating the gender bias.

Reducing Bias

Although there is no definitive blueprint for mitigating bias, several strategies can be pieced together:

  • Understand and validate the data source.
  • Get more comprehensive data and balance your datasets.
  • Be aware of personal biases within your team.
  • Analyze your results based on categories and test with real people.
  • Create diverse teams to reduce potential bias.

Diversity in teams is tremendously beneficial – it paves the way to innovative solutions, increased performance, and results in more robust solutions.

By encouraging and integrating more women into data science teams, we can reduce bias, develop better solutions, and create a more inclusive tech community.

Feel welcome to ask questions and explore more about data science and bias. Remember, technology needs you just as you are! Let's collaborate to create more inclusive code and more successful solutions.

Thank you for your attention. Stay tuned for more enlightening discussions!


Video Transcription

Welcome everybody. I am Mika Visan. I am from Uruguay. Um I work at RRP as the head of the data science team. Uh RRP is a custom software development agency that combines creativity, uh technical expertise and process driving um to the to develop our products.Uh I, I am also K managing director of Gars in Tech in Uruguay. Um GS in Tech is a non profit organization that is focusing on encourage women in, in tech uh in Uruguay, we focus on Children and teenagers. Uh And today, I'm here to talk about uh about bias in data science and why we need more women in data science teams to prevent bias. So, um nowadays, more and more we are trusting or angering to automate our decisions. Um And this uh is um is a fact that uh we need to take care of that um about the bias in the algorithms. So, have you ever wondered that these decisions uh might affect certain group of people? Uh Have you ever wondered about the errors that algorithm has during this talk? Uh We are uh taking care of that. Uh Here, let's take an example. Uh we can see uh uh a recent example of Google translate uh where we uh translates one paragraph uh from a gender that doesn't have for uh from a language that doesn't have a gender specific.

Uh We translate it to English and the translator do their best uh to guess which is the gender associated. Uh Here, we can see how comfort can be. Um So during uh what, what it does doesn't mean that Google is wrong. I don't think so. Uh the issue here is that Google is making the best to get the gender uh based on the probability, what is more probable? And in the data, it says that uh it was uh generally uh more frequently that the men uh was the the one that study and uh is uh was clever than, and the woman was more uh associated with cleaning activities of the house and taking care of the Children. So it's the historical data that shows that um so in this talk, we are uh so that uh we are defining what is data science, what is bias in data science? And uh then what are the factors that influencing bias? Uh After all knowing what are the factors, what can we do to avoid the bias? And spec spec especially uh why team diversity can reduce bias. So what is data science? Data science is a mix of skills. It's not only about algorithms uh you need to uh to understand the business, to understand the problem that uh you are trying to resolve. You need statistical skills, to understand the algorithms and know how to configure them.

And then you need analytical skills to be able to communicate the outputs and to analyze the outputs of the algorithms. And what is machine learning. Um in data science, we use machine learning to learn from the data, learn patterns uh that led us to predict to the text opportunities or the tech uh anomalies in the data or funding groups. So now that we know what is data science and machine learning, what is bias? Um bias is about errors in the assumptions uh that we make over the data. So uh the algorithms learn from data, uh the data can be biased but also the humans uh that develop the algorithms can be biased. Uh Bias is something that uh we cannot completely remove. But we have uh to take into account certain conti considerations to reduce it. Uh So what is algorithm bias? Uh is a term that refers to the farmers uh in algorithms um uh related to certain group of the data generally. Uh It's uh around mi minority groups. Um that bias is present since minority groups are the ones that are not well represented in data. So bias can take different forms like gender bias, race bias, et cetera.

So now uh we are gonna take a look to different examples uh to make sense about the this issue and how harmful it can be to certain group of people. Uh So, uh in this example, again, in Google translate, I invite you to open Google translate and copy this phrase. Um uh In this example, we are gonna translate uh from English to Bengali that uh it doesn't have a gender specific in, in the language. So if we translate it again to English, we can see that the gender is changed. So what is happening here? Uh The translator uh is trying to do the best to uh guess which is the gender as we said before. Historical data says that it's more likely that a woman is the uh is the one that cooks and the man is the one that build the house go is trying to is fighting against bias and is trying to resolve this problem. In this example, we can see how go is uh resolving the problem for one sentence uh when we need to guess which is the standard instead of uh putting the the most uh likely uh like uh in the example before uh it just provide the two options, uh feminine and masculine uh for, for the translation of the sentence.

Another uh uh example of bias is in GP T three that is a language model that uh generates uh human like test. And in the paper open I A um says that uh this algorithm uh has issues related to bias. And they provide this example that was asked to uh complete the sentence he or she was very um and we can see the output of the most uh present adjectives. Uh for the women, we can see that the adjectives were more related to beautiful or gorgeous while the men were more diverse. Another bias that this uh mother presents um is shown in this figure. Uh The mother was asked to complete this sentence about Muslims. And we can see how harmful bias can be bias is not only present in text, it's also present in images. In this example, um We provide two big companies that have bias in in images. The first one was the the Twitter image cropping algorithm uh that bias um fire calcul people. Um and Google Photos have bias against a certain group of people uh in the the algorithm for tagging images. So uh why it's important to take into account these because nowadays we are taking decisions with algorithms and this can be harmful for certain group of people. Uh Machines are deciding if uh you are a candidate or not to a job offer.

Uh if you will uh if you are probable to commit another crime or not in a child or in in health care if you uh have to receive a medical treatment or not. So, uh this is these are real cases examples um that have violence. So now that we already know what is biased and why it is important uh to consider it. Uh We are gonna take a look what are the factors that generate bias. So first of all, the data is biased, the data, we only train our model with a set of data and this data might be incomplete or don't say the whole reality might have missing information or uh maybe it's imbalance, imbalance. It means that we have more information for one group of people than other. For example, for gender, we maybe have more information about men than women. Um The humans bias also humans, the ones that choose this data and the ones that develop the algorithms and uh understand or take conclusions from the outputs of the algorithms are biased. So we need to take it into account this when we are the ones that develop these algorithms or choose the people that are gonna develop, there are different types of uh bias. These are the most uh commons.

The ability bias is uh related to that we tend to overestimate entering an adjustment is um that we always compare to a reference point and representativeness is about the stereotypes. It means that we already have images or in our head about uh things. It it's OK. It's how our brain works, but we need to be aware of that. So for example, if I ask you think about a programmer, probably you will think uh that that is uh a man behind a computer with glasses. This is our stereotype about a programmer. So it doesn't mean that it's wrong. Uh There what is wrong is to think that all the programmers look like that. So the issue is when the stereotypes becomes the single story to tell. So women are misrepresented in data uh in, in data and also in data science teams. Um So we are missing uh their point, we are missing half of the population point. Let's take a look to some exam uh numbers. So in in the field, we are only 22% of women. Uh the researchers that participate in papers are 13% only and the ones that participates in conference 18. So these numbers are saying that it's true. We are very few women in the field. So what about just removing gender from data? That would be the solution too much simple. And uh unfortunately, it's not like that. Uh Chander is present in all other attributes in the data is hidden in the data.

So uh the algorithm might learn certain patterns about the data regarded to center and uh removing it from the data. It doesn't guarantee that our goings won't suffer from bias. So how can we do it? How can reduce bias? There isn't any law to follow. But there are some things that we can take into account. Uh First of all, understanding the provenance of the data, it's uh very important. So, validate uh the source of the data, then get more data uh to get um um more complete set uh balance your data. So make sure that you have enough information for each category. Uh be aware that you are biased. Uh Your team is biased. Uh So validate your assumptions or with the data, don't assume something because it's obvious just go to the data and validate it, um analyze your results.

So yeah, the output of the algorithm uh not only in overall error but also in each category. Um don't be happy when you get a a good accuracy or a good uh error. Uh with uh your test data instead of go outside and test with real people. Uh be careful how you show your, your conclusions and uh use diverse themes to reduce bias. So why diverse themes, the diversity in teams uh always in any area uh is a beneficial because um people can collaborate with each other uh take a different point of views and complement. Uh This leads to higher performance, more innovative solutions and uh build more robust solutions.

So with diversity, we can reduce our stereotypes. If we don't have diversity in themes, we were missing uh opportunities. So it also it's uh good to have a more diverse themes. And why in data science, data science is uh in every area in every field uh is the there is data and more and more data. Uh So whether you are in tech or not, whether you are a woman or not, it's very important uh to uh take into account the bias uh in data. So you can create uh more successful solutions. Uh So our mission is to create more inclusive code. Uh So who goes matters? Uh getting more women into the teams will reduce bias. Uh So uh tech techniques needs you just as you are. Uh I invite you to uh get interest in what the data science means and bias. I will leave you some recommendation, talks about this issue uh in my slides and I am open to any questions. No. Thank you very much for listening.