Artificial Intelligence (AI) is increasingly influencing people’s opinion and behavior in daily life. Gender bias in AI, especially in Machine Translation has been a growing concern since the over-representation of men in the design of these technologies might gradually undermine decades of progress toward gender equality. Google Translate, the most up-to-date translation tool, has shown a strong tendency towards male defaults when translating between gender-neutral languages and English, in particular for fields typically associated to unbalanced gender distribution or stereotypes such as STEM (Science, Technology, Engineering and Mathematics) jobs as well as personality traits and looks.
In my project, the main goal is to investigate gender bias in state-of-the art models for translating between a language with gender neutral pronouns (Vietnamese) and English. I am developing a collection of test sentences to probe a translation model for gender bias. First, I will use this test set to evaluate Google Translate and a trained machine translation model (Helsinki-NLP). Then, I will use a state-of-the-art Neural Network approach to train my own English-Vietnamese translation model (PhoBERT) on a standard translation dataset, and I will evaluate this model using my test sentences. I will assess the gender bias in the translation dataset and use a bias mitigation technique that adds a copy of each sentence containing any gendered words where the gender has been swapped from male to female and vice versa. I will then retrain my model and evaluate it again. Possible comparisons will be between Google Translate, Helsinki-NLP and the two versions of PhoBERT models before and after swapping out gendered words to see if there’s gender bias when doing the translation task from English-Vietnamese.
The results of this project aim to demonstrate the need to augment current statistical translation tools with debiasing techniques. There is also the need to look further into using a bigger dataset with fewer stereotypes, which can be hard to achieve since language dataset always reflect its country’s social context.