User:PAC2/Gender diversity
Gender diversity in Wikipedia articles is a project about measuring gender diversity in Wikipedia articles.
Gender bias is a great challenge for the Wikipedia community. Humaniki provides a global overview about gender imbalance in Wikipedia's project[1]. The share of biographies by gender is a very useful statistic. But we need to go further. What if women are simply not cited in a general article ?
My idea is very simple : takes the list of articles cited in an article (aka blue links), if they concern people, get the gender using sex or gender (P21) and compute gender statistics for the whole article.
Interpretation may be difficult. No one knows the fair share of women in an article. However, when the share of women is really low, we may have forgotten some women and it's worth to have a look at the article and see if we can reduce gender imbalance.
Contributions are welcome. Leave a message on the talk page if you want to suggest any improvement.

Insights[edit]

- Gender diversity in academic disciplines in Wikipedia in French
- Gender diversity in academic disciplines in fr.wikipedia.org (September 2021) with visualisation and analysis
- Gender diversity in professions in Wikipedia in French
- Gender diversity in academic disciplines in Wikipedia in English
- Gender diversity in Wikipedia articles : Focus on social sciences in Wikipedia in English
- Gender diversity in Wikipedia articles : Focus on computer sciences in Wikipedia in English
Tools[edit]
- JavaScript tool to measure gender diversity in a Wikipedia article
- Template:Gender diversity index : a Wikimedia template in fr.wikipedia.org.
- Chouette : a script for mobile interface in fr.wikipedia.org which add a link to the gender diversity query at the bottom of the page.
- Rock your sidebox a script for desktop interface in fr.wikipedia.org which adds a link to the gender diversity query in the side panel.
Discussions[edit]
- January 2021: Mesurer la diversité de genre Les sans pagEs
- December 2021: Gender diversity in Wikipedia articles : another way to look at gender bias, Women in Red
- December 26, 2021: Mesurer la diversité de genre au sein des articles Wikipédia
- Measuring gender diversity at the article level, The Signpost, May 2022
Methodology[edit]
Measuring gender diversity using Wikimedia API and SPARQL query[edit]
In this section we explore several SPARQL queries to get gender diversity.
Those queries can be run directly in Jupyter notebook using PAWS.
List of all links with gender[edit]
SELECT ?item ?itemLabel ?gender ?genderLabel
WHERE {
SERVICE wikibase:mwapi {
bd:serviceParam wikibase:endpoint "en.wikipedia.org";
wikibase:api "Generator";
mwapi:generator "links";
mwapi:titles "Economics";.
?item wikibase:apiOutputItem mwapi:item.
}
FILTER BOUND (?item)
?item wdt:P31 wd:Q5 .
?item wdt:P21 ?gender .
?item rdfs:label ?itemLabel filter (lang(?itemLabel) = "en") .
?gender rdfs:label ?genderLabel filter (lang(?genderLabel) = "en") .
}
ORDER BY ?gender
Simple count[edit]
SELECT ?gender ?genderLabel (COUNT(*) AS ?count)
WHERE {
SERVICE wikibase:mwapi {
bd:serviceParam wikibase:endpoint "en.wikipedia.org";
wikibase:api "Generator";
mwapi:generator "links";
mwapi:titles "Economics";.
?item wikibase:apiOutputItem mwapi:item.
}
FILTER BOUND (?item)
?item wdt:P31 wd:Q5 .
?item wdt:P21 ?gender .
?gender rdfs:label ?genderLabel filter (lang(?genderLabel) = "en") .
}
GROUP BY ?gender ?genderLabel
ORDER BY DESC(?count)
[edit]
This query compute share of women, men, intersexual and non-binary people in the article "Economics". I group together "transgender male" with "male" and "transgender female" with "female".
Caveat : the ROUND function rounds to unity and not to decimal.
SELECT
(SUM(?female) AS ?count_females)
(SUM(?male) AS ?count_males)
(SUM(?nonbinary) AS ?count_nonbinary)
(SUM(?intersexual) AS ?count_intersexual)
(COUNT(*) AS ?count)
(ROUND(100 * ?count_females / ?count) AS ?share_females)
(ROUND(100 * ?count_males / ?count) AS ?share_males)
(ROUND(100 * ?count_nonbinary / ?count) AS ?share_nonbinary)
(ROUND(100 * ?count_intersexual / ?count) AS ?share_intersexual)
{
SERVICE wikibase:mwapi {
bd:serviceParam wikibase:endpoint "en.wikipedia.org";
wikibase:api "Generator";
mwapi:generator "links";
mwapi:titles "Economics";.
?item wikibase:apiOutputItem mwapi:item.
}
FILTER BOUND (?item)
?item wdt:P31 wd:Q5 .
?item wdt:P21 ?gender .
BIND(IF(?gender IN(wd:Q6581097, wd:Q2449503), 1, 0) AS ?male )
BIND(IF(?gender IN(wd:Q6581072, wd:Q1052281), 1, 0 ) AS ?female)
BIND(IF(?gender = wd:Q48270, 1, 0) AS ?nonbinary)
BIND(IF(?gender = wd:Q1097630, 1,0) AS ?intersexual)
}
Comparing different articles[edit]
# This query takes a list of article in Wikipedia, analyse the gender of all entities cited in the article and count the share of males, females and non binary.
# The goal is to measure gender diversity inside wikipedia articles
# Feedback and comments are welcome on my talk page User:PAC2
SELECT ?article
(SUM(?female) AS ?count_females)
(SUM(?male) AS ?count_males)
(SUM(?nonbinary) AS ?count_nonbinary)
(SUM(?intersexual) AS ?count_intersexual)
(COUNT(*) AS ?count)
(ROUND(100 * ?count_females / ?count) AS ?share_females)
(ROUND(100 * ?count_males / ?count) AS ?share_males)
(ROUND(100 * ?count_nonbinary / ?count) AS ?share_nonbinary)
(ROUND(100 * ?count_intersexual / ?count) AS ?share_intersexual)
{
VALUES ?article {
"Anthropology"
"Philosophy"
"Economics"
"Sociology"
"Demography"
}
SERVICE wikibase:mwapi {
bd:serviceParam wikibase:endpoint "en.wikipedia.org";
wikibase:api "Generator";
mwapi:generator "links";
mwapi:titles ?article;.
?item wikibase:apiOutputItem mwapi:item.
}
FILTER BOUND (?item)
?item wdt:P31 wd:Q5 .
?item wdt:P21 ?gender .
BIND(IF(?gender IN(wd:Q6581097, wd:Q2449503), 1, 0) AS ?male )
BIND(IF(?gender IN(wd:Q6581072, wd:Q1052281), 1, 0 ) AS ?female)
BIND(IF(?gender = wd:Q48270, 1, 0) AS ?nonbinary)
BIND(IF(?gender = wd:Q1097630, 1,0) AS ?intersexual)
}
GROUP BY ?article
Related work[edit]
Isaac (WMF) has developed something similar with a user script (w:User:Isaac (WMF)/link gender, w:User:Isaac (WMF)/link gender.js) which calls an API (https://article-gender-data.wmcloud.org/api/v1/out links-details).
Gendered News is a research project measuring gender diversity in French newspapers. It uses first names to compute the probability of having a male/female and returns gender counts at the article level.
OpenSexism has created the Wednesday index, a twitter thread which measures each wednesday gender diversity in 26 Wikipedia articles[3]. OpenSexism has published an article about the wednesday index[4].
Dsp13 has created a new wiki page to improve gendered citation statistics in Wikipedia : w:User:Dsp13/Gendered citation bias.
“If you start at any given article on Wikipedia, you're much less likely to eventually reach an article about a woman artist than you are about a male artist – and this was true for women across the board.[5]”
References[edit]
- ↑ https://whgi.wmflabs.org/
- ↑ https://public.paws.wmcloud.org/User:PAC2/Gender%20diversity%20in%20articles%20about%20occupations%20in%20Wikipedia%20in%20French.ipynb
- ↑ https://twitter.com/OpenSexism/status/1458841564818513926?t=ikCL4Vj_kI2m4M2UrkymAg&s=19
- ↑ https://link.medium.com/64GXFQdPfsb
- ↑ https://www.asc.upenn.edu/news-events/news/bridging-wikipedias-gender-gap-one-article-time
See also[edit]
- Biais de genre (gender bias), Wikiconference 2021