Jump to navigation Jump to search
Wikimedia projects have had challenges modeling gender. Wikidata has the particular challenge of modeling gender as structured data. This documentation page is WikiProject LGBT's guidance on the topic.
Current best guidance
- As of August 2019 no on in the Wikidata network has identified any authority more knowledgeable and insightful on the topic of modeling gender as structured data than the Wikidata community. Many people in the Wikidata community intuitively grasp the complexities and implications of this challenge, and through conversations in the Wikimedia network, no one has identified the academic article, professor, advocacy organization, community organization, or insightful commentator who is capable of articulating what a large number of Wikidata contributors already understand clearly.
- Since there is no external authority with the answers, the Wikidata community has to originate its own recommendations and guidance.
- The issue is complicated and various community groups have their own strong opinions. If anyone claims to have all the answers, or to speak for a certain authoritative organization, community, or demographic, then invite them to either share their published guidance or come to Wikidata talk pages to share their knowledge.
- The discourse on this subject is outlined here first. When someone publishes more papers then share them.
- This is Wikidata, so experiment with different models and try to document why each is useful. Experiment even if it seems wrong. Many people hesitate to model gender because it seems challenging or incorrect, but even incomplete or incorrect attempts are useful for discussion especially when documented and shared.
- Assume good faith and friendly collaboration...
- ...and everyone follow the meta:friend space policy and en:Wikipedia:English Wikipedia non-discrimination policy
Why modeling gender matters
Modeling gender matters for several reasons:
- Wikidata is currently the world's authority on the gender of individuals
- Practically all Internet users who seek gender information will receive and consume Wikidata content
- Wikidata has an extraordinary position of influence and popularity
- There are years of advocacy in Wikipedia seeking to develop and promote Wikimedia content related to gender issues. This is only possible when Wikidata has data about the gender of people profiled in Wikimedia projects. Programs reliant on gender data include the following
- Gender is the most popular personal yet public seeming detail which is a challenge to model in Wikidata. If we develop the discourse and guidelines for modeling gender, then we also get insight to model traits which we protect in the en:Wikipedia:English Wikipedia non-discrimination policy
- Modeling gender in Wikidata happens at scale now anyway
- Avoiding, ignoring, or denying this issue is not productive because Wikidata does gender modeling anyway at scale, globally, for every language and culture with more data and distribution than any other resource
- There is a status quo and either we develop that into a discourse or it proceeds organically
What's important in a model
- It should be possible to enter complex gender-identities into Wikidata
- Everybody should feel comfortable with the way their gender is modeled in Wikidata
- The way we model gender should be in line with our general data standards and the way our semantics work.
- For statistical purposes data-consumers want to be able to see statistics about our coverage in specific areas as they relate to the gender of the involved people. A data-consumer who uses a simple data-model of (male/female/other) should get valid answers. The same goes for data-consumers who want to automatically generate text based on Wikidata and care for the correct grammatical gender that they should use.
- Given that in many nonenglish languages grammatical gender is very important, we should make it easy to enter basic gender information even if we don't know much about the subject.
- Data-consumers like infoboxes that ask Wikidata for a truthy value should get a value that's not misleading (a truthy value returns the highest-ranked statement and strips qualifiers away).