Wikidata:WikiProject Source MetaData/Researchers in Switzerland

From Wikidata
Jump to navigation Jump to search

This is a side project financed by WMCH during the activities of Wiki Science Competition 2021.

Its goal is to speed up the creation of items related to post-docs, qualified technicians, researchers and professors active in the universities, laboratories, hospitals and research centers of Switzerland (and Liechtenstein). Only people active in technical and scientific fields are considered for this project, although boundaries can be "gray" sometimes.

This is a massive creation so the goal is not to create complete and refined items, but to make 1000s of them available and clear enough to be used later in the normal workflow of Wikidata (basic information of affiliation, gender, birth date, 1-2 external IDs). If the items are already present, they are improved but the main goal is to "fill the gap". Similarly, connection of items of researchers to their publications is not the goal, but future users will find this task much simpler.

It can be considered the second attempt of a comprehensive coverage of the researchers of one country after the example of Italy

Steps[edit]

  • 2021-11-08: source Starting Uni Lichtenstein - Institut für Architektur und Raum­entwicklung ✓ Done tweet
  • 2021-11-12: source Starting CIES-UniNE ✓ Done tweet
  • 2021-11-12: Starting KSSG  Doing…
  • 2021-11-13: source Starting EPFL  Doing…
  • 2021-11-13: source UniLichtenstein - Institut für Wirtschaftsinformatik ✓ Done tweet
  • 2021-11-13: Starting FIBL Switzerland  Doing…
  • 2021-11-14: civil servants of canton Jura (cantonal library, Jurassica) ✓ Done
  • 2021-11-14: source Musée cantonal de géologie-UNIL ✓ Done tweet
  • 2021-11-14: source Société Vaudoise des Sciences Naturelles ✓ Done tweet
  • 2021-11-14: source civil servants of canton Valais (SEN, SCAV...) ✓ Done
  • 2021-11-15: source UNINE Faculté des Sciences  Doing…
  • 2021-11-16: source Uni Sankt Gallen ICS-HSG Institute of Computer Science ✓ Done tweet
  • 2021-11-17: source CJBG ✓ Done tweet
  • 2021-11-17: source Vetsuisse/Tierspital Uni Zuerich  Doing…
  • 2021-11-18: [] Vetsuisse Uni Bern  Doing…
  • 2021-11-18: source BCPM Berne  Doing…
  • 2021-11-18: civil servants of canton Neuchatel ✓ Done
  • 2021-11-18: source IRB-USI  Doing…
  • 2021-11-19: source Platform Geosciences SCNAT  Doing…
  • 2021-11-19: UniGe  Doing…
  • 2021-11-20: ISPM UniBe  Doing…
  • 2021-11-20: source UZH Physik-Institut  Doing…
  • 2021-11-24: BEJUNE  Doing…
  • 2021-11-25: EMPA  Doing…
  • 2021-11-25: Ecotoxcentre  Doing…
  • 2021-11-26: EAWAG  Doing…
  • 2022-11-16: source IRSOL via OpenRefine ✓ Done tweet
  • 2022-11-17: source members of SCRS via OpenRefine ✓ Done tweet
  • 2022--11-17: source academic staff IDSIA via Open Refine ✓ Done tweet
  • 2022-11-21: source USI-IPH via Open Refine ✓ Done tweet

Statistics[edit]

Comparison[edit]

Background level and distribution of items at the beginning and during the project

  • beginning: 2021-11-10.
People with affiliation 1114980 in total in the world
People with sex/given name/surname/at least 1 ID/affiliation to some institution in Switzerland or Liechtenstein: 6235 (M:5303 F:930 X:2)
People with sex/at least 2 ID/affiliation to some institution in Switzerland or Liechtenstein: 10012 (M:8253 F:1757 X:2)
  • update: 2021-11-26.
People with affiliation 1117581 in total in the world
People with sex/given name/surname/at least 1 ID/affiliation to some institution in Switzerland or Liechtenstein: 6347 (M:5390 F:955 X:2)
People with sex/at least 2 ID/affiliation to some institution in Switzerland or Liechtenstein: 10282 (M:8450 F:1830 X:2)

Project metrics[edit]

people with affiliation (P108) related to Switzerland or Liechtenstein
Date Total M F X unassigned created by us[1]
2021-11-10 17401 9123 2128 2 6148 3
2021-11-26 17535 9324 2207 2 6002 148+5-3-1=149

Other related edits[edit]

Items of institutions created during the project[edit]

Items of institutions improved during the project[edit]

The improvement is mostly aliases to improve future reconciliation. Alternative names are extracted from sources on-line and IDs

Some similar names refined or created on the way[edit]

Mistakes found in external archives[edit]

Some similar names refined or created on the way[edit]

Future work[edit]

This might be useful to improve the usability of the data (in-depth quality and minimal confusion)

Messy general situations with common names[edit]

  • Lisa Zimmermann
  • Jochen Müller example
  • Kimberly Garcia example
  • Laurent Gautier example example
  • Andres Gomez
  • Philippe Clerc example
  • Sanjiv Jha example
  • Simone Meyer example
  • Christian Hildebrand
  • Stephan Huber example
  • Christoph Renner
  • Ulrich Krieger [1]
  • Rudolf Schmitt example example
  • Sebastian Wolf
  • Philippe Michel
  • Katharina Müller
  • Yuta Takahashi
  • David Roland example
  • Maria Grazia Rossi
  • Eszter Simon example
  • Martin Schmid
  • Stefan Graf
  • Oliver Steiner example
  • Marc Suter example
  • Timothy Alexander example
  • Elisabetta Rossi [2]
  • Philipp Bachmann example
  • Alexander Schenck example
  • Christoph Sommer [3]
  • Bernd Zimmermann [4]
  • Oskar Steiner
  • Ulrich Hamann [5]
  • Denis Jordan [6]
  • Konrad Schindler
  • David Huber
  • Joseph Cornelius
  • Fabio Rinaldi
  • Alessandro Giusti
  • Anna Messina

People to be created to avoid similar names[edit]

Currenty only two people in total

Missing given names[edit]

  • Najla
  • Dimche
  • Urs-Beat
  • Rolphe
  • Anne-Linda

Missing family names[edit]

  • Hemati
  • Vachtsevanou
  • Raetzo
  • Hosi
  • Maskarinec
  • Leontsinis
  • Cabalzar
  • Buetler/Bütler
  • Zurwerra
  • Bonesana
  • Danani
  • Sapozhnik
  • Stenflo
  • Cannelle
  • Hajnsek
  • Gressin
  • Bonesana
  • Derboni
  • Zaffalon
  • Sharygina
  • Calciolari
  • Bezani
  • Morese
  • Fiordelli

}

Interesting IDs to be proposed[edit]


Comments[edit]

  • The problem of similar names is much stronger in the German-speaking world than in other areas. This scenario will take a while to clean up. The core groups in the Swiss academia are names in German, French, Italian and English (since it is international and western-leaning, the main group not involved in a Swiss national language it's them). Other areas are active and they can be tricky (e.g. Spanish names) but statistically these are the 4 main groups to address. Now...
Anglophones are often in external archives which are more precise on the issue of homonyms and use of middle names, probably because of some ongoing and recurring needs in the past; they already faced it and they reasonably deal with it in many international databases.
Francophones rely mostly on French centralized archive that are more or less focused, with limited mistakes. Maybe there are missing information, but not huge mistakes.
People with Italian names rely on a big community of volunteers active in Italy, so information on external databases might be problematic but Wikidata items are quite good and they are a driving force for improvement.
So the main problem before getting an efficient use of Wikidata items of researchers in Switzerland is probably the maintenance of common recurring German names; clearly the lack of care in certain "areas" of Germany and Austria is not helping at the moment and if Switzerland wants to use the database, some additional maintenance on the other side of the border might be required, at least at the beginning. For social and human science, the problem is probably bigger.
  • We tried to focus on the technical and scientific fields. So far, the areas of particle physics, architecture, informatics and computer science, geology are those requiring to fill more gaps. Also, some medical profiles that stopped publishing in the early 2000s could show gaps on Wikidata despite having external IDs. That's why we have started with those areas. They require more careful manual insertion. Biologists and chemists for example can be quickly checked with some massive OpenRefine import.

Future developments[edit]

Based on IDs to be created, future literacy events can be created in cooperation with Swiss Universities and learned institutions, if interested. The goal is to keep up-to-date the database, make the ID coverage more robust for the existing items, and fill gaps outside the technical and scientific fields.

Queries[edit]

#title:List of researchers with affiliation to institutions in Switzerland or Liechtenstein
SELECT DISTINCT ?person
WHERE {
  ?person wdt:P108 [ wdt:P17 ?country ] .
  VALUES ?country { wd:Q39 wd:Q347 } .
}
List of researchers with affiliation to institutions in Switzerland or Liechtenstein
#title:List of researchers with fundamental data and affiliation to institutions in Switzerland or Liechtenstein
SELECT DISTINCT ?person
WHERE {
  ?person wdt:P21 [ ] .
  ?person wdt:P735 [ ] .
  ?person wdt:P734 [ ] .
  ?person wdt:P108 [ wdt:P17 ?country ] . VALUES ?country { wd:Q39 wd:Q347 } .
  ?person wikibase:identifiers ?n . FILTER(?n > 0)
}
List of researchers with fundamental data and affiliation to institutions in Switzerland or Liechtenstein
#title:List of researchers with fewer fundamental data and affiliation to institutions in Switzerland or Liechtenstein
SELECT DISTINCT ?person
WHERE {
  ?person wdt:P21 [ ] .
  ?person wdt:P108 [ wdt:P17 ?country ] . VALUES ?country { wd:Q39 wd:Q347 } .
  ?person wikibase:identifiers ?n . FILTER(?n > 1)
}
List of researchers with fewer fundamental data and affiliation to institutions in Switzerland or Liechtenstein
#title:Number of researchers with affiliation to institutions in Switzerland or Liechtenstein by gender
SELECT ?gender ?genderLabel (COUNT(DISTINCT ?person) AS ?number)
WHERE {
  ?person wdt:P108 [ wdt:P17 ?country ] .
  ?person wdt:P21 ?gender .
  VALUES ?country { wd:Q39 wd:Q347 } .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
GROUP BY ?gender ?genderLabel ?number
Number of researchers with affiliation to institutions in Switzerland or Liechtenstein by gender
#title:Number of researchers with fundamental data and affiliation to institutions in Switzerland or Liechtenstein by gender
SELECT ?gender ?genderLabel (COUNT(DISTINCT ?person) AS ?number)
WHERE {
  ?person wdt:P21 ?gender .
  ?person wdt:P735 [ ] .
  ?person wdt:P734 [ ] .
  ?person wdt:P108 [ wdt:P17 ?country ] . VALUES ?country { wd:Q39 wd:Q347 } .
  ?person wikibase:identifiers ?n . FILTER(?n > 0)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
GROUP BY ?gender ?genderLabel ?number
Number of researchers with fundamental data and affiliation to institutions in Switzerland or Liechtenstein by gender
#title:Number of researchers with fundamental data and affiliation to institutions in Switzerland or Liechtenstein by gender
SELECT ?gender ?genderLabel (COUNT(DISTINCT ?person) AS ?number)
WHERE {
  ?person wdt:P21 ?gender .
  ?person wdt:P735 [ ] .
  ?person wdt:P734 [ ] .
  ?person wdt:P108 [ wdt:P17 ?country ] . VALUES ?country { wd:Q39 wd:Q347 } .
  ?person wikibase:identifiers ?n . FILTER(?n > 0)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
GROUP BY ?gender ?genderLabel ?number
Number of researchers with fundamental data and affiliation to institutions in Switzerland or Liechtenstein by gender
#title:Number of researchers with fewer fundamental data and affiliation to institutions in Switzerland or Liechtenstein by gender
SELECT ?gender ?genderLabel (COUNT(DISTINCT ?person) AS ?number)
WHERE {
  ?person wdt:P21 ?gender .
  ?person wdt:P108 [ wdt:P17 ?country ] . VALUES ?country { wd:Q39 wd:Q347 } .
  ?person wikibase:identifiers ?n . FILTER(?n > 1)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
GROUP BY ?gender ?genderLabel ?number
Number of researchers with fewer fundamental data and affiliation to institutions in Switzerland or Liechtenstein by gender

References[edit]

  1. metrics and some edits on another account