Wikidata:WikiProject LD4 Wikidata Affinity Group/Affinity Group Calls/Meeting Notes/2023-01-10

From Wikidata
Jump to navigation Jump to search

Date of call[edit]

2023-01-10

Presenters/Topic[edit]

Egon Willighagen on SARS-CoV-2 queries

Presentation material[edit]

Relevant webpages: SARS-CoV-2 queries[edit]

Notes[edit]

Presentation[edit]

  • Wikidata Queries around the SARS-CoV-2 virus and pandemic by Egon Willighagen (Maastricht University)
    • During the start of the pandemic in March 2020, realized that the pandemic needed serious research. He was wondering what he could do in terms of his expertise for this pandemic.
  • What do we know about this virus and how can we support this now?
    • WikiPathways is a tool that could be used but still need to track and link to the literature.
    • Started writing a number of SPARQL queries to obtain an overview of what we know about this virus.
    • Started an eBook collecting a number of queries which served as an index for Egon.
    • Needed something where information was dynamic, so that’s another reason to use SPARQL querying.
    • TOC
      • Chapter 1: Introduction
      • Chapter 2: Viruses
      • Chapter 3: COVID-19
      • Chapter 4: The Pandemic [progression in countries]
      • Chapter 5: SARS-CoV-2
      • Chapter 6: Genomes, Genes, and Proteins
      • Chapter 7: The Human
      • Chapter 8: Towards a solution
      • Chapter 9: Literature
  • Scholia also does similar tasks by providing information through queries (Scholia won a mention in Wikimedia’s 2022 Coolest Tools Awards: https://diff.wikimedia.org/2022/12/23/2022-coolest-tool-awards-thank-you-for-the-tools/ ).
  • Showed a screenshot from today of recently published works on the topic.
  • Queries run dynamically in browser in Scholia. In contrast, ebook is updated via a tool that he often ran daily during height of the pandemic.
  • Figured out how to add new knowledge to Wikidata, such as proteins, and added identifiers to these. More information can be found in the article: “A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses” in BMC Biology (which also discusses shape expressions).
  • Within the SPARQL queries, he started ordering results to aid in comprehsibility.
  • One school of thought is that you should hide semantic side (such as listing syntax of queries), but he prefers to expose it in order to encourage further learning.
  • Displayed a curl command: you will actually get copy/paste command
  • Uses Markdown in GitHub to easily refer to queries.
  • Since 2009, he has been developing a concept which is loosely termed: Maintainable book.
    • He uses Groovy code to make sure relevant sections are updated.
  • Citations: Using Wikidata: Q number of articles: write code in Citation.js library.
  • Pre-processing: Has a lot of groovy scripts for java.
  • Because Wikidata is multi-lingual, can show label in a given translation (such as Japanese)
  • Keeping it synchronized between languages can be a challenge

Q&A[edit]

Q: Has the volunteer community still stayed strong: do you have all the data maintenance and quality needed?

A: He hasn’t been as active lately: felt that he was pretty much done with it after last summer. Is more difficult to contribute due to work obligations. What happened in the North of Italy was a wake-up call that COVID was not localized to one region of the world. Has the impression that the project has also slowed down for others. But he still sees activities in the queries, such as new articles.

Q: In terms of Open Science community, can you give a sketch about how that works? Such as: are there important institutions generating a lot of the data?

A: COVID-19 data portal is interesting: https://www.ebi.ac.uk/training/online/courses/covid-19-data-portal/ The data portal combined with patient data received a lot of attention.

Q: Can you share information about WikiPathways?

A: He proceeded to point out relevant points in the diagram found on this page: https://www.wikipathways.org/index.php/Pathway:WP4868

  • It provides a lot of information about what our immune system does in response to virus (mentioned example of cytokines that overloaded systems of some victims).
  • Identifiers of protein can be accessed in Wikidata and refer to other databases
  • More pathways at http://covid.wikipathways.org/

Q: What are some directions and features you were hoping to work on?

A: Still have a lot of questions around COVID-19, such as a section on long COVID.

  • Idea of translations was to share info, but only have 3 ½ translations: more would be preferable.
  • Learned a lot about how to respond quickly to new pandemic and efficiently gather information. What are the lessons learned to be more prepared for next time?

Everyone is agreeing that the nature of the crisis revealed that research should all be more open. But a lot of that research is too closed, not only about COVID but other diseases like malaria. Society would benefit more with open science.