Wikidata:WikiProject LD4 Wikidata Affinity Group/Affinity Group Calls/Meeting Notes/2020-08-11

From Wikidata
Jump to navigation Jump to search

Call details[edit]

  • Date: 2020-08-11
  • Topic: Scholia
  • Presenters: Daniel Mietchen and Lane Rasberry

Presentation material[edit]

Notes[edit]

Overview[edit]

  • Scholarly profiling system
  • Uses wikidata to find what wikidata has for that topic, person
  • Can find papers that authors have written
  • Pre-populates scholarly profiles based on Wikidata queries
  • Complements Wikipedia and can be used independently to browse academic literature
  • Only presents data loaded into Wikidata

Wikicite[edit]

  • Subset of wikidata that deals with source data
    • Zika corpus
    • Challenging to profile papers when have a huge volume
    • Curation workflow for papers
      • Topic tagging
        • Some communities interested in different subsets of tagging--ie. Maternal health or business or education
        • Visualize topic tags
          • Publication per year--find when become popular
          • Locations mentioned in paper
          • Locations of institutions associated with authors


Why Scholia matters[edit]

  • Uses and frees open data
  • Free and accessible on the web
  • Develops with Wiki platform
  • Anyone editing Wiki can contribute
  • Strong ethical foundation
  • Radically inclusionary and diverse
  • Scholia provides a useful service
    1. Orientation to academic literature
    2. For researchers and layman audiences
    3. Largest free and open FAIR option
    4. Native multilingual support
    5. Integration with Wikipedia?
  • Compare to Google Scholar
    • Not open data, no export data
    • Doesn’t encourage curation
  • Compare to Elsevier
    • Can export content, but can’t put on open web
    • Encourage curation, but labor goes toward their own product
  • Wikipedia
    • Design for end user--people’s best interest
    • Don’t need to worry about competitors and preventing reuse of data
    • Fundamental access to content should not be marketplace
      • Curation where marketplace should come in

Data and curation process[edit]

  • Much curation happens at the Wikidata level
  • Doesn’t have entirety of academic source metadata (yet)
  • Scholia can build author networks wherever there is a free and open identifier
  • Connecting authors of papers to where they got their degree to demonstrate impact of universities
    • Awards that they have won
  • Curation workflows
    • Topic tagging
    • Author disambiguation
      • Anyone can disambiguate as they see fit
      • Asking for list of faculty from every university in the world
    • Ontology development
      • WikiProject Lighthouses as example of how to do Wikidata
      • Documentation for developing Wikidata items around certain areas, ie. sports, can apply to other domains, ie. clinical trials
    • Subject affiliation

Questions/Discussion[edit]

  • No workflows entirely automatic
  • Can try Scholia on different document types, ie. Swedish parliamentarians--fine tuning author profile for this group
  • Scholia not aware of full-text--just metadata; can have annotations and links to supplementary data
  • Scholia hides complexity of creating SPARQL queries
    • Front end to Wikidata
    • Could use this concept for other areas
  • Documentation?
    • Kind of neglected
  • Documented workflows for repositories, dspace for instance?
    • Would welcome these
  • Could Scholia query info about books cited in Wikipedia
    • Book info would need to be in Wikidata, which currently does not have much
    • Patent corpus citing Wikipedia articles
      • Could be modeled in Wikidata
  • Is there any objection to the Internet Archive adding info about all the books we have added to (from Wikipedia articles to Wikidata)
    • Yes, interested
    • Would be loading info about books or volume scans?
      • Would try to load info for specific edition referenced
      • Have added links to 200,000 books so far--will load up Wikibase with 200,000 books
      • Would like to load up to 11 million
      • Will link to corresponding item in Wikidata
      • Daniel: Start with a few and see how it goes. Scale up and then see if anyone complains
  • Harvesting from individual repositories?
    • Harvesting from aggregators difficult
    • Depends on content license
    • Do harvest from PubMed central
  • Is there a way to represent theses and dissertations and their authors in Wikidata
    • Yes
    • Can use for seeding for future articles

Showcase

  • Other insights come out when data is loaded
  • Example: VanderBot by Vanderbilt University (see article link in announcements)
  • Privacy legislation may complicate matters, especially in Europe, for living scholars
  • Can also include awards
    • Go to institutions that grant the awards for lists
    • Can show who went to university and became office holder, actor, etc.
  • Gender distribution can be demonstrated if those tags are used
    • Social implications not fully determined, Scholia is being cautious on whole
    • Some users request removal of gender tag from their article
  • Imported Clinical Trials.gov to Wikidata
  • Questions
    • Is data clean up at ORCID necessary?
      • Make use of ORCID if useful, but not often the case and has lots of errors and duplicated
      • Can use LC Authority Names and VIAF for disambiguation
      • How to do disambiguation--Author Disambiguator Tool
        • Go to profile and add /missing to profile web address, you’ll be given link to places to contribute