Wikidata:WikiProject University of Notre Dame Libraries/researcherprofilesproject

From Wikidata
Jump to navigation Jump to search

Documentation for a project to develop business cases for representing the University of Notre Dame's research units and researchers in Wikidata. Related entities have added statement, "On focus list of Wikimedia project" (P5008) Q100997992


Sarah Kasten, Research & Development Analyst, Hesburgh Libraries, University of Notre Dame. User: Sarkasten

Project Goals[edit]

Goals for this project include:

  • Developing a familiarity with existing efforts to represent academic institutions, researchers and scholarly metadata in Wikidata
  • Understand processes for creating and enhancing Wikidata entries, from potential sources of data to efficient methods of editing Wikidata
  • Considering the effort it would take to operationalize participation in Wikidata, identify potential use cases and services that can be facilitated by Wikidata. Understand whether Wikidata is the most effective source for this data infrastructure
  • Explore new kinds of entities and relationships that can be expressed through Wikidata that are not typically represented in the scholarly metadata ecosystem but could support new services

The Business Case for Wikidata and Libraries[edit]

Data Infrastructure for Library Services[edit]

What can a new data infrastructure for information about our institution and researchers help deliver in terms of creating new services provided by the library? With an understanding of the effort required to post data to Wikidata, with manual, batch-load, and automated options, can we justify directing resources here versus other priority areas?

Semantic Citations for Institutional Websites[edit]

Problem statement: Faculty and research lab web pages routinely have flat, HTML lists of research publications, presumably maintained manually by staff/assistants within departments (and so frequently out of date). They do not incorporate semantic markup that could help build out the scholarly output knowledge graph and improve SEO for researcher-maintained web pages. These sites also only trend towards including scholarly output citations, not related "popular" presentations of their research from news/media/journalism world, which are sometimes mentioned in other areas of their personal sites. Also, the Research Lab is an important unit for discovering scientific work but there is no data infrastructure to tie individuals and research output to this organizational structure.

Wikidata-based solution: With curation of an organization's research output and faculty, would need to develop a process to query Wikidata, export the data, convert to HTML and semantic markup that can be embedded into faculty web pages.

Research Analytics to support Research Administration[edit]

Problem statement: Can high quality data from the multi-faceted Wikicite initiative be harnessed to benefit a locally-built research analytics service?

Wikidata-based solution:

  • Create entries for institutional org units
  • Create or enhance records for individual researchers
  • Use the Author Disambiguator Tool to resolve author name string statements in scholarly article entries
  • Develop process for exporting data and integrating it with the local system?

Research Analytics to support Library Collection Strategy[edit]

Problem statement: Data on institutional publishing can inform a library's approach to negotiating licenses for ejournal content. As more libraries pursue transformative agreements, specifically understanding open access publishing trends is useful as well.

Wikidata-based solution: In addition to loading scholarly article citations to Wikidata, and managing author and organizational entries in Wikidata, would need attention to the following:

  • Ensure publisher statements in academic journal entries in Wikidata, so that scholarly output can be organized in a way that can be useful for publisher negotiations
  • Ensure main subject statements in academic journal entries in Wikidata, so that scholarly output and publishing patterns can be correlated with subject-based collection funds
  • Article level data on whether an article open access

Feed Research Citations into Library Web Services[edit]

Problem statement: With an effort to build a user-centric library website, with personalization options available within the "My Account" functionalities, could a feed of relevant research citations be a useful research discovery mechanism and further enhance use of the library website features?

Wikidata-based solution:

Linked Open Data Facilitating SEO[edit]

Problem statement:

Wikidata-based solution:

Data Models[edit]

Excellent work modeling academic institutions and individual researchers has been done through Stanford University Library and WikiProject:Universities. Please check out their work! The sections below refer to areas where new entities or relationships could be created in Wikidata to represent additional ways that research is organized administratively and socially.

Interdisciplinary Research Institutes & Centers within Universities[edit]

Example: Mike and Josie Harper Cancer Research Institute

Cross-Institutional Organizations[edit]

Example: Yellowstone to Yukon Conservation Initiative

Example: Long Term Ecological Research Network

Research Labs[edit]

Example: Archie Lab

Collaborations and Relationships among Researchers[edit]

  • Author Disambiguator Tool & Scholia Workflow
    • Institutional focus of this work is blurred quickly when trying to organize work based on institutional affiliation
    • Scholia missing pages help identify areas where the knowledge graph around an individual researcher needs work. Example: Jason S. McLachlan (ND Biological Sciences)
  • Student/Advisor Fields in Wikidata items
  • Relationships best represented through Queries
    • Frequent Collaborators
    • Colleagues, classmates

Notable depictions of research, researchers in popular works[edit]

Workflow for Data Curation, Entry and Management[edit]

See: [Results and Notes from using Scholia and the Author Disambiguator Tool] related to Harper Cancer Research Institute Affiliates and coverage of their scholarly works in Wikidata.

Faculty/Researcher Profiles[edit]

  • Manual adds and enhancements of researcher profiles
    • P5008 tag
    • Listeria lists
    • De-duplication
  • Less manual options
    • OpenRefine integration
    • QuickStatements
  • Considerations around existing loads of data from multiple sources (Orcid, VIAF, award-granting bodies, etc.)
  • Author Disambiguator Tool workflow for author name string to author property conversion on article records

Organizational Units[edit]

  • Bot added ND Departments from ISNI in Fall 2020. Needs to enhance entries there?
  • Research Institutes
  • Research Labs

Curating Scholarly Articles in Wikidata[edit]

  • The current state of article data loads
  • Workflow Approaches
    • Curating citations based on subject area or topic (based on keyword or controlled vocabularies in source databases).
    • Curating citations based on author affiliation data in source database.
  • Workflow tools
    • Zotero
    • CSV/OpenRefine
    • QuickStatements
    • Bots
  • Author Disambiguator Tool
  • Mapping controlled vocabularies from outside sources to Wikidata

Lists of University Entities in Wikidata[edit]

Employees

"Parent Organization" and "Part of" search