Wikidata:WikidataCon 2017/Submissions/From unstructured to structured data using machine learning

From Wikidata
Jump to navigation Jump to search

 This is an Open submission for WikidataCon 2017 that has not yet been reviewed by the members of the Program Committee.

Submission no. 25
Title of the submission
From unstructured to structured data using machine learning

Author(s) of the submission
Daniel Ecer
E-mail address
d.ecer@elifesciences.org
Country of origin
United Kingdom
Affiliation, if any (organisation, company etc.)
eLife Sciences Publications Ltd

Type of session
Talk
Length of session
45min
Ideal number of attendees
50-100

Abstract

A huge amount of unstructured data exists that is designed for human consumption but not machine readability. For example, in science, historic research is archived and the latest findings (preprints) are often distributed as PDF. Converting the unstructured PDF to a structured machine-readable format will substantially improve the reusability of this research.

eLife is a non-profit organisation that invests heavily in software development and collaboration so that the potential for improvements in the digital communication of new research can start to be realised. With this goal in mind, we recently started a project to convert the unstructured PDF to structured XML, with the aim to build on and improve the accuracy of existing conversion tools. This output could feed into Wikidata.

In this talk, we will provide an overview of the existing tools for unlocking data within unstructured scientific manuscripts and describe their limitations. We will then outline our promising computer vision approach to improving conversion accuracy, before calling on contributions and collaborations for this open-source project.

(This talk could adapt to different formats if preferred)

What will attendees take away from this session?
  1. Challenges of extraction from unstructured data
  2. Overview of existing tools
  3. How computer vision can help
  4. How you can contribute
Slides or further information

https://github.com/elifesciences/sciencebeam

Special requests

Interested attendees[edit]

If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest.

  1. ...