Wikidata:WikiProject PersonalData/Jupyter wikidata

From Wikidata
Jump to navigation Jump to search

Context[edit]

  • Wikidata is all about federated production of data.
  • Jupyter has progressively focused on federated production of data analysis pipelines.

In the context of research:

This project aims to work on a better integration between the two.

Past work[edit]

Unsuccessful "Wikidata for research" grant application[edit]

In 2014-2015, several members of the Wikidata community | jointly applied to a EU infrastructure call. They published their proposal, whose Wikidata entry has the list of authors, affiliations, and Wikipedia handles.

Successful OpenDreamKit application[edit]

On the same call, a bunch of mathematicians and some computer scientists applied and got funded for the OpenDreamKit project. The project proposal is visible here. Work Package 6, in particular, concerns the structuring of Data, Knowledge and Software through explicit semantics, for more efficient science. See the publication (Arxiv) , which also has a Wikidata entry Interoperability in the OpenDreamKit Project: The Math-in-the-Middle Approach (Q57389301).

A substantial part of OpenDreamKit concerns Jupyter.

PAWS[edit]

Wikimedia itself has identified the usefulness of Jupyer, and launched PAWS. The motivation is described here.

See also:

OpenHumans[edit]

Tim Head of WildTreeTech has worked with OpenHumans to implement Personal Data Notebooks, that help individuals deploy Jupyter notebooks on their own data.

A growing collection of notebooks is here.

Possible future work[edit]

BOSSEE[edit]

As a follow up to OpenDreamKit, a project proposal is currently being written for a new infrastructure call. The project is called ("BOSSEE") and centered around Jupyter. The brainstorm is here. At the moment BOSSEE involves mostly previous participants to OpenDreamKit, but also Tim Head of WildTreeTech,

PersonalData.IO[edit]

A lot of open science is concerned with reproducible data analysis pipelines. This often means that the analysis pipeline can be redeployed in completely different contexts. One of the participants in OpenDreamKit, Paul-Olivier Dehaye, has gone one to found a nonprofit, PersonalData.IO, focused on personal data empowerment. He is also on the board of MyData, a global organisation focused on those topics as well (so is Mad Ball, who is leading OpenHumans). He thinks that these analysis pipelines (and associated infrastructure) can be useful for redistributing power in the personal data economy, and that Wikidata and Jupyter are part of a solution (they offer possibilities for federation of data, and analysis).

Ideas[edit]

Any step that would contribute to a more flexible integration of Wikidata and Jupyter.

Concretely:

  • congruent deployment of integrated Wikidata and Jupyter - a la PAWS -, at all scales (Wikidata + Jupyter in entreprise, and eventually for the individual)
  • modeling of a federated wikidata for State in a Jupyter notebook (useful when the notebook is deployed through binder, for instance)
  • modeling of a federated Jupyter for data processing operations as a wikidata-formalized workflow.
  • modular notebook generation based on data stored in Wikidata (first cell from here, second cell from there, etc; for instance data cleaning, the core data use, and the data reshare)
  • modularized binder along the same lines
  • possible integration of Solid with Jupyter (as an add-on)