User:ProteinBoxBot/2016 CIViC sprint

From Wikidata
Jump to navigation Jump to search

Overall summary[edit]

This sprint is in preparation for the CIViC hackathon at NKI in Amsterdam Dec 1st & 2nd, The aim is to create a data model and do a first import into Wikidata of data from CIViC before the Hackathon.

Participants[edit]

Gameplan[edit]

  1. Identify the data model needed to capture records from CIViC in Wikidata Done
  2. Import all pubmed entries used as evidence in CIViC into Wikidata. Done
    1. using tools such as quick statements, Wikipedia Tools add-on in Google sheets and/or /sourcemd
    2. in close colaboration with Wikicite community.
  3. Select 2 more showcase variants to manually model in wikidata 'Done
  4. Write/Run Bot
  5. Do a full import on civic data from November.
  6. Have fun at the hackathons
    1. query
    2. integrate with other linked data sources
      1. Wikidata
      2. Other nodes on the semantic web
    3. remodel

Showcase variants[edit]

First modeling conclusions[edit]

  • 3 models for 3 clinical uses: diagnostic, prognostic, and predictive model. 6 different claims for 6 different predictors, i.e. clinically relevant variants for prediction: is a diagnosis predictor for disease presence, is a diagnosis predictor for disease absence, is a prognosis predictor for better outcome, is a prognosis predictor for worse outcome, is a treatment predictor for drug response, is a treatment predictor for drug resistance
  • Include the annotation on disease context using the Wikidata qualifier ‘medical condition treated (P2175)’
  • Represent dispute of a claim using PMID as source of evidence:
    • Use the WD property in the reference ‘stated in (P248)’ for supporting PMIDs
    • Use the WD qualifier ‘statement disputed by (P1310)’ for refuting PMIDs
  • Not include determination method for the evidence by now (otherwise, it will be described with the WD property ‘determination method (P459)’)
  • Not include multiple drugs as a new item until these kinds of therapies stabilized into a named entity like HAART therapy for HIV
  • Not include the drug interaction type annotation for drug combinations, unless the drug combination is a stable known treatment, by now, so not to add therapeutical context for ‘variant - [disease/drug]’ statements
  • Not to add mutational profile context, by now (it seems that currently there is not structured data on civic)
  • Include the annotation on variant type,(variant group not for now), variant origin (this last in the evidence annotation) using the Wikidata property: ‘instance of (P31)’
  • Include source of the claim using the WD property ‘reference URL’ and as object the URL to the variant summary, which contains all the claims and evidence for the variant, avoiding redundancy in this way.


List of properties used:[edit]

  • Existing properties used:
    • statement disputed by (P1310)
    • stated in (P248)
    • Reference URL (P854)
    • medical condition treated (P2175)

Apart from the model for the data and evidence representation, we will need to include:[edit]

  • Part of (for the gene association)
  • Chromosome
  • Genomic start
  • Genomic end
  • CIViC variant ID (P3329)
  • Instance of (variant type, e.g. missense, frameshift, etc.)
  • HGVS nomenclature (P3331)
  • Genomic assembly


Pending discussions[edit]

Accepted properties[edit]

CIViC Properties[edit]

Gene Summaries[edit]

CIViC property Wikidata property Comments
gene_id - Needs to be proposed as
gene_civic_url described at URL
name Item label
entrez_id NCBI gene ID
description - Wikidata can't store long descriptions

Variant Summaries[edit]

CIViC property Wikidata property Comments
variant_id - Needs to be proposed as CIViC id
variant_civic_url described at URL
gene part of Points to an existing wikidata item
entrez_id NCBI gene ID
variant Wikidata item
variant_type instance of manually added sequence ontology terms as wikidata Item
variant_groups possible at next model round?
chromosome chromosome needs a qualifier pointing to the genomic assembly
start genomic start needs a qualifier pointing to the genomic assembly
stop genomic end needs a qualifier pointing to the genomic assembly
reference_bases - Needs to be proposed and can point to A G T C or sequence of strings?
variant_bases - A G T C or sequence of strings?
representative_trans possibly at next model round?

Variant Group Summaries[edit]

CIViC property Wikidata property Comments
variant_group_id - next data model round?
variant_group_civic_url - next data model round?
variant_group - next data model round?
description - next data model round?

Evidence Summaries[edit]

chromosome || chromosome || needs a qualifier pointing to the genomic assembly chromosome2 || chromosome || needs a qualifier pointing to the genomic assembly
CIViC property Wikidata property Comments
gene Wikidata Item
entrez_id NCBI gene ID
variant Wikidata Item
disease Wikidata Item
doid Disease Ontology ID
drugs Wikidata Item [3]
evidence_type
evidence_direction See discussion
evidence_level determination method
clinical_significance
evidence_statement
pubmed_id Wikidata Item [4] or discuss with WikiCite for the best approach
citation - redundant. Should be compiled based on the pubmed_id
rating isn't the evidence level enough? Possibly next round?
evidence_status not needed
evidence_id - If it is not an internal id, we should propose it
variant_id tbp
gene_id - If it is not an internal id, we should propose it
start genomic start needs a qualifier pointing to the genomic assembly
stop genomic end needs a qualifier pointing to the genomic assembly
reference_bases - Needs to be proposed and can point to A G T C or sequence of strings?
variant_bases - A G T C or sequence of strings?
representative_transcript
start2 genomic start needs a qualifier pointing to the genomic assembly
stop2 genomic end needs a qualifier pointing to the genomic assembly
representative_transcript2
ensembl_version
reference_build genomic build
variant_summary text blob
variant_origin instance of Wikidata item: e.g. somatic mutation
evidence_civic_url described at URL
variant_civic_url described at URL
gene_civic_url described at URL

Discussion points[edit]

  • How to model statements with references that contradict a claim made.

Background info[edit]

Links[edit]

Showcase variants[edit]