User:ProteinBoxBot/2016 CIViC sprint
Jump to navigation
Jump to search
Overall summary[edit]
This sprint is in preparation for the CIViC hackathon at NKI in Amsterdam Dec 1st & 2nd, The aim is to create a data model and do a first import into Wikidata of data from CIViC before the Hackathon.
Participants[edit]
Gameplan[edit]
- Identify the data model needed to capture records from CIViC in Wikidata Done
- Import all pubmed entries used as evidence in CIViC into Wikidata. Done
- using tools such as quick statements, Wikipedia Tools add-on in Google sheets and/or /sourcemd
- in close colaboration with Wikicite community.
- Select 2 more showcase variants to manually model in wikidata 'Done
- Write/Run Bot
- Do a full import on civic data from November.
- Have fun at the hackathons
- query
- integrate with other linked data sources
- Wikidata
- Other nodes on the semantic web
- remodel
Showcase variants[edit]
- V600E: Wikidata Wikipedia CIViC
- Rs1801133 |Wikidata Wikipedia dbSNP
- NM 000492.3(CFTR):c.1521 1523delCTT (p.Phe508delPhe) [1] [2]
First modeling conclusions[edit]
- 3 models for 3 clinical uses: diagnostic, prognostic, and predictive model. 6 different claims for 6 different predictors, i.e. clinically relevant variants for prediction: is a diagnosis predictor for disease presence, is a diagnosis predictor for disease absence, is a prognosis predictor for better outcome, is a prognosis predictor for worse outcome, is a treatment predictor for drug response, is a treatment predictor for drug resistance
- Include the annotation on disease context using the Wikidata qualifier ‘medical condition treated (P2175)’
- Represent dispute of a claim using PMID as source of evidence:
- Use the WD property in the reference ‘stated in (P248)’ for supporting PMIDs
- Use the WD qualifier ‘statement disputed by (P1310)’ for refuting PMIDs
- Not include determination method for the evidence by now (otherwise, it will be described with the WD property ‘determination method (P459)’)
- Not include multiple drugs as a new item until these kinds of therapies stabilized into a named entity like HAART therapy for HIV
- Not include the drug interaction type annotation for drug combinations, unless the drug combination is a stable known treatment, by now, so not to add therapeutical context for ‘variant - [disease/drug]’ statements
- Not to add mutational profile context, by now (it seems that currently there is not structured data on civic)
- Include the annotation on variant type,(variant group not for now), variant origin (this last in the evidence annotation) using the Wikidata property: ‘instance of (P31)’
- Include source of the claim using the WD property ‘reference URL’ and as object the URL to the variant summary, which contains all the claims and evidence for the variant, avoiding redundancy in this way.
List of properties used:[edit]
- Existing properties used:
- statement disputed by (P1310)
- stated in (P248)
- Reference URL (P854)
- medical condition treated (P2175)
Apart from the model for the data and evidence representation, we will need to include:[edit]
- Part of (for the gene association)
- Chromosome
- Genomic start
- Genomic end
- CIViC variant ID (P3329)
- Instance of (variant type, e.g. missense, frameshift, etc.)
- HGVS nomenclature (P3331)
- Genomic assembly
Pending discussions[edit]
- positive diagnostic predictor
- negative diagnostic predictor
- positive therapeutic predictor
- negative therapeutic predictor
- positive prognostic predictor
- negative prognostic predictor
Accepted properties[edit]
- Property:P3329 (Discussion: CIViC variant ID)
- Property:P3331 (Discussion: HGVS nomenclature)
CIViC Properties[edit]
Gene Summaries[edit]
CIViC property | Wikidata property | Comments |
---|---|---|
gene_id | - | Needs to be proposed as |
gene_civic_url | described at URL | |
name | Item label | |
entrez_id | NCBI gene ID | |
description | - | Wikidata can't store long descriptions |
Variant Summaries[edit]
CIViC property | Wikidata property | Comments |
---|---|---|
variant_id | - | Needs to be proposed as CIViC id |
variant_civic_url | described at URL | |
gene | part of | Points to an existing wikidata item |
entrez_id | NCBI gene ID | |
variant | Wikidata item | |
variant_type | instance of | manually added sequence ontology terms as wikidata Item |
variant_groups | possible at next model round? | |
chromosome | chromosome | needs a qualifier pointing to the genomic assembly |
start | genomic start | needs a qualifier pointing to the genomic assembly |
stop | genomic end | needs a qualifier pointing to the genomic assembly |
reference_bases | - | Needs to be proposed and can point to A G T C or sequence of strings? |
variant_bases | - | A G T C or sequence of strings? |
representative_trans | possibly at next model round? |
Variant Group Summaries[edit]
CIViC property | Wikidata property | Comments |
---|---|---|
variant_group_id | - | next data model round? |
variant_group_civic_url | - | next data model round? |
variant_group | - | next data model round? |
description | - | next data model round? |
Evidence Summaries[edit]
chromosome || chromosome || needs a qualifier pointing to the genomic assembly chromosome2 || chromosome || needs a qualifier pointing to the genomic assemblyCIViC property | Wikidata property | Comments |
---|---|---|
gene | Wikidata Item | |
entrez_id | NCBI gene ID | |
variant | Wikidata Item | |
disease | Wikidata Item | |
doid | Disease Ontology ID | |
drugs | Wikidata Item | [3] |
evidence_type | ||
evidence_direction | See discussion | |
evidence_level | determination method | |
clinical_significance | ||
evidence_statement | ||
pubmed_id | Wikidata Item | [4] or discuss with WikiCite for the best approach |
citation | - | redundant. Should be compiled based on the pubmed_id |
rating | isn't the evidence level enough? Possibly next round? | |
evidence_status | not needed | |
evidence_id | - | If it is not an internal id, we should propose it |
variant_id | tbp | |
gene_id | - | If it is not an internal id, we should propose it |
start | genomic start | needs a qualifier pointing to the genomic assembly |
stop | genomic end | needs a qualifier pointing to the genomic assembly |
reference_bases | - | Needs to be proposed and can point to A G T C or sequence of strings? |
variant_bases | - | A G T C or sequence of strings? |
representative_transcript | ||
start2 | genomic start | needs a qualifier pointing to the genomic assembly |
stop2 | genomic end | needs a qualifier pointing to the genomic assembly |
representative_transcript2 | ||
ensembl_version | ||
reference_build | genomic build | |
variant_summary | text blob | |
variant_origin | instance of | Wikidata item: e.g. somatic mutation |
evidence_civic_url | described at URL | |
variant_civic_url | described at URL | |
gene_civic_url | described at URL |
Discussion points[edit]
- How to model statements with references that contradict a claim made.
Background info[edit]
Links[edit]
- Data model: [5]
- Drugs mappings [6]
- Model variant: V600E
- api: [7]
- Data: Gene summeries [8] Gene summaries (Google doc)
- Variant summeries source Variant summaries (google doc)
- Variant groups Source
- Evidence: | Source Evidence summaries (Google doc)
- Data model discussions: [9] [10]
- Data model final discussion [11]