User:ProteinBoxBot/2015 clinvar sprint

From Wikidata
Jump to: navigation, search

Overall sumary[edit]

In a small sprint we modeled a clinker entry, with the object to import Clinvar as wikidata entries, to facilitate gene-variant-disease disease association.

Participants[edit]

Wikidata template[edit]

\-- geneX hasDeletionAssociation DiseaseY
                    \--> Reference to Variant
                                             \-- > Variant causalVariantForDisease DiseaseY

Gameplan[edit]

  1. Inspect the example entry below, to identify data elements to be added by the bot
  2. Inspect the different variant types defined by Clinvar and consider its relevance for our property structure
  3. List required properties
  4. Write/Run Bot

Candidate Properties[edit]

Potential new properties[edit]

  • Clinvar accession number

Discussion points[edit]

Choice for cause of (P1542)[edit]

We choose to model a variant disease relationship with the "cause of" property. Although we know it is too broad. However if used in combination with "Instance of deletion mutation (Q19888172 Property:P31Q656732" and "Instance of genetic variation (Q19888172 Property:P31Q349856", we able to extract the variant-disease relationships.

Dealing with changed proteins[edit]

Having to deal with the changed proteins and how they will be represented in wikidata adds a complexity to the process that we would like to postpone on that.

Background info[edit]

Links[edit]

Variant types defined in Clinvar[edit]

  • confers resistance
  • confers resistance
  • confers sensitivity
  • variant to named protein
  • variation in modifier gene to disease
  • variation to disease
  • variation to included disease

Example entry[edit]

 <Measure Type="Deletion" ID="22144">
       <Name>
         <ElementValue Type="Preferred">NM_000492.3(CFTR):c.1521_1523delCTT (p.Phe508delPhe)</ElementValue>
       </Name>
       <Name>
         <ElementValue Type="Alternate">deltaF508</ElementValue>
       </Name>
       <Name>
         <ElementValue Type="Alternate">DF508</ElementValue>
       </Name>
       <AttributeSet>
         <Attribute Type="HGVS, coding, RefSeq" Change="c.1521_1523delCTT">NM_000492.3:c.1521_1523delCTT</Attribute>
         <XRef ID="F508del" DB="CFTR2"/>
       </AttributeSet>
       <AttributeSet>
         <Attribute Type="HGVS, genomic, RefSeqGene" Change="g.98809_98811delCTT">NG_016465.3:g.98809_98811delCTT</Attribute>
       </AttributeSet>
       <AttributeSet>
         <Attribute Type="HGVS, genomic, top level" Change="g.117559592_117559594delCTT" integerValue="38">NC_000007.14:g.117559592_117559594delCTT</Attribute>
       </AttributeSet>
       <AttributeSet>
         <Attribute Type="HGVS, genomic, top level, previous" Change="g.117199646_117199648delCTT" integerValue="37">NC_000007.13:g.117199646_117199648delCTT</Attribute>
       </AttributeSet>
       <AttributeSet>
         <Attribute Type="HGVS, previous" Change="g.84630_84632delCTT">NG_016465.1:g.84630_84632delCTT</Attribute>
       </AttributeSet>
       <AttributeSet>
         <Attribute Type="HGVS, protein" Change="p.Phe508delPhe">NP_000483.3:p.Phe508delPhe</Attribute>
         <XRef Type="rs" ID="113993960 " DB="dbSNP"/>
       </AttributeSet>
       <AttributeSet>
         <Attribute Type="HGVS, protein, RefSeq" Change="p.Phe508del">NP_000483.3:p.Phe508del</Attribute>
       </AttributeSet>
       <AttributeSet>
         <Attribute Type="Location">NM_000492.3:Ex11</Attribute>
         <XRef ID="CD890142" DB="HGMD"/>
       </AttributeSet>
       <AttributeSet>
         <Attribute Type="Location">NM_000492.3:EXON 11</Attribute>
       </AttributeSet>
       <AttributeSet>
         <Attribute Type="MolecularConsequence">inframe_variant</Attribute>
         <XRef ID="SO:0001650" DB="Sequence Ontology"/>
         <XRef ID="NM_000492.3:c.1521_1523delCTT" DB="RefSeq"/>
       </AttributeSet>
       <AttributeSet>
         <Attribute Type="ProteinChange1LetterCode">F508del</Attribute>
         <XRef Type="Allelic variant" ID="602421.0001" DB="OMIM"/>
         <XRef ID="F508del" DB="CFTR2"/>
       </AttributeSet>
       <AttributeSet>
         <Attribute Type="ProteinChange1LetterCode">F508delF</Attribute>
       </AttributeSet>
       <AttributeSet>
         <Attribute Type="ProteinChange3LetterCode">PHE508DEL</Attribute>
         <XRef Type="Allelic variant" ID="602421.0001" DB="OMIM"/>
       </AttributeSet>
       <CytogeneticLocation>7q31.2</CytogeneticLocation>
       <SequenceLocation Assembly="GRCh37" AssemblyAccessionVersion="GCF_000001405.25" AssemblyStatus="previous" Chr="7" Accession="NC_000007.13" start="1171.....>
       <SequenceLocation Assembly="GRCh38" AssemblyAccessionVersion="GCF_000001405.26" AssemblyStatus="current" Chr="7" Accession="NC_000007.14" start="1175.......>
       <MeasureRelationship Type="variant in gene">
         <Name>
           <ElementValue Type="Preferred">cystic fibrosis transmembrane conductance regulator (ATP-binding cassette sub-family C, member 7)</ElementValue>
         </Name>
         <Symbol>
           <ElementValue Type="Preferred">CFTR</ElementValue>
         </Symbol>
         <SequenceLocation Assembly="GRCh37" AssemblyAccessionVersion="GCF_000001405.25" AssemblyStatus="previous" Chr="7" Accession="NC_000007.13" start="....../>
         <SequenceLocation Assembly="GRCh38" AssemblyAccessionVersion="GCF_000001405.26" AssemblyStatus="current" Chr="7" Accession="NC_000007.14" start="....../>
         <XRef ID="1080" DB="Gene"/>
         <XRef Type="MIM" ID="602421" DB="OMIM"/>
       </MeasureRelationship>
       <Citation Type="practice guideline" Abbrev="ACMG, 2004">
         <ID Source="pmc">3110945</ID>
       </Citation>
       <Citation Type="practice guideline" Abbrev="ACMG/ACOG, 2001">
         <ID Source="PubMed">11280952</ID>
       </Citation>
       <Citation Type="practice guideline" Abbrev="CPIC, 2014">
         <ID Source="PubMed">24598717</ID>
       </Citation>
       <XRef Type="Allelic variant" ID="602421.0001" DB="OMIM"/>
       <XRef Type="rs" ID="113993960" DB="dbSNP"/>
     </Measure>