Wikidata:Property proposal/according to

From Wikidata
Jump to navigation Jump to search

stated in source according to[edit]

Originally proposed at Wikidata:Property proposal/Generic

Descriptionto be used together with P248 if the statement is taken from an aggregator rather than directly from the source
Data typeItem
Domainreferences
Allowed valuessubclasses of database (Q8513)
Example 1stated in (P248)Doppler tomography of transiting exoplanets: a prograde, low-inclined orbit for the hot Jupiter CoRoT-11b (Q59246784)stated in source according toExtrasolar Planets Encyclopaedia (Q1385430)
Example 2stated in (P248)Transit timing observations from Kepler. IX. Catalog of the full long-cadence data set (Q59712406)stated in source according toExoplanet Archive (Q5420639)
Example 3stated in (P248)Gaia Data Release 3 (Q66061041)stated in source according toSIMBAD (Q654724)
Planned usedecorate stated in (P248) references and update statements that were corrected in the aggregator

Motivation[edit]

I mainly work with astronomical data. That science is actively moving forward, data quickly becomes outdated and needs to be regularly updated. Unfortunately, direct information extraction from the text of scientific articles (even using ChatGPT) still seems impractical, so one have to use information from astronomical databases. If we narrow scope, for example, to exoplanets, then there are 2.5 major databases: Extrasolar Planets Encyclopaedia (Q1385430), Exoplanet Archive (Q5420639) and (sometimes) SIMBAD (Q654724).

Any sufficiently large database contains errors. Consider for example the first statement from here: CoRoT-11 b (Q9184117)mass (P2067)2.33 ± 0.27. Now it is supported by three references:

  1. CoRoT: Harvest of the exoplanet program (Q56168679): Gandolfi et al. 2013
  2. Doppler tomography of transiting exoplanets: a prograde, low-inclined orbit for the hot Jupiter CoRoT-11b (Q59246784): Gandolfi et al. 2012
  3. Extrasolar Planets Encyclopaedia (Q1385430): CoRoT-11b

Although they seem "equal", technically they were obtained by parsing the third reference, which contains the first two references. The problem is that if you open corresponding NASA Exoplanet Archive page, you will see that the second article (Gandolfi et al. 2012) estimate is slightly higher: 2.49 ± 0.27 (check table on page 3).

I can write a SPARQL-query to identify problematic statements (the same object/predicate + the same source, but different values):

SELECT * WITH { SELECT ?item ?source {
  VALUES ?item { wd:Q9184117 } # Limit to CoRoT-11b for demo purposes
  ?item p:P2067/prov:wasDerivedFrom/pr:P248 ?source
  MINUS { VALUES ?source { wd:1385430 wd:Q5420639 wd:654724} } # Exclude aggregators
} GROUP BY ?item ?source HAVING(COUNT(*) > 1)} AS %Q {
  INCLUDE %Q
  ?item p:P2067[psv:P2067[wikibase:quantityAmount ?value; wikibase:quantityUpperBound ?upper]; prov:wasDerivedFrom/pr:P248 ?source]
}
Try it!

Unfortunately there is no easy way to identify from which aggregator those statements were extracted (e.g. see references for that statement). That is why I want to have an ability to "decorate" stated in (P248) reference with additional "according to" hint. Ghuron (talk) 17:00, 7 November 2023 (UTC)[reply]

Discussion[edit]