Wikidata:Property proposal/risk factor

From Wikidata
Jump to navigation Jump to search

risk factor[edit]

Originally proposed at Wikidata:Property proposal/Natural science

   Done: risk factor (P5642) (Talk and documentation)
DescriptionThis relation highlights which factors are associated with a high prevalence of a particular gene, disease or characteristic. These factors can be country of origin, country of citizenship, race, gender, occupation, anamnesis, etc. Further information can be found in
Data typeItem
Example 1myocardial infarction (Q12152)smoking (Q662860)
has effect (P1542) mortality (Q1239812)
Example 2hepatitis C (Q154869)Egypt (Q79)
criterion used (P1013) residence (Q699405)
has effect (P1542) incidence (Q217690)
Example 3myocardial infarction (Q12152)male (Q6581097)
criterion used (P1013) gender (Q48277)
has effect (P1542) incidence (Q217690)
Example 4lactic acidosis (Q1500373)metformin (Q19484)
criterion used (P1013) treatment (Q179661)
has effect (P1542) incidence (Q217690)
SourcePubMed articles

@علاء, Ebrahim, *Youngjin, -revi, Addshore, Ajraddatz:

@Arkanosis, ChristianKl, Ladsgroup, Mahir256, Mbch331, Nikki:

@Okkn, Pamputt, Romaine, Sannita, Stryn:

Please create the property as the proposal is currently ready.


OOjs UI icon error.svg WikiProject Medicine has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead.


Instead of merely 'high prevalence of a particular gene or disease', how about 'high prevalence of a particular gene or disease or characteristic'? MaynardClark (talk) 14:13, 1 August 2018 (UTC)Reply[reply]
MaynardClark: Excellent idea. Pictogram voting keep.svg Fixed. --Csisc (talk) 12:48, 2 August 2018 (UTC)Reply[reply]
This is an evidence-based resource. What is/are the (verifiable threshold) criteria for "high prevalence"? For the examples given, could such sourcing be provided? Soupvector (talk) 16:38, 1 August 2018 (UTC)Reply[reply]
Soupvector: The estimation of prevalence as high is relative. For a disease, 0.01 can be a high prevalence rate. For another one, 0.5 can be a medium prevalence rate. That is why we will be dependent on explicit statements of high prevalence in biomedical scientific literature for this purpose. For hepatitis C (Q154869)Egypt (Q79), the used reference can be --Csisc (talk) 13:36, 2 August 2018 (UTC)Reply[reply]
Pictogram voting comment.svg Comment What about a prevalence property - perhaps providing a list of countries and quantitative prevalence values as a Commons table? ArthurPSmith (talk) 20:30, 1 August 2018 (UTC)Reply[reply]
And I see we already have prevalence (P1193) - this can be used now to express this with appropriate qualifiers for country etc. ArthurPSmith (talk) 20:31, 1 August 2018 (UTC)Reply[reply]
ArthurPSmith: prevalence (P1193) is a quantitative property that gives the prevalence rate of a disease or a characteristic in general or in a given country. However, the proposed property is a qualitative one that returns the list of countries that are known to have a high prevalence rate of a particular disease, a gene or a characteristic. The two characteristics are consequently different. The latter is easier to automatically identify from the scientific literature. --Csisc (talk) 13:40, 2 August 2018 (UTC)Reply[reply]
How so? If you have the data available for prevalence by country, sources and hard numbers and all, shouldn't it be possible to derive which countries have the highest prevalence from that? Wouldn't this be redundant data? --Yair rand (talk) 03:33, 7 August 2018 (UTC)Reply[reply]
Yair rand: That is absolutely accurate. However, it is difficult to extract data about the prevalence of all diseases in all countries due to the lack of needed resources. That is why I proposed this new property that can be easily extracted from medical bibliographic databases. I had already developed a Python code for that. --Csisc (talk) 09:16, 7 August 2018 (UTC)Reply[reply]

Pictogram voting comment.svg Comment We cannot always get precise prevalence (P1193), so this kind of property is very useful and meaningful to describe epidemiology of diseases and other medical entities. However, is there any reason why you only focus on countries, @Csisc? There are many other factors associated with diseases, such as race, gender, occupation, anamnesis, etc. It seems like it would be better to expand the scope of the value of this property to "any factors" that are not based on a clear cause-and-effect relation. If that is the case, I will positively support this proposal. --Okkn (talk) 02:23, 9 August 2018 (UTC)Reply[reply]

Okkn: Excellent idea. Pictogram voting keep.svg Fixed. --Csisc (talk) 15:43, 10 August 2018 (UTC)Reply[reply]
 Support Ok, “risk factor” seems good. --Okkn (talk) 02:28, 11 August 2018 (UTC)Reply[reply]
Blue Rasberry
A risk factor is a characteristic you have that let you more exposed to a disease. In the example you had given, lactic acidosis (Q1500373) (effect) is an adverse effect (Q2047938) of metformin (Q19484) (drug). --Csisc (talk) 16:42, 11 August 2018 (UTC)Reply[reply]
@Csisc: In this example, is taking metformin a risk factor for lactic acidosis? Not everyone experiences the adverse effect, but this risk factor property applies to all use of the drug, does it not? Blue Rasberry (talk) 17:05, 11 August 2018 (UTC)Reply[reply]
Blue Rasberry : This is absolutely accurate. I added this as an example to the property proposal --Csisc (talk) 18:36, 11 August 2018 (UTC)Reply[reply]
 Support @Csisc: Wow, this can get complicated, but I might that perhaps 50% of all medical papers talk about this. Blue Rasberry (talk) 18:42, 11 August 2018 (UTC)Reply[reply]
Maybe criterion is treatment (Q179661)? Blue Rasberry (talk) 18:44, 11 August 2018 (UTC)Reply[reply]
Blue Rasberry : Thank you for your support to the proposal. I thank you for your advice concerning the criterion. It is absolutely useful. As for the complexity of extracting risk factors and adding them to Wikidata, I certainly know that. However, I will find a simple method to let the work easier. You can join this effort if you like that. --Csisc (talk) 19:11, 11 August 2018 (UTC)Reply[reply]
  • Pictogram voting comment.svg Comment Somehow it seems odd to link it to citizenship. Shouldn't it be residence?
    --- Jura 15:12, 11 August 2018 (UTC)Reply[reply]
Jura: Of course. However, I did not find "Country of residence" as a Wikidata property. --Csisc (talk) 16:46, 11 August 2018 (UTC)Reply[reply]
Jura: Useful information. Pictogram voting keep.svg Fixed. Thank you. --Csisc (talk) 18:36, 11 August 2018 (UTC)Reply[reply]
@Jura1, Csisc: criterion used (P1013) seems redundant in these cases to me. Is it really needed for these statements? --Okkn (talk) 03:14, 15 August 2018 (UTC)Reply[reply]
Okkn: Excellent question. Risk factor is mostly a transitive relation. For example, if we say that a risk factor for hepatitis C is Egypt, most users will have a question: they will ask if it is Egypt as a residence, as a country of birth or as a visited country. That is why we absolutely have to use criterion used (P1013). --Csisc (talk) 09:35, 15 August 2018 (UTC)Reply[reply]
@Csisc: In many cases, we cannot distinguish an environmental factor (as a residence or a visited country) from a genetic factor (as a country of birth). Are your examples clearly refers to countries as residences? --Okkn (talk) 09:49, 15 August 2018 (UTC)Reply[reply]
Okkn: Of course. For hepatitis C (Q154869)Egypt (Q79) as a country of residence, you can see --Csisc (talk) 10:35, 15 August 2018 (UTC)Reply[reply]
In addition, diseases have not only risk factors for their onsets, but also those for the prognosis. I think P1013 should be used to specify what the risk factor is for. --Okkn (talk) 09:59, 15 August 2018 (UTC)Reply[reply]
Okkn: Of course. However, I think that has effect (P1542) is better for such situations. Pictogram voting keep.svg Fixed --Csisc (talk) 10:35, 15 August 2018 (UTC)Reply[reply]
  • The examples given do not ring true. "Residence in France is a risk factor for heart failure"? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 22:30, 12 August 2018 (UTC)Reply[reply]
    • Andy Mabbett: Pictogram voting keep.svg Fixed. Eliminated example. The example here is not important. The most important fact to consider is the property itself. --Csisc (talk) 10:04, 13 August 2018 (UTC)Reply[reply]
      • @Pigsonthewing: A risk factor is not a cause. "Residence in France is a risk factor for heart failure" means that those who live in France have higher prevalence of heart failure (probably because of genetic or environmental factors). It does not always mean that living in France lead to heart failure. @Csisc: Was the example really wrong? --Okkn (talk) 03:14, 15 August 2018 (UTC)Reply[reply]
      • @Pigsonthewing, Okkn: After a review of the literature, I found that France as a country of residence is an old and controversial risk factor or myocardial infraction that is no longer valid. That is why I dropped it. --Csisc (talk) 10:19, 15 August 2018 (UTC)Reply[reply]
      • My concern remains; equally with the other examples. Without additional detail, such as in the examples at en:Risk factor#Terms of description, they tell us nothing useful. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:40, 15 August 2018 (UTC)Reply[reply]
Andy Mabbett: This is not accurate. Practitioners will not be interested to see the relative risk of the hazard ratio of the risk factor that may differ from a study to another. The main concern for them is to have an exhaustive list of risk factors they can easily use for prevention. --Csisc (talk) 10:56, 15 August 2018 (UTC)Reply[reply]
  • Pictogram voting comment.svg Comment, I think it's essential that every such statement is backed-up by literature and I would like to see this enforced (via constraints, of course)... there is a lot of simple stuff here (not so controversial statements), but other statements will be time bound (changing health policies) or just outright controversial (think wine and anything that causes and prevents cancer). Can expected qualifiers please be incorporated into the proposal? --Egon Willighagen (talk) 05:45, 13 August 2018 (UTC)Reply[reply]
    • Egon Willighagen: I am working on that. If you have ideas, you can be a co-author of my work. I will post the method of bibliometric-enhanced retrieval of risk factors after the acceptance of the proposal. --Csisc (talk) 10:04, 13 August 2018 (UTC)Reply[reply]
  •  Support This is how it is called in most of papers. Using the usual vocabulary will make Wikidata more intuitive. -- Thibdx (talk) 14:29, 13 August 2018 (UTC)Reply[reply]
  • Pictogram voting comment.svg Comment This feature will only be as useful as it is feasible to populate reliably. I find the examples not to be compelling, the statement "Biomedical relations with this property can be easily retrieved with references using PubMed Entrez API" seems out of step with our usual approach to sourcing, and I feel that a clear and sustainable plan compliant with a standard like WP:MEDRS should be coupled to this in order for it to succeed. Soupvector (talk) 01:52, 15 August 2018 (UTC)Reply[reply]
Soupvector: This is an excellent and absolutely useful information. After the property will be created. I will certainly do several Skype meetings in which I show the method to be used for the automatic extraction of risk factors, I will inform interested users about them soon in the mailing lists of Wikidata and of Wiki Project Med. If you like that, you can participate in one of them and share with me your opinions. --Csisc (talk) 10:56, 15 August 2018 (UTC)Reply[reply]
I expressed a need for a process with clearly articulated standards of evidence, which (IMHO) is essential. I do see the value in having a database that could inform inferences about epidemiology - but our standards of evidence need to be clear and firm. Soupvector (talk) 21:55, 15 August 2018 (UTC)Reply[reply]
Soupvector: I agree. I ask about what you can propose to me to adjust this. --Csisc (talk) 09:32, 16 August 2018 (UTC)Reply[reply]