Shortcuts: WD:PP/GEN, WD:PP/Generic
Wikidata:Property proposal/Generic
Property proposal: | Generic | Authority control | Person | Organization |
Creative work | Place | Sports | Sister projects | |
Transportation | Natural science | Computing | Lexeme |
See also
[edit]- Wikidata:Property proposal/Pending – properties which have been approved but which are on hold waiting for the appropriate datatype to be made available
- Wikidata:Properties for deletion – proposals for the deletion of properties
- Wikidata:External identifiers – statements to add when creating properties for external IDs
- Wikidata:Lexicographical data – information and discussion about lexicographic data on Wikidata
This page is for the proposal of new properties.
Before proposing a property
- Search if the property already exists.
- Search if the property has already been proposed.
- Check if you can give a similar label and definition as an existing Wikipedia infobox parameter, or if it can be matched to an infobox, to or from which data can be transferred automatically.
- Select the right datatype for the property.
- Read Wikidata:Creating a property proposal for guidelines you should follow when proposing new property.
- Start writing the documentation based on the preload form below by editing the two templates at the top of the page to add proposal details.
Creating the property
- Once consensus is reached, change status=ready on the template, to attract the attention of a property creator.
- Creation can be done 1 week after the creation of the proposal, by a property creator or an administrator.
- See property creation policy.
On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2024/09. |
General
[edit]Publication type of scholarly article
[edit]Description | Publication type of scholarly article |
---|---|
Data type | Item |
Template parameter | Different from publication type as used for example in w:Template:Infobox short story |
Domain | Instances of scholarly article (Q13442814) and its subclasses. |
Allowed values | Permitted values typically should be potential subclasses of scholarly work (Q55915575). In practice there is diversity in instance of (P31) statements additional to scholarly article (Q13442814) items, which number in tens of millions, and some cleanup work is anticipated for both domain and range. |
Example 1 | Malaria and the microbiome: a systematic review (Q56383548) → systematic review (Q1504425) |
Example 2 | NIH Consensus conference. Gallstones and laparoscopic cholecystectomy (Q70552083) → NIH consensus development conference summary (Q27718083) |
Example 3 | Practice guidelines for the management of bacterial meningitis (Q33982444) → medical guideline (Q878041) |
Source | The new statements initially will be generated by rules from existing statements, as followup to the WDQS split. |
Robot and gadget jobs | Bots will be used heavily to implement the migration from instance of (P31) statements. |
Wikidata project | Wikidata:WikiCite |
Motivation
[edit]Currently these publication types of articles are added as instance of (P31) statements, but better data modelling can follow from having a separate property. For example, on clinical trial (Q30612) under MeSH descriptor ID (P486) the publication type meaning is at present given preferred rank over the "clinical trials as topic" meaning. It would be better not to overload the item in this way, given the importance of clinical trials in medical research. We should have two items, one of which should only be used in "publication type of scholarly article" statements.
This idea was mentioned already several years ago. It comes up now because of the graph split treating the scholarly article items as a graph in their own right. See Wikidata talk:WikiCite#Community input into WDQS graph split: a publication type property proposal for a preliminary discussion. That thread links to a graph split page which goes into fuller details of the technical side. I've been asked by the developers working on the split to make this proposal. @Daniel Mietchen: @Bluerasberry: @Sj:
While the graph split will make SPARQL queries more complex, good can come of it if this proposed property is created, and some systematic work goes on to sort out the current overloading of dozens of items. Charles Matthews (talk) 10:35, 12 September 2024 (UTC)
Discussion
[edit]- Support Would most (all?) subclasses of scholarly article (Q13442814) be replaced by this new property then, and their instances updated to just be instances of scholarly article (Q13442814)? ArthurPSmith (talk) 13:23, 12 September 2024 (UTC)
- Not part of the original plan anyway, which was simply to create new triples from old, with new object items where, for example, "clinical trial" had become either an item for the real-life testing, or for a publication type. Charles Matthews (talk) 15:07, 12 September 2024 (UTC)
- Comment The existing set of rules has already raised some concerns and confusions at Wikidata_talk:SPARQL_query_service/WDQS_graph_split/Rules and I think this proposal is going to help to reduce these confusions/ambiguities. From a technical standpoint what is important is that this new property will help the system to determine if an item should be part of the scholarly_articles subgraph or not. My current understanding (but please let me know if I'm wrong) is that the fact that an entity has a non-deprecated statement with this new property will be sufficient to classify it as a scholarly article (it would not even have to look at the value of this property). From a practical point of view, assuming this proposal is accepted, we should update the WDQS software with this new rule before any migration is attempted. During the migration we might have to keep both types of rules (the one based on P31 and the one based on this property). DCausse (WMF) (talk) 07:05, 13 September 2024 (UTC)
- @DCausse (WMF): So there can be a case analysis with a few cases. An example that is clear is the case of multicenter study report (Q91901000), label "class of publication", and multicenter clinical trial (Q6934595). I have checked just now, and nothing that is instance of multicenter clinical trial (Q6934595) is also instance of scholarly article (Q13442814). On Study of GLS-5700 in Dengue Virus Seropositive Adults (Q26762063) there is another P31 statement, but for another type of trial. I have worked through the 28 hits for multicenter study report (Q91901000):
- Try it!
SELECT ?item ?itemLabel WHERE {?item wdt:P31 wd:Q91901000; wdt:P31 wd:Q13442814. SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],mul,en". } }
- I see a case where controlled clinical trial (Q70447452) is used instead of controlled clinical trial (Q58897597). So as an example for your question, a P31 statement with controlled clinical trial (Q58897597) ought to be enough to classify as a scholarly article. I find no hits like that, so perhaps all those items have already been split out. One hit for Role of lopinavir/ritonavir in the treatment of SARS: initial virological and clinical findings. (Q35536588) which is "wrong", twice. But really this can't be discussed fully here. Charles Matthews (talk) 10:23, 13 September 2024 (UTC)
DisagreeSupport So if I understand correctly, rather than using instance of (P31) or subclass of (P279), you want a dedicated property for scholarly works to subclassify by the idea of a "publication type", which upon your inspection seems to be quite varied in 1000's of "publication types". You didn't want to use subclass of (P279) to subclassify them because it would make querying a bit harder and less straightforward when dealing with the migration, and scholary works in general? So a dedicated property just for scholary works to subclassify/subcategorize (without resorting to using subclass of (P279)) was justified, and hence this proposal. YES/NO? (after a clarifying reply, I can update my disagreement) --Thadguidry (talk) 00:09, 17 September 2024 (UTC)- @Thadguidry: In the background here, we have the blind men and an elephant (Q1218005) issue applied to WikiCite (Q21831105). The graph split will have a major impact on the "WikiCite area", because the scholarly graph split out is the natural habitat for WikiCite on Wikidata: the big citation graph lives there. But people may talk past each other when they have different conceptions of WikiCite.
- So "quite varied" might be fair, but for me the list of publication types of interest is from Medical Subject Headings (MeSH). Those can be found with this query.
- Try it!
SELECT DISTINCT ?item ?itemLabel WHERE {?item wdt:P672 ?string. FILTER (STRSTARTS(?string, "V")) SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } }
- This is the list of terms that can be used on a PubMed page in the "Publication types" section, e.g. on https://pubmed.ncbi.nlm.nih.gov/24394640/ where the types are review article (Q7318358) and a term that comments on the grant support. It would be very good to import systematically those statements into Wikidata, and "subclass of scholarly article" isn't a good fit. Points about this: maybe we only import a subset. Maybe we want to use some other way of looking at publication types (Wikidata is agnostic about ontology, doesn't insist on limiting values), and someone has already mentioned a classification used on The Lens (Q7144471).
- But looking at MeSH with PubMed explains quite well the overloading issue we have, that is impeding a clean split. clinical trial (Q30612) is going to need to become two items, one with MeSH tree code (P672) value V03.175.250, and another with the other three values. The latter item would be fit to be used in main subject (P921) statements, for example where PubMed has a "Clinical Trials as Topic" term. The former item should equally be fit to be used in "publication type" statements. This would be a good resolution to an ambiguity issue we have here and now. Charles Matthews (talk) 08:48, 17 September 2024 (UTC)
- It sounds like there will eventually be overlap with general "publication types". But that's ok, because they can be multi-typed (multiple subclasses added to any particular "publication type of scholary article". Thanks, I think I understand better. To me, this feels like a 1st pass at fixing things, and perhaps this property will be slightly less useful in the future, but for now, it seems you've made a convincing argument that it's needed for now in respect to migration. Changed my vote to support. --Thadguidry (talk) 12:27, 17 September 2024 (UTC)
- Support Contrary to popular belief, instance of (P31) is not intended to be a data dump. Specific qualities deserve specific properties, especially when we have millions of relevant items, as in this case. --Jklamo (talk) 13:27, 17 September 2024 (UTC)
- Support Makes sense. --Prototyperspective (talk) 14:28, 18 September 2024 (UTC)
- Support Thanks for taking the time to write this proposal--So9q (talk) 16:34, 20 September 2024 (UTC)
- Oppose I don't think it makes sense to have a publication type property specific to scholarly articles. Why doesn't it apply to other publications (which currently use genre (P136) for things like this)? I also think some of the more specific instance of (P31) values currently being used are redundant and should not be used on the items at all. I don't see any reason to add academic journal article (Q18918145) to something which we already know is an article published in a journal in the same way we don't add woman (Q467) to a human (Q5) who is female (Q6581072). - Nikki (talk) 09:27, 21 September 2024 (UTC)
- Good points that need to be addressed. If your proposed solution is to instead use Genre I don't think those are genres. And if not, please specify what your solution would be. Prototyperspective (talk) 13:35, 21 September 2024 (UTC)
- I can give a concise kind of reason: if you want to have database constraints that apply in this particular context. Certainly it doesn't make very much sense to have instance of (P31) subjected to database constraints, when it is universal. When you say "should not be used on the items at all" you are arguing for constraints, and the standard way to do that is with a definite property. Charles Matthews (talk) 19:33, 21 September 2024 (UTC)
- Makes sense. One thing is that I think it would generally be best if values in properties can be constrained depending on other values/properties of the item and think this is already done. Moreover, could you please explain why properties like language of work or name also show values other than in this case languages in the autocomplete box? Prototyperspective (talk) 10:07, 22 September 2024 (UTC)
- I don't think I want to talk here about details of constraints, because it is anyway going to be a community decision what is wanted. The general principle is to have constraints based on queries, so a list of constraint violations can be generated automatically. In this case it is worth emphasising (a) that there are tens of millions of items involved, and (b) preliminary checks on the instance of (P31) statements we are starting with show a complex situation. So I don't think we should approach this business with ad hoc ideas. We may end up with a package of constraints that is effective in keeping the data clean, but that would require some effort. Charles Matthews (talk) 11:08, 22 September 2024 (UTC)
- Makes sense. One thing is that I think it would generally be best if values in properties can be constrained depending on other values/properties of the item and think this is already done. Moreover, could you please explain why properties like language of work or name also show values other than in this case languages in the autocomplete box? Prototyperspective (talk) 10:07, 22 September 2024 (UTC)
- I can give a concise kind of reason: if you want to have database constraints that apply in this particular context. Certainly it doesn't make very much sense to have instance of (P31) subjected to database constraints, when it is universal. When you say "should not be used on the items at all" you are arguing for constraints, and the standard way to do that is with a definite property. Charles Matthews (talk) 19:33, 21 September 2024 (UTC)
characteristic of (aliases: quality of | property of | inheres in )
[edit]Description | (qualifier only) statement value is a characteristic, quality, property, or state of this item |
---|---|
Data type | Item |
Domain | quality (Q1207505), property (Q937228), state (Q3505845), relation (Q930933), type of property (Q96253971) |
Example 1 | battery management system (Q810938)measures (P2575)temperature (Q11466) |
Example 2 | terminal velocity (Q614981)has contributing factor (P1479)orientation (Q2235286) |
Example 3 | The Unconscious of a Conservative (Q52945586)main subject (P921)mental health (Q317309) |
Example 4 | tetrachromacy (Q94556)has characteristic (P1552)dimension (Q4440864) |
See also | of (P642), applies to part (P518), facet of (P1269), part of (P361), has characteristic (P1552) |
Motivation
[edit]This common relation is widely expressed with the massively overloaded (and to-be-deprecated) of (P642), and sometimes (erroneously) with applies to part (P518), facet of (P1269), part of (P361), and possibly a few other properties. Although it is semantically an inverse of has characteristic (P1552), constraining this property to the qualifier scope will prevent introduction of redundant inverses of has characteristic (P1552) statements. Swpb (talk) 18:03, 12 September 2024 (UTC)
Discussion
[edit]- Support -wd-Ryan (Talk/Edits) 21:28, 19 September 2024 (UTC)