Property talk:P10283

From Wikidata
Jump to navigation Jump to search

Documentation

OpenAlex ID
identifier for works, grants, authors, institutes, venues, concepts/subjects in OpenAlex
Applicable "stated in" valueOpenAlex (Q107507571)
Data typeExternal identifier
Domainentity (Q35120)
Allowed values[ACIVW][1-9]\d{3,9}
ExampleThe state of OA: a large-scale analysis of the prevalence and impact of Open Access articles (Q49510981)W2741809807
Jason Priem (Q21678556)A2208157607
Journal of the Welsh Bibliographical Society (Q6296212)V87782555
University of North Carolina at Chapel Hill (Q192334)I114027177
altmetrics (Q14565201)C2778407487
Sourcehttps://explore.openalex.org
Formatter URLhttps://openalex.org/$1
See alsoMicrosoft Academic ID (P6366)
Lists
Proposal discussionProposal discussion
Current uses
Total160,514
Main statement160,363 out of 102,567 (156% complete)>99.9% of uses
Qualifier5<0.1% of uses
Reference146<0.1% of uses
Search for values
[create Create a translatable help page (preferably in English) for this property to be included here]
Format “[ACISVW][1-9]\d{3,9}: value must be formatted using this pattern (PCRE syntax). (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303).
List of violations of this constraint: Database reports/Constraint violations/P10283#Format, SPARQL
Distinct values: this property likely contains a value that is different from all other items. (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303).
List of violations of this constraint: Database reports/Constraint violations/P10283#Unique value, SPARQL (every item), SPARQL (by value)
Single value: this property generally contains a single value. (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303). Known exceptions: stress (Q181767)
List of violations of this constraint: Database reports/Constraint violations/P10283#Single value, SPARQL
Allowed entity types are Wikibase item (Q29934200): the property may only be used on a certain entity type (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303).
List of violations of this constraint: Database reports/Constraint violations/P10283#Entity types
Scope is as main value (Q54828448), as reference (Q54828450): the property must be used by specified way only (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303).
List of violations of this constraint: Database reports/Constraint violations/P10283#Scope, SPARQL
Conflicts with “instance of (P31): Wikimedia category (Q4167836), Wikimedia disambiguation page (Q4167410): this property must not be used with the listed properties and values. (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303).
List of violations of this constraint: Database reports/Constraint violations/P10283#Conflicts with P31, SPARQL
Type “entity (Q35120): item must contain property “instance of (P31), subclass of (P279)” with classes “entity (Q35120)” or their subclasses (defined using subclass of (P279)). (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303).
List of violations of this constraint: Database reports/Constraint violations/P10283#Type Q35120, SPARQL
check A for people
Initial A (author) would generally be people (Help)
Violations query: SELECT ?item ?value { ?item wdt:P10283 ?value . FILTER( REGEX( ?value, '^A') ) FILTER NOT EXISTS { ?item wdt:P31 wd:Q5 } } LIMIT 10
List of this constraint violations: Database reports/Complex constraint violations/P10283#check A for people
check A for P6366
?value would generally match P6366 (Help)
Violations query: SELECT ?item ?value ?p6366 { ?item wdt:P10283 ?value ; wdt:P6366 ?p6366 FILTER( REGEX( ?value, '^A') ) FILTER( !REGEX( ?value, CONCAT('^A',?p6366,'$') ) ) } LIMIT 10
List of this constraint violations: Database reports/Complex constraint violations/P10283#check A for P6366
check V for ISSN
Initial V (venue) would generally have an ISSN (Help)
Violations query: SELECT ?item ?value { ?item wdt:P10283 ?value . FILTER( REGEX( ?value, '^V') ) FILTER NOT EXISTS { ?item wdt:P236 [ ] } FILTER NOT EXISTS { ?item wdt:P7363 [ ] } } LIMIT 100
List of this constraint violations: Database reports/Complex constraint violations/P10283#check V for ISSN

Comparing to MAG[edit]

(From Wikidata:Property_proposal/OpenAlex_ID and Wikidata_talk:Property_proposal/OpenAlex_ID)

Egon: Microsoft Academic is abandoned (actually points people to OpenAlex as solution to use instead). There is no reason to compare it with that.

Jura: We actually have plenty MAG statements and the data is still available.

Egon: you can only compare OpenAlex ID with Wikidata, because Microsoft Academic no longer exists. This identifiers does not replace, remove, or whatever the MA identifier. OpenAlex IDs will show differences, because of the curation work they did. Are you suggesting that Wikidata trumps an external database? Where you find changes, you will have to still manually discuss with the OpenAlex team what the ground truth is; passed MA identifier work cannot resolve that situation.

Egon: about the equivalences and differences between MAG and OpenAlex IDs. Because OpenAlex actually curates problems in the MAG data and that MAG is discontinued, I am still not sure how to handle the situation other than talking with the OpenAlex project. Nevertheless, https://w.wiki/4jpF is the query to compare the two. Differences are to be expected (because of the curation) but if people want to list and study the changes, use that query. --Egon Willighagen (talk) 21:09, 25 January 2022 (UTC)[reply]

SELECT * WHERE {
  ?thing wdt:P10283 ?openalex ; wdt:P6366  ?mag .
  FILTER (str(?mag) != SUBSTR(str(?openalex),2))
}
Try it!

}

Counts:

Migrating from MAG[edit]

As you see above, there are 273k MAG IDs that could be migrated to OpenAlex. I agree with Egon that OpenAlex IDs will diverge from MAG IDs in the future, but I think that right now it makes sense to do this migration.

I hoped it should be straightforward to populate from existing MAG values by prepending a letter depending on the item type: if Human then A, if Organization then I, etc.

This query counts type combinations of items with MAG. It times out on WD, so I ran it on a local instance

PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wd: <http://www.wikidata.org/entity/>
select ?A ?I ?V ?W ?letters (count(*) as ?c) {
  ?x wdt:P6366 ?mag
  bind(exists {?x wdt:P31/wdt:P279* wd:Q5} as ?A)
  bind(exists {?x wdt:P31/wdt:P279* wd:Q43229} as ?I)
  bind(exists {?x wdt:P31/wdt:P279* wd:Q5633421}|| exists {?x wdt:P31/wdt:P279* wd:Q625994} as ?V)
  bind(exists {?x wdt:P31/wdt:P279* wd:Q17537576} as ?W)
  bind(if(?A,1,0) as ?cA)
  bind(if(?I,1,0) as ?cI)
  bind(if(?V,1,0) as ?cV)
  bind(if(?W,1,0) as ?cW)
  bind(?cA+?cI+?cV+?cW as ?letters)
} group by ?A ?I ?V ?W ?letters
Try it!

No letter: Assume C[edit]

Ideally, we want "letters=1" in each row. But the majority (228k) have no letter:

A I V W letters c
false false false false 0 228168
false true false false 1 25907
false true false true 2 1563
false false false true 1 21080
true false false false 1 2741
true true false false 2 9
false false true true 2 187
false true true true 3 2
false false true false 1 15
false true true false 2 3


So I ran this discovery query to find the actual list of types:

PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?typeLabel (count(*) as ?c) {
  ?x wdt:P6366 ?mag
  filter (not exists {?x wdt:P31/wdt:P279* wd:Q5}
    && not exists {?x wdt:P31/wdt:P279* wd:Q43229}
    && not exists {?x wdt:P31/wdt:P279* wd:Q5633421} && not exists {?x wdt:P31/wdt:P279* wd:Q625994} 
    && not exists {?x wdt:P31/wdt:P279* wd:Q17537576}
  )
  ?x wdt:P31 ?type.
  ?type rdfs:label ?typeLabel filter(lang(?typeLabel)="en")
} group by ?typeLabel order by desc(?c)
Try it!

There are over 2k distinct types. I examined about 300 of them, and they are indeed various concepts (letter C).

  • So I think it's fair to assume "if not A, I, V, W then it's a Concept".
  • Most of these 228k are MAG Fields of Science imported by Nikola Tulechki.

Conflicts Between Letters[edit]

Now I want to investigate the conflicts where there are >1 letters:

  • 1563: I & W
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?x ?xLabel {
  ?x wdt:P6366 ?mag
  filter (exists {?x wdt:P31/wdt:P279* wd:Q43229}
         && exists {?x wdt:P31/wdt:P279* wd:Q17537576})
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
  # ?x rdfs:label ?xLabel filter(lang(?xLabel)="en")
} limit 100
Try it!
  • 187: V & W
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?x ?xLabel {
  ?x wdt:P6366 ?mag
  filter (exists {?x wdt:P31/wdt:P279* wd:Q43229}
         && exists {?x wdt:P31/wdt:P279* wd:Q17537576})
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
  # ?x rdfs:label ?xLabel filter(lang(?xLabel)="en")
} limit 100
Try it!

MAG ID Concepts overlap[edit]

As @Vladimir Alexiev: mentioned, most of the values of Microsoft Academic ID (P6366) correspond to concepts and in theory should exist as C- prefixed OpenAlex ID (P10283). I imported them a while ago based on the Wikipedia pages that MAG used as concepts.

Now, I ran a test on 1000 random values from Wikidata and queried the OpenAlex API in order to see if they correspond to something on the OpenAlex side.

The result is here Of 1000 values only 400 match. As I did not filter by type, some are not concepts but Institutions and Venues but many have no corresponding object on the Open Alex side.

Also the OpenAlex data has a Wikidata qid, so I checked if they match. 2/1000 do not match, and one of them is RDF/XML (Q48940) (karmic coincidence? :), which on the OA side points back to Extensible Markup Language (Q2115)

So based on:

  • this experiment ;
  • the fact that OA data is CC0 licenced ;
  • the fact that they keep the wikidata qids ;

I propose that we do not migrate the existing MAG ids, which correspond to concepts but rather import them directly from the Open Alex snapshot.

@Egon Willighagen, TiagoLubiana, DarTar, ArthurPSmith, Daniel Mietchen, Jura1: @Nikola Tulechki, Oa01, Vladimir Alexiev, AdrianoRutz, MasterRus21thCentury: Any help will be appreciated! --Vladimir Alexiev (talk) 08:45, 26 January 2022 (UTC)[reply]

  • Agreed! Please do not migrate the data. Per the above points, the curation done by the OpenAlex team what would be undone. I have a data dump from the OpenAlex people, which was more practical than downloading all data. If people want to help pull it in, just let me know. --Egon Willighagen (talk) 06:34, 31 January 2022 (UTC)[reply]
  • @Egon Willighagen: How many of their Concept records have WD? I just looked at some API responses, and it seems like a very excellent API. --Vladimir Alexiev (talk) 07:13, 3 February 2022 (UTC)[reply]
    The data I got from the team as easy to use TSV file with mappings has 65073 rows. When entering these mappings in Wikidata I found an error, reported it with the OpenAlex team, and they said later this or next month they will have a big upgrade of article-concept annotations. But not sure if that includes a lot more concepts itself too. Anyway, around 65k gives you some idea. --Egon Willighagen (talk) 11:19, 3 February 2022 (UTC)[reply]

Breaking change to author IDs[edit]

It looks like all author IDs will be dumped and recreated: https://groups.google.com/g/openalex-users/c/rDA7PWTarVQ?pli=1

Next month we’ll unveil a rewrite of our author disambiguation system. When we do, all old OpenAlex Author IDs will disappear and be replaced by new ones. [...] Because it’s such a huge improvement, we have to completely replace all the old IDs with new ones. For most users, no response is needed; you’ll just notice that author disambiguation is way more accurate. Yay! But if you’re saving and linking to specific author IDs (like https://openalex.org/A123) in ways that assume persistence, you’ll need to update those IDs, because they’ll stop working (specifically, they’ll return 404 errors).

There are currently ~400x author IDs, so not a major change, but worth noting. Andrew Gray (talk) 20:09, 20 June 2023 (UTC)[reply]

V changed to S[edit]

In April 2023, User:Waydze wrote: "the identifiers for the OpenAlex venues changed the format. Their first letter used to be V, now - S. The urls changed as well - it is https://explore.openalex.org/sources/S73261239 instead of https://explore.openalex.org/venues/V73261239."

Are there plans to address this in Wikidata? Apologies if this has been discussed elsewhere already and I missed it. Thanks. -- Oa01 (talk) 12:33, 31 January 2024 (UTC)[reply]

I noticed this change too. E.g. https://openalex.org/V205231332 that supposedly points to w:Astronomy & Astrophysics is longer a valid URL; instead https://openalex.org/sources/s205231332 or https://openalex.org/sources/S205231332 (lower and upper case 's') both seem to be valid. @Oa01: Since there's been no discussion for two months, if you think you can fix the current constraints, go ahead and do it. If you're worried about messing up, then make a proposal here, and if nobody reacts within a reasonable delay, then do it. You can revert it if you find you've messed up. You could ping people who were active in the proposal for creating this property. Waiting for "leaders" to get things done is generally ineffective. The only reason I'm not doing it is the lack of time to check things properly. Boud (talk) 19:26, 5 April 2024 (UTC)[reply]
Thanks for for the info. Much appreciated. -- Oa01 (talk) 18:43, 9 April 2024 (UTC)[reply]