Property talk:P10283
Documentation
identifier for works, grants, authors, institutes, venues, concepts/subjects in OpenAlex
List of violations of this constraint: Database reports/Constraint violations/P10283#Format, SPARQL
List of violations of this constraint: Database reports/Constraint violations/P10283#Unique value, SPARQL (every item), SPARQL (by value)
List of violations of this constraint: Database reports/Constraint violations/P10283#Single value, SPARQL
List of violations of this constraint: Database reports/Constraint violations/P10283#Entity types
List of violations of this constraint: Database reports/Constraint violations/P10283#Scope, SPARQL
List of violations of this constraint: Database reports/Constraint violations/P10283#Conflicts with P31, SPARQL
List of violations of this constraint: Database reports/Constraint violations/P10283#Type Q35120, SPARQL
This property is being used by: Please notify projects that use this property before big changes (renaming, deletion, merge with another property, etc.) |
Initial A (author) would generally be people (Help)
Violations query:
SELECT ?item ?value { ?item wdt:P10283 ?value . FILTER( REGEX( ?value, '^A') ) FILTER NOT EXISTS { ?item wdt:P31 wd:Q5 } } LIMIT 10
List of this constraint violations: Database reports/Complex constraint violations/P10283#check A for people
?value would generally match P6366 (Help)
Violations query:
SELECT ?item ?value ?p6366 { ?item wdt:P10283 ?value ; wdt:P6366 ?p6366 FILTER( REGEX( ?value, '^A') ) FILTER( !REGEX( ?value, CONCAT('^A',?p6366,'$') ) ) } LIMIT 10
List of this constraint violations: Database reports/Complex constraint violations/P10283#check A for P6366
Initial V (venue) would generally have an ISSN (Help)
Violations query:
SELECT ?item ?value { ?item wdt:P10283 ?value . FILTER( REGEX( ?value, '^V') ) FILTER NOT EXISTS { ?item wdt:P236 [ ] } FILTER NOT EXISTS { ?item wdt:P7363 [ ] } } LIMIT 100
List of this constraint violations: Database reports/Complex constraint violations/P10283#check V for ISSN
Comparing to MAG
[edit](From Wikidata:Property_proposal/OpenAlex_ID and Wikidata_talk:Property_proposal/OpenAlex_ID)
Egon: Microsoft Academic is abandoned (actually points people to OpenAlex as solution to use instead). There is no reason to compare it with that.
Jura: We actually have plenty MAG statements and the data is still available.
Egon: you can only compare OpenAlex ID with Wikidata, because Microsoft Academic no longer exists. This identifiers does not replace, remove, or whatever the MA identifier. OpenAlex IDs will show differences, because of the curation work they did. Are you suggesting that Wikidata trumps an external database? Where you find changes, you will have to still manually discuss with the OpenAlex team what the ground truth is; passed MA identifier work cannot resolve that situation.
Egon: about the equivalences and differences between MAG and OpenAlex IDs. Because OpenAlex actually curates problems in the MAG data and that MAG is discontinued, I am still not sure how to handle the situation other than talking with the OpenAlex project. Nevertheless, https://w.wiki/4jpF is the query to compare the two. Differences are to be expected (because of the curation) but if people want to list and study the changes, use that query. --Egon Willighagen (talk) 21:09, 25 January 2022 (UTC)
SELECT * WHERE {
?thing wdt:P10283 ?openalex ; wdt:P6366 ?mag .
FILTER (str(?mag) != SUBSTR(str(?openalex),2))
}
}
Counts:
- This returns just 61 differences
- There are 6755 pairs
- OpenAlex has 7479 values (so 724 without MAG)
- MAG has 279672 values
- All these are tiny percentages of the potential total, which is about 200M --Vladimir Alexiev (talk) 07:39, 26 January 2022 (UTC)
Migrating from MAG
[edit]As you see above, there are 273k MAG IDs that could be migrated to OpenAlex. I agree with Egon that OpenAlex IDs will diverge from MAG IDs in the future, but I think that right now it makes sense to do this migration.
I hoped it should be straightforward to populate from existing MAG values by prepending a letter depending on the item type: if Human then A, if Organization then I, etc.
This query counts type combinations of items with MAG. It times out on WD, so I ran it on a local instance
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wd: <http://www.wikidata.org/entity/>
select ?A ?I ?V ?W ?letters (count(*) as ?c) {
?x wdt:P6366 ?mag
bind(exists {?x wdt:P31/wdt:P279* wd:Q5} as ?A)
bind(exists {?x wdt:P31/wdt:P279* wd:Q43229} as ?I)
bind(exists {?x wdt:P31/wdt:P279* wd:Q5633421}|| exists {?x wdt:P31/wdt:P279* wd:Q625994} as ?V)
bind(exists {?x wdt:P31/wdt:P279* wd:Q17537576} as ?W)
bind(if(?A,1,0) as ?cA)
bind(if(?I,1,0) as ?cI)
bind(if(?V,1,0) as ?cV)
bind(if(?W,1,0) as ?cW)
bind(?cA+?cI+?cV+?cW as ?letters)
} group by ?A ?I ?V ?W ?letters
No letter: Assume C
[edit]Ideally, we want "letters=1" in each row. But the majority (228k) have no letter:
A | I | V | W | letters | c |
false | false | false | false | 0 | 228168 |
false | true | false | false | 1 | 25907 |
false | true | false | true | 2 | 1563 |
false | false | false | true | 1 | 21080 |
true | false | false | false | 1 | 2741 |
true | true | false | false | 2 | 9 |
false | false | true | true | 2 | 187 |
false | true | true | true | 3 | 2 |
false | false | true | false | 1 | 15 |
false | true | true | false | 2 | 3 |
So I ran this discovery query to find the actual list of types:
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?typeLabel (count(*) as ?c) {
?x wdt:P6366 ?mag
filter (not exists {?x wdt:P31/wdt:P279* wd:Q5}
&& not exists {?x wdt:P31/wdt:P279* wd:Q43229}
&& not exists {?x wdt:P31/wdt:P279* wd:Q5633421} && not exists {?x wdt:P31/wdt:P279* wd:Q625994}
&& not exists {?x wdt:P31/wdt:P279* wd:Q17537576}
)
?x wdt:P31 ?type.
?type rdfs:label ?typeLabel filter(lang(?typeLabel)="en")
} group by ?typeLabel order by desc(?c)
There are over 2k distinct types. I examined about 300 of them, and they are indeed various concepts (letter C).
- So I think it's fair to assume "if not A, I, V, W then it's a Concept".
- Most of these 228k are MAG Fields of Science imported by Nikola Tulechki.
Conflicts Between Letters
[edit]Now I want to investigate the conflicts where there are >1 letters:
- 1563: I & W
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?x ?xLabel {
?x wdt:P6366 ?mag
filter (exists {?x wdt:P31/wdt:P279* wd:Q43229}
&& exists {?x wdt:P31/wdt:P279* wd:Q17537576})
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
# ?x rdfs:label ?xLabel filter(lang(?xLabel)="en")
} limit 100
- 187: V & W
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?x ?xLabel {
?x wdt:P6366 ?mag
filter (exists {?x wdt:P31/wdt:P279* wd:Q43229}
&& exists {?x wdt:P31/wdt:P279* wd:Q17537576})
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
# ?x rdfs:label ?xLabel filter(lang(?xLabel)="en")
} limit 100
MAG ID Concepts overlap
[edit]As @Vladimir Alexiev: mentioned, most of the values of Microsoft Academic ID (P6366) correspond to concepts and in theory should exist as C- prefixed OpenAlex ID (P10283). I imported them a while ago based on the Wikipedia pages that MAG used as concepts.
Now, I ran a test on 1000 random values from Wikidata and queried the OpenAlex API in order to see if they correspond to something on the OpenAlex side.
The result is here Of 1000 values only 400 match. As I did not filter by type, some are not concepts but Institutions and Venues but many have no corresponding object on the Open Alex side.
Also the OpenAlex data has a Wikidata qid, so I checked if they match. 2/1000 do not match, and one of them is RDF/XML (Q48940) (karmic coincidence? :), which on the OA side points back to Extensible Markup Language (Q2115)
So based on:
- this experiment ;
- the fact that OA data is CC0 licenced ;
- the fact that they keep the wikidata qids ;
I propose that we do not migrate the existing MAG ids, which correspond to concepts but rather import them directly from the Open Alex snapshot.
@Egon Willighagen, TiagoLubiana, DarTar, ArthurPSmith, Daniel Mietchen, Jura1: @Nikola Tulechki, Oa01, Vladimir Alexiev, AdrianoRutz, MasterRus21thCentury: Any help will be appreciated! --Vladimir Alexiev (talk) 08:45, 26 January 2022 (UTC)
- Agreed! Please do not migrate the data. Per the above points, the curation done by the OpenAlex team what would be undone. I have a data dump from the OpenAlex people, which was more practical than downloading all data. If people want to help pull it in, just let me know. --Egon Willighagen (talk) 06:34, 31 January 2022 (UTC)
- @Egon Willighagen: How many of their Concept records have WD? I just looked at some API responses, and it seems like a very excellent API. --Vladimir Alexiev (talk) 07:13, 3 February 2022 (UTC)
- The data I got from the team as easy to use TSV file with mappings has 65073 rows. When entering these mappings in Wikidata I found an error, reported it with the OpenAlex team, and they said later this or next month they will have a big upgrade of article-concept annotations. But not sure if that includes a lot more concepts itself too. Anyway, around 65k gives you some idea. --Egon Willighagen (talk) 11:19, 3 February 2022 (UTC)
Breaking change to author IDs
[edit]It looks like all author IDs will be dumped and recreated: https://groups.google.com/g/openalex-users/c/rDA7PWTarVQ?pli=1
- Next month we’ll unveil a rewrite of our author disambiguation system. When we do, all old OpenAlex Author IDs will disappear and be replaced by new ones. [...] Because it’s such a huge improvement, we have to completely replace all the old IDs with new ones. For most users, no response is needed; you’ll just notice that author disambiguation is way more accurate. Yay! But if you’re saving and linking to specific author IDs (like https://openalex.org/A123) in ways that assume persistence, you’ll need to update those IDs, because they’ll stop working (specifically, they’ll return 404 errors).
There are currently ~400x author IDs, so not a major change, but worth noting. Andrew Gray (talk) 20:09, 20 June 2023 (UTC)
In April 2023, User:Waydze wrote: "the identifiers for the OpenAlex venues changed the format. Their first letter used to be V, now - S. The urls changed as well - it is https://explore.openalex.org/sources/S73261239 instead of https://explore.openalex.org/venues/V73261239."
Are there plans to address this in Wikidata? Apologies if this has been discussed elsewhere already and I missed it. Thanks. -- Oa01 (talk) 12:33, 31 January 2024 (UTC)
- I noticed this change too. E.g. https://openalex.org/V205231332 that supposedly points to w:Astronomy & Astrophysics is longer a valid URL; instead https://openalex.org/sources/s205231332 or https://openalex.org/sources/S205231332 (lower and upper case 's') both seem to be valid. @Oa01: Since there's been no discussion for two months, if you think you can fix the current constraints, go ahead and do it. If you're worried about messing up, then make a proposal here, and if nobody reacts within a reasonable delay, then do it. You can revert it if you find you've messed up. You could ping people who were active in the proposal for creating this property. Waiting for "leaders" to get things done is generally ineffective. The only reason I'm not doing it is the lack of time to check things properly. Boud (talk) 19:26, 5 April 2024 (UTC)
- Thanks for for the info. Much appreciated. -- Oa01 (talk) 18:43, 9 April 2024 (UTC)
- All Properties
- Properties with external-id-datatype
- Properties used on 100000+ items
- Properties with format constraints
- Properties with unique value constraints
- Properties with single value constraints
- Properties with entity type constraints
- Properties with scope constraints
- Properties with conflicts with constraints
- Properties with constraints on type
- Properties with complex constraints