User:Lucas Werkmeister (WMDE)/LOD Cloud
Jump to navigation
Jump to search
Try it!
Warning: PLEASE DO NOT EDIT THIS PAGE, it just creates more work for me to verify your changes. Please notify me on the talk page instead. |
We need to collect some information in order to get Wikidata into the Linked Open Data cloud diagram (Q43984865).
Number of triples[edit]
SELECT ?count (NOW() AS ?asOf) WITH {
SELECT (COUNT(*) AS ?count) WHERE {
?s ?p ?o.
}
} AS %count WHERE {
INCLUDE %count.
}
Result as of : 12526206009 (there’s some mismatch between the different backend servers, so let’s just say ~12.5 billion)
Links to other datasets[edit]
We count all the properties as links which have formatter URI for RDF resource (P1921) and whose associated item has Linked Open Data Cloud ID (P8605). The following scripts counts the statements, qualifiers and references for all of those:
#!/usr/bin/env bash
TZ=UTC printf 'Data as of {{ISOdate|%(%FT%R:00%Z)T}}: \n\n'
curl -s \
-H 'Accept: text/tab-separated-values' \
--data-urlencode query='
SELECT (GROUP_CONCAT(DISTINCT STRAFTER(STR(?property), STR(wd:)); separator = "|") AS ?propertyIds) ?lodCloudId WHERE {
?dataset wdt:P8605 ?lodCloudId;
wdt:P1687 ?property.
?property wikibase:propertyType wikibase:ExternalId;
wdt:P1921 ?rdfFormatterUri.
}
GROUP BY ?dataset ?lodCloudId
' \
https://query.wikidata.org/sparql \
| tail -n+2 \
| while IFS=$'\t' read -r propertyIds_ lodCloudId; do
lodCloudId=${lodCloudId#'"'}
lodCloudId=${lodCloudId%'"'}
propertyIds_=${propertyIds_#'"'}
propertyIds_=${propertyIds_%'"'}
IFS='|' read -ra propertyIds <<< "$propertyIds_"
total=0
for propertyId in "${propertyIds[@]}"; do
count=$(
curl -s \
-H 'Accept: text/tab-separated-values' \
--data-urlencode query="
SELECT (?asStatement + ?asQualifier + ?asReference AS ?count)
WITH {
SELECT (COUNT(*) AS ?asStatement) WHERE {
?subject ps:$propertyId [].
}
} AS %asStatement
WITH {
SELECT (COUNT(*) AS ?asQualifier) WHERE {
?subject pq:$propertyId [].
}
} AS %asQualifier
WITH {
SELECT (COUNT(*) AS ?asReference) WHERE {
?subject pr:$propertyId [].
}
} AS %asReference
WHERE {
INCLUDE %asStatement.
INCLUDE %asQualifier.
INCLUDE %asReference.
}
" \
https://query.wikidata.org/sparql \
| tail -n+2
)
((total+=count))
done
printf '%s\t%s\t%s\n' "$propertyIds_" "$total" "$lodCloudId"
done \
| sort -t$'\t' -k2 -rn \
| while IFS=$'\t' read -r propertyIds_ count lodCloudId; do
IFS='|' read -ra propertyIds <<< "$propertyIds_"
printf '; %s ({{#commaseparatedlist:{{P|%s}}' "$lodCloudId" "${propertyIds[0]}"
if ((${#propertyIds[@]} > 1)); then
printf '|{{P|%s}}' "${propertyIds[@]:1}"
fi
printf '}}): %d\n' "$count"
done
Data as of 2021-04-30T12:04:00UTC:
- doi (DOI (P356))
- 27367486
- viaf (VIAF ID (P214))
- 6148897
- freebase (Freebase ID (P646))
- 4397020
- geonames-semantic-web (GeoNames ID (P1566))
- 3747452
- uniprotkb (UniProt protein ID (P352), UniProt journal ID (P4616))
- 2537481
- uniprot (UniProt protein ID (P352), UniProt journal ID (P4616))
- 2537481
- dnb-gemeinsame-normdatei (GND ID (P227))
- 1918745
- lcsh (Library of Congress authority ID (P244))
- 1338101
- data-bnf-fr (Bibliothèque nationale de France ID (P268))
- 895058
- idreffr (IdRef ID (P269))
- 528040
- oclc-fast (FAST ID (P2163))
- 502221
- zitgist-musicbrainz (MusicBrainz release ID (P5813), MusicBrainz event ID (P6423), MusicBrainz area ID (P982), MusicBrainz instrument ID (P1330), MusicBrainz series ID (P1407), MusicBrainz recording ID (P4404), MusicBrainz artist ID (P434), MusicBrainz work ID (P435), MusicBrainz release group ID (P436), MusicBrainz label ID (P966), MusicBrainz place ID (P1004))
- 490029
- open-library (Open Library ID (P648))
- 271983
- libris (Libris-URI (P5587), SELIBR ID (P906), LIBRIS editions (P1182))
- 258242
- datos-bne-es (National Library of Spain ID (P950))
- 207362
- eunis (EUNIS ID for species (P6177))
- 179098
- swedish-open-cultural-heritage (Swedish Open Cultural Heritage URI (P1260))
- 164146
- europeana-sparql (Europeana entity (P7704))
- 160460
- national-diet-library-authorities (NDL Authority ID (P349))
- 124052
- chembl-rdf (ChEMBL ID (P592))
- 100223
- bioportal-msh (MeSH term ID (P6680), MeSH concept ID (P6694), MeSH descriptor ID (P486), MeSH tree code (P672))
- 77724
- bioportal-mesh-owl (MeSH term ID (P6680), MeSH concept ID (P6694), MeSH descriptor ID (P486), MeSH tree code (P672))
- 77714
- babelnet (BabelNet ID (P2581))
- 65942
- bag (BAG building ID (P5208), BAG residence ID (P981), BAG public space ID (P5207))
- 61218
- bluk-bnb (BNB person ID (P5361))
- 42519
- gemeenschappelijke-thesaurus-audiovisuele-archieven (GTAA ID (P1741))
- 39556
- data-persee-fr (Persée author ID (P2732), Persée journal ID (P2733))
- 34064
- getty-tgn (Getty Thesaurus of Geographic Names ID (P1667))
- 29226
- rism (RISM ID (P5504))
- 27329
- getty-aat (Art & Architecture Thesaurus ID (P1014))
- 25080
- gutenberg (Project Gutenberg ebook ID (P2034), Project Gutenberg author ID (P1938))
- 19945
- BVMC (BVMC person ID (P2799), BVMC work ID (P3976))
- 16483
- yso (YSO ID (P2347))
- 14180
- glottolog (Glottolog code (P1394))
- 10800
- clld-glottolog (Glottolog code (P1394))
- 10800
- linked-open-numbers (KIT Linked Open Numbers ID (P5176))
- 10327
- hungarian-national-library-catalog (NSZL name authority ID (P3133))
- 6825
- pleiades (Pleiades ID (P1584))
- 5625
- sandrart-net (Sandrart.net person ID (P1422), Sandrart.net artwork ID (P4380))
- 5172
- thesesfr (Theses.fr person ID (P4285))
- 4370
- bbc-programmes (BBC programme ID (P827))
- 4202
- eurovoc (EuroVoc ID (P5437))
- 3068
- stw-thesaurus-for-economics (STW Thesaurus for Economics ID (P3911))
- 2663
- nomisma_org (Nomisma ID (P2950))
- 1801
- bioportal-snomedct (SNOMED CT ID (P5806))
- 801
- iptc-newscodes (IPTC NewsCode (P5429))
- 782
- bioportal-unitsontology (wurvoc.org measure ID (P3328))
- 610
- cpv-2008 (Common Procurement Vocabulary (P5417), CPV Supplementary (P8984))
- 422
- msc (Mathematics Subject Classification ID (P3285))
- 210
- naics-2012 (NAICS code (P3224))
- 45
- warsampo (FI WarSampo person ID (P3817))
- 17
Additionally, the following two datasets are “special” and I haven’t yet figured out a better way to represent them:
- lexvo (ISO 639-3 code (P220))
- 8300
- ocd (Italian Chamber of Deputies dati ID (P1341))
- 12881