Wikidata:Property proposal/URL match pattern

From Wikidata
Jump to navigation Jump to search

URL match pattern[edit]

Return to Wikidata:Property proposal/Generic

   Under discussion
Descriptiona regex pattern of URL that an external ID may be extracted
Data typeString
Domainproperty
Example 1IMDb ID (P345) → (one of multiple values) https:\/\/www\.imdb\.com\/(title|name|news)\/([a-z0-9]+)(\/.*)? <replacement value> \2
Example 2PubMed ID (P698) → https:\/\/pubmed\.ncbi\.nlm\.nih\.gov\/(\d+)(-[^\/]*)?\/ <replacement value> \1
Example 3ISNI (P213) → https?:\/\/www\.isni\.org\/(\d{4})(| |%20)(\d{4})(| |%20)(\d{4})(| |%20)(\d{4}) <replacement value> \1 \3 \5 \7
Example 4ZVG number (P679) → http:\/\/gestis-en\.itrust\.de\/nxt\/gateway\.dll\/gestis_en\/0+([1-9]\d+)\.xml.* <replacement value> \1
Example 5CricketArchive player ID (P2698) → https:\/\/cricketarchive\.com\/Archive\/Players\/\d+\/\d+\/(\d+)\.html <replacement value> \1
Example 6Fandom article ID (P6262) → https:\/\/([a-z0-9\.-]+)\.(wikia|fandom)\.com\/wiki\/(.*) <replacement value> \1:\3
Example 7Geni.com profile ID (P2600) → https:\/\/www\.geni\.com\/(profile|people)\/[^\/]+\/(\d+)(#.*)? <replacement value> \2
See alsoformatter URL (P1630)

URL match replacement value[edit]

   Under discussion
Description(qualifier only) see above
Data typeString
Example 1see above
Example 2MISSING
Example 3MISSING


Motivation[edit]

This will provide a way to extract property and ID from a given URL. A future tool or gadget may benefit from this. GZWDer (talk) 23:46, 26 February 2020 (UTC)

Discussion[edit]

Pictogram voting comment.svg Comment Here's an example of how this would look on Fandom article ID (P6262):

URL match pattern
Normal rank https:\/\/([a-z0-9\.-]+)\.(wikia|fandom)\.com\/wiki\/(.*) Arbcom ru editing.svg edit
URL match replacement value \1:\3
▼ 0 reference
+ add reference


+ add value

If a tool wanted to automatically generate a Fandom article ID (P6262) from the URL https://minecraft.fandom.com/wiki/Sheep for example, it would match the regex specified with property against that URL. There are three caputring groups in the regex. The first one is ([a-z0-9\.-]+), and matches "minecraft", the second one is (wikia|fandom) and matches "fandom", and the third one is (.*) and matches "Sheep". The URL match replacement value allows these capturing groups to be put together. \1:\3 turns into minecraft:Sheep, since \N is replaced with the value of the nth capturing group. --SixTwoEight (talk) 01:52, 4 March 2020 (UTC)