Wikidata:Property proposal/title match pattern

From Wikidata
Jump to navigation Jump to search

website title match pattern[edit]

Originally proposed at Wikidata:Property proposal/Authority control

Descriptiona regular expression extracting a probable label from the <title /> of a website
Data typeString
Domainproperty
Allowed valuesregular expression with a single capture group
Example 1IMDb ID (P345)URL match pattern (P8966)^https?:\/\/(?:(?:www|m)\.)?imdb\.com\/(?:(?:search\/)?title(?:\?companies=|\/)|name\/|event\/|news\/|company\/|list\/)(\w{2}\d+)title match pattern^(.*)\s-\sIMDb$
Example 2Twitter (X) username (P2002)URL match pattern (P8966)^https?:\/\/(?:mobile\.)?twitter\.com\/(?:intent.+screen_name=)?(?!home|hashtag|explore|settings)([0-9A-Za-z_]{1,15})title match pattern^(.+)\s\(@[^\)]+\)\s\/\sTwitter$
Example 3MusicBrainz artist ID (P434)URL match pattern (P8966) ^https?:\/\/musicbrainz\.org\/artist\/([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}) title match pattern^(.+)\s-\sMusicBrainz$

Motivation[edit]

Wikidata for Web (Q99894727) is a browser extension that recognises websites that have the value external id property on wikidata from it's url using the URL match pattern (P8966) property.


It is also able to create external id statements and add them to a user defined (or new entity)

example

the user, however has to enter the label of an existing entity manually. Most websites already carry an appropreate label in their <title/>. it is however usually diluted with some website specific words, that are most likely not part of the label.

The Twitter Profile of Tim Berners-Lee for example has a title element that looks like this: <title>Tim Berners-Lee (@timberners_lee) / Twitter</title>

In order to find the wikidata label we only need what ever precedes the opening bracket. A regular expression to extract that string could be ^(.+)\s\(@[^\)]+\)\s\/\sTwitter

This property would be meant to be used as a qualifier for URL match pattern (P8966) (see examples) --Shisma (talk) 12:53, 18 June 2022 (UTC)[reply]

Discussion[edit]

  •  Comment I'm not sure why you need this for an existing entry - they should already have a label? But I could see this being useful for new entities - is that what you meant here? ArthurPSmith (talk) 16:09, 20 June 2022 (UTC)[reply]
for existing items this would be merely a convenience feature. Often the title (the relevant part im trying to extract) of the thing is a 1:1 match to some wikidata label/alias. I could also use this property to add subject named as (P1810) to each statement or to add aliases-Shisma (talk) 06:37, 21 June 2022 (UTC)[reply]