Property talk:P10999

From Wikidata
Jump to navigation Jump to search

Documentation

website title extract pattern
a regular expression extracting a probable label from the title tag of a website
Has qualitycase sensitive (Q257869)
Data typeString
Allowed values.*(?!>\\)\((?!\?:).*|
ExampleIMDb ID (P345)^(.*)\s-\sIMDb$
X username (P2002)[^\)+\)\s\/\sTwitter$ ^(.+)\s\(@[^\)]+\)\s\/\sTwitter$]
MusicBrainz artist ID (P434)^(.+)\s-\sMusicBrainz$
Formatter URLhttps://regex101.com/?regex=$1
See alsoURL match pattern (P8966), URL match replacement value (P8967)
Lists
Proposal discussionProposal discussion
Current uses
Total1,048
Qualifier1,047>99.9% of uses
Reference1<0.1% of uses
Search for values
[create Create a translatable help page (preferably in English) for this property to be included here]
Scope is as qualifier (Q54828449): the property must be used by specified way only (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303).
List of violations of this constraint: Database reports/Constraint violations/P10999#Scope, SPARQL
Format “.*(?!>\\)\((?!\?:).*|: value must be formatted using this pattern (PCRE syntax). (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303).
List of violations of this constraint: Database reports/Constraint violations/P10999#Format, SPARQL
Allowed entity types are Wikibase property (Q29934218): the property may only be used on a certain entity type (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303).
List of violations of this constraint: Database reports/Constraint violations/P10999#Entity types
Single best value: this property generally contains a single value. If there are several, one would have preferred rank (Help)
Exceptions are possible as rare values may exist. Exceptions can be specified using exception to constraint (P2303).
List of violations of this constraint: Database reports/Constraint violations/P10999#single best value, SPARQL

no value[edit]

@Dhx1: how should i interpret no value. Does it mean:

  • the <title> is good as-it-is
  • the <title> is useless, ignore it

I would personally assume it means the former, since the last can be expressed with ^(.+)$. But i really didn't think about it 😅 – Shisma (talk) 16:15, 27 August 2022 (UTC)[reply]

If a title didn't contain any suitable label information, I used <no value>. Dhx1 (talk) 19:53, 27 August 2022 (UTC)[reply]
ok, agreed – Shisma (talk) 09:44, 28 August 2022 (UTC)[reply]

multiple capture groups[edit]

titles of MyAnimeList character ID (P4085) contain more than one alias of things. those could be captured with multiple groups. Loominade (talk) 11:19, 2 September 2022 (UTC)[reply]

Matching Unicode characters[edit]

@Shisma: Would you be able to test the website title extract pattern (P10999) regular expression I've added for OpenStreetMap way ID (P10689). The title of pages at the formatter URL contain \u202A and \u202C Unicode characters for left-to-right formatting but I'm unsure if using escaped \x.. non-printable characters will break Wikidata for Web (Q99894727). Another option is to use \u.... in the regular expression but I think this would only work for PCRE users if PCRE was being used in Unicode mode. For Javascript as implemented by Firefox, both \x.. and \u.... appear to be usable per [1]. Dhx1 (talk) 12:38, 13 September 2022 (UTC)[reply]

this works for me. But indeed it wouldn't work in php implementations. I don't know what to do in this case 🤷 –Shisma (talk) 16:13, 13 September 2022 (UTC)[reply]
if all fails, we could also do ^(?:Way|Node): .(.+). \(.\d+.\) \| OpenStreetMap$. Less precise but the rest of the pattern is already very specific – Shisma (talk) 16:15, 13 September 2022 (UTC)[reply]

Language[edit]

How can one specify that the extracted label is valid only for a given language?

For instance, here the extracted label should be valid only for the Simplified Chinese language, and here it should be valid only for the French language. Horcrux (talk) 13:37, 20 October 2022 (UTC)[reply]

@Horcrux: I'm facing the same issue with OpenHistoricalMap relation ID (P8424) – the title format depends on the user's preferred language. For now, I've specified the same URL match pattern (P8966) multiple times, each time qualified by a different website title extract pattern (P10999) and language of work or name (P407), but that's bound to break something... Minh Nguyễn 💬 17:58, 29 October 2022 (UTC)[reply]
I've added language of work or name (P407) as allowed qualifiers constraint (Q21510851) for URL match pattern (P8966). --Horcrux (talk) 09:34, 3 January 2023 (UTC)[reply]