Wikidata:Property proposal/Author first names
first name(s) stated as[edit]
Originally proposed at Wikidata:Property proposal/Generic
Description | qualifier for string representation of given or secondary sorting portion of a name as represented in a bibliographic reference file (for example BibTeX) |
---|---|
Data type | String |
Template parameter | 'first-N' parameter in en:Template:Cite Q |
Domain | property |
Allowed values | any string that may appear in a name (including spaces and periods) |
Example 1 | The South Pole Telescope (Q55893751) author name string (P2093) → John Ruhl → 'John' |
Example 2 | The South Pole Telescope (Q55893751) author name string (P2093) → Peter A. R. Ade → 'Peter A. R.' |
Example 3 | Prostetic Rehabilitation of an Eye Globe: Case Report (Q89819389) author name string (P2093) → Clovis Lamartine de Moraes Melo Neto → 'Clovis Lamartine' |
Example 4 | Tear Strength Analysis of MDX4-4210 and A-2186 Silicones with Different Intrinsic Pigments Incorporated by Mechanical and Industrial Methods (Q92616544) author (P50) → Marcelo Coelho Goiato → 'Marcelo Coelho' |
Example 5 | An EAR-motif-containing ERF transcription factor affects herbivore-induced signaling, defense and resistance in rice. (Q52725820) author name string (P2093) → Yonggen Lou → 'Yonggen' (N.B. author's entry is Lu Yonggen (Q9116274) in Chinese name order) |
Source | Bibtex references |
Planned use | Implementation in Template:Cite Q |
Robot and gadget jobs | I will propose a task for Pi bot that will populate this |
See also | object named as (P1932) |
Single-value constraint | yes |
Motivation[edit]
We have been working on Template:Cite Q improvements over the last few months. One request has been particularly challenging: how do we go from author names in the 'First Last' format to 'Last, First'? This is particularly important so that we can match different citation styles in use in articles, which seems to be a blocking issue for using Cite Q more widely.
We currently store author names in object named as (P1932), however it is impossible to automatically determine the first/last name parts of these strings. The good news is that this information is held in the bibtex references for the publications, so we can import it from there, but we need to have a suitable property to import it to.
This would be set as a qualifier of author name string (P2093) and author (P50) (it is important that it is within the publication item due to technical limitations with fetching values from items linked by author (P50)). Only one qualifier would be used for each author, multiple first names would be contained in a single value.
Values would be imported by bot (I will propose a bot task to do this if this property is accepted). It is accompanied by a property proposal for the last name(s) (Wikidata:Property proposal/Author last names). It could either supplement or replace object named as (P1932) (I have no preference either way).
Thanks. Mike Peel (talk) 18:26, 28 December 2020 (UTC)
The Source MetaData WikiProject does not exist. Please correct the name. and The Source MetaData/More WikiProject does not exist. Please correct the name.
Discussion[edit]
- I would rather add "First names string" property, so it may also be used in Wikidata items directly when given name (P735) doesn't have a corresponding Wikidata item. Adamant.pwn (talk) 18:53, 28 December 2020 (UTC)
- P. S. it's kind of inconvenient to discuss this and Wikidata:Property proposal/Author last names separately, is there anything that can be done about it? Adamant.pwn (talk) 18:58, 28 December 2020 (UTC)
- I wasn't sure how to propose two properties at once, I'm happy for them to be merged if that's possible. Thanks. Mike Peel (talk) 19:05, 28 December 2020 (UTC)
- I think we should have "situation-dependent name string" for human (Q5). For example, we need to write:
- "Lionel Messi" (first name + last name) to Lionel Messi (Q615) Template:Infobox football biography (Q5616966) header but "Messi" (last name) to Template:FC Barcelona squad (Q6584713);
- "Cristiano Ronaldo" (two first names) to Cristiano Ronaldo (Q11571) Template:Infobox football biography (Q5616966) header but "Ronaldo" (one of two first names) to Template:Juventus F.C. squad (Q8487403);
- "Pelé" (nickname) to Pelé (Q12897) Template:Infobox football biography (Q5616966) header and "Pelé" (same nickname) to Template:Brazil squad 1958 FIFA World Cup (Q6395321). Сидик из ПТУ (talk) 20:15, 28 December 2020 (UTC)
- P. S. it's kind of inconvenient to discuss this and Wikidata:Property proposal/Author last names separately, is there anything that can be done about it? Adamant.pwn (talk) 18:58, 28 December 2020 (UTC)
- Support Will be useful for author name string (P2093) case. Сидик из ПТУ (talk) 20:18, 28 December 2020 (UTC)
- Support NMaia (talk) 12:38, 29 December 2020 (UTC)
- Oppose given name (P735) and family name (P734) should be used to get the data about the names. ChristianKl ❪✉❫ 22:17, 29 December 2020 (UTC)
- It doesn't work for some languages. For example, we have three Russian words for Michael (Q4927524) (Майкл / Михаэль / Микаэль) since in Russian, pronunciation is taken into account when translating names (Майкл for English, Михаэль for Deutsch). You can also see on interwiks in which other languages the names of Michael Schumacher (Q9671) and Michael Owen (Q128829) are spelled differently. There are similar examples with Russian names in English and Deutsch: Vladimir (Q2253934) (Vladimir / Wladimir), Sergey (Q12902079) (Sergey / Sergei / Sergej), Dmitry (Q19002866) (Dmitri / Dmitrii / Dmitriy / Dimtri / Dimitry / Demitri / Dmitrij). Сидик из ПТУ (talk) 08:27, 30 December 2020 (UTC)
- @ChristianKl: Nope, that won't work. They have to be within the article item to be retrieved, since there is a limit of 400 Wikidata items that can be loaded per page. They also need to contain the full string - e.g., for the case of multiple first names you can't list the order with series ordinal (P1545) as you can't have qualifiers of qualifiers. Thanks. Mike Peel (talk) 12:41, 31 December 2020 (UTC)
- Comment Do we have a database of "bibtex references for the publications" somewhere? This seems highly unlikely. I don't see how this property would ever be widely enough supported to be generally useful. Maybe for some limited subset of published works, but as a general solution it seems quite infeasible. If you look for example at PubMed records, they list author names as strings in a rather wide variety of manners - "First Last", "F. Last", "Last F", "Last, First", etc, (and with middle names and multi-component last names it gets much worse) and that's a database that has at least some level of curation. It's just very hard to handle the huge variety of source materials that are out there in a reliably consistent fashion. I agree that something like Сидик из ПТУ's "situation-dependent name string" might be a better step here. ArthurPSmith (talk) 20:21, 30 December 2020 (UTC)
- @ArthurPSmith: At least in astronomy there is Astrophysics Data System (Q752099), which provides Bibtex references like [1]. Probably it is going to be slightly different between different databases, so it's going to be tricky to do this for all article items immediately, but it should be possible in the longer term, even if we have to come up with some interface that lets humans select which format has been used for each item and then a script can fill it in based on that. Thanks. Mike Peel (talk) 12:41, 31 December 2020 (UTC)
- If you try adding a citation via Citoid starting from the document's url, for example, you will almost invariably get first name, last name parsed accurately. The information is out there, and if Citoid can retrieve it, then there's no reason why a bot can't do the same. PubMed throws away author information because it's an aggregator service and it's simpler to do that for its purposes, which is why it's not a good starting point for creating citations. --RexxS (talk) 16:39, 7 January 2021 (UTC)
- @ArthurPSmith: At least in astronomy there is Astrophysics Data System (Q752099), which provides Bibtex references like [1]. Probably it is going to be slightly different between different databases, so it's going to be tricky to do this for all article items immediately, but it should be possible in the longer term, even if we have to come up with some interface that lets humans select which format has been used for each item and then a script can fill it in based on that. Thanks. Mike Peel (talk) 12:41, 31 December 2020 (UTC)
- Support While being here, I would also suggest separate properties for middle names, as well as for pre- and postfixes. Easier to introduce them now while the WD model is still very much work in progress than at a later point in time... Regarding the property names, I suggest to avoid the words "first" and "last" in them because they imply a certain Western naming notation and even there depend on context. While still not perfect, the "given name"/"surname" pair is semantically better and suits more cases without implying a particular display order, see f.e.
- (matthiaspaul) --92.209.72.111 20:11, 7 January 2021 (UTC)
- The idea with these properties is that middle/pre/post-fixes are stored within the 'first' and 'last' name strings, which is how it is commonly done in bibtex for references, and in reference templates on-wiki (e.g., Citation expects firstN/lastN parameters). Splitting them out into different properties here would add more complexity than I think we need, and it can be done in Lua if needed. Naming them as 'given name'/'surname' would also invite more complexity than is needed (e.g., second surnames). Let's keep things as simple as possible given the situation please. Thanks. Mike Peel (talk) 17:19, 8 January 2021 (UTC)
- Support Lastname, Firstname has been the major barrier to the use of cite Q. Th use of this would start to override the objections MargaretRDonald (talk) 21:32, 29 April 2021 (UTC)
- Comment Adjusted description to match updated description for last name proposal. Note wikidata usage instructions should indicate: Only use as qualifiers of author name string (P2093) and author (P50) ArthurPSmith (talk) 18:10, 2 May 2021 (UTC)
- Done — Martin (MSGJ · talk) 20:22, 28 June 2021 (UTC)