Wikidata:Property proposal/Author last names
last name(s) stated as
[edit]Originally proposed at Wikidata:Property proposal/Generic
Description | qualifier to provide string representation of family or primary sorting portion of name as represented in a bibliographic reference file (for example BibTeX) |
---|---|
Data type | String |
Template parameter | 'last-N' parameter in en:Template:Cite Q |
Domain | work (Q386724) |
Allowed values | any string that may appear in a name (including spaces and periods) |
Example 1 | The South Pole Telescope (Q55893751) author name string (P2093) → John Ruhl → 'Ruhl' |
Example 2 | The South Pole Telescope (Q55893751) author name string (P2093) → Peter A. R. Ade → 'Ade' |
Example 3 | Prostetic Rehabilitation of an Eye Globe: Case Report (Q89819389) author name string (P2093) → Clovis Lamartine de Moraes Melo Neto → 'de Moraes Melo Neto' |
Example 4 | Tear Strength Analysis of MDX4-4210 and A-2186 Silicones with Different Intrinsic Pigments Incorporated by Mechanical and Industrial Methods (Q92616544) author (P50) → Marcelo Coelho Goiato → 'Goiato' |
Example 5 | An EAR-motif-containing ERF transcription factor affects herbivore-induced signaling, defense and resistance in rice. (Q52725820) author name string (P2093) → Yonggen Lou → 'Lou' (N.B. author's entry is Lu Yonggen (Q9116274) in Chinese name order) |
Source | Bibtex references |
Planned use | Implementation in Template:Cite Q |
Robot and gadget jobs | I will propose a task for Pi bot that will populate this |
See also | object named as (P1932) |
Single-value constraint | yes |
Motivation
[edit]We have been working on Template:Cite Q improvements over the last few months. One request has been particularly challenging: how do we go from author names in the 'First Last' format to 'Last, First'? This is particularly important so that we can match different citation styles in use in articles, which seems to be a blocking issue for using Cite Q more widely.
We currently store author names in object named as (P1932), however it is impossible to automatically determine the first/last name parts of these strings. The good news is that this information is held in the bibtex references for the publications, so we can import it from there, but we need to have a suitable property to import it to.
This would be set as a qualifier of author name string (P2093) and author (P50) (it is important that it is within the publication item due to technical limitations with fetching values from items linked by author (P50)). Only one qualifier would be used for each author, multiple surnames would be contained in a single value.
Values would be imported by bot (I will propose a bot task to do this if this property is accepted). It is accompanied by a property proposal for the first name(s) (Wikidata:Property proposal/Author first names). It could either supplement or replace object named as (P1932) (I have no preference either way).
Thanks. Mike Peel (talk) 18:26, 28 December 2020 (UTC)
The Source MetaData WikiProject does not exist. Please correct the name. and The Source MetaData/More WikiProject does not exist. Please correct the name.
Discussion
[edit]- I would rather add "Last names string" property, so it may also be used in Wikidata items directly when family name (P734) doesn't have a corresponding Wikidata item. Adamant.pwn (talk) 18:53, 28 December 2020 (UTC)
- I think we should have "situation-dependent name string" for human (Q5). For example, we need to write:
- "Lionel Messi" (first name + last name) to Lionel Messi (Q615) Template:Infobox football biography (Q5616966) header but "Messi" (last name) to Template:FC Barcelona squad (Q6584713);
- "Cristiano Ronaldo" (two first names) to Cristiano Ronaldo (Q11571) Template:Infobox football biography (Q5616966) header but "Ronaldo" (one of two first names) to Template:Juventus F.C. squad (Q8487403);
- "Pelé" (nickname) to Pelé (Q12897) Template:Infobox football biography (Q5616966) header and "Pelé" (same nickname) to Template:Brazil squad 1958 FIFA World Cup (Q6395321). Сидик из ПТУ (talk) 20:15, 28 December 2020 (UTC)
- @Adamant.pwn: the solution when Wikidata hasn't already the corresponding Wikidata item for a name is easy. Create it. ChristianKl ❪✉❫ 10:31, 2 January 2021 (UTC)
- I think we should have "situation-dependent name string" for human (Q5). For example, we need to write:
- Support Will be useful for author name string (P2093) case. Сидик из ПТУ (talk) 20:05, 28 December 2020 (UTC)
- Support NMaia (talk) 12:38, 29 December 2020 (UTC)
- Question Can't that info be obtained from given name (P735) and family name (P734), and other properties like patronym or matronym (P5056) and second family name in Spanish name (P1950)? --Tinker Bell ★ ♥ 01:57, 30 December 2020 (UTC)
- It doesn't work for some languages. For example, we have three Russian words for Michael (Q4927524) (Майкл / Михаэль / Микаэль) since in Russian, pronunciation is taken into account when translating names (Майкл for English, Михаэль for Deutsch). You can also see on interwiks in which other languages the names of Michael Schumacher (Q9671) and Michael Owen (Q128829) are spelled differently. Сидик из ПТУ (talk) 08:08, 30 December 2020 (UTC)
- @Сидик из ПТУ: The current proposal looks to me like it intends to store names written in one alphabet and not multiple one's. If there's an intended distinction between alphabets the datatype of string instead of monolingual string (with datatypes like mul-lat) would be more appropriate. ChristianKl ❪✉❫ 10:16, 2 January 2021 (UTC)
- Your approach with given name (P735) won't work anyway for authors like Alekseĭ Glandin (Q94406076) (theare are 3 variants of his name here). For persons with item we must have the analogue of short name (P1813) for all languages so I don't see anything incredible that something like this would be needed for author name string (P2093). Сидик из ПТУ (talk) 10:12, 4 January 2021 (UTC)
- And in this proposal we are talking about cases when we need to make something like "Ccc, A. B.", "Bbb Ccc, A." or "Ccc, Aaa Bbb" from "Aaa Bbb Ccc" string of author name string (P2093), depending on whether "Bbb" is a first or a last name. Сидик из ПТУ (talk) 11:00, 4 January 2021 (UTC)
- @Сидик из ПТУ: The current proposal looks to me like it intends to store names written in one alphabet and not multiple one's. If there's an intended distinction between alphabets the datatype of string instead of monolingual string (with datatypes like mul-lat) would be more appropriate. ChristianKl ❪✉❫ 10:16, 2 January 2021 (UTC)
- @Tinker Bell: Possibly that could be used as an input to populate the proposed property, but they can't be used instead of the property for several reasons. First, the information has to be in the item for the article, not in subarticles, due to the 400 item limit when loading Wikidata items in Lua. Second, the author name strings won't always have matching items, particularly as they have to combine multiple last names, and and they may differ between items for the same person as object named as (P1932) does. Thanks. Mike Peel (talk) 12:34, 31 December 2020 (UTC)
- We generally follow in Wikidata the principle that minimizing storage data is more important then minimizing page rendering time. I opened a post on the https://www.wikidata.org/wiki/Wikidata:Contact_the_development_team#Is_the_400_Wikidata_items_per_page_limit_a_good_idea? about it (and the answer should likely be awaited before deciding on this proposal). ChristianKl ❪✉❫ 10:32, 2 January 2021 (UTC)
- @ChristianKl: That doesn't make sense when using Wikidata information in the wikimedia projects. I already get complaints on Commons about the speed of the infobox there, and the more items you have to load then the slower things get. Thanks. Mike Peel (talk) 11:31, 2 January 2021 (UTC)
- It's a bit unclear to me why this can't be cached. ChristianKl ❪✉❫ 12:21, 2 January 2021 (UTC)
- It doesn't matter why it's unclear to you. The reality is that it isn't cached and that's the situation we are working with. At present, we can serve up to 400 citations in a page if we fetch the information from Wikidata. If we have have to lookup information from each author's entry on Wikidata for each source as well, that will rapidly shrink the number of citations we can serve per page, making Wikidata unsuitable as a repository for citations. That's a huge waste of the potential usage for 25,000,0000+ entries on Wikidata, for no good reason that I can ascertain. --RexxS (talk) 16:32, 7 January 2021 (UTC)
- It's a bit unclear to me why this can't be cached. ChristianKl ❪✉❫ 12:21, 2 January 2021 (UTC)
- @ChristianKl: That doesn't make sense when using Wikidata information in the wikimedia projects. I already get complaints on Commons about the speed of the infobox there, and the more items you have to load then the slower things get. Thanks. Mike Peel (talk) 11:31, 2 January 2021 (UTC)
- We generally follow in Wikidata the principle that minimizing storage data is more important then minimizing page rendering time. I opened a post on the https://www.wikidata.org/wiki/Wikidata:Contact_the_development_team#Is_the_400_Wikidata_items_per_page_limit_a_good_idea? about it (and the answer should likely be awaited before deciding on this proposal). ChristianKl ❪✉❫ 10:32, 2 January 2021 (UTC)
- It doesn't work for some languages. For example, we have three Russian words for Michael (Q4927524) (Майкл / Михаэль / Микаэль) since in Russian, pronunciation is taken into account when translating names (Майкл for English, Михаэль for Deutsch). You can also see on interwiks in which other languages the names of Michael Schumacher (Q9671) and Michael Owen (Q128829) are spelled differently. Сидик из ПТУ (talk) 08:08, 30 December 2020 (UTC)
- In how many items are you planning to import this via bot? Only those used with citeQ or all tens of millions of academic papers? In what alphabets will data be added? ChristianKl ❪✉❫ 10:29, 2 January 2021 (UTC)
- @ChristianKl: Ultimately, all of them. Existing Cite Q uses might be prioritized, but the template should work with any paper item. It should probably be stored in the alphabet used in the paper, but mostly my work would focus on latin languages anyway. Thanks. Mike Peel (talk) 11:31, 2 January 2021 (UTC)
- Comment A more informative name for the property might be "last name stated as", making clear the analogy with object named as (P1932) as to how the property should be used. Jheald (talk) 17:22, 4 January 2021 (UTC)
- @Jheald: Agreed, adopted. Thanks. Mike Peel (talk) 21:14, 5 January 2021 (UTC)
- Comment I appreciate Mike's efforts in this regard, but I fear this is not going to resolve the problem, only kick it further down the road. We have names with Tussenvogels, singular names, and others that don't fit the "first last" pattern. I also agree with Jheald, regarding "stated as". Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:06, 5 January 2021 (UTC)
- @Pigsonthewing: Yes, and this proposal should be able to cope with this? Or should we just give up on Cite Q development? Thanks. Mike Peel (talk) 21:14, 5 January 2021 (UTC)
- False dichotomy. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:52, 7 January 2021 (UTC)
- If a source has each editor name with a qualifier "Author last names" and "Author first names", we can use them to generate a citation using
firstN
andlastN
. If the author has a mononym then we can uselastN
or fallback toauthorN
. Having both properties makes the information set larger, while preserving the existing subset, and that's no barrier to coding. As for a "false dichotomy", I'm somewhat in disagreement. I doubt that much more meaningful development can be done on CiteQ until we have a reliable means of generating first and last names. --RexxS (talk) 16:24, 7 January 2021 (UTC)
- If a source has each editor name with a qualifier "Author last names" and "Author first names", we can use them to generate a citation using
- False dichotomy. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:52, 7 January 2021 (UTC)
- @Pigsonthewing: Yes, and this proposal should be able to cope with this? Or should we just give up on Cite Q development? Thanks. Mike Peel (talk) 21:14, 5 January 2021 (UTC)
- Support While being here, I would also suggest separate properties for middle names, as well as for pre- and postfixes. Easier to introduce them now while the WD model is still very much work in progress than at a later point in time... Regarding the property names, I suggest to avoid the words "first" and "last" in them because they imply a certain Western naming notation and even there depend on context. While still not perfect, the "given name"/"surname" name pair is semantically better and suits more cases without implying a particular display order, see f.e.
- (matthiaspaul) --92.209.72.111 20:09, 7 January 2021 (UTC)
- The idea with these properties is that middle/pre/post-fixes are stored within the 'first' and 'last' name strings, which is how it is commonly done in bibtex for references, and in reference templates on-wiki (e.g., Citation expects firstN/lastN parameters). Splitting them out into different properties here would add more complexity than I think we need, and it can be done in Lua if needed. Naming them as 'given name'/'surname' would also invite more complexity than is needed (e.g., second surnames). Let's keep things as simple as possible given the situation please. Thanks. Mike Peel (talk) 17:19, 8 January 2021 (UTC)
- @Mike Peel: Not every localized version of en:Template:Citation works the same way. For example, editors are advised to pass
|author=
rather than|first=
and|last=
into vi:Bản mẫu:Chú thích in the case of a Vietnamese or Chinese name. – Minh Nguyễn 💬 08:51, 15 January 2021 (UTC)
- @Mike Peel: Not every localized version of en:Template:Citation works the same way. For example, editors are advised to pass
- The idea with these properties is that middle/pre/post-fixes are stored within the 'first' and 'last' name strings, which is how it is commonly done in bibtex for references, and in reference templates on-wiki (e.g., Citation expects firstN/lastN parameters). Splitting them out into different properties here would add more complexity than I think we need, and it can be done in Lua if needed. Naming them as 'given name'/'surname' would also invite more complexity than is needed (e.g., second surnames). Let's keep things as simple as possible given the situation please. Thanks. Mike Peel (talk) 17:19, 8 January 2021 (UTC)
- Comment There are already a bunch of name-related properties, not just given and family name, and it isn't always obvious how to arrange these properties either in running text or bibliographically. For example, consider the names in Wikidata:Property proposal/Vietnamese middle name. While I'm encouraged that this proposal calls for the property to be used only as a qualifier on author name string (P2093), I'm concerned that it doesn't explicitly order the name parts in the case of author (P50). Consumers like en:Template:Cite Q would benefit from a more explicit "bibliographic name" and/or "sorting name" qualifier. (Its Vietnamese translation would benefit greatly because it wouldn't need to guess whether to display the name as "Family, Given Middle" or "Family Middle Given" based on the author's ethnicity or the original language of the non-anglicized form of their first name.) A "bibliographic name" property wouldn't be mutually exclusive of the property proposed here, but it seems necessary for achieving what seems to be the goal behind the proposal. – Minh Nguyễn 💬 08:51, 15 January 2021 (UTC)
- Support, an important property for people.--Arbnos (talk) 17:56, 21 February 2021 (UTC)
- Support Lastname, Firstname has been the major barrier to the use of cite Q (a template which is quite fantastic). MargaretRDonald (talk) 21:31, 29 April 2021 (UTC)
- @Mike Peel: Can you please respond to the suggestion of a "bibliographic name"/"sorting name" property from Minh Nguyễn above? That seems like an ideal solution to me that handles all potential problems. ArthurPSmith (talk) 17:01, 30 April 2021 (UTC)
- @ArthurPSmith: The convention is always to split into last/first if there is a split for references, which is why I've proposed these properties in that form. If you want a "sorting name" then you could create it by combining the properties. Any chance of creating these properties soon, please? I have some free time this weekend that I could use to start populating them. Thanks. Mike Peel (talk) 19:31, 30 April 2021 (UTC)
- I'm afraid I don't see a complete consensus here; this is not ready. If the source material for this is always going to be BibTeX references, then perhaps the property name should include that? (edit:) Or at least the description, which is currently woefully inadequate. Otherwise I simply find this proposal will be confusing. ArthurPSmith (talk) 19:37, 30 April 2021 (UTC)
- @ArthurPSmith: I'm not sure I understand, there are plenty of support votes here? It's a long-solved problem in academia, but it's not specific to bibtex, just look at the reference section of any paper. The natural thing to do is to split it into last/first parts, as strings, which is what is proposed. Thanks. Mike Peel (talk) 19:42, 30 April 2021 (UTC)
- There are many name systems (some mentioned in this discussion - for more see this page) where "last/first parts" is not a natural subdivision of a person's name. You need to be much more specific about what the purpose and use of this property is. Right now the label and description are far too vague and it will be misused/misunderstood. ArthurPSmith (talk) 19:54, 30 April 2021 (UTC)
- @ArthurPSmith: I'm not sure I understand, there are plenty of support votes here? It's a long-solved problem in academia, but it's not specific to bibtex, just look at the reference section of any paper. The natural thing to do is to split it into last/first parts, as strings, which is what is proposed. Thanks. Mike Peel (talk) 19:42, 30 April 2021 (UTC)
- I'm afraid I don't see a complete consensus here; this is not ready. If the source material for this is always going to be BibTeX references, then perhaps the property name should include that? (edit:) Or at least the description, which is currently woefully inadequate. Otherwise I simply find this proposal will be confusing. ArthurPSmith (talk) 19:37, 30 April 2021 (UTC)
- @ArthurPSmith: Then don't use these properties for those cases? Although I can't see what rule in that link this proposal breaks. I can't see how to simultaneously make this proposal broader to handle more cases and more specific/less vague? Mike Peel (talk) 20:00, 30 April 2021 (UTC)
- @ArthurPSmith: The convention is always to split into last/first if there is a split for references, which is why I've proposed these properties in that form. If you want a "sorting name" then you could create it by combining the properties. Any chance of creating these properties soon, please? I have some free time this weekend that I could use to start populating them. Thanks. Mike Peel (talk) 19:31, 30 April 2021 (UTC)
- @Mike Peel: I have updated the description to cover what I think you are trying to do here. Do you agree with this? Note we also need to have Wikidata usage instructions provided to specify what you had as allowed values: "Use only as qualifier for author name string (P2093) or author (P50)" ArthurPSmith (talk) 20:45, 30 April 2021 (UTC)
- @ArthurPSmith: That looks OK to me, thanks! Mike Peel (talk) 09:07, 1 May 2021 (UTC)
- @Mike Peel: Done — Martin (MSGJ · talk) 20:35, 28 June 2021 (UTC)