Wikidata talk:WikiProject Books

From Wikidata
Jump to navigation Jump to search
This page contains changes. Please contact a translation admin to mark them for translation.

On this page, old discussions are archived. See: 2013, 2014, 2015, 2016, 2017.

Contents

misuse of NNL work ID (P3959)[edit]

Every instance thus far where I have come across NNL work ID (P3959), it has been misused. This property is supposed to link to work identifiers from work data items, but what I'm finding is that it links to copy records of Hebrew translations from the work data item, which should never happen. --EncycloPetey (talk) 19:22, 5 November 2017 (UTC)

make it clearer in the description? maybe look to add/suggest a constraint that flags the addition/use as incorrect.  — billinghurst sDrewth 23:43, 6 November 2017 (UTC)
I also noticed what you say EncycloPetey, but not reading hebrew, I did not dare remove them. Do you think I should ?
in fact, do you know if there are real work IDs in NNL catalog, or only editions ids (like in many libraries) ? if there are not, maybe rephrase the property as "id for an edition" ? (in all languages) --Hsarrazin (talk) 10:53, 7 November 2017 (UTC)
If you follow the links, you get the library record, which includes the standard library catalog headings in English. Without a single exception, every single record I've come across includes a "...--Translations into Hebrew" header, which means it is not an authority record for the original work. I honestly don't know in the NNL catalog has any work IDs or authority records, but I certainly haven't seen any linked. So the way in with the property is being used does not match its description in Wikidata at all. --EncycloPetey (talk) 13:47, 7 November 2017 (UTC)
mmmmh. no, in fact, I get the standard library catalog heading in hebrew ([http://aleph.nli.org.il/F/?func=direct&con_lng=heb&local_base=nnl01&doc_number=001251501 ex. here from Don Quixote (Q480)'s first NNL link). And when I click on the "English" button to go to the english interface, I get an English page that invites me to make a search, not the En version of the notice. But what I can clearly see is that there is a place set, and a date... which means it is an edition ^^
almost all NNL work ID (P3959) data were added by a single user @זאב קטן: (3614/3743 from NavelGazer data. - which means if we can agree with them, it would be much easier...
is there someone who understands hebrew on the Books project, who could search the database, and maybe help us here ? --Hsarrazin (talk) 14:04, 7 November 2017 (UTC)
Look about halfway down the listings you mentioned (try the second, third, etc items). You'll see catalog information in English such as "Fiction--Translations into Hebrew". I just looked at the listings on Don Quixote (Q480) that you mentioned and the problem is clearly visible to me. I'm not sure why you're not seeing it, and no, you don't have to click on the "English" button to see the cataloging information. The items are standard in English, and are placed among the Hebrew catalog information. --EncycloPetey (talk) 14:54, 7 November 2017 (UTC)
@Nahum, Dovi: Are you or any of your heWS colleagues able to shed light on this topic?  — billinghurst sDrewth 22:01, 7 November 2017 (UTC)
I don't think the NNL has "works" as such in its database, only records of editions. I don't know what "NNL work ID" should link to. I believe the NNL ID property should link to the edition(s) that exist in the library, because that is the place where the information about the work is stored at this library. This at least has been my personal experience with their database.--Nahum (talk) 22:22, 7 November 2017 (UTC)
Then it sounds (and looks to me) as though what the NNL database has are recording of its specific holdings, which would mean specific copies of books rather than works or editions. If that is indeed what it lists, then we have no framework yet for including them on Wikidata. You could argue that the listing could also be treated as editions (or translations), but those would require listing as separate data items rather than inclusion on the data item for the work. --EncycloPetey (talk) 04:08, 8 November 2017 (UTC)
Hello! I'm sorry, I only happened across this just now. I'm bilingual English/Hebrew, and very active in cataloging books with these systems. I'm also doing this in cooperation with the Israel National Library cataloging staff. so, I'll be glad to try to clarify anything, as well as I can.
For now, I'll just say that National Library of Israel ID (P949) is parallel to: VIAF ID (P214) while NNL work ID (P3959) is parallel to: Library of Congress Control Number (LCCN) (bibliographic) (P1144). "authority records" for works are at- National Library of Israel ID (P949). -- Shilonite (talk) 11:22, 13 December 2017 (UTC)
Every use of NNL work ID (P3959) I've come across is using it as parallel to VIAF ID (P214), which (as you say) is incorrect. The name is also confusing, since "work" has a specific meaning in WikiProject_Books --EncycloPetey (talk) 14:20, 15 December 2017 (UTC)
I'm sorry I didn't get back to you till now (i'm not very well...). I understand what you are saying, but please, so we would both ‘be on the same page’, try to examine the cataloging that I have done, and see if it agrees with your method.
If it does, and I understand that we are both working by the same method, then I will go back and correct whatever you find erroneous. If not please point out the differences to me.
For example what's your opinion about this: https://www.wikidata.org/w/index.php?title=Q20278655&diff=next&oldid=600585790 . ...and in general, what Ive done with that book. I'm asking to learn, if perhaps I've misunderstood.
thank you, Shilonite (talk) 11:42, 4 January 2018 (UTC)

Distinguishing the types of digital text[edit]

In book scholarship there are crucial differences between types of editions, serving different scholarly purposes. See for example these course notes. To take a practical example, a Project Gutenberg edition of a book gives you the electronic text but does not show you what the pages looked like, whereas Archive.org or Google Books usually provide visual scans of the book (plus uncorrected OCR). Which you want to use depends on whether your interest is in readable text or in the exact layout and typography (or handwriting) of the original). full work available at (P953) is not specific enough to tell the user what the link will provide, so we need to express these different types. One list given to me by an academic has these core types:

These could be used in instance of (P31) statements, but my interest is in using them to distinguish the different digital versions of a text in the Wikidata entry for an edition. It seems like the most appropriate way to do this is with the qualifier object has role (P3831). See Political Disquisitions (1775 edition) (Q42788256) for an example, where I represent that all 3 volumes of this edition of this book are available from the Oxford Text Archive in a detailed text edition while page scans of individual volumes are available from other sources. I'm feeling my way here, so I welcome other perspectives and improvements. MartinPoulter (talk) 15:37, 7 November 2017 (UTC)

@MartinPoulter: So I don't presume or misread ... can you more express how you see these being used. Are you looking to use these as qualifiers for full work, or are you looking to use these as the base instance of (P31)? Above in your notes are classifications there are various types of new editions or secondary, or maybe tertiary, reproductions.

Then I suppose I am trying to figure how/where reproductions like enWS and Gutenberg's works would fall where I have considered them per the edition published at the time, though are really becoming editions on their own, which does that then mean we shouldn't link them to those editions as they are there own derivatives. [We so need professional guidance]  — billinghurst sDrewth 22:36, 7 November 2017 (UTC)

Hmm, now you talked "digital text" and how are we seeing that as different from version, edition, or translation (Q3331189).  — billinghurst sDrewth 22:41, 7 November 2017 (UTC)
@billinghurst: For my purposes, I'm looking to apply facsimile (Q194070) and Q42794047 as qualifiers for full work available at (P953), though I could imagine annotated edition (Q4769619) and eclectic edition (Q42793760) being in use for instance of (P31) (and possibly there could be exceptions both ways). I'm not proposing to create separate entries for, say, the Oxford Text Archive transcription of an edition. As I understand it, that properly belongs as a full work available at (P953) property in the entry for that edition. I'm looking to tag the full work available at (P953) property in a way that tells the user whether they will get the faithful text of the book or faithful scans of the book. Maybe the "edition" terminology is confusing when used this way, but this is the terminology of academic bibliography. Hope this clarifies. MartinPoulter (talk) 20:48, 14 November 2017 (UTC)

Are we conflating editions and translations; or are we missing translations as their own works?[edit]

To me when we have the creative work (Q17537576) or the variations that we use, we then have edition or translation of (P629). Are we right to combine editions and translations? If we have a translation by an author it gets its own copyright for the translator, and to me that makes it its own creative work (Q17537576) of which then there can be their own editions. I know for the work of Anton Chekhov (Q5685) that enWS has the same works by different translators, so at the work level, are we right to group all the Russian language editions, and the variety of different language translations under the one Property:P629?  — billinghurst sDrewth 22:51, 7 November 2017 (UTC)

technically, a translation is not a creative work (Q17537576), it is a derivative work (Q836950). --Hsarrazin (talk) 19:15, 8 November 2017 (UTC)
What is the gain ? we can always try to complexify the system but for which purpose ? And when I read the difficulties to some contributors to create work and editions items, I think the creation of an additional work item for the translations will jus be a nightmare (one item for the translation as edition, one item for the translated work and one item for the original work). Snipre (talk) 01:04, 8 November 2017 (UTC)
The gain is clarity, consistency, and a disentanglement of conflated concepts that are actually very different. An editor of an edition does a very different job from a translator, and the results of the two processes require different kinds of information. Stuffing them together into a single block has resulted in all sorts of editing headaches. --EncycloPetey (talk) 01:27, 8 November 2017 (UTC)
I understand the conundrum, I have hesitated to post over this matter for months. I am more trying to have an open conversation and deciding to do nothing with our eyes wide open, rather than having to unpick a situation with "Why didn't I say something earlier". In the whole conversations as they have persisted, we have the issue of the conceptual idea (creative work/work/...), to the manifestation/output (edition/book/...). We will continue to be caught by this until we do a far better job explaining this matter.

At the Wikisources we are governed by public domain/free licences, we list at the conceptual level, and reproduce at the manifestation(s). So when we have translations we need to explain the concept of dual licenses. At this point in time we manage all the data and manually apply licenses, though that is not the best way to undertake the curation, especially when it is common data across WP/WS/Commons. Ultimately we should be able to suitably licence translations according to the concept of author and translator irrespective of edition, and one day it will be fed from Wikidata. [And I am probably doing a shithouse job of describing as I need a whiteboard and a marker and to draw pictures, supported by hand-waving, rather than explanation.  — billinghurst sDrewth 03:54, 8 November 2017 (UTC)

@EncycloPetey: Please provide an example how the distinction of editions in original language and translations will help to clarify the situation: just write the relations between the editions in original languages, the translations and the corresponding work items. I did the job with the current system and I will be happy to compare with your simplified system. Snipre (talk) 17:11, 8 November 2017 (UTC)
I don't understand the symbolic language you have used to describe the relations. Please convert your model into prose or some other understandable form, if you would like me to assess it. --EncycloPetey (talk) 17:14, 8 November 2017 (UTC)
@EncycloPetey: You don't need to understand my model, you need to describe once your model, using words, graphics or what is relevant. But try once to put your ideas on the paper and SHOW where the simplicity is. Snipre (talk) 20:39, 27 November 2017 (UTC)
@billinghurst: Ok, you indicate the possible gain even if I don't see clearly what prevent you now to do what you want to do, but I would like to see the cost of that model modification in terms of items relations: can you show us how we would have to link the different editions, translations and works items with your model ? We don't need discussions, we need diagrams to be able to validate a model and that's what is missing now. An ontology follows mathematical rules so discussions are useless: tables, diagrams, systematic descriptions of relations, that's what is important.
You mention some automatic addition of license values to items, this implies bots so you should convert your idea in some programming language. Snipre (talk) 17:11, 8 November 2017 (UTC)
the problem of translations is one of the reasons why FRBR uses 3 levels to describe books (+1 for examplaries, which is not our problem). The need for an intermediate state between the original work and the edition... but the modelling on wikidata seems really difficult, and I'm not sure the linking of editions to the original work through a "translation" level would allow the retrieval of info like "date of creation of the original work" from the edition item. :/
moreover, like billinghurst says, it's already a very difficult task to explain on wikisource how the 2-levels model works… if we have to apply a 3-levels model, it will be nightmare :
and, if it could probably be achieved for books (with a lot of difficulties), it would be absolutely hell for poems and short texts... (--Hsarrazin (talk) 19:15, 8 November 2017 (UTC)

First hack at some cases[edit]

  • A1Y1: a translation of work A1 of author X1 by translator Y1 ... language detail
  • A1Y2: a translation of work A1 of author X1 by translator Y2 ... language detail
  • A2Y1: a translation of work A2 of author X2 by translator Y1 ... language detail
  • A2Y3: a translation of work A2 of author X2 by translator Y3 ... language detail

manifestations of these cases each role into the edition model thereafter, they are just editions (and editions of the translation)

So A1 has editions in the same language or translations into other languages. A1 does not have editions in other languages except via the translations.

the why

So we need a means to identify the one translation of a work, then the variety of places that it appears. Please feel perfectly entitled to update this for clarity. If I can get time at a whiteboard, then I will.  — billinghurst sDrewth 21:22, 8 November 2017 (UTC)

  • Symbol oppose vote.svg Oppose having a new distinct property for translation without at least one good reason, I don't see the problem with the current uses of edition or translation of (P629). Moreover, as @Hsarrazin: pointed it, there is some over-simplifications in the initial statements ; a translation doesn't really have its « own copyright » (see derivative work (Q836950)) and in the others hand, an edition can also be considered as a derivative work (Q836950) and having protection on its own. More importantly, FRBR doesn't care about copyright to distinguish the levels, nor should we. Cdlt, VIGNERON (talk) 09:12, 9 November 2017 (UTC)
    Pictogram voting comment.svg Comment I understand that a translation is a derivative work, even so it does have its own copyright as a creative work. Many pages around that explain this, eg. http://bookwormtranslations.com/copyright-law-and-translation-what-you-need-to-know/  — billinghurst sDrewth 11:16, 9 November 2017 (UTC)
    Well as always with laws, it's complicated; translation doesn't really have « own copyright » but they have « some copyright of their own » (as such, the translator is not the sole author of the translation but just the co-author with the author of the original work, and depending of the country the translator can have less rights on his translation than the orginal author).
    But anyways, I don't see why and how copyrights intervene here, translation are very specific edition but still they are edition (and there is editions way more strange than translations, should we have a different property when an editor transform a poem in verse into a poem in prose? and vice versa? or when other significant changes are made to the original work? in some extreme cases, the better is just to consider that the modifications are so important that this is an entirely new work, for instance Q548338 with Iliad (Q8275)).
    Cdlt, VIGNERON (talk) 12:03, 9 November 2017 (UTC)
A translation can definitely be a new work ("FRBR-style"), because as @Nonoranonqui: patiently explained to me the fundamental discrimination between works is the "Authorial responsability"... So a translation is both a new work, and it's based on/derived/it's a translation of another one. But we probably don't need a new property: we can create an item for a translation, and use
  1. edition or translation of (P629)
  2. translator (P655)
  3. based on (P144)
If I'm not mistaken, these 3 properties give us what we need for understanding the relationship between a book and his translation. A query could look authors, languages and what not to understand everything. I'm not a very good wikidatian, but if properties are simple and clear is better for everyone: we still have queries for complex relations between items. 80.181.62.189 16:46, 13 November 2017 (UTC)
Except it doesn't. Where a translation has multiple editions of its own, this model fails or is corrupted.  — billinghurst sDrewth 21:22, 13 November 2017 (UTC)
I fear that this part of the problem has no solution, from a theoretical point of view. A good translation is both a work and an edition, even for librarians. It's like the wave-particle issue in physics: it's both, depending on how you look at it, what are your needs. Wikidata works with item, which should be "unique". But books don't work that way. So we have to deal with the ambiguity of what we need. I suggest everyone to read this very good free book from @Kcoyle:, she's a great librarian and information professional, and also she's one of us ;-) Aubrey (talk) 09:50, 17 November 2017 (UTC)
@Aubrey: Wrong, there is no obvious solution but we can define a solution with some advantages/disadvantages. We just need to have a logic solution which can be handled by any programming language like SPARQL. Snipre (talk) 15:15, 29 November 2017 (UTC)
@billinghurst: I think we need to distinguish 2 different problems:
Take the case
E1, an edition of work W1 with author X1 in language L1
E2, an edition of work W2 with author X2 in language L1
T1, an translation of edition E1 by translator X3 in language L2
T2, an translation of edition E2 by translator X4 in language L2
If an editor decide to create an new book containing T1 and T2 as
E3, an edition of work W3 by editor X5 containing T1 and T2
There is no problem to create new items for E3 and W3 if we consider that collecting different works is a kind of new work. The Wikidata model is able to handle that situation.
The second problem is to link E1 and W1 to T1 and T2.
To be correct, the information about the fact that T1 and T2 are parts of E1/W1 have to be integrated in W1 and not in E1. Then we have to the answer the question: can we accept the following relations
* W1 has part T1
* W1 has part T2 ? Snipre (talk) 23:55, 18 November 2017 (UTC)
I still don't understand the distinction beetween edition and translation and even less the need for a distinction.
I'm not sure to understand either the case you present here, do you have a concrete example? For W1 has part T1, T2, there is already some case, see this query. Is it what you were thinking about?
Cdlt, VIGNERON (talk) 08:53, 22 November 2017 (UTC)
The use of based on (P144) to link a translation to the document used as based original text for the translation is not the best choice: some book like this one is a translation of this one which is based on the game Mass Effect (Q275960). So based on (P144) can be used twice on the same item once for the translation relation and then for the topic relation. Better avoid that situation. Snipre (talk) 15:15, 29 November 2017 (UTC)

Decameron[edit]

Does anyone know of a Linked Open Data dataset for the stories in Bocaccio's Decameron? We have articles on a couple of stories {Q18600581, Q26710491) but no structure for the days and the stories for each day that I can find. It would be nice not to have to do this from scratch. - PKM (talk) 20:38, 13 November 2017 (UTC)

  • Agree. Somehow I was exhausted after I 1. ;) I got better with QuickStatements in the meantime, so we could try together.
    --- Jura 20:44, 13 November 2017 (UTC)
To start: brigata (Q43256358), days.
--- Jura 16:07, 17 November 2017 (UTC)
Oh excellent! I can add some references to these. - PKM (talk) 21:04, 19 November 2017 (UTC)
@Jura1: Wow, I have realized just how much deep structure you built here! I am stunned.
I have added novella (Q43334491): short prose tale popular in Renaissance Italy, progenitor of the short story <different from> the modern genre novella (Q149537): written, fictional, prose narrative normally longer than a short story but shorter than a novel, and made novella in the Decameron (Q43303440) a subclass. Much more work to do as time permits. Onward! - PKM (talk) 21:54, 19 November 2017 (UTC)
@PKM: I started a list at Decameron editions and translations and included what I found at enwiki/wikisource. Maybe it's possible to give it a reasonable coverage.
--- Jura 12:29, 24 November 2017 (UTC)
@Jura1: thank you for this page and thank you for creating items about editions and translations. As you've seen I've did some corrections to fit the model of WikiProject Books; you reverted me but I see no reason to not use the model of WikiProject Books (especially as there is another discussion, which is more leaning toward keeping the current model). Cdlt, VIGNERON (talk) 14:16, 24 November 2017 (UTC)
It seems consistent with the current model, except maybe that the manuscripts should use "exemplar of" and not "edition of". I don't mind if you change that. I noticed that some of items used the wrong "translation" item, thus the constraint violations. It's fixed now.
--- Jura 14:20, 24 November 2017 (UTC)
I see many points not respecting the model, there was edition of edition (but edition or translation of (P629) is not transitive, corrected now), there was wrong instance of (P31) (thank you for fixing it), there is still several constraints violations (for manuscripts but not only, identifiers too, eg. something is wrong on Q16438#P1256) and in the end, there is a lot of missing information and some wrong information (like Q16438#P577, the property should be inception (P571) and the values should be better indicated, more precise and referenced with better source, like the entry in the Treccani). Cdlt, VIGNERON (talk) 14:37, 24 November 2017 (UTC)
Q16438 isn't even on the list. The source at enwiki I was mentioning is at w:The_Decameron#Translations_into_English. It should be possible to find the same information in Wikidata. Other languages have similar lists.
--- Jura 14:54, 24 November 2017 (UTC)
Q16438 is the work, it's not on the list but it's the more important item of this list.
And please, learn how to use edition or translation of (P629) and has edition (P747) as it was intended (between a work and an edition, never between two editions ; more information on Wikidata:WikiProject Books and on en:Functional Requirements for Bibliographic Records).
Cdlt, VIGNERON (talk) 15:39, 24 November 2017 (UTC)
No problem. I thought you were trying to present some argument and reference about the items on the list you were breaking. Yes, I think we all agree that Wikidata isn't complete yet and you obviously invited to contribute. A list of French translations could be interesting ..
--- Jura 15:56, 24 November 2017 (UTC)

Additional properties[edit]

Why these properties aren't used at all: country of origin (P495) and after a work by (P1877)? --Infovarius (talk) 10:35, 16 November 2017 (UTC)

Hi, Infovarius
AFAIK, after a work by (P1877) is more for artworks (like an etching after a work by (P1877) an original painting)... for books, I'd probably use based on (P144) or inspired by (P941) - these should be applied on the work item, of course, not on the edition.
as for country of origin (P495), what is the point of giving a country of origin ? the work has an author, and a language ; the country in which the author lived at the time is not necessarily the origin of the work (see Voltaire (Q9068)'s works, written in French, but written in Prussia, and published in Prussia (because of France censorship)... should they have Kingdom of Prussia (Q27306) as country of origin (P495) ? this seems rather inadequate.
on version, edition, or translation (Q3331189) items there is already place of publication (P291) - why would you add country of origin (P495) ? --Hsarrazin (talk) 11:17, 16 November 2017 (UTC)
after a work by (P1877) has a different sense from based on (P144) or inspired by (P941) - it has value "person" not "work".
I understand difficulties like with Voltaire (Q9068). But what if all is unambiguate: work has been created and first published in a country - citizenship of an author. Why not to mark this in work item? --Infovarius (talk) 16:45, 17 November 2017 (UTC)
Have you referred to the creation proposal Wikidata:Property_proposal/Archive/31#P1877 ?  — billinghurst sDrewth 06:27, 18 November 2017 (UTC)
@Infovarius: do you have an example for after a work by (P1877)? based on (P144) and inspired by (P941) seems more than enough to me in all cases I can think of (if it is really after *a* work by a person it seems more accurate to directly link to this work instead of the person, plus see the hijacking of after a work by (P1877) which was not at all intended to be used in that way :/ ).
For country of origin (P495), I don't see the need: there is plenty of way to find where a book come from (directly with property like place of publication (P291) - which is far more intuitive and easier to reference - or though the author(s)'s data). Is there a case where the value in country of origin (P495) would be different than the value in place of publication (P291)?
Cdlt, VIGNERON (talk) 10:05, 22 November 2017 (UTC)

I was checking, for 94,017 items with instance of (P31) = book (Q571) (36 %), there is 34,025 with a country of origin (P495). Maybe it should be accepted on works, as place of publication (P291) is only for editions. It's redundant (which is a bad in itself) but it would be easier to do queries and other stuff (like using the redundancy to check the consistency). Cdlt, VIGNERON (talk) 14:43, 24 November 2017 (UTC)

Language property[edit]

I am completely confused which property (language of work or name (P407) or original language of work (P364)) should be used for books, works and films and which is deprecated and will be deleted. User:Pasleim deletes P407 statements, sometimes deletes P364, User:VIGNERON deletes P364. Can you come to an agreement and explain to others? --Infovarius (talk) 20:35, 23 November 2017 (UTC)

original language of work (P364) is deprecated and in a process of deletion (for several months now, it's even written in the original language of work (P364) description) as it was meaningless most of the times (for multiple reason but thank to the FRBR model). For information, I deleted all original language of work (P364) only on items about 'edition' *and* when there was already a language of work or name (P407) with the exact same value (about ~200 items IIRC). So globally, never use original language of work (P364) and always language of work or name (P407). the first removal you cite was an obvious mistake. Cdlt, VIGNERON (talk) 21:12, 23 November 2017 (UTC)
Consensus was reached on WD:PFD to merge original language of work (P364) into language of work or name (P407). However, members of the WikiProject Movies insist on keeping both properties for movies. If you think this is confusing, your comment is highly appreciated on Wikidata:Properties for deletion#Closure of stale thread. --Pasleim (talk) 08:30, 24 November 2017 (UTC)
@Jura1: what are you talking about? The plan is quite clear and logic, see Wikidata:WikiProject Books. And AFAIK, information is not lost (at least not by me, I checked that the information was already there before deleting the deprecated property). Cdlt, VIGNERON (talk) 09:17, 24 November 2017 (UTC)
For items like Les Débuts littéraires de Thingum Bob (Q17352560), there was at least two clues that is it an edition : 1. not in a language spoken by the author and 2. link to Wikisource. I improved the items (who weren't at all following the plan, so it is illogical to use this item as an example of alleged failure of the plan), I think it's clear now.
Cdlt, VIGNERON (talk) 09:17, 24 November 2017 (UTC)
We were looking for a conversion plan. Not that it matters now, we already lost the information in relation to books.
https://www.wikidata.org/w/index.php?title=Q17352560&oldid=563333657 was correct when it was created/edited, but the change of the property on other items made us loose the information that it was just the language of the edition. The same probably applies to all similar items. You will probably need to find a new source to rebuild the information.
--- Jura 09:27, 24 November 2017 (UTC)
https://www.wikidata.org/w/index.php?title=Q17352560&oldid=563333657 was wrong since the beginning. It had a sitelink to Wikisource but was instance of a work. --Pasleim (talk) 09:45, 24 November 2017 (UTC)
Somehow I got the impression the contributor who made it is an expert in the field. So if the approach isn't clear to them, it's unlikely to scale well.
--- Jura 09:54, 24 November 2017 (UTC)
@Jura1: again: what on Earth are you talking about? this edit is clearly and entirely correct, what is wrong with it? (besides the obvious missing properties on the same item but that's beside the point, the item is better after this addition ; and why are you even mentioning it? it's very loosely related to the problem here). What information is lost exactly? For the conversion plan, it's quite easy: delete all original language of work (P364) and replace them by language of work or name (P407) with the same value (and in bonus: check the instance of (P31) and other properties like edition or translation of (P629) and country calling code (P474)). Cdlt, VIGNERON (talk) 10:11, 24 November 2017 (UTC)
I think you are confusing things. Pasleim is stating that the item was wrong to begin with. At least you seem to be satisfied with the approach that seems to be applied for books.
--- Jura 10:16, 24 November 2017 (UTC)
I don't think I'm confusing thing but clearly I'm confused by you. The item Q17352560 was wrong in the beginning as it was empty and missing a lot of property and the instance of (P31) was too general (but reminder: it was created back then in 2014). In 2017, @Hsarrazin: add a language of work or name (P407) and it was a good thing. The only « mistake » (but can we really call it that way?) is that she didn't added others properties nor corrected the P31, but the edit in itself was good. In the end, none of that really matter as original language of work (P364) is not at all involved here.
Can we move on and use a more relevant example? For instance Q19157120 and the P364 deletion I made two days ago. Is there anything you consider as lost here? and why? (I don't see any lost but maybe I'm missing something). If not, do you have an explicit example?
Cdlt, VIGNERON (talk) 10:25, 24 November 2017 (UTC)
Apparently Pasleim and you disagree on https://www.wikidata.org/w/index.php?title=Q17352560&oldid=563333657 . I'm not sure what I can add to help you with this. I'm not aware that an edit is or can be considered incorrect if one doesn't add more statements.
--- Jura 10:31, 24 November 2017 (UTC)
I actually fully agree with VIGNERON. P31 needed to be corrected which wasn't done till today, but this correction should have happened independently of the language property merge. --Pasleim (talk) 10:37, 24 November 2017 (UTC)
Well, P31 on https://www.wikidata.org/w/index.php?title=Q17352560&oldid=563333657 being too general and wrong isn't really the same. It's not independent of the language properties (original language of work and language of edition) because the the use of P407 made it clear that it may not have been the original language of the written work. Once all written work just use language of edition, it's no longer clear.
--- Jura 10:43, 24 November 2017 (UTC)
It's maybe not entirely independent (everything is connected in this Universe) and indeed on this particular example it had to be corrected/completed but now, it seems good to me for this item. Do you have other example where it's unclear or where allegedly « information was lost »? PS: language of work or name (P407) is *not* « language of edition ». Cdlt, VIGNERON (talk) 10:53, 24 November 2017 (UTC)
Ideally, the conversion plan would have taken care of such problems. As Pasleim is doing it, maybe he can detail.
--- Jura 11:00, 24 November 2017 (UTC)
I'm not the only user developed a conversion plan and I'm not the only user doing the conversion (btw, I'm not even member of this WikiProject).
I think, you had too high expectations on the conversion. language of work or name (P407) was not and is not "language of edition". If an item is an edition or a work is defined over P31, P279, sitelinks and external identifiers. If these values were set wrong, they are still wrong now after conversion but it didn't lead to any information loss. --Pasleim (talk) 11:22, 24 November 2017 (UTC)
So if P364 is going to be deprecated I don't understand why User:Pasleim is massively deleting P407 in favor to P364? --Infovarius (talk) 15:27, 24 November 2017 (UTC)
The current rules of WikiProject Movies say to use P364 for movies, therefore I remove P407 in cases where it is redundant to P364. If P364 is going to be deprecated depends on whether or not user accept community consensus. --Pasleim (talk) 18:38, 24 November 2017 (UTC)
We reached consensus to deprecate P364, but we never reached a consensus about the manner in which it would be deprecated, or how the data currently in the property would be handled. The most obvious problem is that the property is explicitly for language of a work, and not for language of editions, nor does the process of deprecation attend to the issue of marking source languages for translations or editions of translations, which is not always the same as the language of the ultimate source (=work). Nor does it solve the problem that works themselves do not have a language; only individual editions / copies will have a language. A "work" refers to the creative piece independently of any specific copy. --EncycloPetey (talk) 05:09, 25 November 2017 (UTC)
Why do you think that work doesn't have a language?? Usually literary works are created in one and only language which is the language. And editions in other languages are just translations from the original language (translation itself can be regarded as creation of new creative work, in different language). Infovarius (talk) 19:51, 29 November 2017 (UTC)
Sorry ? of course all works (textual works obviously) must have a language. But it is language of work or name (P407) not original language of work (P364). --Hsarrazin (talk) 20:00, 29 November 2017 (UTC)
@Infovarius: But the work data item is for the work as a whole, meaning every edition and not any specific edition. A work can appear in any language to which it is edited or translated. We have chosen to eliminate the "original language" property, and now have no means of indicating the original language unless there is a "first edition. This itself is a problematic issue, and some works have no known first edition, and some have a first serialized edition that predates the first bound (book) edition, etc.
We also have no propoerty for marking "language of edition" or "language of translation". We only have a property for "language of work". --EncycloPetey (talk) 02:06, 30 November 2017 (UTC)
Maybe I am not understanding, but if we are talking about the "work" the language P407 is used, and it replaces P364. Editions have P407, and have no requirement for original language as you refer back to the work.  — billinghurst sDrewth 08:50, 30 November 2017 (UTC)
Billinghurst: Why would an edition be marked with language of work or name (P407), since that property is explicitly for the language of the work? Editions are not works. ::::: Also, how do we mark the source language for translations and for editions of translations? We currently have no logical means of doing that. Yes, there can be pointers back to a "work", but for translations we cannot agree on whether the translation is an "edition" or is a "work" and needs its own edition data items.
And, yes, current practice puts language of work or name (P407) on works, but that makes no logical sense. Language is not a property of a work; it is a property of an edition. The language can differ in various translations/editions, so it is not a property native to the work. An item's properties must be invariant, or they are not properties of that data item. "Author" is a property of a work, because a work will always have that author, and this is why we do not replicate the author information on all the data items for the editions. But the date of publication varies with every edition, so we do not put "date of publication" on the work item, but rather on the individual items for each edition. The work instead gets a "date of first publication", or no date at all. The "language" property is in the same category as "date"; it varies with editions/translations, and is not inherent to the work. Yes, a work has an original language of composition, but we've decided to eliminate that property. --EncycloPetey (talk) 14:35, 30 November 2017 (UTC)
language of work or name (P407) on a work is the language it was originally composed by the author... How can you say that it makes no sense to put a language on a work... the language is intrisec to the work... this way, when an edition is the same language, it means it was not translated, whereas when it is different it means it is a translation... Work notice at Bnf (for ex.)
what caracterizes a work is :
  1. an author,
  2. a title (sometimes conventional),
  3. a language,
  4. a date of creation.
Without a language, how can you say that Shakespeare wrote in English, Molière in French or Goethe in German ? --Hsarrazin (talk) 15:14, 30 November 2017 (UTC)
Pretty much my point of view. I would even take it a step further and say that all language belongs on a work, not on edition. Though to do that I have to go back to my argument that each translation is a work too. Any edition of a work, or of a translation, has to be in the same language of its respective parent.  — billinghurst sDrewth 17:02, 30 November 2017 (UTC)
I may be wrong but I think we are mixing very different definitions and senses of the word work here. The sens of work in frbr (that I will write down workfrbr) is very narrow. Editions (and by extension translations, who are expressionsfrbr that we defined to be equivalent to editionwikidata) are not workfrbr but they are work. When P407 says work, I believe this is lato sensu, not stricto sensu. billinghurst: I hear your argument but I feel this is unnecessary or at least I don't see the need (and meanwhile, I see a lot of potential trouble, especially as languages are not always clearly delimited, one can argue that Shakespeare and Molière were not writing in English or French but Early Modern English (Q1472196) and Classical French (Q3100376)). Cdlt, VIGNERON (talk) 17:20, 30 November 2017 (UTC)
Just to gum things up a bit more, the newest version of FRBR, which calls itself the Library Reference Model - LRM adds a new attribute "LRM-E2-A2 Representative expression attribute" - the "representative expression" being the hedge term for "original work." This came out of some (admittedly limited) studies that showed that users of bibliographic data often had a special "feel" for all editions being in relation to the original. This hasn't been entirely incorporated into library cataloging, but the interesting thing to me is how it coincides with how works are treated in Wikipedia entries, often with a fair number of data elements relating to the original. The "LRM" hedges on this because libraries are often cataloging works where either the original is unknown or the cataloger is just not reasonably going to have the time to do the research on it. Rather than put this at a work level the LRM lets you set one of the expressions as the "representative" one. It looks to me like WD could tag one edition as the "original", but it would probably also be useful to carry the original language in all editions. That data is available in the library record format for all published language works.
Note, I do not deny that letting people set one edition as the original could be dangerous, but it also could be very useful for many works. Kcoyle (talk) 19:09, 1 December 2018 (UTC)

Ancient Greek works[edit]

@billinghurst, Hsarrazin, VIGNERON, Snipre: There are currently 13 items left which use both P364 and P407.

SELECT ?item ?itemLabel WHERE {
  ?item wdt:P364 []; wdt:P407 [] .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

Try it!

Me and EncycloPetey disagree on how to apply the guidelines on the front page of this WikiProject. Can somebody of you help us with these items? --Pasleim (talk) 09:34, 30 November 2017 (UTC)

@EncycloPetey, Pasleim: what is the disagreement exactly? Didn't we all agree that original language of work (P364) is deprecated and shouldn't be used? (and if it weren't used for movies too, the property would probably be already deleted for months)
This is maybe besides the point but I've looked at the results and there is something wrong: The Comedies of Aristophanes (Q21286489) is indicated as work in instance of (P31) but many data indicates this is in fact an edition (the 1853 edition by Hickie according to the link to WS and the publication date (P577)). All items seems to be in the same case.
Cdlt, VIGNERON (talk) 11:22, 30 November 2017 (UTC)
IMO, original language of work (P364) should be removed from The Comedies of Aristophanes (Q21286489) but EncycloPetey reverted that change 4 times during the last months [1] [2] [3] [4]. I also think that it should be marked as edition or translation but also that edit was reverted by them [5]. --Pasleim (talk) 12:28, 30 November 2017 (UTC)
@VIGNERON: If you still don't understand the disagreement after months and months of discussion, then explaining it to you all over again isn't going to do any good, is it? Please look at all the previous discussion. There is a lot of it already.
Re: The Comedies of Aristophanes (Q21286489). If you believe this is an edition, then what is it an edition of? What is the work? --EncycloPetey (talk) 14:37, 30 November 2017 (UTC)
@EncycloPetey: I saw the discussions and was wondering if there was a new argument because I thought (wrongly apparently) we were over this, almost all original language of work (P364) has been removed, your 13 texts are the only ones left.
True the case of anthology is a but strange but since you use properties for editions, it seems better to tell it's an edition (or subclass of edition). It should be checked but an item for s:en:Comedies of Aristophanes would be good for the work, wouldn't it? On the other way around, I can ask you almost the same question: « If you believe this is a work, then what is its edition ? ». And on other items like The House of Atreus (Q30349006) you add instance of (P31) = translation (Q7553) ; which is weird as translation (Q7553) is an action, you probably mean translation (Q39811647) which is a subclass of version, edition, or translation (Q3331189). Cdlt, VIGNERON (talk) 14:56, 30 November 2017 (UTC)
Why would you think it was resolved? No consensus on what action to take was ever reached. We agreed to deprecate the one property, but never agreed on how to go about that or how to preserve the information it indicates.
Re: the anthology: huh? No, s:en:Comedies of Aristophanes would not be good for the work. That's a disambiguation page for multiple works that bear the same title. There is no work for this to come from. This is, as you say, an anthology composed entirely of components that are editions, but itself has no parent work to come from. As I keep saying, the data model we are using is both flawed and incomplete, so an appeal to that flawed and incomplete model is merely circular reasoning. --EncycloPetey (talk) 16:18, 30 November 2017 (UTC)
I thought it was resolved since all original language of work (P364) has been removed without trouble (except these 13 items). Is there anyone else except you who disagree?
Why not use s:en:Comedies of Aristophanes (and replace the s:en:disambig with s:en:Template:Versions, as it closer to the second), if we take the criteria given by Hsarrazin, all these anthology have : the same author, the same title, the same language, and the date of creation. It seems to fit the bill. True the content is different for each editions but work has no content so it doesn't really matters (and other properties are already here to explicit that). An other solution is to create a specific work for each different version (a bit overkill but it works too).
FRBR may not be perfect and there always will be tricky cases but this is the best system I know. Plus, it's quite well documented and this project agreed on to use FRBR. Do you know a better cataloguing system?
Cdlt, VIGNERON (talk) 17:04, 30 November 2017 (UTC)
@EncycloPetey: "...the data model we are using is both flawed and incomplete..." This is perhaps true but where are your contributions to solve that ? Perhaps it is time to act as a contributor and to propose a better model. Can we hope once to see contribution to a model ? Snipre (talk) 21:17, 30 November 2017 (UTC)
@Snipre: So, you agree that the data model is both flawed and incomplete, and are willing to explore change? This is the first indication that you or anyone else has made that change might be possible. If we can get more of the community to agree to this, then we will be able to proceed. Up to now, everyone has been pushing to spread the flaws. --EncycloPetey (talk) 21:21, 30 November 2017 (UTC)
@EncycloPetey: Read again what I said: "This is perhaps true that the data model we are using is both flawed and incomplete". But as you never show how to complete or improve the current model how can I judge what is missing ? Until you propose something, and this is something I ask you to do since several weeks, the current model is the best we have and we have to use it in order to be coherent inside WD. I prefer to have a bad solution than only criticisms saying we can do better. Snipre (talk) 21:30, 30 November 2017 (UTC)
Since you're not willing to move forward or admit change, it's disingenuous to criticize others for not proposing solutions. If you're not going to implement the ideas of others, there's no reason to propose those ideas. --EncycloPetey (talk) 21:34, 30 November 2017 (UTC)
@EncycloPetey: Where do you read that I am not ready to change my mind ? Please link to one of my comments saying that. You are not logic because you asked people to change their mind BEFORE showing them any reasons to do it. We have a model which need to be extended but we have something real. And you, what do you have ? You never presented nothing, so for me you have nothing to propose. WD is not a poker game so put your cards on the table or leave the table. Snipre (talk) 21:54, 3 December 2017 (UTC)
  • Pictogram voting comment.svg Comment I agree with Pasleim on these. We can call the original language by reference to the parent work rather than trying to replicate it in every edition. For that set of works, that smattering of the instance of (P31) it is just getting ugly ... book/translation/edition. Compilations don't fit the model, wonder how we go with "Greatest Speeches of ..." compilation, it is going shred your models. /me throws his hands into the air, and leaves it to the experts. I think I will stick with doing editions.  — billinghurst sDrewth 17:20, 30 November 2017 (UTC)
  • The correct way to handle a case like The Comedies of Aristophanes (Q21286489) is to consider it as a normal book: we need a work item and an edition item. The work item for this book will contain the links to other work item of the works composing the book
So having this case:
  • A1: a work of author X1 in language L1 represented in WD by QAAA
  • A2: a work of author X2 in language L2 represented in WD by QBBB
  • D1: a combined edition of A1 and A2 by translator X3 in language L3
To represent D1 in wikidata we need 2 items:
QXXX, Work item for D1 with the following statements:
QYYY, Edition item for D1 with the following statements:
Do you agree with the proposed model ? Snipre (talk) 21:17, 30 November 2017 (UTC)
@billinghurst, Hsarrazin, VIGNERON, Snipre, EncycloPetey, Pasleim: With my proposition we solve the question of original language of work (P364). Snipre (talk) 21:20, 30 November 2017 (UTC)
With which proposition is that? I see nothing that solves that questions currently under consideration. What we need are two new properties: (1) language of composition (or first performance, or first publication), and (2) language of edition (which might be adaptable from "language of work"). And for translations we need some third item to indicate language of source text, and/or the identity of the source text, from which the translation was prepared. --EncycloPetey (talk) 21:24, 30 November 2017 (UTC)
@EncycloPetey:Can't you use your capacity to infer ? This is one principle of database: use relations to deduce information not written. Example, if I said that all dogs are mammals and Floppy is a dog, can't you deduce that Floppy is a mammal even if I don't say it ?
So if you don't have the language of the original text in a item defining a translation, go to the item of the corresponding original text. So the original language of D1 is the language of A1 and A2, so if I have to extract the language value from QAAA and QBBB.
If I want to know the gender of the author of The Knights (Q1215817), why do I have to look in item Aristophanes (Q43353) and not in The Knights (Q1215817) ? Snipre (talk) 21:54, 30 November 2017 (UTC)
I'm sorry that you still can't understand the problem after all the discussion we've been through. Your analogy is flawed for all the reasons we've discussed elsewhere. If all dogs are mammals then that is an invariant quality. It does not change. Likewise, the gender of an author does not (usually) change, so there is no need to replicate it of mark it elsewhere because it is an invariant quality. But language of a work varies and does change, and it is context dependent upon the particular edition, translation, or performance of that work, so it is not an invariant property. Likewise "date" of a work depends upon the specific edition, translation, or performance. You cannot deduce anything when the values are inconstant.
And we've already been through the problem of identifying "original" texts. There is no means of marking that reliably. A translation of a text might be made from the "original", or it might be made from a derivative text in another language. --EncycloPetey (talk) 01:13, 1 December 2017 (UTC)
Please (again) don't mix work and workfrbr, work may have several languages (and even that is a bit dubious to me) but workfrbr clearly has always only one language (the on inside the head of the author). For translation (Q7553) vs. Q23808533, this is a different and separate matter (already discuss in multiple sections of this page), if an edition of Hamlet in German has been translated from one of the first edition in Early Modern English or from a different edition in English or French, doesn't change the fact that Shakespeare was thinking his workfrbr in Early Modern English or that the first editions were in that language, this is clearly invariant. Cdlt, VIGNERON (talk) 14:27, 1 December 2017 (UTC)
Edit: my mistake, in FRBR, workfrbr has no language, this is a property of expressionfrbr only. Cdlt, VIGNERON (talk) 14:48, 1 December 2017 (UTC)
@Snipre: it seems good to me and it seems to be more or less what the FRBR recommends : FRBR 2008 (look on pages 30, 67 and passim, could you take a look and confirm if it fits or not?). Cdlt, VIGNERON (talk) 14:27, 1 December 2017 (UTC)
  • Pictogram voting comment.svg Comment If P407 by itself isn't sufficient to express the information, it probably needs qualifiers. If cases are rare, this might scale. If these are frequent, a solution with a dedicated might be needed. I don't see how it helps us determining the meaning of statement, but if we just say that one solution is "correct" or "what mother recommends implicitly". Once we have a solution, one can try to determine if it can be interpreted in this or that scheme.
    --- Jura 14:33, 1 December 2017 (UTC)
    @Jura1: the solution chosen is quite simple: P407 is the language of the item, if P407 is used on an item with P31 = work (or subclass of), then this is the language of the work, if P407 is used on an item with P31 = edition, then this is the language of the edition (and if P407 is used on something else, then look at the P31). We can use qualifier to make it more explicit and duplicate the P31 but honestly, you just have to look at the P31 to already infer a clear answer. Cdlt, VIGNERON (talk) 14:48, 1 December 2017 (UTC)
    • In this case, there are several languages associated with the item. Up to us to find a solution to qualify them correctly statements correctly.
      --- Jura 14:55, 1 December 2017 (UTC)
      • @Jura1: (if this case is The Comedies of Aristophanes (Q21286489)) there shouldn't be, as already said this item is mixing the work (in Ancient Greek) and edition level (in English). The solution is to do as usual (as Snipre put it « consider it as a normal book ») one item for the work, one for the edition. Cdlt, VIGNERON (talk) 15:13, 1 December 2017 (UTC)
        • @VIGNERON: So how many data items will be required to set up the book currently at The Comedies of Aristophanes (Q21286489)? If we do it your way (as I understand it), there will be 27 data items, or maybe more. That's just for the one book that exists in a single edition on a single Wikisource. --EncycloPetey (talk) 21:07, 1 December 2017 (UTC)
          • @EncycloPetey: I would say only 2: one for the 'work' (to create), one for the 'edition' (The Comedies of Aristophanes (Q21286489) that already exist, with some changes on instance of (P31) and edition or translation of (P629) moved to has part (P527) in the new 'work' item so P629 on the edition wan be linked to the new 'work' item). We already do this kind split for usual books, why not doing it for anthology? after anthology are books (and yes, I required a bigger number of items but we put less data on each items so all in all and in the long run, this is clearly better). @Snipre: do you confirm? Cdlt, VIGNERON (talk) 22:45, 1 December 2017 (UTC)
            • @VIGNERON: So no data items for each of the two volumes? No data items for each of the plays included in the anthology (both the work (translation) and the edition (in this anthology))? Why would you not include those? --EncycloPetey (talk) 23:07, 1 December 2017 (UTC)
              • @EncycloPetey: oh yes, you're right, a work and an edition item for each play in the anthology. I didn't look at the plays in detail but its seems to be already done : The Acharnians (Q1059987) the work and The Acharnians (Q19077417) an edition in English (at least you have the works). So with 11 plays, it's 22 items, plus 2 for the ensemble, not sure about the volumes (I would say no, but I will have to see the previous discussions). And if you want to count all, you need an item for the place of edition(s), for the editor(s), for the translator(s), etc., and a lot of items for the character in the plays too ;) Cdlt, VIGNERON (talk) 10:58, 2 December 2017 (UTC)
              • I feel this is somewhat problematic when we try to discuss this and people just keep repeating things they already wrote and don't actually look at items.
                --- Jura 11:53, 2 December 2017 (UTC)
@VIGNERON: That's what I said since several months: a compilation of works is a new work. But here I see a potential problem for some particular cases: if I have a work for one original text I don't have a work for a corresponding translated edition of that work, so if someone decides to publish a new book containing the original text and the translated text, then the proposed model requires a work item for the translation. Does it means we need a work item for all translation, perhaps not, but we have to find a solution for this case. For books, this case is rare but for poems, this case is more frequent. Snipre (talk) 02:30, 3 December 2017 (UTC)
@Snipre: It's not uncommon for texts at all, and is common far beyond poetry. It applies to drama, correspondence, essays, and most of all it applies to a high proportion of classical literature (Greek, Latin, Chinese, etc.) where parallel texts are common and also anthologies of translations are common. --EncycloPetey (talk) 21:22, 3 December 2017 (UTC)
@EncycloPetey: And ? Do you have a solution or a proposition ? Snipre (talk) 21:44, 3 December 2017 (UTC)
A large part of our problem, in a nutshell, is that we are limited to a binary system of [ "work" or "edition/translation" ]. Translations are neither wholly one or the other, yet they do have editions. So, we need a third option of "translation" that effectively lies in between the levels of "work" and "edition". That doesn't solve all the issues, but would be a positive step if we could implement it. --EncycloPetey (talk) 21:51, 3 December 2017 (UTC)
@EncycloPetey: Good. This is a first step. Can you please provide then the relations between the work, the edition in original language and your new class translation in order to see how complex the model is. Do we have all properties or do we need to create some new ones ? Snipre (talk) 22:06, 3 December 2017 (UTC)
I'm not sure what you're asking or what you're driving at. I know "relation" in the mathematical sense and the biological sense, but think it must have a slightly different meaning the way you are using it. How do you expect a response to be framed? --EncycloPetey (talk) 22:13, 3 December 2017 (UTC)
We have 3 classes (work, edition, translation) so we need at least 3 relations and possible 3 others if we want to have reverse properties. Snipre (talk) 23:41, 3 December 2017 (UTC)
The approach at Wikidata:Lists/Decameron editions and translations works out quite well. We just need to find a good way to express what language something was translated from.
--- Jura 07:48, 4 December 2017 (UTC)
With your approach one can easily determine the language something was translated from by following the edition or translation of (P629) chain. The concern is that you are using edition or translation of (P629) to link both edition with translation and translation with original work. But maybe widen the scope of P629/P747 is more comprehensible than creating a handful new properties. --Pasleim (talk) 11:23, 4 December 2017 (UTC)
Agree for this sample, but it's more complicated for EncycloPetey's. I noticed that there was some inconsistency in the labels of P629/P747. Maybe "translation" should be included in both properties and all languages.
--- Jura 16:45, 4 December 2017 (UTC)
RE Pasleim: "With your approach one can easily determine the language something was translated from by following the edition or translation of (P629) chain." But that won't always work. Assuming that the chain exists, and is complete, and isn't confounded by more than one layer of translation/edition (not all these conditions are always met), all the end of the chain may tell you the language of an ultimate work, not necessarily the language from which the translation was made. The English book The Waning of the Middle Ages is ultimately a translation of a Dutch work, but the translation was made from an unpublished French translation that was radically different from the original Dutch. I also have a book I'm woking with on Wikisource where the original language of composition was German, but the English translation was published first because of the death of the author before the German could be published. So is language of composition and the language of first publication are not the same. Our methods for indicating basic information like author, date, and language are too simplistic to cope with a lot of the data we need to record. --EncycloPetey (talk) 17:06, 4 December 2017 (UTC)
────────────────────────────────────────────────────────────────────────────────────────────────────
A main principle of database design is to avoid duplicate data. You find in the web a lot of literature explaining why duplicate data are bad in a database. If we agree to aim for a good database design, we store the language of the original work only once, namely on the item about the original work. The same with the author. The question left to answer is then how to link editions/derived works/translations with the item about the original work. We currently have edition or translation of (P629)/has edition (P747) and has part (P527)/part of (P361) and published in (P1433). If you think this is not sufficient, please make proposals for new properties. --Pasleim (talk) 17:38, 4 December 2017 (UTC)
WMF DE seems to be for duplicates (at least Wikibase supports symmetric constraints and explicitly doesn't develop better alternatives). WMF gives grants for triplicate schemes .. So I think with an occasional supplementary statement we are still much closer to the ideal.
--- Jura 18:50, 4 December 2017 (UTC)

Edition of an edition[edit]

Aubrey
Viswaprabha (talk)
Micru
Tpt
EugeneZelenko
User:Jarekt
Maximilianklein (talk)
Don-kun
VIGNERON (talk)
Jane023 (talk) 08:21, 30 May 2013 (UTC)
Alexander Doria (talk)
Ruud 23:15, 24 June 2013 (UTC)
Kolja21
arashtitan
Jayanta Nath
Yann (talk)
John Vandenberg (talk) 09:14, 30 November 2013 (UTC)
JakobVoss
Danmichaelo (talk) 19:30, 16 February 2014 (UTC)
Ravi (talk)
Mvolz (talk) 08:21, 20 July 2014 (UTC)
Hsarrazin (talk) 07:56, 9 August 2014 (UTC)
Accurimbono
Mushroom
PKM (talk) 19:58, 10 October 2014 (UTC)
Revi 16:54, 29 November 2014 (UTC)
Giftzwerg 88 (talk) 23:36, 1 January 2015 (UTC)
Almondega (talk) 00:17, 5 August 2015 (UTC)
maxlath
Jura to help sort out issues with other projects
Epìdosis
Skim (talk) 13:52, 24 June 2016 (UTC)
Marchitelli (talk) 12:29, 5 August 2016 (UTC)
BrillLyle (talk) 15:33, 26 August 2016 (UTC)
Alexmar983 (talk) 23:53, 28 August 2016 (UTC)
Finn Årup Nielsen (fnielsen) (talk) 10:44, 29 August 2016 (UTC)
Chiara (talk) 14:15, 29 August 2016 (UTC)
Thibaut120094 (talk) 20:31, 14 September 2016 (UTC)
Ivanhercaz | Discusión Plume pen w.png 15:30, 31 October 2016 (UTC)
YULdigitalpreservation (talk) 17:35, 10 November 2016 (UTC)
User:Jc3s5h
PatHadley (talk) 21:51, 15 December 2016 (UTC)
Erica (ohmyerica) (talk) 19:26, 1 January 2017 (UTC)
User:Timmy_Finnegan
Mauricio V. Genta (talk) 05:38, 12 March 2017 (UTC)
Sam Wilson 09:24, 24 May 2017 (UTC)
Sic19 (talk) 22:25, 12 July 2017 (UTC)
Andreasmperu
MartinPoulter (talk) 09:21, 20 July 2017 (UTC)
ThelmadatterThelmadatter (talk) 01:11, 13 September 2017 (UTC)
Zeroth (talk) 15:01, 16 September 2017 (UTC)
Emeritus
Ankry
Beat Estermann (talk) 20:07, 12 November 2017 (UTC)
Shilonite - specialize in cataloging Jewish & Hebrew books
Elena moz
Oa01 (talk) 10:52, 3 February 2018 (UTC)
Maria zaos (talk) 11:39, 25 March 2018 (UTC)
Wikidelo (talk) 13:07, 15 April 2018 (UTC)
Mfchris84 (talk) 10:08, 27 April 2018 (UTC)
Mlemusrojas (talk) 3:36, 30 April 2018 (UTC)
salgo60 Salgo60 (talk) 12:42, 8 May 2018 (UTC)
Dick Bos (talk) 14:35, 16 May 2018 (UTC)
Marco Chemello (BEIC) (talk) 07:26, 30 May 2018 (UTC)
Harshrathod50
 徵國單  (討論 🀄) (方孔錢 💴) 14:35, 20 July 2018 (UTC)
Alicia Fagerving (WMSE)
Louize5 (talk) 20:05, 11 September 2018 (UTC)
Viztor (talk) 05:48, 6 November 2018 (UTC)
RaymondYee (talk) 21:12, 29 November 2018 (UTC)
Merrilee (talk) 22:14, 29 November 2018 (UTC)
Kcoyle (talk) 22:17, 29 November 2018 (UTC)
JohnMarkOckerbloom (talk) 22:58, 29 November 2018 (UTC)

Helmoony (talk) 19:49, 8 December 2018 (UTC) Pictogram voting comment.svg Notified participants of WikiProject Books

Hi,

It seems obvious and trivial to me, and as documented on the property page, the main page here Wikidata:WikiProject Books and in the FRBR, that an « edition of an edition » is not possible and doesn't even make sense (an edition is by definition the thing edited from a work). So logically, I corrected it on Декамерон (Q43475477) but Jura1 (talkcontribslogs) reverted me and is asking for « references ».

For me it's as obvious as « the sky is blue » or « water is wet », I don't know what more to explain... Any idea, remarks, etc. ?

Cdlt, VIGNERON (talk) 16:01, 24 November 2017 (UTC)

PS: to be sure, I checked again in the FRBR, it's clearly stated « Translations from one language to another, musical transcriptions and arrangements, and dubbed or subtitled versions of a film are also considered simply as different expressions of the same original work. » (FRBR, pages 17-18)

Q43475477 is an edition of a 19th-century translation: Q43169039.
Similar to Q43517456 which is an 1860 edition of the 15th century translation Q43516994.
Maybe the Commons sitelinks shouldn't be on these items.
The objective is to provide a full list of translations at Wikidata:Lists/Decameron editions and translations
similar to w:The_Decameron#Translations_into_English.
--- Jura 16:14, 24 November 2017 (UTC)
I don't speak russian enough so I don't know if the edition Q43475477 is based or not on the edition Q43169039 (BTW, they're both edition as translation are edition). But in any case, the property to indicate this information is based on (P144) not has edition (P747)/edition or translation of (P629). For the list, it is easier to create the exact same list when all editions are link to the same work (the SPARQL request would be shorter with just P629 and not P629+ which doesn't really make sense as P629 is not transitive).
Cdlt, VIGNERON (talk) 16:20, 24 November 2017 (UTC)
  • "the property to indicate this information is based on (P144)" what leads you to this conclusion? Is this something you just made up now or is it documented somewhere at Wikidata?
    --- Jura 00:20, 26 November 2017 (UTC)
The way the FRBR Group 1 classes have been implemented on Wikidata does not allow to express that two editions (frbr:Manifestation) are the embodiments of the same frbr:Expression. based on (P144) is not adequate to express this, as it could also be used to express that an adaptation is based on a particular translation. I'm not sure to what extent it is necessary in Jura1's example and use case to actually be able to express such subtleties. I can however understand that some confusion may arise if one has the distinction between frbr:Expression and frbr:Manifestation in mind. --Beat Estermann (talk) 00:25, 26 November 2017 (UTC)
It seems that the labels/definitions of edition or translation of (P629) aren't the same in all languages. At some point, "translation" was added to English ([6]) and some other languages. The approach chosen for Decameron seems consistent with current constraints.
--- Jura 13:29, 26 November 2017 (UTC)
« edition of » and « edition or translation of » is the same thing as translations are editions (at least until now in this project and in FRBR), the precision in the label is just a way to make more explicit for users. Formally there is not constraints right now to forbid edition of edition but there should be as (I feel) this is not at all in the spirit of this project where 'edition of' is supposed to be between only Work and Edition levels, not inside item of the Edition level. Cdlt, VIGNERON (talk) 23:03, 28 November 2017 (UTC)
Pictogram voting comment.svg Comment This seems to be what I have been addressing at #Are we conflating editions and translations; or are we missing translations as their own works?. The generic "translation" can be at the work level, or at the edition level. Like the misuse of "book" which can relate the creative work, or a specific edition of the work. The difference in the jargon is not important to most people.  — billinghurst sDrewth 23:04, 26 November 2017 (UTC)
@billinghurst: exactly but even if we choose to consider editions and translations to be different things (which I think to be a bad and unnecessary idea, as most databases and references consider translations to be editions) then we would need a new property 'translation of' and in this case we wouldn't have edition of editions, right ? Cdlt, VIGNERON (talk) 23:03, 28 November 2017 (UTC)

For information, right now there is 341 results for edition of edition:

SELECT ?item1 ?item1Label ?item2 ?item2Label ?item3 ?item3Label WHERE {
  ?item1 wdt:P629 ?item2 .
  ?item2 wdt:P629 ?item3 .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

Try it!

Some of them seems to violates multiple constraints (including of lot of manuscripts which probably should use exemplar of (P1574) instead). What should we do with these items?

Cdlt, VIGNERON (talk) 23:03, 28 November 2017 (UTC)

@VIGNERON: For exemplar we need to use exemplar of (P1574) and to link the exemplar to the edition or to the work if no edition exists. Snipre (talk) 13:10, 29 November 2017 (UTC)
The main problem is from this kind of item Septuagint manuscript (Q7452368): this is typically a Wikipedia structure which doesn't correspond to the Wikidata model and create an additional layer in instance/subclass classification without having any meaning in the FRBR classification.
Second problem is this item Septuagint (Q29334) or Vulgate (Q131175) which are defined as work but described as translation of the Bible. We need to find a solution for these items. Snipre (talk) 14:30, 29 November 2017 (UTC)
The Septuagint and Vulgate are effectively anthologies of translations. The Vulgate and Septuagint both have the same collection of translated texts, but "The Bible" can vary in what it contains depending upon the form of Christianity (Coptic, Ethiopian, Orthodox, Protestant) or Hebrew (which will not contain the New Testament). So "The Bible" is not a fixed text nor a definite anthology. --EncycloPetey (talk) 21:31, 30 November 2017 (UTC)

frbr:Expression[edit]

Aubrey
Viswaprabha (talk)
Micru
Tpt
EugeneZelenko
User:Jarekt
Maximilianklein (talk)
Don-kun
VIGNERON (talk)
Jane023 (talk) 08:21, 30 May 2013 (UTC)
Alexander Doria (talk)
Ruud 23:15, 24 June 2013 (UTC)
Kolja21
arashtitan
Jayanta Nath
Yann (talk)
John Vandenberg (talk) 09:14, 30 November 2013 (UTC)
JakobVoss
Danmichaelo (talk) 19:30, 16 February 2014 (UTC)
Ravi (talk)
Mvolz (talk) 08:21, 20 July 2014 (UTC)
Hsarrazin (talk) 07:56, 9 August 2014 (UTC)
Accurimbono
Mushroom
PKM (talk) 19:58, 10 October 2014 (UTC)
Revi 16:54, 29 November 2014 (UTC)
Giftzwerg 88 (talk) 23:36, 1 January 2015 (UTC)
Almondega (talk) 00:17, 5 August 2015 (UTC)
maxlath
Jura to help sort out issues with other projects
Epìdosis
Skim (talk) 13:52, 24 June 2016 (UTC)
Marchitelli (talk) 12:29, 5 August 2016 (UTC)
BrillLyle (talk) 15:33, 26 August 2016 (UTC)
Alexmar983 (talk) 23:53, 28 August 2016 (UTC)
Finn Årup Nielsen (fnielsen) (talk) 10:44, 29 August 2016 (UTC)
Chiara (talk) 14:15, 29 August 2016 (UTC)
Thibaut120094 (talk) 20:31, 14 September 2016 (UTC)
Ivanhercaz | Discusión Plume pen w.png 15:30, 31 October 2016 (UTC)
YULdigitalpreservation (talk) 17:35, 10 November 2016 (UTC)
User:Jc3s5h
PatHadley (talk) 21:51, 15 December 2016 (UTC)
Erica (ohmyerica) (talk) 19:26, 1 January 2017 (UTC)
User:Timmy_Finnegan
Mauricio V. Genta (talk) 05:38, 12 March 2017 (UTC)
Sam Wilson 09:24, 24 May 2017 (UTC)
Sic19 (talk) 22:25, 12 July 2017 (UTC)
Andreasmperu
MartinPoulter (talk) 09:21, 20 July 2017 (UTC)
ThelmadatterThelmadatter (talk) 01:11, 13 September 2017 (UTC)
Zeroth (talk) 15:01, 16 September 2017 (UTC)
Emeritus
Ankry
Beat Estermann (talk) 20:07, 12 November 2017 (UTC)
Shilonite - specialize in cataloging Jewish & Hebrew books
Elena moz
Oa01 (talk) 10:52, 3 February 2018 (UTC)
Maria zaos (talk) 11:39, 25 March 2018 (UTC)
Wikidelo (talk) 13:07, 15 April 2018 (UTC)
Mfchris84 (talk) 10:08, 27 April 2018 (UTC)
Mlemusrojas (talk) 3:36, 30 April 2018 (UTC)
salgo60 Salgo60 (talk) 12:42, 8 May 2018 (UTC)
Dick Bos (talk) 14:35, 16 May 2018 (UTC)
Marco Chemello (BEIC) (talk) 07:26, 30 May 2018 (UTC)
Harshrathod50
 徵國單  (討論 🀄) (方孔錢 💴) 14:35, 20 July 2018 (UTC)
Alicia Fagerving (WMSE)
Louize5 (talk) 20:05, 11 September 2018 (UTC)
Viztor (talk) 05:48, 6 November 2018 (UTC)
RaymondYee (talk) 21:12, 29 November 2018 (UTC)
Merrilee (talk) 22:14, 29 November 2018 (UTC)
Kcoyle (talk) 22:17, 29 November 2018 (UTC)
JohnMarkOckerbloom (talk) 22:58, 29 November 2018 (UTC)

Helmoony (talk) 19:49, 8 December 2018 (UTC) Pictogram voting comment.svg Notified participants of WikiProject Books

Hi,

I'm presently working on the ingest of a pilot dataset of performing arts productions. The Expressions that are to be described in the context of the performing arts are not necessarily editions (often, they have not been published, but we do know who the translator or the adapter was, we know their language, etc.). I would therefore suggest to create a separate class "Expression", corresponding to frbr:Expression, and to slightly modify the description of version, edition, or translation (Q3331189) on the Books project page: in fact, version, edition, or translation (Q3331189) seems to correspond first and foremost to frbr:Manifestation.

When describing editions from the perspective of physical artefacts, as is most common in the library world, the approach that was used so far, employing the classes version, edition, or translation (Q3331189) and creative work (Q17537576), would be maintained as is. However, when describing expressions from the perspective of their content, as is the case in the theatrical databases I'm working with, a more refined data model could be used which distinguishes between the four FRBR Group 1 classes.

I've described the rationale in more detail here and am looking forward to your comments. --Beat Estermann (talk) 00:05, 26 November 2017 (UTC)

@Beat Estermann:
On this page it's indicated: « Not to complicate too much, we didn't use the FRBR terms "expression" or "manifestation", as the boundary between the definitions it's not easy to grasp. So we used "edition" instead, collapsing those 2 FRBR layers in 1 (other conceptual frameworks similar to FRBR (like Bibframe) collapse those 2 layers too). Thus the double layer work - edition has been used for creating Book properties. »
I don't know if our "edition" level closer to the "expression" or to the "manifestation" FRBR level, I don't know (if I had to tell, I would have said "expression" but it's true that our edition level is - wrongly - more seen as physical than intellectual) and I'm not even sure the question make sense as it's both by design.
For the creation of a new class for "expression", it's a good idea (at least we would be exactly aligned with FRBR), but I don't know how to make it useable for everyone (most of the problem on this project are because the simplified models is too complicated already - even if meanwhile some people want to add a fifth level for translation... - so I'm not sure to deal with one more level).
I've read you text quickly and some things seems a bit strange but it sounds good globally, I'll try to read it more thoroughly soon.
Cdlt, VIGNERON (talk) 16:18, 28 November 2017 (UTC)

Berg Encyclopedia of World Dress and Fashion[edit]

Aubrey
Viswaprabha (talk)
Micru
Tpt
EugeneZelenko
User:Jarekt
Maximilianklein (talk)
Don-kun
VIGNERON (talk)
Jane023 (talk) 08:21, 30 May 2013 (UTC)
Alexander Doria (talk)
Ruud 23:15, 24 June 2013 (UTC)
Kolja21
arashtitan
Jayanta Nath
Yann (talk)
John Vandenberg (talk) 09:14, 30 November 2013 (UTC)
JakobVoss
Danmichaelo (talk) 19:30, 16 February 2014 (UTC)
Ravi (talk)
Mvolz (talk) 08:21, 20 July 2014 (UTC)
Hsarrazin (talk) 07:56, 9 August 2014 (UTC)
Accurimbono
Mushroom
PKM (talk) 19:58, 10 October 2014 (UTC)
Revi 16:54, 29 November 2014 (UTC)
Giftzwerg 88 (talk) 23:36, 1 January 2015 (UTC)
Almondega (talk) 00:17, 5 August 2015 (UTC)
maxlath
Jura to help sort out issues with other projects
Epìdosis
Skim (talk) 13:52, 24 June 2016 (UTC)
Marchitelli (talk) 12:29, 5 August 2016 (UTC)
BrillLyle (talk) 15:33, 26 August 2016 (UTC)
Alexmar983 (talk) 23:53, 28 August 2016 (UTC)
Finn Årup Nielsen (fnielsen) (talk) 10:44, 29 August 2016 (UTC)
Chiara (talk) 14:15, 29 August 2016 (UTC)
Thibaut120094 (talk) 20:31, 14 September 2016 (UTC)
Ivanhercaz | Discusión Plume pen w.png 15:30, 31 October 2016 (UTC)
YULdigitalpreservation (talk) 17:35, 10 November 2016 (UTC)
User:Jc3s5h
PatHadley (talk) 21:51, 15 December 2016 (UTC)
Erica (ohmyerica) (talk) 19:26, 1 January 2017 (UTC)
User:Timmy_Finnegan
Mauricio V. Genta (talk) 05:38, 12 March 2017 (UTC)
Sam Wilson 09:24, 24 May 2017 (UTC)
Sic19 (talk) 22:25, 12 July 2017 (UTC)
Andreasmperu
MartinPoulter (talk) 09:21, 20 July 2017 (UTC)
ThelmadatterThelmadatter (talk) 01:11, 13 September 2017 (UTC)
Zeroth (talk) 15:01, 16 September 2017 (UTC)
Emeritus
Ankry
Beat Estermann (talk) 20:07, 12 November 2017 (UTC)
Shilonite - specialize in cataloging Jewish & Hebrew books
Elena moz
Oa01 (talk) 10:52, 3 February 2018 (UTC)
Maria zaos (talk) 11:39, 25 March 2018 (UTC)
Wikidelo (talk) 13:07, 15 April 2018 (UTC)
Mfchris84 (talk) 10:08, 27 April 2018 (UTC)
Mlemusrojas (talk) 3:36, 30 April 2018 (UTC)
salgo60 Salgo60 (talk) 12:42, 8 May 2018 (UTC)
Dick Bos (talk) 14:35, 16 May 2018 (UTC)
Marco Chemello (BEIC) (talk) 07:26, 30 May 2018 (UTC)
Harshrathod50
 徵國單  (討論 🀄) (方孔錢 💴) 14:35, 20 July 2018 (UTC)
Alicia Fagerving (WMSE)
Louize5 (talk) 20:05, 11 September 2018 (UTC)
Viztor (talk) 05:48, 6 November 2018 (UTC)
RaymondYee (talk) 21:12, 29 November 2018 (UTC)
Merrilee (talk) 22:14, 29 November 2018 (UTC)
Kcoyle (talk) 22:17, 29 November 2018 (UTC)
JohnMarkOckerbloom (talk) 22:58, 29 November 2018 (UTC)

Helmoony (talk) 19:49, 8 December 2018 (UTC) Pictogram voting comment.svg Notified participants of WikiProject Books

Berg Encyclopedia of World Dress and Fashion (Q4891400) is a 10-volume encyclopedia published in hardcover, ebook, and online. Each volume has its own ISBNs, DOI, editors and subject area (e.g. African dress). Should I make a work item for each volume or can they all be editions of one work item Berg Encyclopedia of World Dress and Fashion (Q4891400) annotated as to volume number, editors, and subject area? (see contents) - PKM (talk) 23:55, 12 December 2017 (UTC)

After much thought, I am going to add a single edition for the online Encyclopedia and just include the volume information and reference URLs in individual references until I see if this source is useful enough to create items for each volume. - PKM (talk) 20:08, 14 December 2017 (UTC)
I have now added Berg Encyclopedia of World Dress and Fashion (Q55816150) specifically for the online edition with <has part> items for each of the 10 volumes. I'll be going through and updating my <stated in> references over time. - PKM (talk) 23:50, 29 July 2018 (UTC)

Proposing change to qualifiers— remove P248, add P805[edit]

Following a discussion in Wikidata:Project chat about the constraint that stated in (P248) is only to be used for references, that I will replace that with the identified preferred statement is subject of (P805). I will look to set up a references section on same page and have P248 entered there.  — billinghurst sDrewth 04:50, 14 December 2017 (UTC)

I don’t get why you need a qualifier and not a reference. The example do not help. author  TomT0m / talk page 16:42, 11 January 2018 (UTC)
@TomT0m: Where we are using described by source (P1343) it is not a reference, it is a qualifier to the work, as such it is a statement of origin. There has been widespread use of P248 in this situation, and this is a constraint violation, see Property:P248. So this is to offer a contextually corrected property to use for P1343.  — billinghurst sDrewth 04:55, 13 January 2018 (UTC)
See Charles Dickens (Q5686) for some examples of difference. If you have the "show constraint violations" gadget operating, you will see the highlights.  — billinghurst sDrewth 04:58, 13 January 2018 (UTC)

NB: I changed behavior of s:ru:Модуль:Другие источники to use P805 instead of P248 (diff). -- Sergey kudryavtsev (talk) 06:50, 13 January 2018 (UTC)

NB2: w:ru:Модуль:External links uses P805 too.

@billinghurst, TomT0m: Can you run a bot to replace P248 with P805? -- Sergey kudryavtsev (talk) 07:02, 13 January 2018 (UTC)

Well outside my skill set. I have placed this request for a bot to be run. There is a discussion section there if there is any comment to be made about the requested replacement.  — billinghurst sDrewth 10:53, 13 January 2018 (UTC)
Under way with user:PLbot undertaking.  — billinghurst sDrewth 13:21, 24 January 2018 (UTC)

Pseudonyms[edit]

Currently it seem we are assuming that pseudonyms do not have their own items. It’s not the case in external databases that have a proper identifier (Qid here) on pseudonyms. This causes questions on our users (see Talk:Q7245 or Topic:U5ied71lz96i7r8m). I think we should think about this. Any previous discussion about this ? Any known Documentation ? author  TomT0m / talk page 17:07, 11 January 2018 (UTC)

There are definitely cases where the pseudonyms have items, though the article needs to be about the pseudonym, not about the individual, ie. article at WP that has article as such pseudonym itself is notable. I have seen this more in cases of collective pseudonym (Q16017119).  — billinghurst sDrewth 05:02, 13 January 2018 (UTC)
Here a related question: is there a case where a Wikimedia project has two differents pages, one for the person and one for the pseudonym? (and is it a common pratice on a Wikimedia project?). I don't know any but if there is, Wikidata would have to deal with it. Cdlt, VIGNERON (talk) 11:14, 13 January 2018 (UTC)
@VIGNERON: check for instance of (P31) -> pseudonym (Q61002) and see what is there. I know that there are articles for collective pseudonyms.  — billinghurst sDrewth 12:52, 13 January 2018 (UTC)
My unknown case is when we know a text is signed by a pseudonym but we know nothing about whom actually is the author. Is the proposed model currently is
< text > author (P50) View with SQID < unknown value >
credited as search < string pseudonym >
 ?
This raises another question : a pseudonym is supposed to be its own identifier. What happens if several authors uses the same pseudonym at some point in time, we don’t know who one or two actually is but we are rather sure the author is not the same. In other words, there is two « persona », with each an author, with the same signature ? Is there anonymous authors we only know their pseudonyms ? This imply, if an anonymous author has several pseudonym and we create a « human » item for each, that one person can have several « human » Wikidata item. This is not true if we choose to have « persona » items. If we have « persona » item, we also can refer to this pseudo without using its signature string. We can have several « persona » for one pseudonym string. author  TomT0m / talk page 12:27, 13 January 2018 (UTC)
If there is no article/item for an author, just a pseudonym, and an unknown one, you probably should just consider using author name string (P2093). I see little point generating items for people who are basically anonymous.  — billinghurst sDrewth 12:53, 13 January 2018 (UTC)
Good questions.
The precise meaning and use of named as (P1810) is clearly not clear (there is a constraint used as qualifier constraint (Q21510863) but the given example is a direct property :/ I will raise this point on the talk page, but there is other unclear point, among others: is it limited to people or not?). Nonetheless, you model seems good, just one detail: it's not always a unknown value, it can be used for known value too for alternative names which act the same way as pseudonyms
< some old edition of the 'Sonnets' > author (P50) View with SQID < William Shakespeare (Q692) View with Reasonator View with SQID >
named as (P1810) View with SQID < Shake-speares >
(and with statement is subject of (P805) = spelling of William Shakespeare's name (Q7575898)). author name string (P2093) is a good solution too (but it depends on the context).
At least, one point is sure : anonymous (no name) and pseudonymous (some name) are mutually exclusive. It's either one or the other.
Cdlt, VIGNERON (talk) 12:56, 13 January 2018 (UTC)
@VIGNERON: I can also see it being used as qualifier to a reference, see Chet Baker (Q2274) which is causing constraints issues too. From my reading of the English description, it is used for proper nouns, rather than people.

Re your Shakespeare example, does it not come under my earlier explanation? I would have said that would just be the addition of the pseudonym property item added to Shakespeare, and then on the work, use author -> Shakespeare, then qualify with "named as" -> given pseudonym/alternate spelling/whichever  — billinghurst sDrewth 15:20, 13 January 2018 (UTC)

@billinghurst: indeed, I see that this point is already discuss on Property talk:P1810.
Maybe, but I'm not sure to understand, what « earlier explanation » are you talking about.
To get back to the original question, some database have several identifiers but some have only one (BnF has only one for Samuel Clemens/Mark Twain). Cdlt, VIGNERON (talk) 15:52, 13 January 2018 (UTC)
Once more we’re discussing a global issue (pseudonyms) taking the small picture. This tends to spread discussions everywhere :( This amounts to questioning Wikidata objective on this. I tend to think we’re one place where we can add informations that are not hold by over databases. Wikidata has a large scope, and tend to be inclusive. I think as a consequence we should allow to hold information about personas. author  TomT0m / talk page 16:04, 13 January 2018 (UTC)
@billinghurst: I wonder if the lack of a « persona » concept in this model tends to make kind of hard to treat cases in a generic way. There is a lot of properties and way to use it. Hard to take into account all the possible cases and not forget something. If a writer likes to play with the histories of its identities, invents false biographies for them, see https://en.wikipedia.org/wiki/Romain_Gary for example who let his cousin play the role of one pseudonym for the press, hard to model any of this. If we consider « Emile Ajar » a fictional character, then we can have an item for it and link it to the item of Gary’s cousin. Authors have also been known to change pseudonyms wrt. the field of work, eg. Special:EntityPage/Q309240 who signed « Moebius » only for its science fiction work. We can’t really link the pseudo with science fiction properly if we don’t have an item for Moebius. As a qualifier for the pseudonym maybe … but that’s a limited approach. Also a single persona may have several signature string. The « persona item » model allows to treat all kind of corner cases elegantly. And seems to me easier to query while being more flexible. I think we should have « persona » items and property to link them to their puppeteers. author  TomT0m / talk page 15:56, 13 January 2018 (UTC)
(ec) I said above:
There are definitely cases where the pseudonyms have items, though the article needs to be about the pseudonym, not about the individual, ie. article at WP that has article as such pseudonym itself is notable. I have seen this more in cases of collective pseudonym (Q16017119).
So no items for pseudonyms unless there is a wikidata item that says "this is a pseudonym" and not about the person for who it was a pseudonym.

So, for where there are multiple authority controls they are usually both entered against the person and each is qualified with "named as." If there is more than one BnF, then it will have corresponding multiple VIAFs, and it is my understanding that this will put the duplicates into a queue to be considered for merging.  — billinghurst sDrewth 16:02, 13 January 2018 (UTC)

@TomT0m: You can list multiple pseudonyms against one author. The task is to link a work to the author, irrespective of the name used, where the additional names are qualified.  — billinghurst sDrewth 16:05, 13 January 2018 (UTC)
If someone is creating false biographies for a pseudonym, then that sounds like it reaches into one of those where an article is being written about the pseudonym, and it does get its own item.  — billinghurst sDrewth 16:07, 13 January 2018 (UTC)
Then remember that I discussed collective pseudonym (Q16017119) so Ellery Queen (Q586362) and Michael Field (Q839369) have articles and have multiple people involved.  — billinghurst sDrewth 16:09, 13 January 2018 (UTC)

How to include books in a practical manner[edit]

I have read the documentation and there is only one concern that I have. It is wonderfull as a database but it fails me in several ways. I want to add all the books of all the authors we know. The objective is to information about books that are available for reading.

When I read about the database model, I find that there is nothing practical in there. The notion that LUA should be the glue to bind it all is not even an excuse. There are a few scenarios that I want an effective answer for.

  • I want all Wikisource books to be effectively registered so that we know what books are available for reading in what language. I really want us to advertise those books, I want them to be read.
  • I want us to import all books from the Open Library that have an author we have an identifier to the Open Library for. I do not mind to restrict it at first to include only the books with ebooks. To be truthful, I also want to include the books the Biodiversity Heritage Library has at the Internet Archive. For them we have to import many more authors .. but it is an option to treat them like we do scientific publications where authors are only added at a later date.

Now when it is about database design. It is one thing to suscribe to what libraries do, it makes sense when we accomplish things in this way. My challenge is how can we effectively register books and find an audience for these books. Thanks, GerardM (talk) 19:03, 25 January 2018 (UTC),

for wikisource texts, there is a work that is done now, by frwikisource and Tpt, to allow a rather automatic import of texts, as editions, and to ease the creation of work items. But it is not complete yet. You may read what's been done for now here (sorry, it's in French). --Hsarrazin (talk) 19:21, 25 January 2018 (UTC)
That is cool, even important. It is obvious that without data we cannot do much. But how is this going to enable more readers. How will this be a template for all the other Wikisources? How about all the other issues that I raise.. To paraphrase a Wendy advert: Where is the beef? Thanks, GerardM (talk) 19:53, 25 January 2018 (UTC)s
My two cents: Wikisource is still in the initial stages of adding to WD, and only the French and English Wikisources are really large enough and varied enough to be doing much. Many other Wikisources are small, poorly staffed, and have little oversight to maintain consistent formatting and data. Even on the English Wikisource, we face the issue that many older works and editions are so poorly curated, that they practically have to be done over again from scratch.
We've managed to do a decent job of adding authors and author data, but works, editions, and translations still have many challenges to overcome. I have requested a customizable tool for the addition of Wikisource works, but such tools seem to take low priority with the developers, who favor Wikipedia-tools because of the much larger participation. --EncycloPetey (talk) 00:59, 9 February 2018 (UTC)

Works[edit]

What is the current best practice on instance of (P31) for non-fiction works? And if book (Q571) is not the correct P31 for works - and I am sure it's not - could we please change the example on the project page? - PKM (talk) 19:47, 8 February 2018 (UTC)

I would say it depends on the item being added. For entries in the 1911 Encyclopædia Britannica, it's common to use encyclopedic article (Q17329259). There are also options for textbook (Q83790), academic journal article (Q18918145), etc. The use of book (Q571) is simply the most generic sort of example, and sometimes the only meaningful option. --EncycloPetey (talk) 00:54, 9 February 2018 (UTC)
I would say : in theory, all documents, fiction or non-fiction should follow the FRBR. In practice, since almost all non-fiction as only one editionFRBR per workFRBR, there is no real need to use the FRBR and most wikidatians only create one item (which ideally is more or less wrong but pragmatically is more or less right). But if you follow the FRBR, I see no reason why not use book (Q571) for workFRBR or as @EncycloPetey: said, any subclass of it, for instance a general and obvious choice is non-fiction book (Q20540385). Do you have a specific work in mind? Cdlt, VIGNERON (talk) 07:41, 9 February 2018 (UTC)
Mmm we’re actually not really « using FRBR ». You mean « create a work item » ? author  TomT0m / talk page 12:22, 9 February 2018 (UTC)
@TomT0m: mmm too, the first phrase of the first section on Wikidata:WikiProject Books is literally « We used the Functional Requirements for Bibliographic Records (FRBR) model », we adapted it (like everybody, nobody use exactly the FRBR, even the FRBR adapted itself several times since 1997) but adaption is still usage. Anyway, that doesn't matter that much, as PKM was speaking of « non-fiction works », I guessed (maybe wrongly) that she was indeed talking about creating a work item. Cdlt, VIGNERON (talk) 13:16, 9 February 2018 (UTC)
I’m of the opinion that, if we create a single item, it’s maybe best to create the work one anyway. author  TomT0m / talk page 14:12, 9 February 2018 (UTC)
@TomT0m: well yes, I think I get your idea but if you have only one item, you're outside the FRBR and work/edition separation, the item is neither and for the constraints you have to be both. Cdlt, VIGNERON (talk) 14:51, 9 February 2018 (UTC)
I don’t understand. FRBR describe a model, it does not require us to have items for every part of it ? Or does it ? author  TomT0m / talk page 15:15, 9 February 2018 (UTC)
Nobody is coming putting a knife under wikidatian's throat to create both a item about the work and one about the edition Face-wink.svg. But logically, we're are creating item about works and editions. One is less meaningful and useful without the other. Cdlt, VIGNERON (talk) 15:21, 9 February 2018 (UTC)
@VIGNERON: The problem with using non-fiction book (Q20540385) for instance of (P31) is that it's not simply a from ("instance") but a form/genre combination item. That is, "book" is a form but "non-fiction" is a genre. So I wouldn't use that value at all. I would also point out that many non-fiction works have gone through multiple editions. I have books on my shelf about anatomy, botany, Greek theatre, and Latin grammar, as well as dictionaries, encyclopedias, biographies, writing guides, and statistical reference works which have all gone through multiple editions. --EncycloPetey (talk) 16:41, 9 February 2018 (UTC)
@EncycloPetey: There is nothing in instance of (P31) or in Help:BMP that says it classifies work of art by form. It’s a generic property that can handle classification by genre as well, it classifies by many criteria (and it’s is force, no need to reinvent the wheel to classify stuffs). As both genre-classes and form-classes are subclass of « work », this follows that there is no problem into creating a subclass of both non fiction and books. Although we don’t have to and using only instance of (P31) we could as well put statements with the two values. Seems practical however to create such classes for common combinations. author  TomT0m / talk page 17:38, 9 February 2018 (UTC)
@TomT0m: Yet we have no guiding philosophy or principle on this matter. I would argue that instance of (P31) should be limited to a form or structure, and leave the genre (for fiction works) to its own separate statement, and likewise the main subject (for non-fiction works) should be kept separate from the "instance of" statement to the greatest extent possible. --EncycloPetey (talk) 17:46, 9 February 2018 (UTC)
@EncycloPetey: I’d argue that other ontology project have handled taxonomies with a few numbers of properties (two, one to link instance to their class, and one for subclass relationship actually) and several class tree instead of creating one property for each taxonomies, with great success.https://en.wikipedia.org/wiki/OBO_Foundry For example https://en.wikipedia.org/wiki/OBO_Foundry (and they have many class trees). Following their path would probably be an help for interoperability if we share common principle with them. And we will have a hierarchy of artistic genre anyway (if not several, as there may be several ways to classify genres), so having a specific property to deal with them is not much help in my opinion. author  TomT0m / talk page 18:02, 9 February 2018 (UTC)
@TomT0m: Unfortunately, that is an encyclopedic categorization primarily for a single scientific subject field, and for a project like Wikisource, the structure quickly collapses. On Wikisource, we have followed the classification principles of the w:Library of Congress Classification. --EncycloPetey (talk) 18:26, 9 February 2018 (UTC)
@EncycloPetey: This is quite a large field, with many subfields and subontologies that are designed to work well together, which is not easy as there is many many ways to model things in a way that models won’t be easily combinable to each other. Definitely comparable with wikisource in complexity, if not waaay more complex. See https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2814061/ for example. Rapidly watching Library_of_Congress_Classification , it’s actually a topic classification amongst many other, like for example the ACM one, there is no objective reason to align to only one. As Wikidata is inclusive, this highlight the fact that we will indeed have to deal with several classification system. Plus Wikidata has a very rich system of items to precisely describe the topics of a book and relations between them, so it might be more efficient to use the « topic » property to the precise subject topic and to use knowledge fields items themselves to find close topics than to use a rigid topic classification tree designed for the non digital library era. author  TomT0m / talk page 19:01, 9 February 2018 (UTC)
And for the collapsing, I don’t know what you are referring to precisely, but if you’re referring to the category system and its loops, instance of (P31) and subclass of (P279)_are in no way comparable. Sure there is problems in our class tree but projects like OBO gives very good principle to avoid them by design. We try to unsure that a subclass is never a subclass of itself by a « subclass of » statement chain, for example, as by definition that would mean any of the classes in that path are equal. Classification (should) obey strong principles like the https://en.wikipedia.org/wiki/Type%E2%80%93token_distinction which are strong and well established, while categories are … way less structured. author  TomT0m / talk page 19:01, 9 February 2018 (UTC)
@PKM: could you give some context or example so we can see clearer here. Cdlt, VIGNERON (talk) 14:51, 9 February 2018 (UTC)
Sure! In general, I always make both work and edition items for my references, since (1) I am not always using the most recent edition of physical books and (2) I frequently use Oxford reference works which have separate editions for the online edition which will have a different date and ISBN than the physical publication. Also, I make heavy use of "main subject" and "genre" on these.
The work I was struggling with most recently was Patterns of Fashion 4 (Q48046762), a work on costume history, (edition: Patterns of Fashion 4 (Q48047403)). I used "book" here because I'm not sure what is better.
Another example is The Concise Oxford Companion to English Literature (Q47463825) (editions: The Concise Oxford Companion to English Literature (Q47463849) online 3rd, The Concise Oxford Companion to English Literature (Q47463828) online 4th). I used "creative work" here, again with some uncertainty. I have used "reference work" for items identified as "dictionary of..." or "encylopedia of..." in the past, but I'm not sure that's right for a general history of something. - PKM (talk) 19:24, 9 February 2018 (UTC)

can someone help me create a property for books?[edit]

hello! I've never created a new property. Is someone available to help? thanks! שילוני (talk) 14:03, 12 February 2018 (UTC)

Some tricky properties of a book[edit]

I'm clearly out of my depth here. It would be appreciated if someone else can take on cleaning up Q1366818 (the book Escape to Life) and then ping me to look at how it would be done correctly.

BEGIN: copied from Wikidata:Project chat.

Q1366818 (the book Escape to Life) presents an interesting situation on several counts. I'm wondering what, if anything, of the following we can somehow convey.

  • The book was originally published in 1939 by Houghton Mifflin. We have an existing entity Q390074 for present-day publisher Houghton Mifflin Harcourt, but not for this predecessor. It would be inaccurate to say that the book was published by Houghton Mifflin Harcourt; what should we do?
  • Klauss and Erika Mann originally wrote the book in German, but it was first published in English translation. A German edition did not come out until 1991. Is there any way to convey that the book was written in German, but first published in English? Is there any way to indicate the first German edition as being just that?

Jmabel (talk) 05:37, 16 February 2018 (UTC)

@Jmabel: - There's quite a lot of prior art at Wikidata:WikiProject Books; they seem to list the pertinent statements for the Work, and for the Edition, as far as i can see from a quick glance. (Sorry I'm pointing you elsewhere rather than answering in detail.) hth --Tagishsimon (talk) 09:02, 16 February 2018 (UTC)
Interesting. As a relatively casual user of Wikidata, how would I be likely to have found that page, other than coming here to ask? - Jmabel (talk) 16:13, 16 February 2018 (UTC)
@Tagishsimon: Even after reading that page, I don't see answers to either of the questions I asked above. Did you read the page and see answers to my questions? Or was this just "there's a lot of stuff about books at Wikidata:WikiProject Books, your questions might be answered there"? I think someone more expert than I on Wikidata would do well to see if this can be expressed with current properties (and if so I'd be interested in learning how). In particular, I'm guessing that for Houghton Mifflin there is some way to do this with custom properties, but I haven't been able to work out how to create one of those. - Jmabel (talk) 16:27, 16 February 2018 (UTC)
@Jmabel:
in fact, Escape to Life (Q1366818) is flawed because it is defined as a work Q7725634, but contains publication infos. If you read the Wikidata:WikiProject Books page, you've seen that works and version, edition, or translation (Q3331189) are 2 different types. The work should contain only info about the authors, the original language (german), the original (german) title, the genre, and links to edition items.
infos about editions, both in english and german, must each go into an version, edition, or translation (Q3331189) item, which would then have all the properties about the publisher, the year of publication, the title of the said publication, etc. like a traditional library catalog. For each edition there must be a different item, and it would be preferable if you could add a library ID for edition, LoC for example, to be able to differentiate editions and have reference.
Then, each edition is linked to the work item through edition or translation of (P629), and in the work item, you may link to the publications through has edition (P747). Then, you can indicate on the English edition, that it was the first edition, like I did with Escape to Life (Q48914392). It should also be done for the first german edition, for which I have no info at all.
this may seem a little complicated, but it is the only way to manage data about the work and data about the different editions, without mixing them up.
if you need help, you may seek it on the discussion page of the project.
as for your question about publisher, on Houghton Mifflin Harcourt (Q390074), I see it was created in 1880, so it is the right publisher. Publishers often change their name through time, and it is written differently on many books, and it still is the same publisher... If it is the actual denomination in 1939 that bothers you, you can add a stated as (P1932) qualifier to set the exact name of the publisher at the time of publication. :) --Hsarrazin (talk) 16:50, 16 February 2018 (UTC)
Houghton Mifflin Harcourt doesn't seem to me like just a "change of name" of Houghton Mifflin. It represents a merger with the historically equally important Harcourt Brace Jovanovich (previously Harcourt Brace, then Harcourt, Brace, and World, then Harcourt Brace Jovanovich). Aside: there used to be a joke in the publishing industry that the name was changed because Jovanovich thought he was more important than the world.
I'm clearly out of my depth here. I'll bring it to Wikidata talk:WikiProject Books. - Jmabel (talk) 17:04, 16 February 2018 (UTC)

END: copied from Wikidata:Project chat. - Jmabel (talk) 17:06, 16 February 2018 (UTC)

@Jmabel: I totally agree with Hsarrazin, you should had one item for each edition, it's the easiest and simpliest way to go. Cdlt, VIGNERON (talk) 08:35, 23 February 2018 (UTC)
Hsarrazin (talkcontribslogs) VIGNERON (talkcontribslogs) So no item at all for the book as a work, just for editions? Because that is not at all the way that, for example, Hamlet (Q41567) is handled. - Jmabel (talk) 16:51, 23 February 2018 (UTC)
Also, I still see no way to express that the work was written in German, but first published in English translation. - Jmabel (talk) 16:52, 23 February 2018 (UTC)
this is deduced from the fact that the work item's language is German, while the first edition's language is English. --Hsarrazin (talk) 17:04, 23 February 2018 (UTC)
@Jmabel: you obviously need to keep the current item (Escape to Life (Q1366818)) about the work but you also need items for the editions (ideally for all the editions). Reminder: a work is an intangible object, it's *never* published what is published is de facto an edition. When you say "the work is published in English", it's in fact "the work has an edition in English". If you have several items, then it's easy to say "give me date and language of the first (or all, or the last) edition(s) of this work". Cdlt, VIGNERON (talk) 17:24, 23 February 2018 (UTC)
One solution:
one item Qxx0 for the work, with language in German
one item Qxx1 for the manuscript, with language in German
one item Qxx2 for the first edition, with language in English, translated from Qxx1, edition of Qxx0
one item Qxx3 for the second edition, with language in German, edition of Qxx0
For the editor problem, two cases:
1) Houghton Mifflin bought Harcourt Brace Jovanovich and changed its name by the same occasion. In that case, one item is sufficient, with two significant events, one for the buy and the second for the name change.
2) Houghton Mifflin merged with Harcourt Brace Jovanovich in a new entity called Houghton Mifflin Harcourt and in that case a new item is necessary for Houghton Mifflin Harcourt. Snipre (talk) 22:47, 23 February 2018 (UTC)
@Jmabel: Snipre (talk) 23:08, 23 February 2018 (UTC)
The history of Houghton Mifflin & Harcourt is even more complicated (Reed Elsevier had bought Harcourt, turned it into a couple of divisions of Reed Elsevier while keeping the names, then eventually sold those divisions (and also I believe some things that were never part of Harcourt) to Houghton Mifflin which changed its name to Houghton Mifflin at the time of the acquisition. So I guess it's more like your case 1, though I doubt we have entities in WikiData that describe exactly what Houghton Mifflin acquired. - Jmabel (talk) 00:14, 24 February 2018 (UTC)

Collection[edit]

Is there any way to include an edition in a collection of books? I mean, if I want to say some french edition belongs to Le Livre de poche (Q1629027) I can't use collection (P195) without triggering constraint issues thus this property is intended just for paintings, sculptures, etc. I don't know how to handle it -- maybe "part of", "series" or anything else. Any ideas? Thanks. Wikidelo (talk) 14:54, 18 April 2018 (UTC)

You should definitely not use collection (P195), as this is used to link the item to a collection assembled by a collector or collecting organization. Your example rather fits the definition of schema:Series; series (Q20937557) has <equivalent class> schema:Series. However, schema:Series is defined as a sub-class of Creative Work, while series (Q20937557) is not. Instead, Wikidata uses a qualifier to indicate the type of items of a series, e.g. Welsh Triads (Q2542444) (manuscripts), Zanja de Alsina (Q301895) (fortifications), Triumph Tiger (Q3539718) (motorcycles). In addition, several sub-classes of series (Q20937557) series have been defined - you may explore them using the Wikidata Ontology Explorer. Some of them relate to creative works. Maybe this would require some tidying up. --Beat Estermann (talk) 06:09, 19 April 2018 (UTC)
Maybe the following queries answer your question better. You may just replace the "P1433" in the second query to output the same list for the other properties. --Beat Estermann (talk) 06:51, 19 April 2018 (UTC)
#List of properties linking an item to a book series, ordered by frequency of use
SELECT ?property ?propertyLabel ?count WITH {
  SELECT ?property ?value (COUNT(DISTINCT ?item) AS ?count) WHERE {
    ?bookseries wdt:P31/wdt:P279* wd:Q277759.
    ?item ?wdt ?bookseries.
    ?property a wikibase:Property;
              wikibase:directClaim ?wdt.
    FILTER(?property != wd:P31)
  }
  GROUP BY ?property ?value
  ORDER BY DESC(?count)
  LIMIT 10
} AS %results WHERE {
  INCLUDE %results.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en-US,en,en". }
}
ORDER BY DESC(?count)

Try it!

# List of item / book series pairs linked with the property P1433 (published in)
SELECT ?item ?itemLabel ?bookseries ?bookseriesLabel 
WHERE
{
  ?bookseries wdt:P31/wdt:P279* wd:Q277759.
  ?item wdt:P1433 ?bookseries.
   
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

@Beat Estermann: But collection like Le Livre de poche (Q1629027) can be considered as a collection of written book by an editor. This similar to a collection of paintings by a collector. Written works are creative works so I don't really see why we should do a difference. Snipre (talk) 07:40, 19 April 2018 (UTC)
In fact, you're talking about a series which is called "collection". Let's take the example of stamps - you may have distinct series of stamps issued by the postal service which may respond to specific design criteria, and you may have collectors building collections of stamps according to their own criteria. I believe the case of Livre de poche is more like the case of the postal service. And yes, I think it makes sense to keep the two cases apart in the data model. Cheers, Beat Estermann (talk) 07:48, 19 April 2018 (UTC)
@Beat Estermann: Thank you very much for the examples. My skills with SPARQL are just slightly better than my personal best at pole vaulting (4 inches). Anyway, I see your point. I've been looking at several wikis and it seems the concept of "collection" has two ambiguous approaches: for French, Italian, Spanish, and a few more wikis, the idea of "collection" is more or less consistent [[7]]; nevertheless for enwiki there's no main category for "collection" (Penguin Classics, Everyman's Library, Folio...), just "series" [[8]]. Maybe I'm too picky, but for me, "series" should be Zuckerman Bound (Q17054053) and all the Zuckerman's books, or Harry Potter (Q8337), but Pocket Penguins (Q25037402) or Colección Austral (Q5776710) are quite another thing. I've also seen that there is little or non-existent consistency populating those items in wikidata. A lot of them don't even have a single statement, some are qualified as editorial collection (Q20655472), which is also weird because it's an instance of "art collection" and "catalog", and some are instances of "imprint" (Penguin Classics (Q11281443)). Messy. I agree with @Snipre:, "why we should do a difference"? But I also agree with @Beat Estermann:: maybe mixing art collections with paperback editions is not a good idea for the data model. Wikidelo (talk) 21:47, 19 April 2018 (UTC)
@Infovarius: It seems a good candidate. I'll try it. Thanks! Wikidelo (talk) 14:29, 20 April 2018 (UTC)
I don't think published in (P1433) is a good solution; the definition implies that it should be used for example for papers published in a scientific journal or for individual stories published in a larger work. In fact, many of the hits the above query produces point to scientific "book series" which are treated like journals (they also get a journal identifier). I would rather use series (P179) instead; and possibly use different classes for different types of "series"/"collections". --Beat Estermann (talk) 17:38, 20 April 2018 (UTC)
@Beat Estermann, Wikidelo: Please provide an objective criterion to exclude books groups from collection. Please explain what is the common parameter between a paintings collection and a sculptures collection not shared by a books collection ?
series (P179) is not appropriate as series (P179) implies an order, a sequence: each element of a serie has to be connected to one or two others members of the serie using qualifiers like followed by (P156) or follows (P155). A collection is an ensemble of items grouped according to an arbitrary criterion.
@Jura1: Do you have some elements to distinguish series (P179) from collection (P195) or do we have to merge both properties ? Snipre (talk) 23:20, 20 April 2018 (UTC)
@Snipre: I think I provided the distinction above - a "collection" has been collected by someone according to some criteria and is different from a "book series" (definition: "a set of independent books in a common format or under a common title or supervised by a common general editor", Oxford Dictionary). I am totally aware of the fact that the latter definition coincides with one of the definitions for "collection"@fr ("Série d'ouvrages, de publications ayant une unité", Le Petit Robert). – "collection"@en does not have this meaning; and as the definition given in Le Petit Robert implies, the property <"series"@en; "série"@fr> should be absolutely fine in this context - even from a francophone point of view. This said, we still may want to distinguish different types of series/collections at the level of the class definitions. The following distinguishing criteria come to mind (I don't have time right now to do a thorough terminological research across several languages, but we should maybe do that at some point - create a table with multi-lingual definitions and alignment with the respective WD class):
  • Have similar items been collected ex post or do they have been designed to be similar from the beginning?
  • Do the items have one common originator (publishing house, editor, etc.) or do they have different originators?
  • Can the items be brought in a natural order (are they even numbered)?
  • Does the order of the series apply to its content or rather only to its publication date or similar?
  • Is the series complete (we know all the items) or open (further items may be added)
So much for now... --Beat Estermann (talk) 07:27, 21 April 2018 (UTC)
@Beat Estermann: I don't know if it answers all your questions, but here's an example at LibraryThing [[9]] (note that the covers are randomly displayed; they aren't the actual covers of this particular editions). Wikidelo (talk) 11:47, 21 April 2018 (UTC)
@Jura1, Beat Estermann, Snipre: I guess series (P179), catalog (P972) or collection (P195) could be tweaked to somehow solve the issue, but I can't figure out which one is the best candidate or in which way it could distort the main purpose of each property. I agree with all of your points of view but, as a non-librarian and a noob, I can't tell which one is best. I think the LibraryThing approach is quite interesting [[10]]. They use "Publisher series" for the "collection" concept I was talking about in the beginning. The closest we have here is editorial collection (Q20655472). So, we can broaden editorial collection (Q20655472) to include "Publisher series" or we can create a new "Publisher series" item. Then we can consistently make Penguin Classics (Q11281443), Pocket Penguins (Q25037402), Le Livre de poche (Q1629027), etc, instances of editorial collection (Q20655472) or the new Q:"Publisher series". Wikidelo (talk) 11:34, 21 April 2018 (UTC)
@Jura1: I think that should be only applied to version, edition, or translation (Q3331189), otherwise we could end up with a mess, like book (Q571) with one or multiple ISBN. Wikidelo (talk) 11:34, 21 April 2018 (UTC)

Edition item properties[edit]

I've found a couple of properties that could be useful for Wikidata:WikiProject_Books#Edition_item_properties:

What do you think? Wikidelo (talk) 20:07, 21 April 2018 (UTC)

yep, seems interesting :
typeface/font used (P2739) would be mostly interesting on ancient books, where editions can be discriminated through font (I worked on a 1502 Venetian edition (Alde) this week, and can very well see where it would be useful). --Hsarrazin (talk) 09:25, 19 May 2018 (UTC)

Some questions, based on my first items for works / editions[edit]

Hi everybody. I've just created my first couple of sets of items for editions and works, at England Delineated (Q52228403) + editions, and A Topographical Dictionary of Wales (Q52240439) + editions, and I wondered if somebody could sanity-check what I've done, to see if there's anything I've done that's not right, or could be done better.

I've got quite a few editions/works that I'm about to organise some images for on Commons, that I'm intending to create Wikidata items for at the same time, so it would be good to know whether I've got things basically right.

The item-sets above I've hand-created as sort of a target to aim for. For the items going forward, I hope to be working much more from existing metadata, doing much less by hand, so they probably won't end up being as complete. But I thought if I do a couple by hand, then I could see where the different fields would fit, when or if I have them.

A few questions occurred to me along the way (below), that didn't seem entirely explained in the existing guidance at Wikidata:WikiProject_Books#Bibliographic_properties.

Sorry if it is rather a lot of questions, but it was my first time making works / editions items, so I was very much feeling my way, and hoping I was getting things right (or at least not too wrong!) So many many thanks in advance for any thoughts or comments, to set me on the right way. Regards, Jheald (talk) 23:59, 28 April 2018 (UTC)

Titles and subtitles of works/editions[edit]

Since nineteenth-century book titles can be quite long, what should go where? Are title (P1476) and subtitle (P1680) the right properties to use? Where should a title be split between them? Should one try to get everything into title (P1476) if one can, eg as done at England Delineated (Q52228403), or should I have tried to split this title?
With A Topographical Dictionary of Wales (Q52240439) I ran into the apparent problem that string fields can only be 400 characters long, leading to a rather unnatural split between the two parts of the title. Is there a better way round this? (eg is there any alternative property that ought to be used instead? Are there some properties that Wikidata allows to have longer strings?)
In practice this probably won't be a problem, because long titles in my metadata look as if they have already been truncated with ellipsis (...); so I shall probably go with just dumping the whole of that title into title (P1476). But it would be good to know what the 'right' thing to do is considered to be.
Query results for longest book titles tinyurl.com/ycnqu9s3, showing a few other examples
Usually there is a short 'convenience' title, which is typically what has been used for the item label -- should we have a property for this? If these Wikidata entries were being used to power reference citations, how would one most normally expect the the title to be stated in such a reference? Are we storing the information to generate that?
MARC 245 divides a title into: 'title' ($a) and 'remainder of title' ($b), stating that
"In records formulated according to ISBD principles, subfield $a includes all the information up to and including the first mark of ISBD punctuation (e.g., an equal sign (=), a colon (:), a semicolon (;), or a slash (/)) or the medium designator (e.g., [microform])."
Is that something we should try to emulate, or does it bring its own problems?
If the "remainder of title" would not fit into 400 characters, do we need a new property such as "subtitle (continuation)" ? Perhaps as a qualifier, since it would need to be linked to a particular value of "subtitle" ? Jheald (talk) 11:46, 29 April 2018 (UTC)

Naming of items[edit]

I've gone with shortened titles for item names, pretty much truncating at the first punctuation. I presume this is okay. One thing I wondered about was preference for title-case / sentence-case (ie how many words to capitalise in item names -- and in book titles). Again, I'm likely to follow the metadata, but I notice that for full titles, there seems to be an avoidance of title-case, presumably because it simply makes them too unreadable. For the shorter forms used for item names I think I prefer title case, but I did notice that a couple of existing items (England delineated (Q28872042) -- unrelated; and A topographical dictionary of Wales (Volume I) (Q25219289) -- a bit mixed up) both preferred sentence case. Are there any strong feelings one way or the other?
Also, naming of editions -- I've gone with England Delineated (1st edition) (Q52281940), England Delineated (2nd edition) (Q52229333), etc. Is this appropriate; what is the preference between this or using years for the suffix instead? (Or using both, eg "(1st edition, 1790)" -- are years helpful to include?) It may be easier to extract years from the data; on the other hand, there may be multiple years for the same edition (as we shall see).


Edition number[edit]

I'm getting a constraint warning for including series ordinal (P1545) as a qualifier to make the edition names more machine interpretable. Is there an objection to these? It seemed to me they might be helpful.
Also, does anyone know of any gadget or add-on to make it easier to re-order statements? eg the sequence of editions on England Delineated (Q52228403) is a bit of a mess at the moment. (I was kind of hoping that if I added series ordinal (P1545) qualifiers the software might sort them itself, but no joy). Yes, it's no big deal to do by hand; but if they were machine-added it could be a bit of a pain.
Your solution is not sufficient to sort editions: if several publishers generated several editions, then you should add another qualifiers to be able to distinguish which editions have to group before being sorted.
Ex. Publisher Y published one first and one second edition of a work, and publisher X published one first edition, one second and one third edition of the same work, how your system will help to sort edition of pulisher Y ? To be able to work correctly you don't have to use data from work item to deal with data about edition items. You are formatted with a wikipedia format where everything is mixed together in the same document. WD is a database, and data are splitted in several containers or items and you have to perform an extraction from the different items instead of duplicating data in the different items. Snipre (talk) 22:30, 29 April 2018 (UTC)

Different scans of the same edition[edit]

Turning to the edition items now, I have sometimes found cases where there are multiple different scans in circulation for the same edition -- see eg A Topographical Dictionary of Wales (3rd Edition) (Q52243033) for a particular case.
The constraint checker doesn't seem to like there being multiple different Google Books ID (P675) values for the same item. (Similarly for Open Library ID (P648) and Internet Archive ID (P724)). It also doesn't seem to like them being qualified with volume (P478). Are there real objections to consider here, or would it be appropriate for these constraint conditions to be relaxed a bit?
It probably also doesn't like me using publication date (P577) as a qualifier. In this case, the same edition has gone through various impressions, eg 1843, 1844, 1845. These differ little, apart from a few entries in the errata page, and a change of address at the end of 1843 for the publisher. To me it very much makes sense to group the different impressions together into a single item for the edition -- it makes it much easier to see the major developments of the text.
On the other hand, is it sometimes helpful to track particular scans? For example, very often the Internet Archive or Hathi scan corresponds to a particular scan from Google (although not always) Is it helpful to indicate this in some way? But if so then how?
Also, it may sometimes make sense to put images extracted from different scans of the same edition into different Commons categories (so that each category corresponds to images from a single exemplar of the book). In that case I'm guessing it may make sense to treat them as different exemplars from within the same edition, connected back by exemplar of (P1574) -- but most of the time I'm thinking that is probably an unnecessary complication? Is there an inverse property, to announce that the edition may have different exemplars?
Scans are similar to exemplars, and should not be mixed in edition level. If you want to add data about one scan, then you should create a new item for the scan and linked that item to the edition item like you link the edition item to the work item.
Ex.: if you want to add the localizaiton of one exemplar of the Gutenberg's bible, you haven't to add this data in the edition item, but create a new item. If you have three different ID and one property like publication date which are different for the scans, this justifies news new items to avoid confusion. Snipre (talk) 22:36, 29 April 2018 (UTC)
Okay, so let's take a look at how this might work.
I have created a new class individual book (Q53731850) for distinct printed copies of books (a few more eyes checking over its statements would be very welcome!), and changed the statements on a pair of existing items (hat-tip to User:Sic19), namely On the laws and practice of horse racing, etc., etc (UC copy) (Q51425849) and On the laws and practice of horse racing (UPenn copy) (Q51514189) to make them instances of it, and exemplar of (P1574) a new item On the laws and practice of horse racing (1866 edition) (Q53738443), to which I have moved statements that were specific to the edition rather than the copies.
However, with respect to information about online copies, I have duplicated this on On the laws and practice of horse racing (1866 edition) (Q53738443) as well as the items for the distinct copies, using statement is subject of (P805) to distinguish which copy each scan is taken from. I think it's useful to collect together information about all online copies on the edition item, because I think this is where people will look for that information, both directly as humans, and when writing queries. This creates an issue in the form of a violation of the 'unique values' constraint, but this can perhaps be worked around.
In practice I wouldn't expect many such items for distinct printed copies to be created. Internet Archive ID (P724) already allows collection (P195) to be used as a qualifier, and that should usually be enough to distinguish different scan families without the need for new items. So I would expect individual book (Q53731850) items only to be created rather lazily, in particularly complicated cases, or when there is specific information related to particular copies that people want to record. Jheald (talk) 22:10, 18 May 2018 (UTC)

Annotations for edition number (P393), publisher (P123), and printed by (P872)[edit]

In each case I have use stated as (P1932) to indicate how the name was actually stated on the title page. I hope this is acceptable. IMO it's for example quite useful to know that eg all copies of England Delineated (2nd edition) (Q52229333) were stated to be "Second Edition, with Additions and Corrections" -- this doesn't eg indicate an edition "2.1" following on from an initial release "2.0".
Where there isn't yet an item for the printer or publisher, I have used the special <some value> value, and then annotated it with the text from the page -- as eg at Q52283171#P872. In fact, this is what I am intending to do systematically for printers and publishers on initial item upload, then going back to see if there are ones I can match. I hope this is acceptable.
Also, where there is an identifiable address, I have put this in a located at street address (P969) qualifier. Again, it's not suggested on the Books item style page, but I hope this is considered reasonable. With enough of these, it may be quite nice to be able to track the different addresses for a printer or publisher over time, with the works issued from each one.
The only complication I found was with A Topographical Dictionary of Wales (3rd Edition) (Q52243033), where the publisher's address changed during the print-run and/or re-issues of the edition. You can see how I've dealt with this, but I am open to suggestions, if anyone has a better thought.
Wrong. You mix the person and the company. You have to create an item for the company, i.e. S. Lewis and Co., and the address should be saved in the item of the company, not in the book's item. Again, you mix data about different concept into one item. address is not a characteristic of a book but of a company. Snipre (talk) 22:43, 29 April 2018 (UTC)

Annotations for full work available at (P953)[edit]

What qualifiers/annotations are recommended for P953? You can see what I have done at eg Q52243033#P953. Is this appropriate, or are other things that should be added? There are quite a lot of potential qualifiers in current use, eg tinyurl.com/ya5g4mxa. Are there any that it would be particularly valuable to try to make a point of including?
(BTW, I am presuming that if an edition has Google Books ID (P675) or NRHP reference number (P649) or Internet Archive ID (P724), then that suffices and it is unnecessary to add a P953 to the same scan?)
Also (perhaps related to the question that User:MartinPoulter raised a few threads above), what is the best way to indicate that a site offers eg a cleaned-up transcription of the full text, such as at Q52243156#P953, rather than the more common page scans + OCR ?
One other question that came up was how best to indicate multiple volumes available at the same link -- for example with Q52241009#P1844, both volumes are available at the same link (but as two different files). I indicated this by adding both volume (P478) = 1 and volume (P478) = 2 as qualifiers on the same statement. But at Q52241558#P953 the two volumes have been combined together into a single scan-file (they may also have been bound together). I tried to indicate this using volume (P478) = "1 & 2", but the constraint checker doesn't like this. Is the preferred way therefore to do what I did for the Hathi trust case? Or is there a different way to indicate two volumes together?
Same as above: instead of creating a bunch of qualifiers, create one item for the scan or the electronic version with all data including the link to the online version. Snipre (talk) 22:46, 29 April 2018 (UTC)
  • There is a qualifier to identify a specific page in a pdf: title page number (P4714), information that can't be stored otherwise. For the reminder, I think it depends how far you want to go. If you think a detailed description is needed, it might be preferable to create separate items.
    --- Jura 11:42, 4 May 2018 (UTC)
wikisource texts 

Also, I would say : I have encountered items where full work available at (P953) had been used to link to a wikisource page... which was already present in the wikisource section of the same item. This is really useless and dirty. when an edition is available on wikisource, just link it as wikisource link ! --Hsarrazin (talk) 09:41, 18 August 2018 (UTC)

England Described (1818) (Q52284408)[edit]

I wasn't sure how to treat this. Should it be treated as an edition, or would it be more appropriate to treat it as a new work in its own right?
On the one hand, it is a much more extensive enlargement and rewriting of England Delineated (Q52228403) than the previous new editions. But on the other hand, it is an enlargement of Q52228403, albeit with a lot of new material, leading to a somewhat different focus.
If one is looking down the list of editions at England Delineated (Q52228403), is it helpful to see it included? (Google in fact titles it as such). Or would a stand-alone item and based on (P144) have made more sense?
There is no rule for that: starting when a modified edition starts to become a new edition or even a new work ? Usually the contributor who is adding this version has to choose based on expert or historic considerations. Snipre (talk) 22:51, 29 April 2018 (UTC)
I'm not sure what to use for the P31-statement but to express the relationship to England Delineated (Q52228403) you could use modified version of (P5059) instead of edition or translation of (P629) (or based on (P144)) to express that it is not a direct edition or translation but a modified version. - Valentina.Anitnelav (talk) 15:31, 3 May 2018 (UTC)

Does there *always* need to be a separate work and edition item?[edit]

Finally, if (as would be the case with England Described (1818) (Q52284408)), this is the only time the title was issued, is it appropriate to try to combine 'work' and 'edition' in the same item (as OpenLibrary does, or at least displays) ? Or is it still required to create two items, even though they will be rather redundant to each other?
Thanks in advance, Jheald (talk) 00:10, 29 April 2018 (UTC)
Any more thoughts about this?
Per guide to item structure on the project page, a lot of properties are expected to be located on version, edition, or translation (Q3331189) items.
So, in cases when there has only ever been one edition, if we do accept that only a single item should be created, it would make sense to me for it to be made instance of (P31) both version, edition, or translation (Q3331189) and book (Q571).
If we do go down this route, it might be helpful to include edition or translation of (P629) statement pointing to itself -- I think query writers would find this useful, so that separate edition and work items and combined edition/work items could both be dealt with in the same way.
Does that seem a sensible suggestion to people? Jheald (talk) 22:28, 18 May 2018 (UTC)
Yes, if and only if both concepts are merged inside the same item, meaning that all properties linking the edition and the work as as all editions properties and all work properties are present in the same item, then we can consider that solution. The risk is more about constraints: we will complexify the monitoring of properties use. Snipre (talk) 23:50, 20 May 2018 (UTC)
@Snipre: So you're saying it would also need has edition (P747) on the item, pointing to itself? Jheald (talk) 09:24, 21 May 2018 (UTC)
@Jheald: Exactly. That's the only for lua scripts retrieving data from both work and edition items to be able to work without complexifying the code. But again this solution is possible but not recommended as we will have problem for some constraint definitions. Snipre (talk) 11:13, 21 May 2018 (UTC)
Question: Has England Described (1818) (Q52284408)) even been published in translation? That would require a separate data item for the edition. So, we're talking about finding a way to have a single data item for a work that was issued only once, in only one language, by only one publisher, from only one location, and never translated nor reprinted. --EncycloPetey (talk) 00:16, 21 May 2018 (UTC)
@EncycloPetey: Agreed. But it's not such an uncommon case -- in fact I would think it is the most common situation for most classes of old books. Jheald (talk) 09:22, 21 May 2018 (UTC)
That's not my experience with old books. My experience is that many were published as UK/US editions, or were published in another language, or had the contents appear later in another edition. --EncycloPetey (talk) 15:07, 21 May 2018 (UTC)

subject areas and genres[edit]

Sometimes genre (P136)-statements have subject areas/academic disciplines as their values. The most frequent are philosophy (Q5891), art history (Q50637) and history (Q309), but there are also cases like statistics (Q12483) and finance (Q43015). I see that this somehow mirrors the practice in book shops but I'm rather sceptical if it is the best way to express the fact that a work is of interest for a certain discipline. I see following options to deal with subject areas used as genres:

  1. Generally allow instances of academic discipline (Q11862829) to be used as values in genre (P136)-statements
  2. Create a new genre-item for each subject area that is used as a genre
  3. Expand the scope of an already existing property to be applicable for those cases, too (field of work (P101) is the one that comes to my mind, but maybe there are others)
  4. Create a new property <subject area> that has works as its domain and subject areas as its value

I don't really like the first two approaches (they tend to misuse genre (P136) as a catch-all), but what do you think about this issue? - Valentina.Anitnelav (talk) 14:53, 3 May 2018 (UTC)

@Valentina.Anitnelav: Yes, I agree that there is some confusion ,ainly because no clear classification exists about written texts.
In my opinion we need 4 properties to describe correctly
If I take the examples you provided, history, finance, statistics,... these are subject and no genre. If I should characterize your examples, I would propose as written format textbook (Q83790) and as written genre essay (Q35760), treatise (Q384515), scientific writing (Q1965486),...
History, finance, statistics are not genre but subject and the property main subject (P921) should be used instead of genre (P136). Snipre (talk) 01:04, 4 May 2018 (UTC)
I mainly agree with you, Snipre, and I especially like the idea to separate between form (or written format) and genre.
I also thought about using main subject (P921) for subject areas. I abandoned the idea as this is actually not very accurate: the subject area of a work is seldomly the main topic. See for example The religious and historical paintings of Jan Steen (Q29589359). It is a catalogue about Jan Steen (Q205863), not about art history. Art history is the subject area this book is written in or of interest for. On the Genealogy of Morality (Q230302) is about morality, not about philosophy (in difference to The Problems of Philosophy (Q3393210)) - Valentina.Anitnelav (talk)
@Valentina.Anitnelav: You can add several subjects so I think you can really that the book is about art history and Jan Steen (Q205863) and even add the list of works mentioned in the catalog. Snipre (talk) 10:03, 4 May 2018 (UTC)
@Snipre: I see a problem with the use of main subject (P921) because those statements would be inaccurate (not because of the number of values). On the Genealogy of Morality (Q230302) is not about philosophy (e.g. its principles, questions, methods, development) and The religious and historical paintings of Jan Steen (Q29589359) is not about art history (e.g. its principles, questions, methods, development). It should be possible to get all books having philosophy as its main topic (e.g. The Problems of Philosophy (Q3393210) and What is Philosophy? (Q7991586)) without getting every book in the field of philosophy. - Valentina.Anitnelav (talk) 10:56, 4 May 2018 (UTC)
That's why libraries often use "Schlagwortketten" (subject strings) like "Philosophie - 19. Jahrhundert - Nietzsche - Moral". Imho main subject (P921) is the right property, since an editor is free to add a second "main subject" or replace a general subject like "philosophy" with a more precise term like "Frankfurt School". --Kolja21 (talk) 01:36, 11 June 2018 (UTC)

Time for a new "subject facet" property ?[edit]

@Valentina.Anitnelav, Snipre: Further to the above, I wonder if it would be useful to propose a new "subject facet" property ?

For the Bioheritage Diversity Library (BHL) books, discussed in this section below, that we now have 60,000 items for, the BHL releases a 'keywords' dataset, that it would be useful to think how best to add.

Looking at keywords that have more than 400 hits (from the volumes of the whole collection, not just the items we have titles for), a few we might consider to relate to the form of the item (ie what the item is), viz:

Periodicals (43829); Catalogs (10750); Pictorial works (1635); Internet resource (1455); Electronic books (929); Collected Works (704); Early works to 1800 (682); Catalogs and collections (571);

But mostly they are indicative of the subject matter, ie:

Natural history (10273); Science (9892); Botany (8022); Nursery stock (7413); United States (6392); Plants (6044); Birds (6012); Seeds (5556); Zoology (4745); Nurseries (Horticulture) (4643); Flowers (4330); Plants, Ornamental (3622); Agriculture (3482); Entomology (3219); Gardening (2985); Trees (2984); Seedlings (2855); Vegetables (2828); Geology (2746); Insects (2739); Fruit (2705); Insect pests (2670); Societies, etc (2635); Forests and forestry (2491); Fruit trees (2414); Great Britain (2329); Paleontology (2210); Control (2184); California (2184); Shrubs (2109); New York (State) (2078); Biology (2068); Angiospermas (2017); Bulbs (Plants) (1939); Flora (1803); Mollusks (1799); Germany (1717); Classification (1703); North America (1661); Fisheries (1651); Bibliography (1604); Horticulture (1577); Fishes (1512); Horses (1264); Equipment and supplies (1262); Australia (1220); France (1215); Ornithology (1177); Massachusetts (1155); Diseases and pests (1114); Montana (1091); Grasses (1089); England (1049); Canada (1043); Description and travel (1005); Research (1004); Pennsylvania (999); Alberta (965); History (924); Mexico (916); Anatomy (871); Fruit-culture (868); Illinois (845); India (834); Italy (818); Europa (809); Washington (State) (809); physiology (797); Oceanography (796); Taxonomía (782); Ohio (771); Hunting (750); Varieties (740); Península Ibérica (739); Iowa (725); Marine biology (708); Mammals (689); Pteridófitos (682); Gimnospermas (655); Scientific Expeditions (643); Botanical illustration (637); Roses (630); Lepidoptera (614); Bees (611); America (601); Agricultural implements (585); Berries (585); Statistics (584); North Carolina (583); Prices (582); Beetles (581); Europe (578); Animals (572); Forest reserves (566); Poultry (552); Fishing (544); Obras clásicas (543); Antiquities (534); Seattle (533); Game and game-birds (532); New Jersey (516); University of Washington Botanic Gardens (516); Anatomy, Comparative (514); Colorado (508); Diseases (508); Hongos y líquenes (494); Michigan (491); Evolution (488); Environmental aspects (482); Fungi (481); Veterinary medicine (477); 1809-1884 (463); Plant diseases (462); Engelmann, George, (461); Forest management (461); Wildlife conservation (460); Briófitos (458); Ethnology (455); Medicine (453); Beneficial insects (453); Natuurlijke historie (442); Austria (440); Floriculture (439); New York (439); Field notes (438); Africa (435); Indonesia (429); Plant collecting (427); Florida (424); Learned institutions and societies (421); Microscopy (413); Asia (412); Plantas útiles o venenosas (412); Minnesota (402); Identification (401);

But these are not, in almost all cases, the main subject (P921) of the item. Instead they are more like en:faceted search terms.

So, for example, if we take a book like A systematic arrangement of British plants :with an easy introduction to the study of botany (Q51423679), library catalogues might give the subject as "Botany -- Great Britain" and "Botany -- Ireland" (those two from OCLC, which for copyright reasons we can't take; but an edition at the LoC might have something quite similar). This would correspond to our P921.

On the other hand, the BHL [11] gives keywords "Great Britain", "Ireland", "Plants".

For the reasons Valentina was expressing above, I think these need a different property, that might perhaps be called "subject facet". What do people think? Jheald (talk) 10:55, 8 June 2018 (UTC)

Proposed, at Wikidata:Property_proposal/Creative_work#subject_facet Jheald (talk) 23:20, 10 June 2018 (UTC)

Pauly-Wissowa[edit]

Hi! I've just noticed that the volumes of Paulys Realenzyklopädie der klassischen Altertumswissenschaft (Q1138524) have instance of (P31)  Wikimedia category (Q4167836) (e.g. Pauly-Wissowa vol. I,2 (Q26414652)); however, this property is in contrast with the constraint of published in (P1433) (e.g. in RE:Ancites (Q15892059)). The problem affects thousands of items and creates thousands of constraint violations. My proposal is to trasform the items of the volumes from categories to items of books (e.g. Pauly-Wissowa vol. S I (Q26469375) is actually both!). What do you think? --Epìdosis 15:22, 9 May 2018 (UTC)

✓ Done. --Epìdosis 08:40, 30 May 2018 (UTC)

Award for book or for author ?[edit]

Hi, I wonder where should be the award received (P166): on the author item or on the book item. It's assumible to have it duplicate on both concepts ?. What about the inconsistencies ?. Excuse me, if it had been discusse before and I don't find. Thanks, Amadalvarez (talk) 07:27, 12 May 2018 (UTC)

I have added many awards to people. When an award is also associated with a book, it is easy enough to add them as qualifiers.. A secondary notion is that authors are more likely to have an item than a book. Thanks, GerardM (talk) 08:08, 12 May 2018 (UTC)
@GerardM: When you say "... to add them as qualifiers.", under which property would you use award received (P166) as a qualifier? May be author (P50)?. Thanks, Amadalvarez (talk) 22:35, 12 May 2018 (UTC)
In awards such as the Hugos, sometimes the same author can be nominated (finalist) to the same category two times for two different works, so I interpret that in those cases the award winners are not the authors but the works. However, since the winner (P1346) property is a person, I add award received (P166) to both person and literary work. Also, if your local Wikipedia templates support Wikidata integration, this way they can automatically show the awards received by the person in his/her infobox, and by the literary work in its infobox too. --JavierCantero (talk) 08:28, 13 May 2018 (UTC)

Recording the edition format[edit]

One of the English-language aliases for property distribution (P437) is "book format".

Is distribution (P437) appropriate to record the format of the books in a particular edition -- eg folio (Q772267), quarto (Q2122442), octavo (Q1307353), duodecimo (Q1266414) etc -- as at eg Q53576187#P437 ?

Or should the values allowed for distribution (P437) be restricted to the currently permitted hardback (Q193955), paperback (Q193934), pamphlet (Q190399), softcover (Q990683), Library binding (Q6542551); and perhaps a new property be introduced for book format, akin to newspaper format (P3912) ? Jheald (talk) 20:36, 19 May 2018 (UTC)

Number of pages[edit]

We have the property number of pages (P1104).

If a source gives the number of pages for a book as eg "viii, 187 p." or "338, xlviii p.", are there agreed values to attach to an applies to part (P518) qualifier to denote the number of pages of front matter (front matter (Q24033349)), main content, and appendices respectively ? Jheald (talk) 20:47, 19 May 2018 (UTC)

I would love to hear a cataloguers professional point of view. From my experiences with reproducing old works there is no such thing as uniformity, so at best guess you are seeing the last numbered page of each section. Page number in the fore sections of a work are often variable, some will label plates, some will not, "number of pages" as a concept itself is problematic, and it changes through time. If we are going to go via sections, then we would also need to start a number of plates in a work. Then to make things more complex when there are addition editions, you can even see inserted pages with nnnA, nnnB, ... so they didn't have to renumber the whole work. Dashed variability and changes in time!  — billinghurst sDrewth 23:28, 20 May 2018 (UTC)
"last numbered page of each section" makes a lot of sense. My second example above is in fact actually stated as "238 [338], xlviii p." in the catalogue -- presumably the last numbered page was also wrongly numbered in this case!
Do you think it ould it be worth a specific new property, last numbered page? Jheald (talk) 09:18, 21 May 2018 (UTC)

Using volume as a unit[edit]

I have been using volume (Q1238720) as a unit for the property number of parts of this work of art (P2635), eg at Q53574199#P2635.

Does anyone know if there is a way to make volume appear in the plural, ie as volumes ? Jheald (talk) 09:36, 20 May 2018 (UTC)

@Jheald: wouldn't the application of plural be something that is more general, and probably be language specific? As we know for many languages there is a general rule, and maybe it could be managed by a general rule with exceptions, though still that will be extensive when you get to number of languages.  — billinghurst sDrewth 23:18, 20 May 2018 (UTC)
@billinghurst: I was thinking of perhaps a slightly more general mechanism, as to whether there was a string that could be specified (in each language) to be shown when the item is being used as a unit. (Though that would still leave the single/plural issue, but we don't usually state when a book or edition is only a volume). I found and added P558 (P558) and unit symbol (P5061), but they don't seem to help. Jheald (talk) 08:46, 21 May 2018 (UTC)

Do we keep both or merge?[edit]

How would we handle a situation such as Our native ferns and their allies; with synoptical descriptions of the American Pteridophyta north of Mexico (Q51515725) and Our native ferns and their allies; with synoptical descriptions of the American Pteridophyta north of Mexico (Q51515726)?

These two items are for two different scans of the same edition of a book. The first was scanned by Cornell University from their holdings, but the second was scanned from the University of California libraries. It is the same edition, just different scans from copies at different libraries. --EncycloPetey (talk) 01:22, 30 May 2018 (UTC)

@EncycloPetey: You need one item for the edition, without any data about the scan (available at URL,...) and 2 items, one for each scan. The items about the scan should be linked to the edition item using exemplar of (P1574) and defined as instance of exemplar (Q512674). We already discussed about that problem above (see Wikidata_talk:WikiProject_Books#Different_scans_of_the_same_edition. @Jheald: You proposed to use instance of individual book (Q53731850) for particular exemplar: I don't like the term "book", because this term is not well defined and some people can use it for the edition or even for the work. I think exemplar (Q512674) is more neutral: we have work, edition and exemplar, all can be book depending on the point of view. Snipre (talk) 11:57, 30 May 2018 (UTC)
That method presents a very real problem, however. Each item on Wikisource was been labelled an "edition" or "translation" up until now, but those "editions" are typically backed by a specific scan. So, if we do what you're suggesting, then every single Wikisource-hosted copy will have to be redone, because you're suggesting they are actually exemplars. And thus, every Wikisource copy will have (1) an exemplar item where the Wikisource copy is linked, (2) a separate edition item, and (3) associated work item where the Wikipedia entry is linked. --EncycloPetey (talk) 15:02, 30 May 2018 (UTC)
@EncycloPetey, Jheald: Wikisource is doing what they want and I don't "take care" about what they choose as model. WD has to deal with other kind of data: first some exemplars can have an history like the bible of George Washington. Then particular exemplars have some characteristics like library identifiers, so WD has to have specific items to deal with that kind of data and finally you didn't do the difference between editions and print runs: one edition can have several print runs, each print run has small differences which can change the references (for example, some data can be printed on different page number). Fot these reasons WD has to have a specific item for each exemplar. Snipre (talk) 22:06, 7 June 2018 (UTC)
@Snipre: If somebody has a particular need for a particular exemplar then they can create an item for it. But in general, WD does not need to have a specific item for each exemplar. Lazy creation at the time of need will usually be quite sufficient. Jheald (talk) 22:15, 7 June 2018 (UTC)
I would merge the two into one item, and use collection (P195) as a qualifier to distinguish where a particular scan-set is taken from.
There may be a few complicated cases where it may make sense to create distinct items for specific exemplars, but in most cases I think that would be an unnecessary complication. Jheald (talk) 15:22, 30 May 2018 (UTC)
@EncycloPetey, Snipre, Jheald: in this case, I would merge too ; maybe create item about exemplars but not for the scans. Snipre: is your point to consider scans as a fifth level of FRBR? Because, scan are not exemplar at all. If a library did 10 times a scan of the same examplar (which is not unusual at all), do you really suggest to create items for: 1 work, 1 edition, 1 exemplar and 10 scans. PS: do someone know how to contact openlibrary to ask for a merge of OL26454530M and OL7247817M (indicated as two different editions but it's clearly the same one). Cdlt, VIGNERON (talk) 13:31, 12 June 2018 (UTC)
@EncycloPetey, VIGNERON, Jheald: Not exactly, I am not creating a new level, I just consider a hard copy of a book and its scan as two exemplars in term of FRBR. So if I take your example of a library doing 10 copies of a book in paper, then we have one work, one edition and eleven exemplars. I don't consider the scan as a sublevel of the hard copy but as an egal level. Why ? That is the only way to correctly put all the data of the scans without mixing them. Each scan can have one specific identifier, one specific URL for online access, one specific creation date,... how can you treat all that information in one item ? how can you retrieve the specific data of one scan when everything is mixed ? The question is not to know if we can merge the items, the question is what are the data about scans which can be described in WD ? If you have 2, 3 or more data per scan, then you HAVE to create several items. So unless you can ensure that now and in the future, no more than one data per scan will be possible, we have to create several items for each scan or hard copy.
By the way explain me how you plan to add the data about one scan if nobody specifies the data about the hard copy which was used for the scanning ? The scan, if it can be identified by any set of specific data, is an exemplar, and the scanning can be considered as the translation operation: we still need a property to indicate which was the edition used as original text for a translation. In some cases the original version in the original language was not used as text for translation. Ex.: an original version in English was translated in German and the German version, not the English one, was used to generate a French version. In that case, we should be able to specify that relation. We could use the same property to create a relation between a hard copy and a scan, if we have enough data about both "texts" to justify the creation of 2 items. Snipre (talk) 22:39, 13 June 2018 (UTC)
@Snipre: oh ok, I see. That can make some sense. For the FRBR, exemplar are only physical (and not always unique), see FRBR, pages 24, 47-48 but as FRBR itself says « dynamic nature of entities recorded in digital formats merit further analysis ».
In my example, there is only one physical document, the 10 scans are not materialized (reprint exists but are rare, most digital contents stay only digital). I don't see any problem to put 10 URLs of the same physical exemplar on one item for the exemplar, the URL is the only property specific to the scans, everything else is specific to the exemplar (and often specific only to the edition). I agree with your thought but you miss an important point: except for URL (and derivative of the URL like identifiers), 99% of the time, there is no specific data about specific scans. For exemplar, it's already quite rare to have specific data (the collection and the history of owners and that's it).
To go back to the original example here, what data would justifiy to have 2 exemplar items? I can understand one item for the edition and one for the exemplar but right now, both are about edition (and the same 6th edition). I would propose to merge the two current items and maybe create an item for the exemplar (but do we really need an item about the exemplar? I'm not even sure)
Cdlt, VIGNERON (talk) 08:39, 14 June 2018 (UTC)

Biodiversity Heritage Library[edit]

Announcing: Wikidata:WikiProject BHL

The Biodiversity Heritage Library (Q172266) is a large multi-institution project to digitise and make available literature from the past relating to zoology, botany, and the diversity of life.

A few weeks ago, Magnus's Reinheitsgebot created Wikidata items for 63,000 BHL titles out of the (currently) 136,000 in Mix'n'match catalogue 1131. Since then some progress has been made identifying Wikidata items for BHL creators and adding BHL creator ID (P4081); replacing author name string (P2093) with author (P50); and adding some further fields from online sources; but there is still a considerable way to go.

For current statistics, see the dashboard pages for title progress and creator progress now created at Wikidata:WikiProject BHL.

The data has its quirks. The BHL 'title' dataset combines various different sorts of material, including books, periodicals, catalogues, individually bound article reprints, technical reports, etc. Initially these have all been given instance of (P31) = publication (Q732577); a few (but not all periodicals) have now been given instance of (P31) = periodical literature (Q1002697) based on keywords from BHL. It will be quite a challenge to further refine the identification of the material.

Also be aware that the BHL dataset of 'creators' for the titles (currently imported as author (P50) / author name string (P2093)) actually includes people with a considerable variety of relationships to the printed material -- including authors, editors, illustrators, corporate sponsors, various other contributors to works, even former owners of the texts in a few cases. This too could usefully use quite a lot of refinement.

But it's an important collection. Commons currently includes almost 250,000 files from the BHL, coordinated through the c:Commons:Biodiversity Heritage Library project page -- so work structuring the information here may make a real difference to building new pathways to make those Commons images more accessible. With 60,000 titles, I think it's also a very useful test-set to work on, to put our ideas for book data into practice, and to see what practical issues and questions arise, when applying them to a (very diverse) real-world sample of this size.

Anyone with an interest in this data, and/or ideas on how to improve it, is very welcome to add themselves to the Participants section at the bottom of Wikidata:WikiProject BHL page. Jheald (talk) 16:12, 7 June 2018 (UTC)

Just to clarify Jheald: do you have any connection to the BHL? --Succu (talk) 21:21, 24 July 2018 (UTC)
@Succu: None at all. :-) I just was doing some work on the items, and thought that a WikiProject with property-use statistics, a progress page to record work done / doable, and a talk page might be a useful thing to create, as there were likely to be other people also interested in this data. Jheald (talk) 21:30, 24 July 2018 (UTC)

How many edition items for On the Origin of Species ?[edit]

We currently have 24 different items for English-language versions of On the Origin of Species (Q20124), mostly arising from different copies scanned for the Biodiversity Heritage Library (Q172266) (plus four more versions that are translations).

This query, tinyurl.com/yd4yjod9 gives a summary, because the list on Q20124 has become pretty much impossible to navigate.

Question: How many of these items should we keep, and which (if any) should we merge?

Background: Darwin himself produced six editions of the text, the first and the last being

His official authorised publishers were John Murray (Q1232629) in London, and D. Appleton & Company (Q3011053) in New York. With the exception of On the Origin of Species (1859) (Q20968204), all of the Murray and Appleton items that we have correspond to the 1872 version of the text.

The page-counts in the 'pp' column of the query correspond to the number of scan frames, so small differences here may not be that significant: the highest-numbered page of the Murray 1872, 1880, and 1886 copies is in each case 458; on the other hand that for the 1910 "popular impression" is 432. The Appleton copies are complicated by being in two volumes, sometimes bound together and sometimes not. The highest-numbered page of the second volume is 338 for the two 1889 copies, and 339 for the 1899, 1909, 1915, and 1917 copies.

Other publishers produced versions that may or may not correspond to the Darwin's final 6th edition, depending on the copyright observance and/or expiry and/or their own particular whims.

The 1902 and 1905 Collier copies (numbered as two volumes) both conclude at page 356; the 1909 "Harvard Classics" edition (single-volume) from the same publisher concludes at page 552.

The 1872(?) and 1899 Burt copies would both conclude at page 538 (except this is missing in the 1899 copy); the earlier copy then adds several pages giving a list of other works in Burt's "Library of the World's Best Books". This pagination also matches the Merrill and Baker copy, from a series called "World's Famous Books".

The Hurst and the Caldwell copies appear to have identical pagination (final page 501), though their title pages are different.

The Books Inc. copy re-orders the material (moving the historical preface to the end), and omits both the index and the comparison of the 6th with earlier editions.

... etc ...

So: how to bring sense to all of this?

We have a large number of copies based on the same text; some are also based on the same typography and pagination; some may even be from the same printing (Q51515167 / Q51515141); though even then we have different scannings, with each scanning available from multiple different sources (ie BHL vs IA).

The page at On the Origin of Species (Q20124) brings little sense of any of this; it certainly doesn't group together the copies based on the same underlying text.

Does it make sense to differentiate eg the three John Murray texts between 1872 and 1886, all with the same pagination, or would it make sense to group these in some way?

How can we best bring some order to all of this? Jheald (talk) 19:46, 12 July 2018 (UTC)

Start with the six editions issued during Darwins live span, published by John Murray (Q1232629). Remove all other items from On the Origin of Species (Q20124). Than match the rest by hand. Systema Naturae (Q29270) or Genera Plantarum (Q1501516) are probably easier to handle, because BHL is missing scans of most editions. --Succu (talk) 19:15, 24 July 2018 (UTC)
Hm, The Complete Works of Charles Darwin Online (Q7727209) gives 497 results! --Succu (talk) 19:46, 24 July 2018 (UTC) ... etc ...

Publishers and imprints[edit]

Aubrey
Viswaprabha (talk)
Micru
Tpt
EugeneZelenko
User:Jarekt
Maximilianklein (talk)
Don-kun
VIGNERON (talk)
Jane023 (talk) 08:21, 30 May 2013 (UTC)
Alexander Doria (talk)
Ruud 23:15, 24 June 2013 (UTC)
Kolja21
arashtitan
Jayanta Nath
Yann (talk)
John Vandenberg (talk) 09:14, 30 November 2013 (UTC)
JakobVoss
Danmichaelo (talk) 19:30, 16 February 2014 (UTC)
Ravi (talk)
Mvolz (talk) 08:21, 20 July 2014 (UTC)
Hsarrazin (talk) 07:56, 9 August 2014 (UTC)
Accurimbono
Mushroom
PKM (talk) 19:58, 10 October 2014 (UTC)
Revi 16:54, 29 November 2014 (UTC)
Giftzwerg 88 (talk) 23:36, 1 January 2015 (UTC)
Almondega (talk) 00:17, 5 August 2015 (UTC)
maxlath
Jura to help sort out issues with other projects
Epìdosis
Skim (talk) 13:52, 24 June 2016 (UTC)
Marchitelli (talk) 12:29, 5 August 2016 (UTC)
BrillLyle (talk) 15:33, 26 August 2016 (UTC)
Alexmar983 (talk) 23:53, 28 August 2016 (UTC)
Finn Årup Nielsen (fnielsen) (talk) 10:44, 29 August 2016 (UTC)
Chiara (talk) 14:15, 29 August 2016 (UTC)
Thibaut120094 (talk) 20:31, 14 September 2016 (UTC)
Ivanhercaz | Discusión Plume pen w.png 15:30, 31 October 2016 (UTC)
YULdigitalpreservation (talk) 17:35, 10 November 2016 (UTC)
User:Jc3s5h
PatHadley (talk) 21:51, 15 December 2016 (UTC)
Erica (ohmyerica) (talk) 19:26, 1 January 2017 (UTC)
User:Timmy_Finnegan
Mauricio V. Genta (talk) 05:38, 12 March 2017 (UTC)
Sam Wilson 09:24, 24 May 2017 (UTC)
Sic19 (talk) 22:25, 12 July 2017 (UTC)
Andreasmperu
MartinPoulter (talk) 09:21, 20 July 2017 (UTC)
ThelmadatterThelmadatter (talk) 01:11, 13 September 2017 (UTC)
Zeroth (talk) 15:01, 16 September 2017 (UTC)
Emeritus
Ankry
Beat Estermann (talk) 20:07, 12 November 2017 (UTC)
Shilonite - specialize in cataloging Jewish & Hebrew books
Elena moz
Oa01 (talk) 10:52, 3 February 2018 (UTC)
Maria zaos (talk) 11:39, 25 March 2018 (UTC)
Wikidelo (talk) 13:07, 15 April 2018 (UTC)
Mfchris84 (talk) 10:08, 27 April 2018 (UTC)
Mlemusrojas (talk) 3:36, 30 April 2018 (UTC)
salgo60 Salgo60 (talk) 12:42, 8 May 2018 (UTC)
Dick Bos (talk) 14:35, 16 May 2018 (UTC)
Marco Chemello (BEIC) (talk) 07:26, 30 May 2018 (UTC)
Harshrathod50
 徵國單  (討論 🀄) (方孔錢 💴) 14:35, 20 July 2018 (UTC)
Alicia Fagerving (WMSE)
Louize5 (talk) 20:05, 11 September 2018 (UTC)
Viztor (talk) 05:48, 6 November 2018 (UTC)
RaymondYee (talk) 21:12, 29 November 2018 (UTC)
Merrilee (talk) 22:14, 29 November 2018 (UTC)
Kcoyle (talk) 22:17, 29 November 2018 (UTC)
JohnMarkOckerbloom (talk) 22:58, 29 November 2018 (UTC)

Helmoony (talk) 19:49, 8 December 2018 (UTC) Pictogram voting comment.svg Notified participants of WikiProject Books I'd like to do some work on publishers and imprints. Does anyone know of a standard reference or database (preferably freely accessible online) with info about the dates of publisher mergers, acquisitions, spinoffs, etc.? - PKM (talk) 19:22, 13 July 2018 (UTC)

Novel or book[edit]

I've started a discussion at Wikidata:Administrators' noticeboard#Novel or book, which impacts this project. --EncycloPetey (talk) 18:39, 21 July 2018 (UTC)

Why exactly did you ask for the intervention of an administrator without informing me first? The only reasons I can imagine is that you think I needed to be blocked right away because I was endangering the project, or that I have abused the extra tools that I have as an administrator. Is that what happened? Anyway, the place to start a discussion is the Project Chat or the corresponding WikiProject, and it's just basic courtesy to let the other person know. Moving on, if you are writing in this WikiProject, I am going to assume you are aware of the continuous discussions about modeling books and literary works. You can always start a new discussion, but it was agreed beforehand that all books should have instance of book. You also stated that novel (Q8261) is not a literary genre (Q223393), and yet one is an instance of the other. Currently, there are only 4738 items with instance of (P31) novel (Q8261), so I started checking some cases. Out of the 10 items I checked, most had instance of (P31) book (Q571) before being changed to novel (Q8261) by an inexperienced editor. Any reason why you (an experienced editor) made such changes without prior discussion? Andreasm háblame / just talk to me 20:12, 21 July 2018 (UTC)
I have explained my reasons there.
Re: "it was agreed beforehand"; I have seen discussion that touches this subject obliquely, and found multiple proposals to make "novel" a value for "instance of", but have seen no discussion that concluded it should not be used that way. --EncycloPetey (talk) 21:00, 21 July 2018 (UTC)
@EncycloPetey: If you wnat to understand the book classification, then start to read Wikidata:WikiProject_Books#Bibliographic_properties. The first principle is to use the FRBR model so please explain how novel can be included in that model. Snipre (talk) 18:13, 22 July 2018 (UTC)
I have read that. A "novel" is a form of literature, so it would be used at the WORK level of the model for WORKs that are novels. I don't see why this is so hard to understand. This proposal has been made many times, and I have found no objections. --EncycloPetey (talk) 20:46, 22 July 2018 (UTC)

We need a clear model in WD if we want to follow a FRBR structure[edit]

It is time to define a clear model for written works in order to be able to continue the data import in WD. We agreed to use the FRBR model, but we never adapted the WD model to that model.

The FRBR model is composed of 4 levels. Currently the WD model (described in Wikidata:WikiProject_Books#Bibliographic_properties) is trying to follow that model but without a clear success. We need to clarify the WD structure and to link clearly the WD structure with the FRBR structure.

FRBR model Current WD model Proposed WD model
Work Book Work
Expression - -
Manifestation Edition Edition
Item Exemplar, manuscript Exemplar

Propositions:

  1. Proposition 1: Whatever is the chosen classification, the classification levels have to be used as unique value for property instance of (P31). There is 4 levels in FRBR model so if WD wants to use that model, we should have maximal 4 values for instance of (P31) when used to define a written item.
  2. Proposition 2: We should avoid any use of term "book" in any of the 4 levels. Term "book" can be used in each level so the confusion is at its maximal when using that term with no clear definition. And even if we define clearly the term in WD, the use of this term will always be a source of misunderstanding as few persons are taking care of WD definitions.
  3. Proposition 3: The lower level "item" according to FRBR level should be defined by a unique value and not by manuscript AND individual book like this is currently described in Wikidata:WikiProject_Books#Bibliographic_properties. A manuscript is an individual book so this level manuscript is not necessary as a manuscript can be described as an individual book written by hand. The proposition id to use exemplar, but other possibilities exist like version or individual book.
  4. Proposition 4: More properties are necessary to describe some characteristics of a written item:
  • a property to describe the format of the written element. For example, a novel can't be used as value for genre, value for genre can be romantic, erotic, dramatic, but we need someting to define the format of the text.
  • a property is needed to define if an examplar is written by hand or by mechanics. This property will be used to define a manuscript.
  • a property is needed to describe the support. Currently a scroll is defined by "instance of scroll". As this is in contradiction with Proposition 1, we need a new property to define the physical format of the document.
Examples:
A novel can be written by hand on a scroll made of papyrus. A poem can be printed in a codex made of vellum.

Comments ? Snipre (talk) 19:23, 22 July 2018 (UTC)

General comments[edit]

Thanks for pulling this together. I suggest we organize comments by proposal. - PKM (talk) 20:47, 22 July 2018 (UTC)

this is indeed clarifying some very essential problems that have been very problematic to work with these past years. --Hsarrazin (talk) 08:59, 18 August 2018 (UTC)

Comments on Proposition 1 (classification levels/unique values)[edit]

@PKM: Why creative work (Q17537576) is too broad ? Just take the example of Gone with the Wind (Q2870): this was a work which doesn't exist only under the form of written document but as movie or songs. That's why I want to avoid the use of book at the level of work because often a book was adapted in movie or TV serie, and the work should be able to connect all those different forms. For me creative work (Q17537576) is fine. Snipre (talk) 14:29, 23 July 2018 (UTC)
There are books (as physical objects) that contain several literary work (Q7725634) (any anthology, but also all of the "Double Ace" collection, for example, contain two novels per book), and there are even literary work (Q7725634)s that span over several books (such as the spanish translation of Cryptonomicon (Q534975), published in three separate books with different title (P1476) and publication date (P577)). There is not going to be any "point" in the hierarchy that matches "book", neither creative work (Q17537576) nor literary work (Q7725634) or something in between. They are simply unrelated concepts. --JavierCantero (talk) 18:53, 23 July 2018 (UTC)
The description of creative work (Q17537576) includes the word "artistic". There are many works that contain the minimal level of creativity required to qualify for copyright, but are not considered artistic. There are also works that aren't even creative enough to qualify for copyright (at least in the US), but they are certainly books. Example: telephone book. Jc3s5h (talk) 14:27, 26 July 2018 (UTC)
The phrase in the proposal "we should have maximal 4 values for P31) (P31)) when used to define a written item" needs to be clarified. In general, an item may have several instance of (P31). Does this mean that any of the items under discussion may have up to 4 P31) (P31)) statements? Or does it mean if we travel from an exemplar, through the connections from more specific to less specific, we will encounter at most 4 items before we come to an item that has no {{P|31)} and only has subclass of (P279). What bothers me is that some items will clearly be instances of several other items, even after we allow for items implied by the hierarchy. For example, an exemplar could be an instance of a version, and also an instance of a manuscript. Another exemplar could be both an instance of a version, and also an instance of printed matter (Q1261026). Jc3s5h (talk) 14:39, 26 July 2018 (UTC)
  • Did we all notice that "edition" has been renamed "version" in English (version, edition, or translation (Q3331189))? Is there general support for this (and if so, let's use it.) - PKM (talk) 20:58, 22 July 2018 (UTC)
  • Symbol oppose vote.svg Oppose I don't see the justification for this. Everywhere else on Wikidata, we are happy with allowing items to be instances of a subclass (or chain of subclasses) of a particular item. So why not here?
I also think it's a poor fit with where we currently are.
Here's a query for current subclasses of written work (Q47461344), by count of number of items: tinyurl.com/ybkkemsw
What's wrong with identifying that an item is a biographical article (Q19389637) or a Q4423781 or a field study (Q26840222) or a travelogue (Q1164267) or a biographical dictionary (Q1787111) or an autobiography (Q4184) or an atlas (Q162827) or an encyclopedic dictionary (Q975413) -- or indeed a novel (Q8261) or a novella (Q149537) or a novelette (Q472808) or a short story (Q49084) ?
Yes, we probably want to put limits on the permitted classes, but it seems to me a lot more useful and intuitive than saying Uncle Tom's Cabin (Q2222) is a literary work (Q7725634). Jheald (talk) 23:00, 22 July 2018 (UTC)
  • Currently biographical dictionary (Q1787111) is <instance of> literary genre (Q223393) and subclass of both "literary genre" and "written work" (several steps removed). Is that acceptable? I think the proposal is parallel to what Wikidata has decided about occupations - Sandy Koufax isn't <instance of> baseball player, he has <occupation> baseball player. In the case of books, we might choose to say "all books are <instance of> written work" in the same way that all people are <instance of> human. But we don't have to make that choice. If we decide to accept <instance of> "novel/book/dictionary/play/chanson de geste" then I think we would have to make sure all of those things are subclasses of written work (at some remove) (as well as subclasses of genres), and we would have to agree that an item can be both a work and genre. - PKM (talk) 23:34, 22 July 2018 (UTC)
@PKM: Agreed. I think that is the exact analogy.
With regard to work and genre, we don't want the (implied) statement that Dictionary of National Biography (Q1210343)instance of (P31)  literary genre (Q223393), so I think the statement biographical dictionary (Q1787111)subclass of (P279)  literary genre (Q223393) is wrong and needs to come out.
On the other hand, using P31 to say biographical dictionary (Q1787111) (or some superclass of it) instance of (P31)  literary genre (Q223393) is I think perfectly okay. Some care may be needed, to be vigilant for potential issues like this one; but essentially this is a pattern we use right across the site. Jheald (talk) 20:56, 23 July 2018 (UTC)
@PKM, Jheald, EncycloPetey: +1. The main problem is that a written document can be defined according to a long list of characteristic: the format of the text (novel,...), the genre (romantic, dramatic,...), the support (codex, scroll,...), the way the text is written (hand written (manuscript), printed, electronic,...), and plenty of others if we take the time to list all possible characteristics. So the question is on which basic we can use one particular characvteristic like the text format to classify one written document ? This is completely biased on the opinion of everyone.
Using a completely external structure allows to avoid any conflict: no need to define which characteristic is the better one, no need of creating an item for all possible combinations of characteristics, no need of defining if novel can be considered as a work or not. By creating one dedicated property for each characteristic of books, we keep a uniform treatment of the data and by using the FRBR structure, we define only the level of classification without trying to put a personal judgment. Snipre (talk) 16:21, 23 July 2018 (UTC)
Your argument is stuffed with straw: level 1 has nothing to do with the "support" or the "way the text is written"; those are properties of manifestations or exemplars. But the work in abstraction will still be a play, or a novel, or a poem. This is invariant regardless of the specific edition or exemplar examined, and is therefore a property of the work itself. --EncycloPetey (talk) 16:26, 23 July 2018 (UTC)
@EncycloPetey: My argument is perhpas not really focused on the work level, but it is still consistant: why do we classify a book according to its text format and not according to its genre ? In a library, it is more common to classify by genre that by text format. So please provide your argumentation because until now I didn't see anything which has the appearence of an argument from your side. Snipre (talk) 19:11, 23 July 2018 (UTC)
You are still arguing at straw. This is not an either...or' issue. We can do both. Please stop misinterpreting my comments. --EncycloPetey (talk) 19:29, 24 July 2018 (UTC)
@Snipre: There is a spectrum of possible practice here -- at one end, having very few, very generic classes and then conveying everything with properties; at the other, having a great number of very specific classes, ultimately stretching to creating a new class for every possible intersection.
Neither extreme is desirable, IMO. Over the years on Wikidata I think we've found the sweet spot is somewhere in the middle: having a limited number of classes that capture the basic essence of the thing, with further details about it conveyed by statements.
It wouldn't be an approach that would work well with a classic relational database, where one would want to force information as far as possible into strict columns; but it happens to be an approach that does work rather well with a triplestore query engine, like WDQS.
With a few exceptions (such as our key rule for human beings, that their defining essential feature is always that they are a Q5), the tipping point for switching from narrowing classes to adding a statement instead varies from context to context and is always slightly fuzzy and up for grabs (though as a community we can sometimes put down a few key markers). But a good consideration may be how strongly distinctive a class is, as opposed to being a mere intersection. So to say a particular poem is instance of (P31) sonnet (Q80056) would seem reasonable; whereas to say a manuscript is instance of (P31) "British illuminated manuscript" seems rather forced, unless British illuminated manuscripts are such a distinctive thing, with a tendency to such a distinct set of characteristics that set them apart from other illuminated manuscripts, that it truly makes sense to treat them as a distinctive class to themselves.
Finding this 'sweet spot' reasonably well has a number of advantages. Firstly, on the one hand it makes instance of (P31) feel intuitive, a good statement capturing what the thing fundamentally is, without on the other hand requiring too vast a vocabulary of different classes. It groups like-things well. Secondly it means it may be possible to state some characteristics of the things just once, for the class, rather than requiring users to accurately and redundantly input the same piece of information over and over again, per each individual item.
Fortunately SPARQL makes it very easy to extract items, via constructions like (wdt:P31/wdt:P279*)?/wdt:Pxxxx, whether a property is given as a statement directly on the item itself, or as a statement applying to a whole class that the item is an instance of.
So this is the model which has been applied pretty much across the whole of Wikidata; and which I also see substantial benefits for here, with very little against. (Though defining version, edition, or translation (Q3331189) as the unique class for an edition does I think work quite well). Jheald (talk) 20:44, 23 July 2018 (UTC)
@Jheald: I know how WD works and just looking at the topics appearing in this page, the current model is not clear and solutions are found case by case. But this way is not possible for databases et especially for machines which need a rigorous classification to perform data processing. Your sweet spot is just the nightmare of everyone who has to do data extraction or data manipulations: some time the model is like that and sometimes the model is different. No way to create automatic tools. Accepting that text format are defined using instance of and genre using a dedicated property doesn't allow the possibility to have one tool extracting books according to their characteristics: we need to code differently the queries. Then if we start to use values like illuminated manuscript then this is a third way to extract data: to extract all manuscripts, we need to query the sum of 2 classes (illuminated manuscript and manuscript). This is the problem because people don't know how a database is working. A database needs structures and rules and not individual solutions. Snipre (talk) 08:58, 24 July 2018 (UTC)
@Snipre: This is a graph database, not a relational database. I've given you the query fragment above which will handle it -- in fact, in a graph database it is rather more efficient to have things in a hierarchy of relevant classes, rather than to have everything in a single mega-class and then to have to filter all of it. Jheald (talk) 09:14, 24 July 2018 (UTC)

To be honest, I can't make head or tail of the proposition here. Could someone try wording it differently? I can think of several ways to read the statement as currently given, and they contradict each other, and none of them strike me as good policy, so perhaps I'm missing something. - Jmabel (talk) 04:10, 26 July 2018 (UTC)

@Snipre: If I understand this proposition, this means there could only be 4 possible instance of (P31) values, which would be FRBR-1, FRBR-2, FRBR-3 and FRBR-4 (let's call them this way to avoid confusion with items actually existing on wikidata), without any possibilities to use subclasses :)

Let's be clear : you tend to refuse subclasses as P31, because of their potential weakness to any change made on wd ontology that could break the system, and propose to dispatch the characteristics of specific subclasses in properties instead, which would insure that no modification to ontology would result in the loss of info.

Comparing with the work achieved on biographic data : it was a hard fight to group all people as human (Q5), since many contributors thought that writer, politician, athlete, etc, would be more precise. It is now widely accepted, but it was a very long process, and every biographical specificity is in a specific property (almost).

well… having worked a lot on biographies, for authors mainly, I, for one, tend to think that it would be an enormous work, but it could worth it.

Specific properties should be defined very clearly though :)--Hsarrazin (talk) 10:24, 18 August 2018 (UTC)

Comments on Proposition 2 ("book")[edit]

  • I think it's essential to use "book" as an alias where a novice user would expect it. - PKM (talk) 20:47, 22 July 2018 (UTC)
  • Book or novel or play or poem or whatever sort of "work" it is. Not all literary works are books, and even novels may be serialized in a magazine during their first release, or collected within a volume with other works in a later release. "Book" is extremely misleading for many works, but "work" is so generic that it is uninformative and should only be used in situations where no further precision is possible. --EncycloPetey (talk) 20:54, 22 July 2018 (UTC)
  • GA candidate.svg Weak support I have some sympathy with this. The problem with "book" is that it might suggest either the work in abstract at its most essential level, or a physical manifestation of it, or even a particular edition. At the moment I believe we mostly use it at the 'work' level (and very heavily, with in excess of 100,000 uses). I would prefer to see those 'work' level uses replaced with classes like novel (Q8261) or lexicographic thesaurus (Q179797) or encyclopedic dictionary (Q975413), which make much clearer that their instance is being considered for its content in abstract, rather than as a physical object. Jheald (talk) 23:15, 22 July 2018 (UTC)
  • Novices don't use "book" as a synonym of FRBR's Work but Edition: they expect for the book item to have an ISBN, since they are talking about the physical object and not the creative work. So if you want to keep the "book" term as a more user-friendly choice, you should use it as the third FRFB level. --JavierCantero (talk) 17:36, 23 July 2018 (UTC)
    • oh yes they do : did you never hear someone telling so and so is working on his next book - at this stage this is clearly a work level :) --Hsarrazin (talk) 08:39, 18 August 2018 (UTC)
  • Symbol strong support vote.svg Strong support, clearly book is too polysemic. For aliases, all for levels of the FRBR can have "book" as alias, I'm not sure that putting there aliases would really help. Cdlt, VIGNERON (talk) 22:15, 25 July 2018 (UTC)
  • Symbol support vote.svg Support; "book" should be alias for each level, so all are easily found if someone enters "book" in the UI. - Jmabel (talk) 04:08, 26 July 2018 (UTC)
  • Symbol strong support vote.svg Strong support - book (Q571), as described in French is a physical item, an object described as having pages (a codex in fact) and thus, at best, a format of publication ; in other languages (nl) it is merely a document or a printed work (es), or a medium for distribution of a text (en) (paper or electronic), which are very different things ; in no case it can be a work, or even an edition (except by metonymy (Q41966)). An exemplar could be a book but many exemplars are not books, they can be serials issues, manuscripts, cds, etc, volumen. The fact that it is currently used for all 4 levels by people who are not book professionals (and also by book professionals) is indeed a very clear indication that it is totally inapropriate for our goals with FRBR. It could be an alias for each level though :) --Hsarrazin (talk) 08:38, 18 August 2018 (UTC)

Comments on Proposition 3 ("exemplar")[edit]

  • Symbol oppose vote oversat.svg Strong oppose In line with my existing comment on Proposition 1. If an object is e.g. an illuminated manuscript (Q48498), it makes much the most sense to say that upfront. That is exactly what the item is an instance of. Anything else is needless obfustication and confusion. Jheald (talk) 23:19, 22 July 2018 (UTC)
@Jheald: Your proposition is the same system used for categories in WP: if you have a novel written by hand on a scroll with decorations, do you create an item combining all these features (i.e. "novel written by hand on a scroll with decorations"? And if now you don't have a novel but a poem do you create a new item "poem written by hand on a scroll with decorations" ? And if you have a novel written by hand on a scroll without decoration would you create the item "novel written by hand on a scroll without decorations" or the item "novel written by hand on a scroll" ? Do I need to continue the demonstration or is it clear that combinations of terms is just a nightmare when considering all possible characteristics of a written document ? Snipre (talk) 11:40, 23 July 2018 (UTC)
@Jheald, Snipre: I think it's reasonable to say that the Ellesmere Chaucer (Q1227831) (poorly modeled currently) is <instance of > illuminated manuscript and <exemplar of> Canterbury Tales. - PKM (talk) 19:17, 23 July 2018 (UTC)
@PKM: Yes, I think that would be exactly the right way to model it.
A complication that might arise for some manuscripts (but not I think here) would be if the manuscript collected together a number of texts. The answer in such a case I think would be to have a corresponding number of exemplar of (P1574) statements, but it might be useful to qualify each one with a new property "folio(s)", akin to the existing page(s) (P304), to indicate which part of the MS corresponded to each text.
Pinging @MartinPoulter: here, who I think has recently been working with items for a number of manuscripts from the Bodleian Library. Jheald (talk) 19:47, 23 July 2018 (UTC)
@PKM: Not sure if your example helps the discussion: if we accept "illuminated manuscript" why can we accept "English illuminated manuscript" for Ellesmere Chaucer (Q1227831) ? Snipre (talk) 19:25, 23 July 2018 (UTC)
@Snipre:, if an editor thought it was reasonable to make an item for "Hiberno-Saxon illuminated manuscript" (which I think is a reasonable class of manuscripts), I'd have no problem with using <instance of> "Hiberno-Saxon illuminated manuscript" for the Lindisfarne Gospels (Q80935) (which today has three <instance of> statements). - PKM (talk)
@PKM: If you choose that system so you can forget to extract data in an exhaustive way: if one contributor create an item "Hiberno-Saxon illuminated manuscript" for one case and another create "Saxon illuminated manuscript" and "Hiberno illuminated manuscript" and chose to add too instances ("Saxon illuminated manuscript" and "Hiberno illuminated manuscript") instead of using "Hiberno-Saxon illuminated manuscript", no query will be able to handle those cases. The main problems are to have an overview of which descriptions are available and to update old descriptions with new ones. If contributors feel free to create items when they need them they won't take the time to look for existing descriptions and will create duplicates as this is the easiest way. People are lazy so don't expect they will try to look for what is already available. Snipre (talk) 08:40, 24 July 2018 (UTC)
@Snipre: Not true. This is exactly what SPARQL is designed to be good at. Path queries ( wdt:P279* ) are your friend. Jheald (talk) 09:23, 24 July 2018 (UTC)
@Jheald: Please read my comment once more: I never said that SPAQRL is not able to perform the queries I mentioned, I just pointed the fact that having 3 models, we need 3 different SPARQRL queries to find the same kind of items. Don't you understand that querying documents by genre requires a query based on a dedicated property and querying documents by text format requires a query based on instance/subclass ? Don't you see the problem to query all illustrated documents if we create different classes like illustrated novel, illustrated anthology, ... ? Snipre (talk) 15:07, 25 July 2018 (UTC)
@Snipre: The query fragment that I posted in the Proposition 1 discussion checks both ways: whether the item itself has a given property and value directly, or whether the item is an instance of a class that has that property and value. It's not particularly difficult. Jheald (talk) 15:22, 25 July 2018 (UTC)
  • Pictogram voting comment.svg Comment I've read the discussion thus far, but don't think I've seen enough specific examples of how this would be applied, or what the actual results would look like. I'd like to see this explored more with varied examples before forming an opinion. Are there others who feel as I do? --EncycloPetey (talk) 22:29, 25 July 2018 (UTC)

Comments on Proposition 4 (new properties, genres)[edit]

4A: genre[edit]

  • I don't know that Wikidata can solve the world's imprecision around what is meant by "genre". I believe the proper solution is to encourage multiple genre values, so that Childhood's End has genres "novel" and "science fiction", while "The Roads Must Roll" has genres "short story" and "science fiction" (genre-by-form and genre-by-subject, if you will). If we were to create a new property for "genre-by-form", I'm not sure we'd achieve clarity, because some of the minor forms can be confused with genres-by-subject (lyric poetry). Even if we made a new property like "literary form", how would editors know what values to assign, since novel (Q8261) is an <instance of> literary genre, which can be supported by quality citations in multiple languages? (Aside from not supporting splitting genre in the way proposed, I would not want to use the term "format" here, as that has a strong connotation of book format (Q18602566) like hardback, paperback, etc.) I'm going to quote the Oxford Dictionary of Literary Terms at length re: genre, because I think this is important:
Genre The French term for a type, species, or class of composition. A literary genre is a recognizable and established category of written work employing such common conventions as will prevent readers or audiences from mistaking it for another kind. Much of the confusion surrounding the term arises from the fact that it is used simultaneously for the most basic modes of literary art (lyric, narrative, dramatic); for the broadest categories of composition (poetry, prose fiction), and for more specialized sub-categories, which are defined according to several different criteria including formal structure (sonnet, picaresque novel), length (novella, epigram), intention (satire), effect (comedy), origin (folktale), and subject-matter (pastoral, science fiction).

- PKM (talk) 20:47, 22 July 2018 (UTC)

  • We could certainly be more precise about what we mean by "genre", and really ought to disentangle several another salient features. For example, when I work with translations of Greek poetry (whether drama, or lyrical odes, etc.) it can be very important to know whether the translation was done in prose or in verse. The original text may have been poetic, but the translation may not be in the same literary form as the original. Currently, we have no means to indicate this aspect of works, at any level, where it differs from the higher levels. --EncycloPetey (talk) 21:00, 22 July 2018 (UTC)
    • Good point - this is important in translations of Beowulf (and likely many other things) as well. - PKM (talk) 21:03, 22 July 2018 (UTC)
  • It's perhaps worth noting that at least one major resource, viz the Library of Congress Genre/Form Terms (Q47537953) thesaurus, doesn't think that trying to create a wall between genre and form is a game that is worth the candle.
Per my answer to Propositions 1 and 2 above, I would think that the broad literary form -- eg novel (Q8261), poem (Q5185279), encyclopedic dictionary (Q975413) -- should be given as the main instance of (P31) on a work-level item.
Beyond that, in values for genre (P136), I see no merit in trying too hard to systematically segregate genre from form -- comic novel (Q2561390) is as meaningful a genre term as horror film (Q200092). I don't see any particular value in requiring that this instead be given as Comic fiction (Q27640800). Both should be equally acceptable as values.
We do, however, probably need to agree guidance as to where P136 rather than P31 becomes more appropriate to specify the nature of the content.
In terms of subject matter, we have main subject (P921). But it may be useful to be able to individually specify aspects of an overall subject beyond this, as proposed at Wikidata:Property_proposal/Creative_work#subject_facet, based eg on the kind of subject keyword information available from databases like the Biodiversity Heritage Library (Q172266). Jheald (talk) 23:58, 22 July 2018 (UTC)

4B: support, and other properties[edit]

  • Can we use distribution (P437) for manuscripts? Certainly manuscripts were a pre-printing method of distribution. As part of this exercise, we should be sure to document guidance on manuscript (Q87167): document written by hand and manuscript (Q2376293): work that an author submits to a publisher or editor for publication. - PKM (talk) 20:47, 22 July 2018 (UTC)
  • Given that I don't buy Snipre's line on Proposition 1, I would have thought that the nature of the physical support of the item would be a defining characteristic of acceptable instance of (P31) values at the item level -- eg manuscript (Q87167), scroll (Q720106), papyrus fragment.
The idea of using distribution (P437) at the 'Manifestation' level is interesting.
One thing that is interesting is what to do if an exemplar is the only known copy of a text. Do we need to have a manifestation level as well? Do we need to have a work level even?
We already have this issue on a larger scale, where we only have one known edition of a work. Do we need to have a work-level item as well? Many many systems (eg OpenLibrary) avoid this, and only create a separate work-level item when they actually need one, to avoid the difficulties of trying to keep two items with largely overlapping property-values (eg author, title, publication date, etc, etc) in sync -- potentially a huge amount of duplication, that users have so far largely run away from. Is there some semaphore we could adopt, to indicate a combined item? Jheald (talk) 00:20, 23 July 2018 (UTC)
@Jheald: The problem with your approach is how do you treat a novel written by hand on a scroll ? Following your reasoning this implies instance of novel + instance of scroll + instance of manuscript ? So you can already foresee the extension of that list when more details are added. So when you have a 3-7 instance of, I think we can really start to ask what is really the concept of the item. Snipre (talk) 11:04, 23 July 2018 (UTC)
I would support some way to handle a single-edition book without multiple items, especially for things like museum exhibition catalogues, if we can agree on how to model them. - PKM (talk) 20:20, 23 July 2018 (UTC)
  • If some item is a poem and a scroll, then let it be a instance of poem (Q5185279) AND a instance of scroll (Q720106). What's the issue here? Don't fight the world, it's not possible to categorize everything with a single taxonomy, as the OOP practitioners know well. --JavierCantero (talk) 18:20, 23 July 2018 (UTC)
@JavierCantero: So can you explain why we use a property for genre and we are not defining everything using instance of (P31) ? Why the text format and the support have more importance to be defined using instance of (P31) and not other characteristics ? You should be coherent: if "it's not possible to categorize everything with a single taxonomy" like you said, we should not create properties.
Just for your information, I am not against using instance of (P31) for describing most characteristics of a book, I just want a coherent system: if you accept to use genre (P136) so why this is a problem to have a dedicated property for text format ? Snipre (talk) 19:19, 23 July 2018 (UTC)
genre (P136) is a natural fit as a property since its value is independent from any other property, qualifier or value of the specific item. That can't be said about a property whose value could be book (Q571) or scroll (Q720106), since these have specific properties to state related to the item, such as ISBN-13 (P212) or the number of pages for a book, properties that if the item is a scroll shouldn't have (and viceversa if a scroll had its own specific properties they shouldn't be set when using a different "text format" value). Using instance of (P31) you ensure that only items defined as books would have book properties (such as ISBN-13 (P212)) and only items defined as scrolls would have its own (the data model enforces that). --JavierCantero (talk) 08:41, 24 July 2018 (UTC)

distribution (P437) seems perfectly ok to me for manuscript, codex, volumen, cd, etc. --Hsarrazin (talk) 10:43, 18 August 2018 (UTC)

Other observations[edit]

  • Not really germane to any of the above, but the English-language label + definition of literary work (Q7725634) "creative work by a writer created with aesthetic or recreative purposes" seemed off to me.
Having a class with that definition may well be quite useful. But (perhaps the result of too many years exposed to copyright law), it seemed dubious to me to apply the label "literary work" to categorically exclude all non-fiction, biography, history, encyclopedic writings etc. In copyright law at least, these would all be considered literary works, embodying a degree of originality and creativity, and often a degree of literary style as well. Jheald (talk) 23:36, 22 July 2018 (UTC)
    • I agree with that ! for me "literary work" is some work made with "letters" (i.e. any textual work - written or not), not necessarily 'artistic' ; non-fiction literature is literature. --Hsarrazin (talk) 08:57, 18 August 2018 (UTC)
  • Above I've questioned whether we always need to have separate items for all levels. But there is also a converse question: do just these levels always provide a sufficiently fine grouping of different items, in particular of different versions at the edition/manifestation level.
For two examples, consider (i) different scans of a particular edition: A number of libraries, including eg OCLC, and the British Library consider that a particular scanning creates a new version of a particular edition. But it should be tied to the edition that it is a version of.
As a second example, consider (ii) the 28 versions of On the Origin of Species that we currently have items for, as discussed above. These different versions show some clear groupings, which it would be useful to model. Jheald (talk) 00:32, 23 July 2018 (UTC)
@Jheald: Re: scans, how about item "scan" and new property <scan of> "edition"?
For the Origin of Species question, I assume scholars name these groups in some way. Perhaps these are <instance of> a new item "version group" which has parts that are versions? - PKM (talk) 18:47, 24 July 2018 (UTC)
A property <scan of> would also be useful at The Decoration of Houses (Q55740549), published in 2007 from a photographic reproduction of the 1897 first edition. - PKM (talk) 20:41, 24 July 2018 (UTC)

Everyone seems to be ignoring the "expression" level. Seems to me it can be quite useful, especially to distinguish translations of the same work, when (for example) there might be multiple translations into a given language and/or multiple editions of a given translation. - Jmabel (talk) 04:16, 26 July 2018 (UTC)

Proposed refinement of model[edit]

Aubrey
Viswaprabha (talk)
Micru
Tpt
EugeneZelenko
User:Jarekt
Maximilianklein (talk)
Don-kun
VIGNERON (talk)
Jane023 (talk) 08:21, 30 May 2013 (UTC)
Alexander Doria (talk)
Ruud 23:15, 24 June 2013 (UTC)
Kolja21
arashtitan
Jayanta Nath
Yann (talk)
John Vandenberg (talk) 09:14, 30 November 2013 (UTC)
JakobVoss
Danmichaelo (talk) 19:30, 16 February 2014 (UTC)
Ravi (talk)
Mvolz (talk) 08:21, 20 July 2014 (UTC)
Hsarrazin (talk) 07:56, 9 August 2014 (UTC)
Accurimbono
Mushroom
PKM (talk) 19:58, 10 October 2014 (UTC)
Revi 16:54, 29 November 2014 (UTC)
Giftzwerg 88 (talk) 23:36, 1 January 2015 (UTC)
Almondega (talk) 00:17, 5 August 2015 (UTC)
maxlath
Jura to help sort out issues with other projects
Epìdosis
Skim (talk) 13:52, 24 June 2016 (UTC)
Marchitelli (talk) 12:29, 5 August 2016 (UTC)
BrillLyle (talk) 15:33, 26 August 2016 (UTC)
Alexmar983 (talk) 23:53, 28 August 2016 (UTC)
Finn Årup Nielsen (fnielsen) (talk) 10:44, 29 August 2016 (UTC)
Chiara (talk) 14:15, 29 August 2016 (UTC)
Thibaut120094 (talk) 20:31, 14 September 2016 (UTC)
Ivanhercaz | Discusión Plume pen w.png 15:30, 31 October 2016 (UTC)
YULdigitalpreservation (talk) 17:35, 10 November 2016 (UTC)
User:Jc3s5h
PatHadley (talk) 21:51, 15 December 2016 (UTC)
Erica (ohmyerica) (talk) 19:26, 1 January 2017 (UTC)
User:Timmy_Finnegan
Mauricio V. Genta (talk) 05:38, 12 March 2017 (UTC)
Sam Wilson 09:24, 24 May 2017 (UTC)
Sic19 (talk) 22:25, 12 July 2017 (UTC)
Andreasmperu
MartinPoulter (talk) 09:21, 20 July 2017 (UTC)
ThelmadatterThelmadatter (talk) 01:11, 13 September 2017 (UTC)
Zeroth (talk) 15:01, 16 September 2017 (UTC)
Emeritus
Ankry
Beat Estermann (talk) 20:07, 12 November 2017 (UTC)
Shilonite - specialize in cataloging Jewish & Hebrew books
Elena moz
Oa01 (talk) 10:52, 3 February 2018 (UTC)
Maria zaos (talk) 11:39, 25 March 2018 (UTC)
Wikidelo (talk) 13:07, 15 April 2018 (UTC)
Mfchris84 (talk) 10:08, 27 April 2018 (UTC)
Mlemusrojas (talk) 3:36, 30 April 2018 (UTC)
salgo60 Salgo60 (talk) 12:42, 8 May 2018 (UTC)
Dick Bos (talk) 14:35, 16 May 2018 (UTC)
Marco Chemello (BEIC) (talk) 07:26, 30 May 2018 (UTC)
Harshrathod50
 徵國單  (討論 🀄) (方孔錢 💴) 14:35, 20 July 2018 (UTC)
Alicia Fagerving (WMSE)
Louize5 (talk) 20:05, 11 September 2018 (UTC)
Viztor (talk) 05:48, 6 November 2018 (UTC)
RaymondYee (talk) 21:12, 29 November 2018 (UTC)
Merrilee (talk) 22:14, 29 November 2018 (UTC)
Kcoyle (talk) 22:17, 29 November 2018 (UTC)
JohnMarkOckerbloom (talk) 22:58, 29 November 2018 (UTC)

Helmoony (talk) 19:49, 8 December 2018 (UTC) Pictogram voting comment.svg Notified participants of WikiProject Books Here's a first-cut model based on Snipre's proposed model and the comments above.

FRBR model Current WD model Proposed WD model <instance of> Property notes
Work Book Any subclass of creative work (Q17537576) [include literary forms, see Note 1] <genre> and <main subject> go here (+ others)
Expression - - -
Manifestation Edition Any subclass of version, edition, or translation (Q3331189) <edition or translation of>; include all fields required for bibilographic citation here
Item Exemplar, manuscript Any subclass of document (Q49848) <exemplar of>

Notes:

  1. Literary forms (novel, dictionary) should be subclasses of creative work or its subclasses. Literary forms may be <instance of> genre but not subclass of genre

I think this may meet Snipre's objective of a simple model while allowing the use of a variety of <instance of> statements.

Thoughts on this model? - PKM (talk) 18:39, 24 July 2018 (UTC)

We currently have a limited number of classes that are currently in the subtrees of both document (Q49848) and creative work (Q17537576) -- see tinyurl.com/y8hgtw58
These are going to need some work to separate out the physical from the essential. Some we may need to think about more closely. Jheald (talk) 19:06, 24 July 2018 (UTC)
BTW, the talk pages of both document (Q49848) and creative work (Q17537576) have the {{Item documentation}} autodescription template, which gives a very useful view of what is in the subtrees of the two classes at the moment. Striking, because that template goes up rather than down. But it does have a link to the tree tool, which does go down. Jheald (talk) 19:43, 24 July 2018 (UTC)
On a first look, I am not 100% sure that Maximilian von Schwerin-Putzar (Q89848) document (Q49848) is going to work. We need our class to be something such that it is 100% clear that, for it and all of its subclasses, for anything that is instance of (P31) one of those classes, it should be self-evident from that statement that the thing is a definite single concrete physical object.
But Maximilian von Schwerin-Putzar (Q89848) document (Q49848) has subclasses like papal bull (Q189867) -- which I think really we would consider a type of work. (cf: en:List_of_papal_bulls) But it is hard to deny that it is also a kind of document. So this may need some refinement. Jheald (talk) 20:01, 24 July 2018 (UTC)
@Jheald:, I think you meant document (Q49848) not Maximilian von Schwerin-Putzar (Q89848). :-) (And {{Item documentation}} is my life-saver tool.) I agree some cleanup of document (Q49848) is necessary, but it looks like that is needed in any case. Is an ebook a physical object? Is it a document? We certainly must include ebooks at the edition level, as we cite them in references. However we structure our base model, I can see us having a "problem" list of things to discuss in order to truly standardize how we treat books - but having guidelines and best practices around all aspects of books would be valuable, and the lack of a base model has kept us from focusing on the finer points, IMHO. - PKM (talk) 20:44, 24 July 2018 (UTC)
Fixed now, thanks!! Jheald (talk) 21:24, 24 July 2018 (UTC)
@PKM: The key problem I think is that, in the subtrees of both document (Q49848) and creative work (Q17537576), there are fair number of class items that don't really clearly embody a work/edition/exemplar distinction, and so wouldn't solidly lead the editor of an item to accurately encode such a distinction.
Also, is eg Mona Lisa (Q12418) a work or an exemplar? Arguably it's a work in a different sense of creative work (Q17537576), one that's not a well-spring for multiple distinct exemplars, unlike say a text as a work.
It would be nice to get some clearer lines and inheritances in our classes here, but it's going to be a lot of work. So being less ambitious, and going back to something like "exemplar", rather than "document", may make sense as the key class at the head of the tree for classes that instances are individual distinct copies of works of which many individual notable copies may exist. It also gives us a tighter tree of subclasses to watch over and police, to some extent addressing Snipre's (very fair) comment below.
Mirroring this on the other side, it might also make sense to have a specific class for a work in the FRBR sense, ie a "work of which many individual notable copies may exist", that again one could more tightly police the subclasses of, rather than using creative work (Q17537576) for this.
It's a problem we have in all sorts of areas across Wikidata, as my original is-it-abstract-or-is-it-concrete query tinyurl.com/ya4spc62 tried to highlight. But actually I think we may be in rather better shape than the above may indicate, because the important thing to distinguish (as regards properties that may or may not be appropriate) are whether items are versions or whether they are exemplars, or not. And that I think with our existing structure we probably can already do, at least reasonably well, at least in principle (even if there is work to do on some/many individual items). Jheald (talk) 14:23, 26 July 2018 (UTC)
@Jheald:, if you're recommending that "individual book" and "illuminated manuscript" should be <subclass of> "exemplar", I would be 100% happy with that. "Document" was a quick choice for the Item level of FRBR, and is the category choice I was the least confident about.
Same thing with "work" - if we can define a class whose subclasses are clearly Works in the FRBR sense, and get agreement on that, I'm fine.
My one goal here is to get a model that we have consensus on and move on to other challenges. - PKM (talk) 19:03, 26 July 2018 (UTC)
In the other end, having dedicated properties for text format, genre,... allows us to have constraints which alarms when the instance/subclass values of the item where the property is used in not respecting the rules. Snipre (talk) 14:53, 25 July 2018 (UTC)
  • Whether we enact this proposal or not, we still rely on a correct class tree. Are you proposing that we avoid making use of any class trees at all? --EncycloPetey (talk) 18:53, 26 July 2018 (UTC)
@EncycloPetey: So do you have a class tree to propose ? And if possible a class tree which is ok for everyone. It is already now so difficult to agree on which value should be used for work level (book or work) so I don't expect any agreement on several dozen of items classified in a certain order.
But in any case we can't create a model based on something which doesn't exist (the class tree) so unless you propose a class tree right now, my model doesn't rely on something which doesn't exist. Snipre (talk) 08:49, 28 July 2018 (UTC)
Ah, you're taking 'that strategy are you? Blame me for your reluctance while dissembling over the question I asked. You were the person who brought up the issue of trees. I only asked you to clarify, and you've tried to turn that against me. My question is straightforward: Are you proposing that we avoid making use of any class trees at all? --EncycloPetey (talk) 13:42, 28 July 2018 (UTC)
@EncycloPetey: I don't blame you, I am just tired to discuss with people who never propose some alternative. You support the use of class tree ? No problem, but please show me since when you are working on the development of this class tree and especially what you are doing to generate a global agreement for that initiative. Snipre (talk) 20:23, 29 July 2018 (UTC)
You have tried to put responsibility on me each time, but have failed to answer my question. Therefore, I assume that you're not going to answer my question, since I've asked it twice now without getting an answer. --EncycloPetey (talk) 20:27, 29 July 2018 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── Have we examined FaBiO, the FRBR-aligned Bibliographic Ontology as a guide for our books ontology? It might be helpful for sorting out our class trees. I note they have "novel" as work > artistic work (our creative work?) > literary artistic work > novel. I like that. - PKM (talk) 19:34, 28 July 2018 (UTC)

Use cases for modeling[edit]

What are your use cases? If the purpose is to "import data", maybe it would be preferable to outline what you want to import and how it's a problem in the current approach. If the purpose is to discuss a theoretical model that may be in uses elsewhere, maybe this isn't really a suitable forum.
--- Jura 04:33, 26 July 2018 (UTC)

@Jura1:, I think there is broad consensus to follow the FRBR model; what we seem to keep going round and round about is what to use for <instance of> at each level.
The current recommended best practice on our project page is to use <instance of> "book" for the work level, but that has been widely disputed in discussions in favor of something more like "work".
My use case is: As an editor, I want to understand a clear recommended best practice for adding a work-level item in WD to go with a new edition-level item (usually one I intend to use as a source for a <stated in> reference). - PKM (talk) 19:23, 26 July 2018 (UTC)
  • It might work better with more specific import questions at hand. Most Wikidata items can be interpreted within one theoretical approach or the other, but this doesn't really answer import questions.
    --- Jura 09:20, 28 July 2018 (UTC)

proposal for Copyright status of a work[edit]

Aubrey
Viswaprabha (talk)
Micru
Tpt
EugeneZelenko
User:Jarekt
Maximilianklein (talk)
Don-kun
VIGNERON (talk)
Jane023 (talk) 08:21, 30 May 2013 (UTC)
Alexander Doria (talk)
Ruud 23:15, 24 June 2013 (UTC)
Kolja21
arashtitan
Jayanta Nath
Yann (talk)
John Vandenberg (talk) 09:14, 30 November 2013 (UTC)
JakobVoss
Danmichaelo (talk) 19:30, 16 February 2014 (UTC)
Ravi (talk)
Mvolz (talk) 08:21, 20 July 2014 (UTC)
Hsarrazin (talk) 07:56, 9 August 2014 (UTC)
Accurimbono
Mushroom
PKM (talk) 19:58, 10 October 2014 (UTC)
Revi 16:54, 29 November 2014 (UTC)
Giftzwerg 88 (talk) 23:36, 1 January 2015 (UTC)
Almondega (talk) 00:17, 5 August 2015 (UTC)
maxlath
Jura to help sort out issues with other projects
Epìdosis
Skim (talk) 13:52, 24 June 2016 (UTC)
Marchitelli (talk) 12:29, 5 August 2016 (UTC)
BrillLyle (talk) 15:33, 26 August 2016 (UTC)
Alexmar983 (talk) 23:53, 28 August 2016 (UTC)
Finn Årup Nielsen (fnielsen) (talk) 10:44, 29 August 2016 (UTC)
Chiara (talk) 14:15, 29 August 2016 (UTC)
Thibaut120094 (talk) 20:31, 14 September 2016 (UTC)
Ivanhercaz | Discusión Plume pen w.png 15:30, 31 October 2016 (UTC)
YULdigitalpreservation (talk) 17:35, 10 November 2016 (UTC)
User:Jc3s5h
PatHadley (talk) 21:51, 15 December 2016 (UTC)
Erica (ohmyerica) (talk) 19:26, 1 January 2017 (UTC)
User:Timmy_Finnegan
Mauricio V. Genta (talk) 05:38, 12 March 2017 (UTC)
Sam Wilson 09:24, 24 May 2017 (UTC)
Sic19 (talk) 22:25, 12 July 2017 (UTC)
Andreasmperu
MartinPoulter (talk) 09:21, 20 July 2017 (UTC)
ThelmadatterThelmadatter (talk) 01:11, 13 September 2017 (UTC)
Zeroth (talk) 15:01, 16 September 2017 (UTC)
Emeritus
Ankry
Beat Estermann (talk) 20:07, 12 November 2017 (UTC)
Shilonite - specialize in cataloging Jewish & Hebrew books
Elena moz
Oa01 (talk) 10:52, 3 February 2018 (UTC)
Maria zaos (talk) 11:39, 25 March 2018 (UTC)
Wikidelo (talk) 13:07, 15 April 2018 (UTC)
Mfchris84 (talk) 10:08, 27 April 2018 (UTC)
Mlemusrojas (talk) 3:36, 30 April 2018 (UTC)
salgo60 Salgo60 (talk) 12:42, 8 May 2018 (UTC)
Dick Bos (talk) 14:35, 16 May 2018 (UTC)
Marco Chemello (BEIC) (talk) 07:26, 30 May 2018 (UTC)
Harshrathod50
 徵國單  (討論 🀄) (方孔錢 💴) 14:35, 20 July 2018 (UTC)
Alicia Fagerving (WMSE)
Louize5 (talk) 20:05, 11 September 2018 (UTC)
Viztor (talk) 05:48, 6 November 2018 (UTC)
RaymondYee (talk) 21:12, 29 November 2018 (UTC)
Merrilee (talk) 22:14, 29 November 2018 (UTC)
Kcoyle (talk) 22:17, 29 November 2018 (UTC)
JohnMarkOckerbloom (talk) 22:58, 29 November 2018 (UTC)

Helmoony (talk) 19:49, 8 December 2018 (UTC) Pictogram voting comment.svg Notified participants of WikiProject Books

This proposal should of interest for all wikisource contributors that are part of this project too... Wikidata:Property proposal/copyright status --Hsarrazin (talk) 11:52, 25 July 2018 (UTC)

Will we mark the copyright status of the work for each and every country? Laws differ considerably in each nation.
We'd also have to have a means of differentiating the status of content within a volume. Sometimes the text is in public domain, but the illustrations are still under copyright. Or a volume may contain multiple works, some of which are under copyright and some of which are not. Sometimes the primary work is free of copyright, but the annotations, or the introduction are copyrighted. We'd have to have a system that indicates the status of individual components of a work before we can handle copyright issues. --EncycloPetey (talk) 14:41, 25 July 2018 (UTC)
@EncycloPetey: you probably mean, indicate the status of individual components of an edition, since each component is the edition of an individual work which status can be handled on the corresponding work item ? --Hsarrazin (talk) 09:26, 18 August 2018 (UTC)
No, I did not. I avoided being specific, because it is still not clear whether we would mark the work's data item or the edition's data item (or both) for the components, nor how this would be coordinated for a data item that represents a composite collection.

Chaucer[edit]

I've started some work on Chaucer. Can folks look at these items and see how they can be improved?

I am particularly stumped by how to show the relationship between the 1894 and 1900 versions of volume IV (edition of an edition? based on?).

I eventually want to get to the goal of indicating that the versions of Chaucer's works in Kelmscott Chaucer (Q4219142) are based on Skeat's 1894 editions, but that's a couple of levels of complexity beyond where I am right now.

✓ Also, I find complete edition (Q16968990) and complete works (Q1978454) confusing. Author A's "collected works" are logically a subset of her "complete works", but a "complete works" is a type of "collected works". And currently one is a book and one is a group of works. Ideas how to sort this? - PKM (talk) 21:43, 27 July 2018 (UTC)

  • Not that we necessarily currently have the properties, but in FRBR terms wouldn't the 1894 and 1900 versions of volume IV be two manifestations of the same expression, with the latter based closely on the former? based on (P144) would capture the relationship, no? - Jmabel (talk) 21:55, 27 July 2018 (UTC)
    Probably not. based on (P144) is usually used on derivative works, not editions of the same thing. We haven't yet determined how we want to handle items that are editions of other editions, or possibly an edition of two separate items simultaneously. --EncycloPetey (talk) 23:12, 27 July 2018 (UTC)
There are volumes in The complete works of Geoffrey Chaucer (Q55776699) that are editions of two (or more) works. - PKM (talk) 18:50, 28 July 2018 (UTC)
Re: complete edition (Q16968990) and complete works (Q1978454) - I am going to follow the ontology at FaBiO, which has anthology > collected works > complete works. I'll build these out accordingly. - PKM (talk) 19:00, 28 July 2018 (UTC)
I note that AAT has only one item "collected works" with the meaning "complete works" (which is the likely source of some of our poor labels), and that they class this as a document genre under "information" (that is, as FRBR:Item). AAT also classes "novel" as a literary genre. This is an illustration, I suppose, that controlled vocabulary =/= ontology. Just commenting for the purpose of general discussion. - PKM (talk) 20:38, 28 July 2018 (UTC)
It turns out that our item labeled "collected works" in English should actually be "complete edition" based on the single sitelink. Andreasmperu and I are sorting that out, and I'll likely move the AAT link which would solve the problem mentioned above. - PKM (talk) 15:01, 31 July 2018 (UTC)

Proposal to clean up the Books class tree using FaBiO[edit]

@Snipre, EncycloPetey, Jheald, JavierCantero, ArthurPSmith, Jura1: EncycloPetey has asked for more examples and Snipre has asked for a proposal on an improved class tree. I have spent some time over the last few days looking at FaBiO, the FRBR-aligned Bibliographic Ontology as a guide for making an FRBR-compliant class tree for Wikidata. Here are my preliminary thoughts:

  • Using a professionally-developed ontology rather than re-inventing the wheel is probably a good idea.
  • FaBiO was first published in 2010 and Version 2 was published in February 2018.
  • FaBiO is widely used and has been mapped to other ontologies.
  • FaBiO is less granular than Wikidata and includes some concepts which are out of scope for the Books project.
  • Getty AAT, while widely used for arts concepts in Wikidata, is not a good source for a primary book structure as it is not based on FRBR.

I think FaBiO would provide a solid basis for a class tree for Books in Wikidata, with the usual caveats that there will be individual concepts where Wikidatans choose to structure our class tree differently, especially if we skip the FRBR:expression level. I would recommend the following best practices:

  • Don't follow FaBiO exactly, but use it as a guide, to be expanded and collapsed where needed.
  • Where we do follow FaBiO, use a <stated in> FRBR-aligned Bibliographic Ontology (Q44955004) reference on the <subclass of> statement.
  • Used exact match (P2888) on concepts that are exact matches to FaBiO concepts.
  • Encourage the use of multiple parents within the same FRBR class (e.g. "biographical dictionary" <subclass of> "dictionary, biography").
  • Encourage the use of <instance of> "genre" on work-level items (especially where so classed in AAT or other vocabularies and referenced).

The following table maps the FRBR:works tree (only) in FaBiO to WD concepts. It's also available in Google Sheets with more details at bit.ly/fabio2wd. (It's been years since I've done a Wikitable - if you can improve the format of this, I'd be delighted!)

Mapping FRBR:works in FaBiO to Wikidata
Wikidata Class Tree (proposed) FaBiO Class Tree FaBiO Comment
intellectual work (Q15621286) or written work (Q47461344) work "subclass of FRBR work, restricted to works that are published or potentially publishable, and that contain or are referred to by bibliographic references"
announcement (Q567303)) announcement
no item notification of receipt
retraction notice (Q7316896) retraction
creative work (Q17537576) artistic work
literary work (Q7725634) literary artistic work
musical composition (Q207628) musical composition
novel (Q8261) novel
novella (Q149537) not in FaBiO
novella (Q43334491) (Renaissance) not in FaBiO
novelette (Q472808) not in FaBiO
play (Q25379) (many subclasses) play
poem (Q5185279) (many subclasses) poem
screenplay (Q103076) screenplay
short story (Q49084) short story
biography (Q36279) biography
autobiography (Q4184) not in FaBiO
hagiography (Q208628) not in FaBiO
(many more)
no item case for support
correction (Q5172784) correction
critical edition (Q680458) (?) critical edition
data set (Q1172284) dataset
essay (Q35760) essay
no item examination paper
no item grant application
not WD Books item image
no item instructional work
not WD Books item metadata
not WD Books item model
not WD Books item opinion
not WD Books item policy
not WD Books item proposition
not WD Books item questionnaire
reference work (Q13136) reference work
encyclopedia (Q5292) not in FaBiO
dictionary (Q23622)
(many more)
not WD Books item reply "A work that is a reply, either to a letter or other direct communication, or to feedback or comments "
report (Q10870555) report
review (Q265158) review
scholarly work (Q55915575) (added) scholarly work
scholarly article (Q13442814) not in FaBiO
not WD Books item at work level sound recording
specification (Q2101564) specification
vocabulary (Q6499736) (poor match?) vocabulary
group of works (Q17489659) work collection
not WD Books item work package " component of the case for support of a grant application"
working paper (Q1228945) working paper

FaBiO data from: Peroni, S., Shotton, D. (2012). FaBiO and CiTO: ontologies for describing bibliographic resources and citations. In Journal of Web Semantics, 17: 33-43. https://doi.org/10.1016/j.websem.2012.08.001. Open Access at: http://speroni.web.cs.unibo.it/publications/peroni-2012-fabio-cito-ontologies.pdf CC-BY 4.0

https://sparontologies.github.io/fabio/current/fabio.html#toc

Similar mappings can be done for editions/exemplars, but I'd like some general feedback on this idea before I do that work (it's done by hand and it's very time-consuming). What do you think of this approach? - PKM (talk) 21:15, 30 July 2018 (UTC)

Overall, I like this approach. We shouldn't be trying to reinvent something that experts have already tackled. I do see a few items not listed, such as patent application (Q3022019), and dissertation (Q1385450) / doctoral thesis (Q187685) (where some clarification or a merger may be needed), and educational material (Q6006020) which should be the same as "instructional work", including textbook (Q83790). I also think group of works (Q17489659) does not quite describe a volume that is a published collection or anthology (Q105420). We'd need to be sure we have that part of the classification tree worked out as well. --EncycloPetey (talk) 21:23, 30 July 2018 (UTC)
FaBiO is very different from our current model on frbr:expression and frbr:manifestation - their "expression" is closer to our "version/edition". I've added the FaBiO side of the mappings to the Google sheet for reference. We might effectively blend these levels. - PKM (talk) 00:30, 31 July 2018 (UTC)
I think this makes sense. When you say "Encourage the use of <instance of> "genre" ...", by genre here you are referring to work types like "novel", etc., or even more specifically the subject matter (eg. romance novel (Q858330), crime novel (Q208505), science fiction novel (Q12132683) etc.? I guess this is ok, but I don't believe it's been common practice for this project up to now... ArthurPSmith (talk) 14:37, 31 July 2018 (UTC)

While is "musical composition" under "literary work"? Seems unintuitive to me. - Jmabel (talk) 19:22, 31 July 2018 (UTC)

@Jmabel: I agree, though I double-checked, and that's where they place it, with subclass "song". - PKM (talk) 20:56, 31 July 2018 (UTC)

More on FaBiO[edit]

Karen Coyle's FRBR, Before and After: A Look at Our Bibliographic Models has some good general info on FaBiO (and other models) from a librarian's perspective, noting its emphasis on fields relating to the workflow of academic publishing. She also notes "Along with the classes derived from FRBR entities, FaBiO has dozens of properties for bibliographic description, few of which would be considered exact equivalents of descriptive elements in library data." Coyle's work is available as a PDF here. My take today (subject to further exploration) is that FaBiO would be a great reference for our "work" class tree but possibly of lesser value for "versions" and "items". FaBiO also highlights properties we don't have today and might want to add. I'd really like to hear from some of our librarians on this. - PKM (talk) 20:56, 31 July 2018 (UTC)

  • FaBiO's concept "scholarly work" is so obviously useful as a parallel to "creative work" and "reference work" that I have added it as scholarly work (Q55915575). It solves the problem of how to classify the "work" aspect of many books I have added to WD. - PKM (talk) 00:59, 3 August 2018 (UTC)
I've written on "scholarly work" on my blog. It's a common concept but needs a good definition. I like the definition that (I assume you) gave it on Wikidata: "work that reports the result of study and analysis of a topic using scholarly methods". However note that others define it as "anything published in a scholarly journal." Some insist that it means "peer-reviewed." Others tie it into the fact of having citations (which would make all of WP scholarly...). Also, using the same term "scholarly" in the name and the definition is a kind of definitional "no-no." It would be longer, but how about "work that reports the result of study and analysis of a topic, usually peer-reviewed." Also, chatted at WikiCite2018 about getting better advice/instructions to editors into WD pages. Means that we wouldn't need to whole banana in the definition as long as further info would show up prominently on the page. Kcoyle (talk) 18:41, 1 December 2018 (UTC)

Too many names to ping[edit]

There too many user (66) on Wikidata:WikiProject Books/Participants for Echo to work (max 50). Ran into it when proposing my property. —Dispenser (talk) 02:01, 6 August 2018 (UTC)

There's an open Phabricator ticket for this problem. - PKM (talk) 20:01, 6 August 2018 (UTC)

Distribution[edit]

Currently distribution (P437) is valid for works but not versions/editions. I've proposed fixing that on the property's Talk page. -PKM (talk) 20:01, 6 August 2018 (UTC)

Refining my proposed model (again)[edit]

Here are my updated thoughts on tackling this problem.

Proposed general approach[edit]

Books and other written works in Wikidata are modeled in 3 layers based on Functional Requirements for Bibliographic Records (Q16388). These layers are:

  • Work, corresponding to frbr:work and representing the intellectual content of a written work.
  • Version or edition, similar to both frbr:expression and frbr:manifestation, but not exactly equal to either of these. The "version" is a published or otherwise distributed version of a "work", with full bibilographic information, that can be searched for online or in a library or archive, and used as a citation to support statements in Wikidata.
  • [exemplar], a physical or digital object that is one and only one instantiation of a "version", such as an individual book in a collection or an illuminated manuscript. This is equivalent to frbr:item.

Proposed class trees[edit]

Works[edit]

The class tree for works is based on version 2 of the FRBR-aligned Bibliographic Ontology (Q44955004). We use the fabio:works hierarchy with modifications and extensions as agreed to by the Wikidata community.

  • Types of works should be <subclass of> [pick work item] or one or more of its subclasses.
  • Types of works may also be <instance of> genre (Q483394) or one of its subclasses, but should not be <subclass of> a genre.
  • Individual works should be <instance of> one or more subclasses of [pick work item].
  • Individual works should be linked to their versions using <has edition>.

Versions/editions[edit]

The class tree for versions is developed specifically for Wikidata.

Items[edit]

  • Bibiographic items should be <instance of> their object type (book, illuminated manuscript, codex)
  • Bibiographic items should be linked to a version using <exemplar of> the version.

Comments[edit]

So many people have dropped out of this conversation that we may never be able to reach consensus. However, here are a few more thoughts based on the conversation:

  • Instructing people to stop using <instance of> "book" for work-level items would be a big deal, since that behavior is recommended in several places, but I think it's the one idea everyone pretty much agrees on.
  • There is enough opposition to the "all humans are Q5 model" (that is, "all frbr:work-type items are <instance of> [some item]") that I don't believe we'd ever get consensus to go that way.
  • So my question is, how can we move forward? - PKM (talk) 19:39, 18 August 2018 (UTC)
    If we can agree on (at least) the proposal for work and version/edition items, it would make for a good start. The item level seems to be generating more conversation, and may need a deeper look. We also need to agree explicitly (with specific examples) how we will handle Wikisource data items. --EncycloPetey (talk) 01:40, 19 August 2018 (UTC)
    Thanks for writting this up, PKM. I fully agree with your model. Concerning "book": I think we should try to discourage people from using it because it can refer to any of the 3 levels. As alternative I propose to use written work (Q47461344) or any of its subclasses. --Pasleim (talk) 12:05, 20 August 2018 (UTC)
@Pasleim: Change of book to written work: Don't modify Help:source without changing the recommendations on Wikidata:WikiProject Books. Everything should be coherent. Snipre (talk) 18:35, 12 September 2018 (UTC)
@Pasleim, Snipre, EncycloPetey: I'll be sure to modify both Help:source and the recommendations on Wikidata:WikiProject Books, just for works and editions at this time, unless someone else beats me to it. It seems like we have consensus (or at least no objections) to making these changes. - PKM (talk) 19:54, 12 September 2018 (UTC)
@PKM, Pasleim, Snipre, EncycloPetey: I'm all for implementing those 3 layers. Otherwise, if works' instance of (P31) can be so many different things (subclasses of written work (Q47461344) or genre (Q483394)), I would be in favour of having a dedicated "FRBR level" property to be able to know this level without having to do SPARQL requests. (I know this possibility of a dedicated property has been discussed somewhere, it was at least orally discussed at WikiCite 2017, but I can't find the notes). That would be a way to get a "all humans are P31:Q5" equivalent without having to confront resisting forces. -- Maxlath (talk) 16:24, 5 November 2018 (UTC)
This definitively one option even if I prefer the other one: use a new property for genre, text format or other characteristics and keep only the 3 FRBR levels as unique value for instance of (P31). Why ? For the same reasons we avoid the use of book. If we use subclasses of work as value for instance of (P31), then we have to be clear with the label of the subclasses to avoid any misunderstanding. Snipre (talk) 10:22, 7 November 2018 (UTC)

Publication[edit]

Does anyone want to take a stab at fixing the parents of publication (Q732577)? It's clearly not a "work" in the FRBR sense. It's currently a parent of "version". - PKM (talk) 20:10, 23 August 2018 (UTC)

Work item properties: Deleting RVK identifiers[edit]

Should Regensburg Classification (P1150) not be used as shown in "Work item properties"? For example Bottroper Protokolle, SWB-Online Katalog: RVK-Notation: GN 9999, MS 1420 etc. @JakobVoss: Why do you remove this property for books? --2A00:C1A0:4882:2F00:4538:E01:D5DA:B609 00:15, 12 September 2018 (UTC)

RVK notations on work items should better be replaced by main subject (P921), genre (P136) and related properties. Regensburg Classification (P1150) is more useful as mapping between RVK classes and Wikidata items. To give an example:

Updating the Project page[edit]

I have changed the example of the project page to show <instance of> written work, not book (which is what the sample item says anyway). I have added a second example with instance of scholarly work. I have added simple one-liners that "works" should be instances of "written work" or its subclasses and that editions should be instances of "version, edition or translation" or its subclasses.

I also suggest:

  • Specifically calling out that <instance of> "book" is deprecated (and making sure "book" is no longer a subclass of "work"
  • Adding a section on class tree and FaBiO (or possibly referencing a subpage on this topic?)

Either way, I'll announce these changes on Project Chat when I am done. Comments? - PKM (talk) 20:02, 16 September 2018 (UTC)

Thank you very much! I will not add instance of (P31)  book (Q571) anymore! --Epìdosis 20:09, 16 September 2018 (UTC)
Re: Works should be instances of written work (Q47461344) or one of its subclasses.
We also need to allow for books that consist of printed musical scores and for books that consist primarily (or entirely) of artwork or photographs. --EncycloPetey (talk) 01:34, 17 September 2018 (UTC)
Do we have classes for those? FaBiO has “musical composition” but our item with that name = “act of composing music”. What would you want call them? - PKM (talk) 02:20, 17 September 2018 (UTC)
musical composition (Q207628) is a subclass of musical work (Q2188189), although the translations seem to be divided over whether the item refers to the act or the result. --EncycloPetey (talk) 02:48, 17 September 2018 (UTC)
Ugh, then someone will eventually need to duplicate and disambiguate them. I’ve been thinking that works that are primarily artworks or photographs might be “scholarly works” or “reference works”. How would you feel about “Works will generally be instances of written work (Q47461344) or one of its subclasses. There may be exceptions for other types of works issued in book form.” - PKM (talk) 19:58, 17 September 2018 (UTC)
It would help if we could also link to a page or discussion with additional information, such as a copy of the FaBiO table and comments about musical score books, etc. In order words, have the short blurb, but with a link to additional and more detailed information. --EncycloPetey (talk) 00:36, 18 September 2018 (UTC)
I’ll put that on my to-do list. I want to format the FaBiO table a bit dfferently than above, and I’m thinking about how best to do that. - PKM (talk) 20:23, 18 September 2018 (UTC)
This is not a good idea to start the classification like that: we have musical work (Q2188189), work of art (Q838948), creative work (Q17537576), intellectual work (Q15621286), anonymous work (Q567620), derivative work (Q836950), collective work (Q3594128), recreative work (Q17538258), revoiced work (Q26160672), written work (Q47461344), scientific work (Q11826511), posthumous work (Q17518461), scholarly work (Q55915575),... before saying that "Works should be instances of written work (Q47461344)", we need to have a clear understanding of the top classification but currently we are missing a global picture. Snipre (talk) 12:31, 19 September 2018 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── I don't agree. Many of these items (intellectual work (Q15621286), creative work (Q17537576), anonymous work (Q567620), derivative work (Q836950), collective work (Q3594128), recreative work (Q17538258), revoiced work (Q26160672), and posthumous work (Q17518461)) are broader concepts than works in the FRBR sense. FBRR-works may be any of these classes but they are also "book-works" whatever that is in WD. I don't believe work of art (Q838948) is a type of "book" in the scope of this project. musical work (Q2188189) is an outlier. I think it is important to define a parent class of which "all of these items are (FRBR works)".

We could take a cue from FaBiO and add a new "parent" "bibliographic work" = "work that is published or potentially publishable, and that contains or is referred to by bibliographic references". I'd also be happy using intellectual work (Q15621286) as the recommended parent class, although I think "written work" is better.

And I would not have made the changes to the project page if I didn't that we had consensus from those who are still participating in this conversation after (something like) three years. - PKM (talk) 00:15, 20 September 2018 (UTC)

@EncycloPetey: Wikidata:WikiProject Books/Works is a draft of a "more information" page for works. Feel free to edit or suggest changes. - PKM (talk) 20:13, 20 September 2018 (UTC)

Pictorial works?[edit]

@EncycloPetey, kcoyle: I am thinking of a new item pictorial work = "creative work consisting primarily of a selection of images, with minimal or no accompanying text" for the FRBR work-level item for books of photographs or artworks. The name is analogous to pictorial map (Q162206). Thoughts? Other suggestions? - PKM (talk) 20:10, 21 September 2018 (UTC)

Hmm. The terminology seems to be overly vague if you are intending this to be limited to book-like works. It also doesn't seem to match well with the concept used for pictorial map (Q162206). However, I haven't been able (yet) to think of an alternative label. I think the question we need to answer first is "What sort of things would we need to include in such a category?" Would we want to include cartographic works? collections of bird / plant / natural history illustrations? photographic collections? museum catalogs? If we can decide what sorts of items might be included, then we may be better able to assign a label to the concept. --EncycloPetey (talk) 00:58, 22 September 2018 (UTC)
It was suggested that using written work as our base class, we don't have a place for FRBR-works that are primarily artworks or photographs, so I am trying to resolve that. For some reason, map is currently a <subclass of> written work, so that's covered unless we change it. In my mind, atlases are reference works. Art and exhibition catalogs are often (but not always?) scholarly works. So I think this class—if we make it—would include natural history illustrations and non-scholarly photographic and artwork collections. - PKM (talk) 20:26, 22 September 2018 (UTC)
I think these would be good questions to take up at Wikicite 2018, if you will be there. Categorizing works is very hard, not straight-forward, so we need to give good guidance. There are definitely works that are not "written" - including music, dance, maps, films, and all of the visual arts. Let's make sure we have a way to cite all works. I'm game to put some time into this. 184.23.19.186 21:17, 23 November 2018 (UTC)

Thousands of bad edits[edit]

It seems User:Simon Villeneuve is running his account as an unsupervised bot, making thousands of additions, many of which do not fit the WikiProject Books data structure (among other issues). He is adding publisher (P123) to work items, instead of edition items. I have posted this issue to his talk page, but as I said, his account seems to be running automatically and without supervision. There will be a big mess to clean up afterwards. --EncycloPetey (talk) 13:43, 24 September 2018 (UTC)

Hi,
I have made a mistake. It is easy to correct it, but before, I want to be sure that there is a consensus here. I'll write my questions in a moment. Simon Villeneuve (talk) 14:11, 24 September 2018 (UTC)
Ok. We have about 39,000 books who have a publisher entry :
SELECT DISTINCT ?b ?bLabel WHERE { SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" } ?b wdt:P31/wdt:P279* wd:Q571 ; wdt:P123 [] . MINUS {?b wdt:P31/wdt:P279* wd:Q3331189 .}.}
Try it! . I didn't enter all of them myself, so it seems that I'm not the only one who is making that "mistake" (if it is one).
My values come from infobox of enwiki. EncycloPeter (EP) said to me before on my talk page that there is a consensus to not put a publisher, release date or another property associated with a version/edition on item dedicated to book. When I asked him to show me a discussion about this consensus, he didn't point me one, just saying to me to come here to talk about it. I forgot the discussion on my talk page and I put back publishers this morning, but I have stopped this again until we agree here what to do.
For now, there is about 129,000 items about books and about 38,000 items about edition/version. Many item classified as edition/version seems to be wrongly classified, as Catholic Bible (Q591016) (or, at least, they can't have an editor, release date and other properties like that if I follow the logic of EP).
Here are my questions :
1- Do the project plan to create an item for every edition/version of a book ?
2- If so, can we put all publishers of a peculiar book on the item dedicated to the book until all of these items have been created or must we wait that every edition/version have been created to do so ?
3- If there's is a rough consensus not to put a publisher, release date and anoter properties on books, why nobody have blocked publisher (P123), publication date (P577), translator (P655) and so on with none of constraint (Q52558054) for book (Q571) ?
I can compare this situation with items dedicated to films. We don't create an item for every language version of a film, or an element for every different publication date (P577) or distributor (P750) in every country. We put all these informations on the element dedicated to the film. Simon Villeneuve (talk) 14:52, 24 September 2018 (UTC)
1. see Wikidata_talk:WikiProject_Books#Does_there_*always*_need_to_be_a_separate_work_and_edition_item?. There is not really consensus about this.
2. In my opinion, you can leave the statements there until an edition item is created. But I suggest that we remove publisher data on these 1300 items.
3. It is worse. There is not only a missing none of constraint (Q52558054) but the constraints on publisher (P123) even enforce that publisher can only be used on work items and not on edition items. --Pasleim (talk) 16:57, 24 September 2018 (UTC)
Ok, thank you for the link. I'll read this discussion later.
I forgot to talk about the ~31,000 items about books who have an ISBN 10 or 13.
SELECT DISTINCT ?b ?bLabel WHERE { SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" } {?b wdt:P31/wdt:P279* wd:Q571 . ?b wdt:P957 [] .} UNION {?b wdt:P31/wdt:P279* wd:Q571 . ?b wdt:P212 [] .} }
Try it! What do we do with them ? Simon Villeneuve (talk) 18:34, 24 September 2018 (UTC)
For these items, I believe the "correct" process is to duplicate the item, change the one with the sitelinks to a "work" and the duplicate to an "edition" and remove the incorrect statements on the resulting items. It's a lot of effort and I don't know if there is a way to automate it. Until recently, our published best practice was to use <instance of> book for work items (and "book" was still a sublcass of "creative work" and "literary work" last time I looked). We haven't yet broadly communicated this change in recommended practice. This is going to be a long-term effort. - PKM (talk) 20:08, 24 September 2018 (UTC)
It depends on where the place to which the sitelinks point. If the sitelinks are to WP, then it should usually be a work, unless the Wikipedia article is about a particular edition or exemplar. If the sitelinks are to Wikisource or Commons, then you will have to look at the Wikisource page or Commons page/category to see whether that page is for an edition or a work.
But when it comes to ISBNs, any values taken from Wikipedia pages will be a mess. The Wikipedia editors routinely added ISBNs for a current edition to all manner of articles about publications, irrespective of the edition. --EncycloPetey (talk) 02:09, 25 September 2018 (UTC)

Why author (P50) both on works and editions?[edit]

Since (I suppose) the author of the work is the same author of every single edition, wouldn't be sufficient use author (P50) only on work items?--Malore (talk) 14:14, 16 October 2018 (UTC)

There might be a few exceptions (for example works with multiple authors) but, more important, without the author it will be difficult to use the item in different contexts like Template:Cite Q (Q22321052). --Kolja21 (talk) 20:31, 16 October 2018 (UTC)
The three editions of Bayesian Data Analysis (Q29167237) has had slightly different authors, see [12]. When the editions are cited then it is easier if all the information is available on the edition item. — Finn Årup Nielsen (fnielsen) (talk) 00:02, 8 November 2018 (UTC)
The biggest problem with placing an author on an edition occurs when the edition is a translation. There are bots that will automatically label the translation as "edition by author" in other languages, which is incorrect for a translation. Conversations about this issue with people who run such bots have yielded no results. --EncycloPetey (talk) 00:41, 8 November 2018 (UTC)
The typical case I know are textbooks where the authors update the content through several editions and when one author dies or stops to contribute, new people contribute in replacement but the original authors are still mentioned in addition to the new authors. This is a particular case but quite recurrent and for that we need to adjust our model.
For the case mentioned by EncycloPetey, I think this is a problem of data import and not a structural model requiring a model modification. This way to import data should not be accepted and measures to avoid that are necessary. @EncycloPetey: If you find again that problem and you receive no positive feedback from bot operators, please report this here and we can launch actions as WikiProject: this can have a bigger effect especially when several persons support the bot flag withdrawal. Snipre (talk) 10:08, 8 November 2018 (UTC)
@Snipre: You mean like this one? e.g. This edit identifies a Gutenberg e-book as an "uitgave van Stephen Crane" (edition of/from Stephen Crane). --EncycloPetey (talk) 15:38, 17 November 2018 (UTC)
@EncycloPetey: No, your edit is just a confusing way to indicate that the item is about an edition, but the author is the same for the work and the edition. My case is the following:
work item
author: XX
first edition
author: XX
second edition item
author: XX and YY
And label description is not a real part of WD modelling. Snipre (talk) 13:41, 3 December 2018 (UTC)
But the description is still highly misleading. Stephen Crane was not responsible for that edition. --EncycloPetey (talk) 17:47, 3 December 2018 (UTC)
@EncycloPetey: I don't speak enough that language to clearly judge what was the sense behind that sentence. My opinion is not to consider description or label as relevant to assess an item. Do you speak that language ? Wikidata is the right structure to bypass all languages difference to use more absolute characteristics to define a concept. Snipre (talk) 19:45, 4 December 2018 (UTC)

editions of a book[edit]

Editions of a book are often described in a non-numeric way (or not purely numeric). How to provide this information in a wikidata item?

Just few examples:

  1. s:en:Page:Ossendowski - Beasts, Men and Gods.djvu/8: Ninth Printing Is this equivalent to edition number (P393) = 9?
  2. s:pl:Strona:Asnyk Adam - Pisma 03. Wydanie nowe zupełne.djvu/007: Wydanie nowe zupełne (Edition new and complete); we do not know how many earlier editions were.
  3. s:pl:Strona:Józef Piłsudski - Wspomnienia o Gabrjelu Narutowiczu (1923).djvu/04: Pierwszy — dziesiąty tysiąc (First — tenth thousand); OK, here we can assume this is edition number (P393) = 1.
  4. s:mul:Page:H.M. Der Untertan.djvu/10: Dreiundachtzigstes bis neunundneinzigstes Tausend (Eighty-third to ninety-ninth thousand). Which edition?

As you can see at least for pl & de early-XX c. books editions were named basing on edition volume size. You can also find editions like "corrected", "cheap", "hardcover", etc. the latter ones might slightly vary in content, not only in cover. And they definitely have different IDs in library catalogues.

Any hints? Ankry (talk) 10:25, 6 November 2018 (UTC)

My opinion:
1) Yes, use 9 as value for edition number (P393). Numbers should be preferred to allow better parsing when comparing data.
2) edition number (P393) = unknown value. There is a value but it is undetermined with the current information.
3) edition number (P393) = 1. Same as 1).
4) edition number (P393) = unknown value. There is a value but it is undetermined with the current information.
Snipre (talk) 10:09, 7 November 2018 (UTC)
1) No. The 9th printing is not the 9th edition. Printings are typically considered the same edition as whichever was the previous edition because no new editorializing has taken place. The book is re-printed from the same typsetting.
We will not always be able to describe the edition with a numerical value. In English publications, there is often a "US" and a "UK" edition published simulaneously. Sometimes the "first" edition is a translation (published in a different language) because the translation goes to press before the original. Sometimes the "first" edition is delayed and a revised edition is published before the original makes it to press. Sometimes even the scholars disagree over the numbering of editions. --EncycloPetey (talk) 12:08, 7 November 2018 (UTC)

LCCN (bibliographic)[edit]

Currently, Library of Congress Control Number (LCCN) (bibliographic) (P1144) is described as an authority control for works, but the example and any links to the Library of Congress you care to examine are all for editions. Shouldn't the property restrictions be corrected to apply to editions? --EncycloPetey (talk) 02:40, 11 November 2018 (UTC)

Yes. - PKM (talk) 20:33, 11 November 2018 (UTC)

So, how do we get this fixed? --EncycloPetey (talk) 14:45, 14 November 2018 (UTC)

I've changed the constraint [13] --Pasleim (talk) 14:57, 14 November 2018 (UTC)
For instance of (P31) it still has "Wikidata property for authority control for works". Will this create a problem? --EncycloPetey (talk) 01:01, 15 November 2018 (UTC)

Reprints[edit]

Should simple reprints of a book (that have no changes in content, no change in copyright date, and no designation as a new edition) get separate items in Wikidata? If the answer is yes, does it matter if the publisher is different? For example, if Penguin Books originally publishes a book in 2015 and then reprints it again in 2017 (but with no changes in content except for a one-line note that it's a reprint), should each get a separate Wikidata item? For case #2, let's imagine that a modern publisher reprints the original edition of Frankenstein; or, The Modern Prometheus, should that get a new Wikidata item? Please ping me on any response. Thanks. Kaldari (talk) 01:49, 6 December 2018 (UTC)

I bet EncycloPetey will know the answer to this! Kaldari (talk) 01:54, 6 December 2018 (UTC)
If a modern publisher reprints an old book, it will either be a new edition or a facsimile edition. And if it's a new publisher, that will necessitate a new data item because the date of publication and the publisher are different from the other edition. If it's a facsimile reprint edition, it's de facto a new edition, usually with an ISBN that the original didn't have. For example, the Methuen facsimile reprint of Shakespeare's First Folio, printed 300+ years after the original can't be fit into the data item for the original FF; the publisher, date, etc. are all different.
If it's a new printing of an edition, it could get a new data item, but Wikiproject:Books hasn't really tackled that issue yet. Thus far, we haven't had to worry about that question.
A rule of thumb is: if the data of the "new" edition / printing differs from previous editions / printings, then it needs a new data item. --EncycloPetey (talk) 02:02, 6 December 2018 (UTC)
@EncycloPetey: Here's an actual example of the problem (to prove it's worth worrying about)... Q51499531 is the original U.S. printing of a book in 1912; Q51499519 is a 1915 reprint by a different publisher, while Q51499528 is a 1916 reprint by the same publisher as the 1915 reprint. All three have absolutely identical content except for slight changes to the title page: same pagination, same typesetting, everything. Should all three of these have separate items in Wikidata? Kaldari (talk) 04:58, 6 December 2018 (UTC)
@Kaldari: Different publication date, different publishers enough to considered as different editions. Pagination, typestting can be the same but author of foreword, illustrator or number of pages can be different. Just remember that an item can be used as reference with a page number to source a statement, so everything should be the same especially the page number to considered an edition as reprint. This is not the case in your examples, so different items are required. Snipre (talk) 11:09, 6 December 2018 (UTC)
@Snipre: I believe it is the case in my examples. Judging by the linked scans at the Internet Archive, all three have the exact same pagination and content so they are considered reprints, right? What about the case of Q51499519 vs Q51499528. The only difference between these is that one is a 1915 reprint and the other is a 1916 reprint. Otherwise, they are identical and have the same publisher, pagination, and content as each other. Should we have separate items for both? Kaldari (talk) 17:35, 6 December 2018 (UTC)
Personally, my opinion is that we should not have separate items for books that differ only in publication date (but have the same publisher, content, pagination, etc.). Some books are reprinted nearly every year and having separate items for every year creates a needless maintenance burden and makes it difficult to figure out which item is appropriate to link to. Kaldari (talk) 17:43, 6 December 2018 (UTC)
On a tangent here: How does a book printed in 1912 have an ISBN? --EncycloPetey (talk) 03:43, 8 December 2018 (UTC)

Proposal[edit]

I propose the following guidelines for editions and reprints:
Each edition of a book should have a separate Wikidata item. If the content, pagination, or publisher changes, a new item should be created for that edition. If a book is an identical reprint of a previous edition by the same publisher (with the same content and pagination) it does not need a new item.
Kaldari (talk) 01:30, 10 December 2018 (UTC)

@Kaldari: Another formulation:
Each edition of a book should have a separate Wikidata item. If the content (foreword, afterword, illustration), pagination (page number), or publication data (publication date, publication place, publisher) changes, a new item should be created for that edition. If a book is an identical reprint of a previous edition (no change in the mentioned properties), it does not need a new item.
Snipre (talk) 10:26, 10 December 2018 (UTC)
@Snipre: I like your wording, except for one detail. I don't think we should include publication date in the second sentence. Although technically the publication date data would stay the same for an identical reprint anyway (since publication date (P577) specifies "date or point in time when a work was first published or released), it may confuse people if we say "publication date" here since reprints by definition have new publication dates (but not new first publication dates). Kaldari (talk) 17:47, 10 December 2018 (UTC)
So, applying this to the question above, #How many edition items for On the Origin of Species ?, it seems that most of the entities mentioned in that discussion would end up with different items (creating quite a chaos!) Are there properties that could be used to group together the main different variants of the actual text (ie disregarding differences of pagination, publisher, etc), in order to bring some structure to the set? And/or that would allow one to group together the versions that are most similar to each other? Jheald (talk) 18:28, 10 December 2018 (UTC)
If there are no objections within the next couple days, I'll add a tweaked version of Snipre proposal to the WikiProject page so that we have some guidance on this. We can always refine it further if needed. Kaldari (talk) 19:16, 12 December 2018 (UTC)
Ok to avoid publication date in the above properties list.
@Jheald: What is the problem to have hundreds of editions for On the Origin of Species ? The only problem is just to clearly describe the editions. And perhaps in the first step the best is to focus on the editions with the highest of examplars. Nobody requires to have all editions of all books in WD. Snipre (talk) 20:08, 12 December 2018 (UTC)
It's useful to be able to identify that, although the book was published many times in many different forms, there were only quite a limited number of texts for the book -- so eg if somebody wants to look at how the text varied, they're not faced with a huge number of items with no guidance as to how which relates to which. Similarly, if somebody has an exemplar of a book (or a new scan set) it is useful to be able to identify which other of the published forms it is closest to. Jheald (talk) 16:34, 14 December 2018 (UTC)

Introduction to notabilty[edit]

Hi, I have never made an item about a written work but I would like to do something for some local history books I have here in my room. I am giving them way and usually I "wikimedianize" all this stuff before I remove it. Like I use them as a source in some language editions. I feel it's time to create also an item on wikidata about them but I'd like to know more about the notabilty guidelens before I start. I made a quick sampling, and it does not seem to me that we have a lot of detailed entries in my language (Italian). Even common books are probably missing.

So for example this book is used as a source on itwikipedia and it's one of the books I am giving away. Are those the sort of books whose items we aim to have soon or later? Or we just stick to create items that have at least one clear IDs? (maybe some language might be more common in certain databases, so this could be biased, not sure)--Alexmar983 (talk) 18:58, 7 December 2018 (UTC)

You mean La fortezza di San Barnaba, Firenze 2001? You can create an item for this edition with the properties ISBN-10 (P957) and SWB editions (P1044). Every book that is cited in Wikipedia is notable. --Kolja21 (talk) 01:40, 8 December 2018 (UTC)
Thank you that's what I needed.--Alexmar983 (talk) 15:18, 8 December 2018 (UTC)