Wikidata talk:WikiProject Manuscripts

From Wikidata
Jump to navigation Jump to search

SpBot archives all sections tagged with {{Section resolved|1=~~~~}} after 7 days. For the archive overview, see Archive/. The latest archive is located at Archive/2024.

Modelling dates and dating methods


Wikidata's designated property to state the time of creation for an entity is inception (P571). Using it in conjunction with qualifier determination method (P459) we can document the method used to determine that date. A recent succinct and yet comprehensive survey of dating methods is given by Enock Omayio, Indu Sreedevi and Jeebananda Panda in their paper "Historical manuscript dating: traditional and current trends", Multimedia Tools and Applications 83.3 (September 2022), available on ResearchGate.

Omayio/Sreedevi/Panda (2022) divide methods for Historical Manuscript Dating (HMD) into three categories:

  1. palaeographic, based on
    1. visual examination (of features such as writing style, manuscript’s layout, nature and type of writing material [support], nature of ink used, probable writing instrument [or pen] used, content of manuscript, nature of binding: based on level of expertise of the palaeographer) or
    2. meta-data (author, place of origin, content, content class, size, language used, layout [twice?]);
  2. physical, either
    1. direct (radiocarbon dating (Q173412), accelerator mass spectrometry (Q530255); deemed destructive because they use a sample extracted from the manuscript) or
    2. indirect (spectroscopy (Q483666)/spectrometry (Q65306641): mass spectrometry (Q180809), Raman spectroscopy (Q862228), infrared spectroscopy (Q70906), atomic emission spectroscopy (Q186548), X-ray spectroscopy (Q901775), surface analysis methods; deemed nondestructive and faster, can be done in situ);
  3. computer-based, either semantic-based or image-based, based on
    1. Hand-crafted features:
      1. Semantic features
      2. Structural features
    2. Deep learned features.

Omayio/Sreedevi/Panda ignore dated or datable manuscripts in their scheme as they focus on dating methods for undated manuscripts. Thus a date given in the colophon (Q372474) of a manuscript (either correctly or as subject to correction) does not feature in the examples above. Jonathan Groß (talk) 17:41, 27 October 2023 (UTC)[reply]

"Annotated dataset[s] of scanned images of dated historical manuscripts are required to make models for dating historical manuscripts." Jonathan Groß (talk) 17:41, 27 October 2023 (UTC)[reply]

On a loosely-related matter: Lake's 10-volume collection of facimilia of dated Greek manuscripts until 1200 is available online. Jonathan Groß (talk) 14:23, 6 November 2023 (UTC)[reply]



Genealogical Chronicle of the Kings of England (Q116728094) has two sets of dimensions - closed or folded in its cover, and open or unfolded. Is there a qualifier to indicate which dimension set I am including? PKM (talk) 03:07, 31 October 2023 (UTC)[reply]

Thank you for bringing this up, this is indeed a good question regarding the many types of manuscript that are present in Wikidata.
I intended "dimensions" statements to refer to the page size. Of course, this only really applies to the codex format or single leaves. I'm not sure how to do it with scrolls, i.e. the carbonated scrolls from the Villa dei Papiri. With scraps from scrolls, the maximum extent in height and width is stated sometimes for each (numbered) individual piece, sometimes for a whole leaf as it can be "reconstructed".
For codices, there are usually two sets of dimensions by the standard of modern codicology: The height and width of the page itself as it appears in binding (half a bifolium, if you will), and the extent of the writing area (the area assigned to and used for the main text of the manuscript). Sometimes there are multiple true values for one and the same manuscript, if it contains multiple production units (i.e. is a composite manuscript). Iviron 812 (Q123196004) is such a case, where there are three originally independent production units combined into one volume. The size I have stated refers to the paper size, not the writing area. I have qualified the three sets of dimensions with applies to part (P518) (using the "work"-item as objects).
To attempt and answer your question, we definitely need a qualifier for that. Something like "applies to subject state" with the value being an item like "opened" (unfolded, unrolled) and "closed" (folded, rolled up).
As to the distinction between page size and size of writing area, I'm not sure how to model that. Maybe with a qualifier like "applies to facet" (full page, writing area)? Jonathan Groß (talk) 08:32, 31 October 2023 (UTC)[reply]

Stating scribes: P11603 or P6819?


Looking at the ontology, I discovered transcribed by (P11603), which I didn't knew. Instead, I was aware of calligrapher (P6819) and it's faulty use on a bunch of manuscripts from the Dutch National Library. Perhaps we should have a look at if both properties shouldn't be merged. --Jahl de Vautban (talk) 07:06, 31 October 2023 (UTC)[reply]

Agreed. While preparing the data model, I found the two properties which were created indepently of one another. calligrapher (P6819) was proposed by somebody from India to model the calligrapher (in the truest sense) of a written document. transcribed by (P11603) was created in 2022 after being proposed by somebody working on Middle English manuscripts. It is clear that the two properties have the same purpose and should be merged.
The major problem I see is one of cultural context: The terms "calligrapher" and "calligraphy" refer an artful and aesthetically pleasing manner of writing, while "copyist" and "transcriber" (arguably) puts the emphasis on the task of the scribe to produce a copy of a pre-existing work. The fact that the former (calligraphy) is often associated with handwritten documents from Oriental (Asian) countries and the latter (copying) is more common to a (medieval) European context may be of importance, I don't really know. What I do know is that there are strong feelings attached to the terms "calligrapher" and "copyist" (like with German Kalligraph and Abschreiber), as either implies an aesthetical judgement on the nature of their work and skill. What I am getting at is, if we decide to merge the properties into one and name it "copyist", we may well alienate a lot of people who use P6819 in calligraphic items that are truly deserving of the name, while on the other hand, if we call the unified property "calligrapher", we will elevate thousands of named or nameless scribes to a level that, looking at their work, they don't really deserve. (TBH some should rather be called "coprographers".)
But this is only my opinion, and it is 80% based on feeling (which is not a good argument) and 20% based on my perception that we may be dealing with differences between writing cultures that need to be taken into account.
If I were hard-pressed to suggest a course of action, it would be to merge the two properties in favour of the label calligrapher, as it is better to flatter a horrible scribe than to demote a calligrapher to copyist.
In an ideal world, I would put together an international conference with conservators and scholars from European, Hebrew, Arab, Persian, Indian, Chinese, Japanese and Mseoamerican manuscript cultures and have them exchange opinions. Jonathan Groß (talk) 08:59, 31 October 2023 (UTC)[reply]
transcribed by (P11603) is relatively new, so anything created before 2023 would have used calligrapher (P6819) for lack of anything better - I did this myself, and I have since changed those statements. I have trouble with using “calligrapher” for a 16th-century copyist of a text, and I would prefer to keep the two properties distinct, even though in some cases one might choose to enter both properties with the same value. - PKM (talk) 20:57, 5 December 2023 (UTC)[reply]

P.S. @Jahl de Vautban: Regarding your hint at a faulty use of P6819: Can you explain in more detail what you mean? I have not yet looked at that but I've noticed numerous items from Dutch collections that are modelled somewhat wonky. Jonathan Groß (talk) 09:11, 31 October 2023 (UTC)[reply]

@Jonathan Groß: I meant that it the way that it is used it conflates the author and the copyist, e. g. on The Hague, KB : ms. 70 E 9 : 5 (Q114989552). I alerted the importer back in February but despite their assurance it didn't improve. --Jahl de Vautban (talk) 09:15, 31 October 2023 (UTC)[reply]

Aah, I see: P6819 points to authors of the texts present in the manuscript. Yikes! @Epìdosis, MartinPoulter: Is there any way to get an error report for this? Jonathan Groß (talk) 09:17, 31 October 2023 (UTC)[reply]

The way I've described it is "the P50 author is the person who created the work of literature while the P6819 calligrapher is the scribe who wrote it on this particular surface". Still, reading the above discussion my understanding of "scribe" is much closer to "copyist" than to "calligrapher". I agree that having the two properties causes confusion. I like the idea of an error report, but what is it looking for? How do we know that P6819 is being used but not to represent the scribe? MartinPoulter (talk) 09:45, 31 October 2023 (UTC)[reply]
@MartinPoulter: A possible starting point might be a query of items with calligrapher (P6819) whose targets do not have either occupation (P106)scribe (Q916292) or occupation (P106)calligrapher (Q3303330) (subclass of artist (Q483501) but not scribe (Q916292), which needs checking).
Regarding your statement "the P50 author is the person who created the work of literature while the P6819 calligrapher is the scribe who wrote it on this particular surface": I fully agree, but are you suggesting that we add P50 to manuscripts? I am strongly opposed to this, as it would bloat the manuscript items without a real need and make curating them harder. Manuscripts should refer to the works they transmit, and P50 should only be used in items on those works. Jonathan Groß (talk) 10:03, 31 October 2023 (UTC)[reply]
@Jonathan Groß: I have tried something different based on the dates, but obviously it's dependent on whether dates exists on the "scribes" (as a matter of fact it didn't detect the above example because of that):
SELECT ?item ?itemLabel ?earliestDate ?latestDate
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
  ?item wdt:P31 wd:Q87167 .
    SELECT ?item (MIN(?date) AS ?earliestDate) (MAX(?date) AS ?latestDate)
    WHERE {
       VALUES ?lifeDates { wdt:P569 wdt:P570 wdt:P1317 wdt:P2031 wdt:P2032 }
      ?item wdt:P6819 ?calligrapher .
      ?calligrapher ?lifeDates ?date .
    GROUP BY ?item
  FILTER ((YEAR(?latestDate) - YEAR(?earliestDate)) > 500)
Try it!
Second query based on occupations (much more results):
SELECT DISTINCT ?item ?itemLabel ?calligrapherLabel
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
  VALUES ?occupation { wd:Q916292 wd:Q3303330 wd:Q3997704 }
  ?item wdt:P31 wd:Q87167 ;
        wdt:P6819 ?calligrapher .
  ?calligrapher wdt:P31 wd:Q5 .
  FILTER NOT EXISTS { ?calligrapher wdt:P106 ?occupation }
Try it!
@Jahl de Vautban: thanks for raising the problem and for the queries. I just add copyist (Q3997704) to the allowed occupations in the second one. I agree with all the above comments, we have some relevant problems: first, transcribed by (P11603) needs to be deleted and its values moved to calligrapher (P6819) (I very much agree with all the reasoning by @Jonathan Groß:, with just one divergence: I would prefer to have "copyist" as label and "calligrapher" as alias, since having "calligrapher" IMHO could be confusing when just adding mere copyist - when adding someone who was truly a calligrapher, this could be specified with the qualifier object has role (P3831)calligrapher (Q3303330)); second, we need to eradicate the wrong uses of these properties, i.e. cases in which the value is not a copyist/calligrapher but the author of the transmitted text (as Jonathan said, the manuscript should only link to the transmitted text(s), and not directly to its author(s)) - I see the biggest problem in the import of manuscripts from The Hague, since it is impossible to massively convert author to text maybe the only (painful) solution is just deleting the data (opinions welcome, of course); third, we need to clean up all occurrences of author (P50) on manuscripts, nearly 3k as per the following query:
SELECT ?item ?itemLabel ?author ?authorLabel
  ?item wdt:P31/wdt:P279* wd:Q87167 ; wdt:P50 ?author .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
Try it!
Some need to be converted to calligrapher (P6819), others fall into the second problem (so they should either be converted into the transmitted text, or just removed). --Epìdosis 14:09, 31 October 2023 (UTC)[reply]
I think the VALUES of the second query of Jahl doesn't work for some reason, I would propose this alternative:
SELECT DISTINCT ?item ?itemLabel ?calligrapherLabel
  ?item wdt:P31 wd:Q87167 ;
        wdt:P6819 ?calligrapher .
  ?calligrapher wdt:P31 wd:Q5 .
  MINUS { ?calligrapher wdt:P106 wd:Q916292 } . 
  MINUS { ?calligrapher wdt:P106 wd:Q3303330 } . 
  MINUS { ?calligrapher wdt:P106 wd:Q3997704 } . 
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
ORDER BY ?itemLabel
Try it!
--Epìdosis 14:25, 31 October 2023 (UTC)[reply]
@Jonathan Groß: "are you suggesting that we add P50 to manuscripts? I am strongly opposed to this" I agree with your opposition; P50 should attach to the literary work. In my experience, manuscript data does sometimes include this "author" property and people try to put it into manuscript records on Wikidata. I'd rather they didn't, but if they are going to add it, I want them to put it use P50 to avoid confusion with scribes. MartinPoulter (talk) 14:43, 31 October 2023 (UTC)[reply]

I'm glad we're in agreement. I would also advocate removing P50 from all manuscript items, as it is hard enough to curate in the work items. Jonathan Groß (talk) 15:13, 31 October 2023 (UTC)[reply]

I have added now a costraint to author (P50) discouraging its use on manuscripts; hopefully it will be discouraging enough. But we have to (try to) solve present occurrences, sigh ... in some cases it seems that P50 has been used as equivalent to P6819, so maybe these cases can be saved after a manual analysis; for others, I tend to support mass removal as Jonathan. --Epìdosis 15:15, 31 October 2023 (UTC)[reply]
This change would make it difficult to group manuscripts as I have done at Wikidata:WP_EMEW/Sources/Leland. This particular project is now moribund anyway, but the ability to group manuscripts by author still seems useful to me. - PKM (talk) 00:03, 4 November 2023 (UTC)[reply]
@PKM, Epìdosis: There should be a way to do this if we combine queries for item withexemplar ofwork and workauthor (P50)authorname. Is it possible? Jonathan Groß (talk) 13:04, 4 November 2023 (UTC)[reply]
Sure, taking The Itinerary of John Leland [Stow transcript] (Q105593039) as example both author (P50) and creator (P170) should be removed; the query should become in order to take into account exemplar of (P1574). Of course, when removing P50 and P170, the item needs to have P1574, otherwise effectively the connection between the manuscript and the author is completely obliterated. --Epìdosis 13:23, 4 November 2023 (UTC)[reply]
Got it, thanks. I can make those changes, but possibly not today.  :-) PKM (talk) 20:29, 4 November 2023 (UTC)[reply]

Disentangling P195 and P217


I have started a discussion here. Opinions and contributions are welcome. Jonathan Groß (talk) 08:26, 1 November 2023 (UTC)[reply]

Modelling 'overlapping' manuscripts


After following up on PKM's request I thought I should address the underlying issue with all of you.

As you probably realise, scholars (and consequently, holding institutions and research projects) differ in their definition as to what constitutes a manuscript; they 'overlap', if you catch my drift. The Gregory Aland list of New Testament manuscripts has a different concept from exempli gratia Pinakes (Diktyon) and the Catenae Catalogue. While the latter describe and define manuscripts as they appear in library catalogues (as they are preserved, what Erik Kwakkel calls a "usage unit"), Gregory/Aland and their successors at Münster assign their canonical numbers to manuscripts as they were conceived ("production unit" according to Kwakkel).

To give an example, a New Testament manuscript currently housed in the National Library of Greece (no. 3139) is designated as Minuscule 2936 by Gregory-Aland. Another NT manuscript in the Hungarian Academy of Sciences' Library (K 499) used to bear the Gregory-Aland number 2764 (= Minuscule 2764). But later study has revealed the Greek and the Hungarian manuscripts to originally have been part of one and the same manuscript, meaning they are dependent units. Gregory-Aland's number 2764 is therefor void and both manuscripts have the same designation in Gregory-Aland's catalog, but are still two distinct manuscripts in Pinakes' and the Catenae Catalogue's scheme.

To deal with this I have opted for a hybrid solution. I have created three items:

I think my reasoning for this is self-explanatory, and there should be no doubt that having these three items is the best solution within Wikidata.

However, when it comes to modelling these items, I have a few quarrels which I would like to hear your opinion on.

  1. The relationship between the three items. The straightforward solution is:
    Minuscule 2936 (Q123161850)has part(s) (P527)National Library of Greece 3139 (Q123348967) and Minuscule 2936 (Q123161850)has part(s) (P527)National Library of Greece 3139 (Q123348967)
    National Library of Greece 3139 (Q123348967)part of (P361)Minuscule 2936 (Q123161850)
    Hungarian Academy of Sciences Library and Information Centre K 499 (Moravcsik 12) (Q123349035)part of (P361)Minuscule 2936 (Q123161850)
  2. The collection statement. My preferred option is
    Minuscule 2936 (Q123161850)collection (P195)National Library of Greece (Q1467610)applies to part (P518)National Library of Greece 3139 (Q123348967) and Minuscule 2936 (Q123161850)collection (P195)Hungarian Academy of Sciences Library and Information Centre (Q458921)applies to part (P518)National Library of Greece 3139 (Q123348967)
    Minuscule 2936 (Q123161850)collection (P195)National Library of Greece (Q1467610)
    National Library of Greece 3139 (Q123348967)collection (P195)Hungarian Academy of Sciences Library and Information Centre (Q458921)
  3. The inventory number statement. I do the same as for the collection (see above)
  4. The identifiers. As our ID properties have constraints for distinct values and unique values (as they should), I do it like this:
    Minuscule 2936 (Q123161850)Gregory-Aland-Number (P1577) 2936 (without Diktyon and Catanae Catalogue IDs)
    National Library of Greece 3139 (Q123348967)Diktyon ID (P12042)5172 and National Library of Greece 3139 (Q123348967)‎Catenae Catalogue ID (P12109)4473
    Hungarian Academy of Sciences Library and Information Centre K 499 (Moravcsik 12) (Q123349035)Diktyon ID (P12042)11722 and Hungarian Academy of Sciences Library and Information Centre K 499 (Moravcsik 12) (Q123349035)‎Catenae Catalogue ID (P12109)4628
  5. The catalog codes. I add them to all of the items:
    Minuscule 2936 (Q123161850)catalog code (P528) 2936catalog (P972)Gregory Aland list (Q122926343)
    Minuscule 2936 (Q123161850)catalog code (P528) 2936catalog (P972)Gregory Aland list (Q122926343)
    National Library of Greece 3139 (Q123348967)catalog code (P528) 2936catalog (P972)Gregory Aland list (Q122926343) and National Library of Greece 3139 (Q123348967)catalog code (P528) 2764catalog (P972)Gregory Aland list (Q122926343) (deprecated rank, with reason for deprecated rank (P2241)obsolete (Q107356532))

Number 1 should be uncontroversial. Numbers 2 and 3 (if we ignore our ongoing uncertainty how to model this, see this discussion) may be an instance of unnecessary data duplication? In number 4 I have avoided this, but at the same time made it more difficult to access Diktyon along with Aland numbers. Number 5, finally, in my opinion is neither an instance of data duplication nor of conflation and should be OK.

Looking forward to your opinions. Best, Jonathan Groß (talk) 14:03, 6 November 2023 (UTC)[reply]

I’ve had a similar situation with tapestries and embroideries where a series is broken across multiple collections - example at The Story of Troy (Q77347174). I am following your steps 1 and 2. I think inventory numbers, identifiers, and catalog codes should be on the parent when they apply to the whole manuscript, and otherwise on their individual parts. What you’ve done is thorough, but as you say it’s a lot of data duplication. Open to other approaches. - PKM (talk) 23:39, 6 November 2023 (UTC)[reply]
I agree on 1) and 4) but I disagree on the other points. My reasoning is simple : as of today, Minuscule 2936 (Q123161850) doesn't exist anymore as a physical object but was instead broken into two separate parts. No institution claim to have the whole thing, and indeed adding to it the collection and inventory number would result in an institution holding two object with the same inventory number, which shouldn't be possible, and furthermore one of which it doesn't effectively hold. What we should try to model is that Minuscule 2936 (Q123161850) is definitely broken into two parts that were once the same item and thus shouldn't show up when I query for the list of all existing manuscript. This also makes me disagree with 5), on the ground that the number should only apply to the original whole manuscript. I get that you use qualifiers to refine the statement, but I would say that qualifiers are often difficult to be queried and difficult to be explicetely sourced. --Jahl de Vautban (talk) 08:31, 7 November 2023 (UTC)[reply]

Thank you Jahl for your comment. I agree that my handling as described above is not ideal ("nicht der Weisheit letzter Schluss"). Your take on the data duplication in 2) and 3) is very illuminating for me and I will follow your suggestion. On 5) however I disagree. It is common practice in manuscript databases to assign the GA numbers even if these describe a set of items scattered across multiple holding institutions. The reason being that catalog codes need not be unique, and should be as visible and accessible as possible. This also made me wonder if we shouldn't add the Diktyon numbers both in the "parent" and "children" items (with a qualifier in the parent item like "applies to part"). But this is a matter for debate, I realise, and I'm glad we have this site now to exchange our viewpoints. Best, Jonathan Groß (talk) 14:06, 7 November 2023 (UTC)[reply]

@Jonathan Groß: it's fine, we don't have to agree on everything. More on my idea of being able to tell that a manuscrit is broken in several parts: I think both the original manuscript and the resulting parts should be kept as instance of (P31)manuscript (Q87167), because they respectively were and are a manuscript; the only way to refine that would be with data qualifier, but again hard to query. Therefore, we would need another delaration for that. I wonder if we could use significant event (P793)cutting (Q196751) or significant event (P793)separation (Q3182649) or a similar value on the original manuscript. --Jahl de Vautban (talk) 19:49, 7 November 2023 (UTC)[reply]
We might also add instance of (P31) = fragment (Q11086567) to separated parts of a whole (in addition to manuscript (Q87167). I have used this approach with textile fragments. - PKM (talk) 22:49, 7 November 2023 (UTC)[reply]
This raises the issue of what to do with manuscript fragment (Q30103158). Jonathan Groß (talk) 18:47, 12 November 2023 (UTC)[reply]
On further thought, I think I'd prefer to use manuscript fragment (Q30103158) for these - I keep forgetting that it exists. - PKM (talk) 23:04, 19 November 2023 (UTC)[reply]
Does it make ontological sense to have an item whith two P31, one of which being a subclass of the other (real question) ? --Jahl de Vautban (talk) 08:23, 22 November 2023 (UTC)[reply]
In my opinion, we should aim for to a single P31. I am unsure how to best define this though. I would like to keep it simple (as in "all people are Q5"). But maybe it makes sense to use more specific subclasses instead of Qmanuscript (which need to be modelled sensibly of course): manuscript fragment, palimpsest, scroll, codex, dismembered codex... I am writing this from my phone BTW, hence no item links. Jonathan Groß (talk) 13:00, 22 November 2023 (UTC)[reply]

Dealing with conflation of manuscripts and works


"Manuscripts which aren't manuscripts" was the title I wanted to give this section, but that would be confusing.

One more area we should at some point tackle is separating works from exemplars. Sometimes I come across items that claim to be about manuscripts, but on closer inspection turn out to be about works transmitted in the manuscript. For example, Cleopatra Glossaries (Q5131738) claims to be instance of (P31)codex (Q213924) and instance of (P31)composite manuscript (Q33308141) but the English Wikipedia article covers a set of texts (glossaries) transmitted in a composite manuscript. So ideally we should have a number of items: One for the manuscript, one for the set of texts, and four more for the individual texts. This is what I have done with the Berlin Chronicle (Q21100459) transmitted in Egyptian Museum and Papyrus Collection, P 13296 (Q21100575), and with the Alexandrian World Chronicle (Q21100150) transmitted in the Goleniscev Papyrus (Q21100168).

My question to you is: How can we find these cases? And how should we deal with them? Collect and discuss them individually, or solve each one as we see fit in the moment? Ideally we should have a Taskforce for that. Thanks, Jonathan Groß (talk) 17:00, 18 November 2023 (UTC)[reply]

Other examples for this kind of conflation are: Rawlinson Excidium Troie (Q7297036), Ystorya Dared (Q107550169) ... Jonathan Groß (talk) 17:57, 18 November 2023 (UTC)[reply]

I have found and cleaned up a few of these “in the moment” (Codex Huygens (Q80191660) and Le Regole del disegno (Q123458703)). I don’t think we have enough active members yet to start making task forces, but if people want one that’s great. Focused effort is good. - PKM (talk) 00:48, 19 November 2023 (UTC)[reply]
I’d like to work on the Rawlinson Excidium Troie - ARLIMA has 13 exemplars - but I can’t start on it right away. But if no one beats me to it, it’s on my list. - PKM (talk) 00:09, 22 November 2023 (UTC)[reply]
@Jonathan Groß: The following SPARQL query for Wikidata Query Service produces 958 items that are both a textual work (or one of its subclasses) and a manuscript / writing surface (or one of their subclasses):
WHERE { {?item wdt:P31 / wdt:P279* wd:Q47461344 }
UNION { ?item wdt:P31 / wdt:P279* wd:Q3327760 }.
?item wdt:P31 / wdt:P279* wd:Q87167 .
} But I think it is just the tip of the iceberg, for many more cases are not marked as such. Ailintom (talk) 08:31, 23 November 2023 (UTC)[reply]
@Ailintom: Thank you! I've gone ahead and created Wikidata:WikiProject Manuscripts/Conflation as a repository for these issues. Jonathan Groß (talk) 09:09, 23 November 2023 (UTC)[reply]
I have placed an even better version of queries on Wikidata:WikiProject Manuscripts/Conflation Ailintom (talk) 09:35, 23 November 2023 (UTC)[reply]
@Jonathan Groß:: Sorry, I made a mistake in my query. Here are better queries:
2705 items that are both a manuscript and a textual work
50 items that are both an archaeological artifact and a literary work
Some queries are too heavy for WikiData to handle, so one has to choose narrower classes to make it work. Ailintom (talk) 09:14, 23 November 2023 (UTC)[reply]
Note that within the project Epigraphy, we took the decision to have two P31 for each item, inscription (Q1640824) + whatever archaeological item it is. As most texts are unique, it didn't make much sense to have two items for each inscription. However, currently inscription (Q1640824) is a subclass of written work (Q47461344) through epigraph (Q669777). That's not ideal and should be adressed, but that may explain some of the cases. --Jahl de Vautban (talk) 14:03, 23 November 2023 (UTC)[reply]
Hey. It's great to see this discussion happening here. The Manuscript data model on Wikidata has needed an overhaul for some time! Separating works and manuscripts (the artifact) is a great start - similar work needs to be done for other things on Wikidata, such as separating buildings and the organizations they house. It makes for much cleaner, more useful data. A while back I uploaded items for about 500 manuscripts in the National Library of Wales archive, such as Peniarth 481D (Q21541725). You'll see I also opted for 'Exemplar of' to link the mss to the works it contains. However, there are challenges with this approach. Many archives don't have data about works, or some of our mss contained hundreds of poems - each of which is a unique work, raising the question - should each poem have its own item? and if not, how can we associate the authors of the works with the manuscript? Many of our data records list a bunch of authors without specifying the works at all but currently, i'm unsure if these can be linked to the manuscripts on Wikidata at all since a manuscript can't really have authors. Anyway, just food for thought. I look forward to taking part in the discussion. Best Jason.nlw (talk) 11:10, 20 December 2023 (UTC)[reply]

Are manuscript fragments in-scope of this project?


Hi, My insitution, the Swedish National Archives, do have a number of medieval manuscripts which we'd be happy to upload to Commons and represent on Wikidata. During the Reformation in Sweden many, most even, manuscripts were destroyed and fragments reused for bookbindings. These fragments represent c. 11 000 manuscripts and are held in a database by us,

So my questions are: Are fragments in-scope of this project? Are individual fragments even of interest to Wikidata or would a better focus be the original manuscripts that scholars have identified/reconstructed from the fragments? Are there any good examples of how fragments or fragments collections should/could be represented on Wikidata? DivadH (talk) 09:58, 21 November 2023 (UTC)[reply]

I think the answer is: Yes, fragments are very much within our project scope, even though we still haven't decided on clear-cut rules how to model them. I will take a closer look at the project later. Jonathan Groß (talk) 14:37, 21 November 2023 (UTC)[reply]
@DivadH: Can we set up a meeting to talk about the collaboration you suggested? I'm available from 16:00 CET during the week, and from 14:00 CET on the weekend. Jonathan Groß (talk) 09:42, 23 November 2023 (UTC)[reply]
Jonathan, my apologies for not getting back to you sooner! I fell sick the day after I wrote this message and am back at work first now. I would be happy to speak but I can't spare the time this week. Could you perhaps contact me on my work mail? david.haskiya at DivadH (talk) 07:24, 28 November 2023 (UTC)[reply]



I'm still a bit confused about using fonds (Q3052382). Should these subcollections within Árni Magnússon Institute for Icelandic Studies (Q627418) be fonds or collection? - PKM (talk) 01:02, 28 November 2023 (UTC)[reply]

@PKM: We started a discussion about that here, but so far no professional archivists have shown up to help us clear up the data modelling. As per our current data model, collection (P195) is recommended for stating the holding institutions (which is far from ideal: "collection" is ambiguous), ‎fonds (P12095) is meant for a record set within an institution. So yes, as of now, you can use P12095 for these subcollections. Cheers, Jonathan Groß (talk) 09:47, 28 November 2023 (UTC)[reply]

Parts of parts of parts


Do we think the 17 parts of Hauksbók AM 544 4to (Q123582890)! as described at, each needs its own “has part” item, and furthermore each of the 13 parts of the first of the 17 parts needs its own “has part”, and the text of each of these parts needs a work or edition/translation item that it can be an exemplar of? Just want to be sure I am not overthinking this.

And even if this would be proper in an ideal world, is it less important than making basic items for the hundreds of manuscripts that have no Wikidata items at all? - PKM (talk) 05:52, 1 December 2023 (UTC)[reply]

In my opinion, you should use your best judgement. Personally I would say it's more important to describe whole manuscripts first, but what's most important is quality data in each single item. Jonathan Groß (talk) 11:28, 1 December 2023 (UTC)[reply]
I see that Strahovský kodex DG III 7 (Q123284647) is modeled with multiple "exemplar of" statements, and I think that's a reasonable compromise. - PKM (talk) 23:30, 2 December 2023 (UTC)[reply]
How is this for a standard:
  • A manuscript or codex may link to the (notable) works it contains using one or more exemplar of (P1574) statements, preferably qualified with folio(s) (P7416).
  • A manuscript should link to separated sections, leaves, illuminations, and the like using has part(s) (P527). Any part of a manuscript with its own inventory number (in the same or a different repository/collection) should have its own Wikidata item.
  • Any codicological unit of significance (or having its own Wikipedia article) may also have its own Wikidata item. These should be linked to their parents using has part(s) (P527)/part of (P361).
PKM (talk) 01:01, 8 December 2023 (UTC)[reply]
I see our Showcase Item Laurentianus Plutei 70.5 (Q123109695) qualifies "exemplar of" statements with page(s) (P304) rather than folio(s) (P7416). Does anyone have a string preference? - PKM (talk) 00:17, 10 December 2023 (UTC)[reply]
To be honest, I didn't know folio(s) (P7416) existed. It's far better for Codices I think. For scrolls with only recto/verso distinction, I would still prefer page(s) (P304). Jonathan Groß (talk) 13:03, 11 December 2023 (UTC)[reply]
That sounds good to me. - PKM (talk) 23:25, 14 December 2023 (UTC)[reply]

Classes of manuscripts


I would like your feedback on Wikidata:WikiProject Manuscripts/Data Model#Classes of manuscripts section. Initially I have phrased it like this:

The standard designation for any manuscript item is MS.instance of (P31)manuscript (Q87167). This is replaced in some cases by:

Other uses like instance of (P31)codex (Q213924), instance of (P31)book (Q571), instance of (P31)papyrus scroll (Q113016548), instance of (P31)papyrus fragment (Q95065857), instance of (P31)lectionary (Q284465) are discouraged.

But I'm unsure if my stance on discouraging instance of (P31)codex (Q213924), instance of (P31)papyrus scroll (Q113016548) and instance of (P31)papyrus fragment (Q95065857) is justified. After all, scrolls and codices are fundamentally different in their properties, their only shared material characteristic is the writing support and ink/paint. This is why I want to present a different practice to you in two variants, and discuss which one we should adopt:

A) The generic designation for any manuscript item is MS.instance of (P31)manuscript (Q87167). Whenever possible, this statement should be replaced by a more specific subclass:

B) In order to prominently feature major characteristics of the manuscript, these properties may be stated as additional P31-statements:

I'm not sure whether we should encourage the use of monomerous codex (Q123476808) and composite manuscript (Q33308141); the former is a very technical term. Bearing in mind that we haven't yet decided on a foundational model (I still favour Gumbert's), we should try not to be too specific with P31.

What do you think? Jonathan Groß (talk) 14:05, 1 December 2023 (UTC)[reply]

What is the distinction between composite manuscript (Q33308141) and codex (Q213924)? Is it just whether the item is bound or not? - PKM (talk) 23:40, 2 December 2023 (UTC)[reply]
composite manuscript (Q33308141) should rather be labelled "composite codex" as it describes a quality unique to codices: A composite manuscript (Q33308141) is a codex composed of two or more 123476271). The opposite term is monomerous codex (Q123476808). Jonathan Groß (talk) 10:15, 3 December 2023 (UTC)[reply]
Thank you. codicological unit (Q123476271) had no instance of (P31) or subclass of (P279). Since it has subclasses, it also needs to be a subclass. I have chosen codicological unit (Q123476271)subclass of (P279)book component (Q63285117) but feel free to change that if there's a better choice. - PKM (talk) 00:00, 4 December 2023 (UTC)[reply]
I am uncertain. Some WikiProjects insist on a single generic instance of (P31) for all items in their domain (so all people are "human" and only "human"; all paintings are "painting"). Conversely, computers, textiles, clothing, built heritage and other projects use one of many values for P31 - built heritage says "use the most appropriate type for the site"; clothing and textiles often use the classification used by the holding institution but may be more or less specific.
My personal preference is
  1. Set P31 to the "most specific subclass of manuscript (Q87167)" (which would include codex (Q213924))
  2. Do set multiple P31s as long as one is not a parent or subclass of another
I think setting P31 to a class and one or more of its subclasses is bad practice in most cases but might make sense here (see discussion at Property talk:P31/Archive#Multiple "instance of" statements in a single article from 2019.
- PKM (talk) 00:38, 4 December 2023 (UTC)[reply]
I agree that it is unfortunate to set multiple values for P31, especially when they are subclasses of the same class. But the only alternatives I can think of is either (1) create new items for all possible combinations (illuminated palimpsest codex fragment etc.), which is ridiculous, or (2) think of other ways to model these specific properties (i.e. ms.has characteristic (P1552)palimpsest (Q274076), ms.has characteristic (P1552)manuscript fragment (Q30103158), ms.has characteristic (P1552)illuminated manuscript (Q48498)). Jonathan Groß (talk) 09:41, 4 December 2023 (UTC)[reply]
Oh multiple P31s seems like the way to go. - PKM (talk) 20:25, 4 December 2023 (UTC)[reply]
It's a interesting question, though I think we should start with what exactly is a manuscript (Q87167). I have come up with several dimensions, that I have ranked below. The following was written and rewritten over some hours, it might not be all that coherent.
  • (0) At the present, my understanding of the majority of the descriptions and the properties is that any handwritten document can be a manuscript (Q87167). Interestingly, the existence of manuscript fragment (Q30103158) imply than by using manuscript (Q87167) we actually mean complete handwritten document. For the document part, we are therefore dealing with object whose main purpose resides within the text they carry and the information they contain; a painting on paper with e.g. single describing names on it wouldn't be considered a document.
  1. For handwritten the Spanish description supplies the interesting precision that it only apply to flexible supports and that brings the question of the material. The existing data model puts forward parchment (Q226697), papyrus (Q125576) or paper (Q11472), but we could also think palm-leaf manuscript (Q1641020) or silk (Q37681). The distinction with, say, inscriptions on stone or metal, ostraca or clay tablets would thus lie in the materialy of the writting support rather than the technique used to write on it. That means that an Egyptian papyrus scroll from 2000 BC, a Chinese silk scroll from the 6th century or an English parchement codex from the 12th can all in their own right be considered a manuscript (Q87167). At this point, we could consider if we want to add made from material (P186) to hundred of thousand of items or just create an handfull of subclasses based on the material to avoid that, but papyrus fragment (Q95065857) singles out the fact than those subclass should all of their fragment equivalent and it starts to become messy. On the question of the fragment we could use state of conservation (P5816), but we need to consider the original piece and not what we have now, which I feel contradicts what I supported in the previous discussion related to manuscripts broken into separate parts.
  2. Let's go now to the form of the writting support. In my own interpretation , this will mostly be the distinction between codex (Q213924) or scroll (Q720106), though I except they are other way of dealing with handwritten documents to store them efficiently than rolling them or piling them up. From the previous distinction in material and form, papyrus scroll (Q113016548) appears as an oddity in uniquely combining both. For form, distribution format (P437) could be a candidate, but the descriptions currently used on several languages makes me wonder if it's not restricted to works.
  3. Distinct but related to the form is the question of the sheer number of individual folios or individual pieces a given manuscript (Q87167) has. In my understanding again, a codex can have between one and n; I don't know if a scroll (Q720106) can have more than one codicological unit (Q123476271), but surely they are example of documents so long that they had to be written on several scrolls? This is closely related to the concept of codicological unit (Q123476271). A composite manuscript (Q33308141) could be described as a manuscript with several CU, each with a certain number of folios. For this aspect I'm only able to find number of parts of this work (P2635) but this wouldn't be usable on manuscript as they are not works. Not sure if something else exists.
  4. Then we have some additional properties of the manuscript (Q87167), like does it have pictures or was there an other text on the same support before the current one? For this I wasn't able to come up to a good solution, presumably we should refrain from using them as P31 as it bypass the previous three levels. has characteristic (P1552) could be a solution.
  5. Finally we have the use of the manuscript (Q87167), which would be things like lectionary (Q284465). This I think come last, because the use of a manuscript (Q87167) is irrelevant to the previous 4 levels and indeed it could have changed overtime, from manuscript used to bind new codices to papyri used to stuff mummies. Interestingly, in this approach, a palimpsestic (?) manuscript could have different uses overtime. has use (P366) could be used for this dimension.
Here are my thoughts. --Jahl de Vautban (talk) 22:36, 4 December 2023 (UTC)[reply]
A few thoughts here, by your numbering:
1. I believe our standard should be to add made from material (P186) to every item (in addition to the support, it can include ink, tempera, gold leaf etc., according to what the holding institution specifies). As far as the general hierarchy, it's instructive to look at what the Getty AAT has. They conflate the two senses of "manuscript" into one hierarchy (which I dislike) but otherwise it is similar to what we have done.
2. If we need a property for "format" I would prefer to make a new item "manuscript form" or "manuscript format".
3. I have used number of parts of this work (P2635) for number of unique texts in a codex. I use number of pages (P1104) with unit "leaf" or "folio" for the extent of the entire manuscript or codex, and folio(s) (P7416) for the specific range of folios of an exemplar. (See example at Oxford, Bodleian Library MS. Rawl. D. 893 (Q123655413). There might be a better ways to do this.
4. We might want to use has graphical element (P9344) to list things like manuscript illumination (Q8362), historiated initial (Q4924460), etc.
5. For lectionary (Q284465) and other uses, I have used genre (P136) but has use (P366) would work for me.
- PKM (talk) 03:41, 6 December 2023 (UTC)[reply]
To avoid confusion, I'll respond briefly to some points. The concept of codicological unit (Q123476271), strictly speaking, does not apply to scroll (Q720106) as scrolls are not codices. Of course, a scroll can be subject to similar interventions and alterations as a codex, but to model these with codicological unit (Q123476271) or any of its subclasses would be ontologically wrong. Jonathan Groß (talk) 12:24, 8 December 2023 (UTC)[reply]
I wasn't really implying that, but I do wonder how would we link scrolls in the case of a content so large that it was written over several pieces of them. Anyway, I agree with PKM's choice of properties. If I wrap up, that would mean that we should either stick with manuscript (Q87167) as P31 and find a way to express that they take the form of a codex or a uolumen or with replacing manuscript (Q87167) by codex (Q213924) or scroll (Q720106) as subclasses of manuscript. --Jahl de Vautban (talk) 07:57, 15 December 2023 (UTC)[reply]
I have been digging further into subclasses of manuscript (Q87167), and I now think my wording "Set P31 to the most specific subclass of manuscript (Q87167)" is completely unworkable (see the queries direct subclasses of "manuscript" and direct subclasses of "illuminated manuscript"). I now think we should have:
- PKM (talk) 23:11, 16 December 2023 (UTC)[reply]
Dear all, I am bit late to the party, but may I add my thoughts here. My experience with (Wiki)Data is limited, so please be patient with me in this respect, but I have some experience with medieval manuscripts, and recently have tried to compare all major data models for manuscripts I could find. It struck me that they tend to follow the terminology of manuscript catalogues which in turn is informed by Western palaeography. For this reason, some classes of documents have much more prominence than others - namely codices (because so much ancient literature only survives here), fragments (because so many of the oldest books only survive as fragments) and papyri (not least because of their importance for the Greek New Testament).
In my view, this perspective also explains why there is no subclass of manuscripts to describe letters, charters, and other (unbound) documents written on one or more individual sheets, and why paper plays no big role in our debated - while such documents survive in huge masses, they are not normally described in manuscript catalogues and play little if any role in traditional palaeography.
This model made sense, as you can describe 95% of the relevant material with just three terms, but it is not a very logical. It mixes format, state of preservation, and writing material. Of course a codex can be made from papyrus (not a good idea, but people have tried) and/or fragmentary, there are many parchment scrolls, etc. Also, what was relevant to traditional Western palaeography is still relevant to us, but WikiData is surely more inclusive than Bernhard Bischoff was. Partly, these issue can easily be fixed but partly not. Namely, I think there is a strong case to keep these things separate:
As I said, I am new here, so I certainly don't mean this as a suggestion to suddenly change long-standing practice; I am well aware how difficult it is to implement such changes like cleaning up the 68 (!) direct subclasses of "manuscript" (search). In any case, if I had to find a logical model to both reconstruct traditional scholarly terminology and include materials not relevant to these traditions, this would be my first idea. atb CRolker (talk) 19:32, 29 May 2024 (UTC)[reply]

Papyrus - material vs. manuscript


papyrus (Q125576) is papyrus the plant material. A couple of weeks ago, User:Shonagon created a well-referenced new item for "papyrus manuscript" - it's now papyrus (Q12043767) after merging with the extant item linked to CSwiki.

We have 109 items that are instance of (P31) papyrus (material). I am going to verify that these are all manuscripts or manuscript fragments and change their P31 to papyrus (manuscript). - PKM (talk) 22:54, 11 January 2024 (UTC)[reply]

Hello PKM. I apologize. I searched but didn't notice the existing item papyrus (Q125576) and forgot to report the new item in instance of (P31) on existing items about papyrus manuscripts and thanks for doing it.
The split was necessary. There are 2 different concepts, which have 2 different Art & Architecture Thesaurus ID (P1014). It was ontologically problematic to have items with instance of (P31):papyrus (Q125576) which is a subclass of plant material. I checked all external IDs of papyrus (Q125576) one by one and those about the manuscript on papyrus and not the material papyrus were reported to the new item papyrus (Q12043767). Best regards --Shonagon (talk) 23:21, 11 January 2024 (UTC)[reply]
Thank you @Shonagon for all of this. This is perfect. - PKM (talk) 23:26, 11 January 2024 (UTC)[reply]
I have also added it as superclass of papyrus scroll (Q113016548). --Jahl de Vautban (talk) 23:27, 11 January 2024 (UTC)[reply]
The 109 papyrus manuscripts have their P31s changed. - PKM (talk) 23:41, 11 January 2024 (UTC)[reply]
Thank you all! Jonathan Groß (talk) 15:13, 11 March 2024 (UTC)[reply]

Redundant information and strict modelling


Hi y'all,

Notified participants of WikiProject Manuscripts

@Laboratoire LAMOP: created a lot of manuscript recently and raised the question of redundancy vs. modelling. When a specific identifier exist, should we re-add the same link with the general property described at URL (P973)? (or even thrice with full work available at URL (P953)).

  • For me, the redundancy is bad and dangerous and should be avoided at all cost (just imagine update tens of thousand of URL vs. just updating once the identifier).
  • Laboratoire LAMOP argue that we should follow the modelling (although it's technically *not* on Wikidata:WikiProject Manuscripts/Data Model) to make the link more findable.


Cheers, VIGNERON (talk) 15:23, 16 May 2024 (UTC)[reply]

In my opinion only the ID property if it exists; I perfectly agree with your argument on redundancy. I can somehow understand the use of full work available at URL (P953) (but I would better avoid it if the ID is already present); the generic described at URL (P973), if redundant, is surely to be removed - I routinely do it myself in such cases. Epìdosis 15:26, 16 May 2024 (UTC)[reply]
BNF Manuscripts have an identifier but this is the exception. As in the dashboard, users will use P973 to find out if a manuscript has a description and BNF manuscripts that nevertheless have a description will not be included in the results.
There are currently 75,000 manuscripts with this property, its use seems widespread and should be included in the BNF manuscripts despite its redundancy.
Laboratoire LAMOP (talk) 16:00, 16 May 2024 (UTC)[reply]
I'll use the Handschriftenportal (Q120049859) as an example, because I'm more familiar with it. For the Stuttgart Psalter (Q2359574) e. g. there is a HSP-ID, which refers to a short table of basic information. Then there are two long form catalogue descriptions, which, with their bibliographical informations (in many other cases they are scans of older printed catalogues), could be named with described by source (P1343). And finally there is the digitized manuscript itself, obviously a candidate for full work available at URL (P953). Both things I may want to query for: are there a useful descriptions and/or are there digital copies of those manuscripts I can use for my research? HHill (talk) 16:27, 16 May 2024 (UTC)[reply]
@HHill: it seems to be a different situation with no formal redundancy. In that case, don't hesitate to indicate all the URLs (with a general property or ideally - if it fits the need - with specific identifier properties). Here the case is that 2 exactly identical URLs are generated twice on the same item, once with a general property and once with a specific property. For example, on Q125948913 you have twice (with described at URL (P973) and with BnF archives and manuscripts ID (P12207)). Cheers, VIGNERON (talk) 18:56, 16 May 2024 (UTC)[reply]
Indeed, and in the Handschriftenportal case all the mentioned different aspects can be addressed separately via[HSP-ID].
The example you link to is certainly rather low on a spectrum from rudimentary to extremely detailed manuscript description. HHill (talk) 11:22, 17 May 2024 (UTC)[reply]
The problem is that "described at URL" does not state the URL at which the most detailed and accurate description is found, but rather an unqualified assortment of URLs. This is where ID properties come in handy, and for that reason I strongly agree with Epìdosis and VIGNERON. Jonathan Groß (talk) 16:44, 16 May 2024 (UTC)[reply]
I broadly agree in general that it is best not to have duplicate data, though I'm sympathetic with the discovery problem that specialized properties might pose. That might be mitigated by well organized project page like this one, but you need to be aware of it, something we can't take for granted. Does the query service allows for a query that would yield any identifier that is a subclass of Wikidata property for items about manuscripts (Q29546563) for any manuscript? --Jahl de Vautban (talk) 07:03, 17 May 2024 (UTC)[reply]
Yes, the query service allows for a query that would yield any identifier being instance of (P31)Wikidata property for items about manuscripts (Q29546563) or instance of (P31)Wikidata property related to papyrology (Q124542227) (being Wikidata property related to papyrology (Q124542227)subclass of (P279)Wikidata property for items about manuscripts (Q29546563)): the query is and it times out in WDQS, but it works in QLever ( giving more than 2.5M results. Epìdosis 20:24, 17 May 2024 (UTC)[reply]
Turns out I didn't propertly thought through this: a number of property concerned aren't in fact identifiers (like illustrator (P110) or exemplar of (P1574)), so obviously the numbers skyrocketed. When specifiying only properties with applicable 'stated in' value (P9073) and items that are subclasses of manuscript (Q87167), this query yields 71900 results. --Jahl de Vautban (talk) 08:31, 18 May 2024 (UTC)[reply]
Bonus query with the actuals links, though I don't understand why the results drop. --Jahl de Vautban (talk) 08:35, 18 May 2024 (UTC)[reply]
Notified participants of WikiProject Manuscripts @Laboratoire LAMOP: it's been almost two weeks and there seems to be an agreement, what should we do now? Can we remove all this redundant data? Can someone put it somewhere explicitly on the modeling page Wikidata:WikiProject Manuscripts/Data Model? Cheers, VIGNERON (talk) 12:44, 29 May 2024 (UTC)[reply]