Wikidata:Property proposal/Gottstein code

From Wikidata
Jump to navigation Jump to search

‎Gottstein code[edit]

Originally proposed at Wikidata:Property proposal/Generic

   Done: Gottstein code (P11957) (Talk and documentation)
DescriptionGottstein code as a representation of a cuneiform sign variant
Data typeString
Domainpaleographic sign variant (Q118867680)
Allowed values^(a[0-9]+)?(b[0-9]+)?(c[0-9]+)?(d[0-9]+)?$
Example 1Cuneiform Sign Variant of CUNEIFORM SIGN A (Neo Assyrian) (Q119228775)Gottstein code"a3"
Example 2Cuneiform Sign Variant of CUNEIFORM SIGN AN (Neo Assyrian) (Q119228854)Gottstein code"a1b2"
Example 3Cuneiform Sign Variant GIR2 (Old Assyrian) (Q119267190)Gottstein code"a4b1c2d2"
Sourcehttps://www.materiale-textkulturen.de/download_werk.php?w=4000005
Planned useMachine-readable description of cuneiform sign variants

Motivation[edit]

Cuneiform paleography varies greatly across the centuries of the usage of this script and is an interesting subject of research for philologists and for OCR applications.

Cuneiform signs consist of cuneiform wedges which eeach conists of a wedge head and possibly a wedge stroke.

The Gottstein System (https://doi.org/10.6105/mtk.mtc_blog.2012.005.Gottstein) classifies wedges according to their position on the unit circle, e.g. a horizontal wedge may be called wedge type A and a vertical wedge may be called wedge type B etc.

For a full specification of all wedge types a, b, c and d please refer https://doi.org/10.6105/mtk.mtc_blog.2012.005.Gottstein Figure 1 on page 4.

A Gottstein Code then expresses the amount of wedges per wedge type, e.g. a2b2 (two vertical and two horizontal wedges)

Apart from searching for cuneiform signs variants with the same or a similar amount of wedges, Gottstein codes can be used to identify a sign variant in OCR tasks.

Gottstein codes may also be used as classification targets in machine learning settings.  – The preceding unsigned comment was added by Situxx (talk • contribs) at 19:01, 12 June 2023‎ (UTC).[reply]

Discussion[edit]

  •  Comment I only did a year of Sumerian ten years ago; how widespread is the Gottstein system? --Jahl de Vautban (talk) 17:42, 20 June 2023 (UTC)[reply]
    Thank you for the comment!
    There have been publications which implement the Gottstein system like (https://www.academia.edu/download/41598995/Panayotov_2015_Gottstein_System_on_a_Dig_M_and_N-Ass_Pal_CDLN_2015_17.pdf) but in the community of Assyriology the system is not too widely used.
    That being said, digital paleographies of cuneiform text corpora are also not (yet) widely applied and this is missing in the field.
    However, for computer science, Gottstein codes are very interesting as classification targets for machine learning applications which target cuneiform sign recognition.
    While Gottstein codes do not provide a unique description of a cuneiform sign (the same Gottstein code might map to a set of signs with the same amount of wedges per type which are arranged differently), the amount of wedges per type alone is important information when classifying cuneiform signs as clusters of wedges.
    If you want to describe cuneiform sign shapes uniquely, that might be more achievable with a PaleoCode (https://academic.oup.com/dsh/article/36/Supplement_2/ii127/6421811) (which could be another property proposal in the future).
    We are currently working on creating a digital paleography of cuneiform signs using Wikidata, which you can see e.g. here:
    https://situx.github.io/paleordia/c/?q=Q87555087&qLabel=%F0%92%80%AD
    You will see that the shape of cuneiform signs varies a lot throughout the centuries, so that one should not try to classify a cuneiform sign with training data from another epoch.
    Here, the Gottstein Code and maybe also a PaleoCode property would come in handy for computer scientists to retrieve machine-readable cuneiform sign variant information for machine learning tasks and for Assyriologists to have a centralized place to discover paleographic sign variants. Situxx (talk) 09:54, 21 June 2023 (UTC)[reply]
  •  Support While it is true that the Gottstein system is not widely adopted by Assyriologists, I have the feeling it is primarily due to the lack of computational savvy among those working with paleography. This proposal makes a clear argument for the advancement of computational paleographical analysis in terms of a more specific way to classify sign variants in a machine readable format. The Gottstein system is a step in the right direction, and would allow for a more descriptive annotation to accompany each sign on a tablet, which in turn could provide machine learning methods for handwriting detection. This would hopefully provide the necessary avenue for the adoption of a PaleoCode as mentioned above Admndrsn (talk) 17:40, 24 July 2023 (UTC)[reply]