Talk:Q11953984

From Wikidata
Jump to navigation Jump to search

Autodescription — linguistic unit (Q11953984)

description: any of a range of units of language, whether a word, phrase, clause, sentence, paragraph, whole conversation or a story, morpheme, grapheme, phoneme and syllable
Useful links:
Classification of the class linguistic unit (Q11953984)  View with Reasonator View with SQID
For help about classification, see Wikidata:Classification.
Parent classes (classes of items which contain this one item)
Subclasses (classes which contain special kinds of items of this class)
linguistic unit⟩ on wikidata tree visualisation (external tool)(depth=1)
Generic queries for classes
See also


German label "Spracheinheit"[edit]

I tried to find German sentences matching the definition as per traced structured data and failed."Spracheinheit" would rather be something like language or dialect, but that would need to be carefully checked. --Dan Polansky (talk) 11:44, 23 December 2022 (UTC)[reply]

I was inclined to agree after finding some text defining it as a "group of genetically related languages" (right now I instead find A Catalogue of Dictionaries and Grammars of the Principal Languages and Dialects of the World; with a list of the leading works of the Science of Language), but then I found Units, Text and Language: Die Natur der Spracheinheiten which seems to confirm that "Spracheinheit" has been used also in the sense "linguistic unit" we are talking about here.
This label was introduced simultaneously with the English label "language unit" on 29 May 2018, then moved to the alias column during an item merge in 2019.
I therefore think "Spracheinheit" can be returned to the alias column, precisely because it's ambiguous, and we thus want users to find any and all available matches when they search for it. --SM5POR (talk) 10:09, 3 January 2023 (UTC)[reply]
@SM5POR: Thank you. In Wiktionary, we require 3 attesting quotations, and that seems to be a nice, fairly lenient standard. Would it be too much of me to require 3 attesting quotations as evidence? I know this is not based on any Wikidata policy. The idea is to guarantee some minimum standard of being widespread, 3 still being a very lenient standard. --Dan Polansky (talk) 14:00, 6 January 2023 (UTC)[reply]
@Dan Polansky: I'm afraid I'm not in a position to either interpret or determine Wikidata policy on source citation. Some properties come with a constraint requiring at least one reference, but this is obviously a pretty low bar. When it comes to labels, descriptions and aliases, those fields have never had their own references, and adding corresponding name (P2561) claims for each label and alias in every language for citation purposes only is probably way beyond best current practice in Wikidata (besides I think name (P2561) is actually intended for proper nouns, not words for items in general).
The citations you expect would fit nicely in the lexeme section, however, and I see that Spracheinheit (L890958) was actually added (by a bot? I don't know what triggered it) just a week ago. What's still missing are the individual senses. The lexeme has a reference to DWDS, but I'm unfamiliar with that source and can't tell what it really shows (I haven't explored the links provided there).
When it comes to documenting past and present language, I'm inclined to accept singular instances of words appearing in printed media as evidence of use. I have myself searched Swedish newspapers at tidningar.kb.se for various words and added links to those lexemes (see ombudsman (L239133) for examples, and maybe you have some idea of what sources could be used for ombudsman (L449949)).
In the long run, I think all labels and aliases should be matched by corresponding entries in the lexeme database. One thing I dislike about the aliases is the tendency to include not only words from the respective languages, but also graphical symbols, dingbats and smileys from the vast repository of Unicode characters (see heart (Q826930) for an example; the fact that you technically can add them doesn't mean it's a good idea; it clutters up the visual presentation and blurs the distinction between "text" and "pictures"). So, spelled-out words (including names) only, is what I prefer. And the source citations go to the lexemes, once they are created. But requiring three citations for a single alias? I haven't used Wiktionary and don't know on what grounds those criteria were chosen. Knowing that a word or other expression is widespread may be important to Wiktionary, but I don't think that criterion applies equally to lexemes or aliases. Better add statistics to the lexeme to show how frequent it is, than wait until you have seen it three times before you even begin counting.
As a practical matter, requiring the citations to be entered already when a lexeme is created would seriously hamper the development of the database, I think, to the point of it effectively coming to a standstill (I recall the yearly years of Wikipedia around the year 2000, when the idea was for all articles to go through a review process; as a result essentially nothing was published). By allowing the database to grow also without citations, it can be used and developed much faster, and you can still extract those portions which have valid citations if you want authoritative data only. It's a work in progress, in every possible aspect. --SM5POR (talk) 15:51, 6 January 2023 (UTC)[reply]
@SM5POR: I think the requirement of 3 quotations should only be there for the cases that there is a reasonable doubt. I agree that indiscriminate requirement of attesting quotations is an overkill. For quotations, one can also use Wiktionary, especially with includable words such as solid-written German compounds. --Dan Polansky (talk) 15:55, 6 January 2023 (UTC)[reply]
Given that the word "Spracheinheit" isn't merely mentioned in passing, but is part of the very title of this book, in the area of linguistics, and published in 1995, I think we may conclude that the word is well-documented in this particular sense. --SM5POR (talk) 17:32, 6 January 2023 (UTC)[reply]
It may be a bit dated and the use in the meaning of "unity of language(s)" may be a bit more common (I'm not a linguist) but this use exists and I found three occurences:
Thank you; it is the right kind of supporting evidence. Someone may want to create wiktionary:en:Spracheinheit entry, with properly formatted attesting quotations; or I may eventually do it myself, being a seasoned Wiktionary editor. --Dan Polansky (talk) 07:34, 9 January 2023 (UTC)[reply]

Misuse of model item (P5869) in linguistic unit[edit]

@Dan Polansky, SM5POR:model item (P5869) is about a given item being modeled well within Wikidata, so that users can look at the model item when they want to understand which properties to use when they edit other items. It's makes no sense to reference sources outside of Wikidata for this. Please remove the mistaken uses of model item (P5869). ChristianKl10:23, 2 January 2023 (UTC)[reply]

I definitely agree, and I should clarify my involvement: I'm using the opportunity to try to improve the editor feedback mechanisms available in the form of reason for deprecated rank (P2241) and Wikimedia community discussion URL (P7930); see Talk:Q110646418 for an example of my approach. While I have edited a few of the mistaken statements (mostly to deprecate them), I have left the references in place in case Dan wants to reuse them elsewhere, when he has found some more appropriate properties. While honestly mistaken edits should normally be left with deprecated rank and a reason for deprecation to dissuade future editors from repeating the mistake, 26 deprecated statements for the same property would definitely be overkill, and I would suggest to Dan to retain at most one with the reason I have given. --SM5POR (talk) 10:48, 2 January 2023 (UTC)[reply]
Using model item is a workaround until property superclass of is approved, which has not yet happened. We should productively discuss what the best intermediate solution is, to serve project needs best. The information that I have entered into linguistic unit via the workaround is vital for establishing the breadth of the concept of linguistic unit. Even now, some questions remain: is a chapter of a book a linguistic unit? Is a whole book a linguistic unit? What, approximately, is the maximum linguistic unit in terms of some form of size?
I am eager to replace this workaround with a better workaround, or even with a proper solution. --Dan Polansky (talk) 13:51, 2 January 2023 (UTC)[reply]
Note to other potential readers: In a separate discussion, I have suggested using the Sandbox-Item (P369) property as a workaround for experiments pending the creation of a proposed new property. --SM5POR (talk) 08:03, 3 January 2023 (UTC)[reply]
I removed all the incorrectly used model item claims. Misusing a property to mean something leads to bad data. ChristianKl22:15, 28 March 2024 (UTC)[reply]

Link rot[edit]

An URL to Google Books was marked as "link rot". I unmarked it since it makes no sense: what is the chance that a Google Books link is going to rot in a decade? Not too big. Or is there a guideline to clarify the matter? --Dan Polansky (talk) 14:30, 4 January 2023 (UTC)[reply]

Sorry for that, I intended to fix the constraint violations, including setting the language of work or name (P407) "qualifier", but when I accessed the URL I ended up with a message (from Google) telling me the indicated page didn't exist, had expired or something (in contrast, now it seems just fine). I didn't understand why and went looking for an explanatory note item, but found none closer than "link rot" and used it for lack of a more obvious action. This was when I still doubted you knew exactly what you were doing and I thought I was actually doing something constructive. I later realized my involvement was of little use and abandoned my mistaken approach. --SM5POR (talk) 10:10, 5 January 2023 (UTC)[reply]