Topic on Wikidata talk:Wikidata Lexeme Forms

Jump to navigation Jump to search
عُثمان (talkcontribs)

@Lucas Werkmeister:: I have created noun templates for Hindustani (Hindi/Urdu) following a similar format to the Hindko ones at Wikidata:Wikidata Lexeme Forms/Hindustani. I have been rethinking about the pros and cons of having separate templates for different scripts and I think at this point the benefits of having separate templates outweigh the amount of effort to keep the representations in sync. The Urdu templates could be used to add forms to lexemes with just the Hindi representations and vice versa using edit mode, and at this point I have some tools to assist in automating the script conversions. If these work out, then I can add some more Hindi/Urdu and Punjabi templates which are set up similarly.

The gender features are marked as optional just to account for the fact that gender statements and features are missing on a lot of Hindustani lexemes, so Lexeme Forms could help fill those out.

عُثمان (talkcontribs)

@Lucas Werkmeister:: The templates for all major parts of speech in Hindustani in both scripts/registers are now ready and on the page above.

The Hindi translation of the Lexeme Forms messages was already completed by someone, and I can finish the transcription of those for Urdu in short order.

(Sorry for the extra ping, I am often unsure of what messages result in a notification on here. Whenever you have a chance to take a look.)

Lucas Werkmeister (talkcontribs)

I see, interesting approach… I think this is doable, but since the index page (toolforge:lexeme-forms) groups templates by language code, they’ll be separated there: one group for اردو (ur), and one group for हिन्दी (hi). (I’m not sure if the blocks would be adjacent or not… so far, the groups are sorted by language code, so hi and ur would be quite a bit apart – but I could also put them next to each other, I think.) Does that sound okay?

عُثمان (talkcontribs)

If they can be placed next to each other, that would be perfect - I think it would make sense to place them in the alphabetical position of "hi" since this is what the joined language name also starts with.

Lucas Werkmeister (talkcontribs)

Regarding the template identifiers, I think it would make more sense to put the language code at the end (e.g. hindustani-noun-masculine-ur) – I think that would match the “more general parts first” guideline that I generally follow for the identifiers, for example:

  • hindustani-noun-masculine-hi
  • hindustani-noun-masculine-ur – still a masculine noun in Hindustani, only the language code changed
  • hindustani-noun-feminine-hi – still a noun but now feminine
  • hindustani-adjective-red-hi – still Hindustani but no longer a noun

Though it’s not totally clear (hi/ur is a bit more “orthogonal” to the other “dimensions” than usual, perhaps). But on a more practical note, it might make editing the URL easier to toggle between the Hindi and Urdu templates ^^ [edit: What do you think?]

And a more straightforward thing: the last two forms (the optional ones) in the Hindi version of the feminine noun template have masculine (Q499327) instead of feminine (Q1775415) as the grammatical feature, is that intentional or a copy+paste mistake? (I suspect the latter, since the Urdu version has feminine (Q1775415) for all six forms.)

عُثمان (talkcontribs)

That makes sense regarding the language codes, I have changed the identifiers accordingly. And yes, that was a copy paste mistake on the Hindi feminine vocative forms, thanks for catching that

Lucas Werkmeister (talkcontribs)

Alright, thanks! Then I’ll start transcribing the templates (and hopefully not notice anything else amiss).

Lucas Werkmeister (talkcontribs)
Lucas Werkmeister (talkcontribs)

So far it seems to me like the identifier might as well use “declinable”/“indeclinable”/“comparable” – or are there going to be more adjective templates?

عُثمان (talkcontribs)

It's more common within in-language sources, but these categories are also brought up in Hindi: An Essential Grammar. The names are sort of idiomatic in that they use an adjective of the type they refer to - the common word for red "lal" is invariant, while the word for black "kala" changes for gender, number, and case. Grammars of the other northwestern Indic languages like Punjabi, Hindko, Saraiki, etc. have adopted the same pattern for adjective naming so it is also consistent with that (Saraiki for example has "unfast" adjectives which change for gender but not number, named for an adjective for unfast dyes which does this.)

The reason for not using a more generic label like declinable/indeclinable is there are other types of declinable adjectives which decline along a pattern other than that of "black" (changing just for case and nothing else, for example). There are too few of these in common use in Hindustani to warrant their own templates however.

عُثمان (talkcontribs)

I haven't created a lexeme for the Hindustani cognate yet, but this would be an example of what I mean by an adjective which declines along a different paradigm - ਬਾਕੀ/باقی (L1037483). ਹੱਬਾ/ہبّا (L985132) has the same number of forms but the endings are different. So I think these types of adjectives are too idiosyncratic to have their own templates.

Lucas Werkmeister (talkcontribs)

Google isn’t letting me read this book (“You have reached your viewing limit for this book”, even in a private window), so I’ll have to take your word for it :D thanks! I wasn’t too fazed by red/black (I think I’ve seen something similar in templates for another language, though I don’t remember which one), but handsomest felt a bit stranger ^^

عُثمان (talkcontribs)

That one can be changed to comparable if that sounds less silly, there isn't a competing comparable adjective paradigm so I only chose that one to match the scheme of red / black.

Lucas Werkmeister (talkcontribs)

It’s alright… but why “handsomest” and not just “handsome”?

عُثمان (talkcontribs)

Part of why the color names are used for the other adjectives is that it is apparent from the label what type it is - so for "black adjective" it becomes "kali sifat" if we use "sifat" a feminine word for adjective, or "kala gun" using a masculine word. "Handsomest" indicates the existence of a superlative form in the same way, and English has comparable adjectives but not gendered ones so it is possible to carry that idea over in the calque.

Lucas Werkmeister (talkcontribs)

Alright, that sounds sensible enough to me. Thanks!

Lucas Werkmeister (talkcontribs)

The templates for nouns, adjectives and adverbs should be deployed now – the verbs will need more time, I’m afraid (I’ll try to start them tomorrow but might need more than one evening to finish them).

عُثمان (talkcontribs)

Fantastic, thank you - it took me several evenings to assemble the verb templates, so that is understandable

عُثمان (talkcontribs)

I've made a couple adjustments to these templates - User:Mahir256 pointed out I had used two different spellings of "Hindustani" in Devanagari without realizing it, so I have updated them all to हिंदुस्तानी which seems to be the most common spelling in writing as opposed to हिंदुसतानी (the more common spelling apparently differs from the Urdu spelling and from the pronunciation, but that seems to be expected in this context). I have also updated the items for the transitivity values on the verbs as User:Nikki has started making an effort to shift all of these to new items describing transitivity as a property of verbs instead of the items for the types of verbs themselves.

Lucas Werkmeister (talkcontribs)

I think some of the verb template identifier components should perhaps be switched around too:

  • hindustani-verb-basic-transitive-urhindustani-verb-transitive-basic-ur
  • hindustani-verb-additive-transitive-urhindustani-verb-transitive-additive-ur
  • hindustani-verb-causative-ur is fine, but mentioning for context
  • hindustani-verb-double-causative-urhindustani-verb-causative-double-ur

This way, the first two templates still share the property of being transitive, and the last two also have the causative element in common. What do you think? (And the same would apply to the hi templates too, of course.)

عُثمان (talkcontribs)

So it's slightly counter-intuitive, but the "additive transitive" phase is actually a property of intransitive verbs - this "verb phase" model comes from John Beames's work A Comparative Grammar of the Modern Aryan Languages of India (Q113330708). There is a regular pattern in Hindustani (and Punjabi and the other northwestern Indic languages) where most intransitive verbs can be made to take additional objects. (The inverse used to be true in Hindustani too, but this feature has been dropped over time - the Punjabi templates I am working on also have the "subtractive" phases which allow intransitive forms of transitives, or even avalent forms of intransitives.) The "basic" phase describes the base form of the verb (intransitive or transitive). The "additive" phases then describe extensions of either type of base. If written out fully, the categories would be:

  • verb-basic-intransitive
  • verb-basic-transitive
  • verb-additive-transitive
  • verb-additive-causative
  • verb-additive-double-causative

However, since the intransitive phase can only be a "basic" form, there is no need to distinguish this, and likewise causative forms are only "additive" as there are no verbs with a causative base form. Only the transitive extension of intransitives needs to be distinguished from the "base" transitive template. The idea behind these templates is that it is a lot easier to enter the forms for these if broken up into smaller chunks rather than having one massive template with ~200+ fields, especially since not every verb goes up to double causative - the additive-transitive and causative / double-causative template would be used in edit mode to add these forms to existing lexemes. I think it would be fine to switch the order for double causative to causative double, but the others seem like they might cause additional confusion in this context.

If it would be helpful to look at an example of what I mean, गड़ना/گڑنا (L991835) is a fully modeled Hindustani intransitive verb, and on the talk page of that lexeme there is an explanation of how this verb phase information can be used in the context of larger verbal expressions.

Lucas Werkmeister (talkcontribs)

Okay, I see… tbh, I think I’d prefer to include the “redundant” parts in the identifier in that case? As in your fully written out version, except probably still with causative-double too. I feel like this makes the relationship between the templates a bit clearer. But I’m also okay with keeping the current identifiers if you prefer that.

عُثمان (talkcontribs)

OK, that's fine with me; I have updated the identifiers accordingly. Broadly speaking, the two types of template available can be grouped as the "basic" ones and the "additive" ones so I can see how that would be less confusing.

Lucas Werkmeister (talkcontribs)

Alright, thanks! The basic-intransitive templates are up now (hi, ur), the rest will follow.

Lucas Werkmeister (talkcontribs)

basic-transitive also up now (hi, ur).

Lucas Werkmeister (talkcontribs)

additive-transitive too (hi, ur). I’m starting to figure out how to efficiently transcribe them ^^

عُثمان (talkcontribs)

Nice, I appreciate your efforts on this since I know these are a lot. So far these have worked great

عُثمان (talkcontribs)

In trying these out some more, I noticed I propogated a slightly hairy mistake across Form 30 specifically in each of the Urdu verb templates. I've updated the template wikitext, but to highlight the changes, these are the corrected strings which should be in the brackets in the example sentences for Form 30 on each of them:

  • basic-intransitive:
    • پھَیلےگی
  • basic-transitive:
    • دھارےگی
  • additive-transitive:
    • پھَیلائےگی
  • additive-causative:
    • کھِلائےگی
  • additive-causative-double:
    • کھِلائےگی

I don't remember what I did that might have resulted in this, but thankfully it looks like there are no issues with any of the other forms or in the Hindi versions of this form upon second review.

Lucas Werkmeister (talkcontribs)

Alright, that fix should be deployed now. Thanks!

Lucas Werkmeister (talkcontribs)

additive-causative also done (hi, ur).

Lucas Werkmeister (talkcontribs)

additive-causative-double also done (hi, ur) – that should be everything now \o/

عُثمان (talkcontribs)

Amazing, thank you! This opens up a lot of possibilities.

عُثمان (talkcontribs)

I spotted another typo I made - on `hindustani-verb-basic-intransitive-ur` Form 1, the bracketed example should be پھَیلنا (I just fixed this on the wiki page).

Lucas Werkmeister (talkcontribs)

Alright, the fix should be deployed now.

Reply to "Hindustani templates ready"