Wikidata:Lexicographical data/Focus languages/Form/Malayalam

From Wikidata
Jump to navigation Jump to search

Language: Malayalam[edit]

Language details[edit]

What is the language, language family, usual scripts, where is it spoken, by how many people, and what other languages do speakers (%) of this language usually speak? (Some of this information can be found in the article list of languages by total number of speakers)

Malayalam is spoken in the Indian state of Kerala and the union territories of Lakshadweep and Puducherry (Mahé district) by the Malayali people. It is one of 22 scheduled languages of India spoken by 2.88% of Indians (i.e., around 34 million speakers). It uses a non-Latin Malayalam script. All the letters of Malayalam script are also documented on Wikidata.

Current representation of this language in Wikimedia projects[edit]

Is there a Wikipedia or a Wiktionary? Is it a language in Wikidata? If yes, what are the statistics for pages in Wikipedia or Wiktionary, or for Lexemes in Wikidata? (Details are in m:Complete list of Wikimedia projects, and in the local Special:Statistics pages, and in Ordia for Lexemes.)

Malayalam (Q36236) is documented on Wikidata (with 25 external identifiers) and 103 Wikipedias. As of 4 March 2021, there are

Current representation of this language in other sources[edit]

Is there an open corpus of text for this language? How many books are published in this language? Is this language taught in schools? Is it an official language of a country or region? (Please link to details)

Malayalam has official language status in the state of Kerala and the union territories of Lakshadweep and Puducherry. It is also one of 22 scheduled languages of India. The language is taught in all the schools in Kerala and is widely spoken in various parts of Tamil Nadu, Karnataka, and Gulf countries. Educational institutions offer both Malayalam and English as medium of instruction, especially in public educational institutions. Malayalam language newspapers are available both in paper format as well as on the internet. Several books in Malayalam have been published, including poems, novels, etc. Malayalam corpus has been integrated into recent language models like BERT[1] (also called mBERT). There is also a Malayalam language version of Aspell[2] and Wordnet. Further details about the language can be seen on Malayalam (English) page as well as on മലയാളം (Malayalam) page.

Seed group of participants[edit]

Describe a bit about the seed group that wants to coordinate and actively participate. Describe its size, its current activity, why this group will likely still exist in three years time. Does anyone in the group know how to code? How many in the group know English? How many in the group are not living where the language is spoken, or are not native speakers?

A dedicated project called Wikidata:WikiProject Kerala has been working on documenting roads, railway stations, schools, administrative divisions, local bodies (particularly focusing on Kerala) on Wikidata. The work done so far demonstrates how new Wikidata items related to the above were created, ensuring labels in both English and Malayalam (please check the current progress here). The complete list of progress track pages can be found here. Contributors of WikiProject Kerala include both native speakers and those in the diaspora (especially in places, where Malayalam is not the official language).

Potential for community growth[edit]

Describe the potential for the language community to grow. Is Internet access widely available? Through which kind of devices usually? What is the literacy rate in the language community? Are there universities, vocational schools, or similar institutions, and how large are the student populations?

Kerala is the first state in India that declared access to the internet as a basic right[3]. Internet penetration is 56% as reported in May 2020[4]. Malayalam speakers use both desktop and mobile devices to connect to the internet. There are multiple ways speakers communicate in Malayalam: using transliteration of Malayalam words in English, using keyboards with the Malayalam alphabet, and using translation services. Some mobile applications with all these features include Android Applications like Gboard[5]. The literacy rate of Kerala is the highest with 96.2% as reported in September 2020[6]. A detailed list of universities and educational institutions[7] is documented here.

Openness of the existing community to innovation[edit]

If there is a Wikipedia in that language, how open has it been to Wikidata? To Article Placeholder? To bot editing and usage of modules?

In addition to documenting educational and cultural institutions on Malayalam Wikipedia, Wikidata:WikiProject Kerala contributors have also created many external identifiers specific to Kerala. Take, for example, external identifiers related to school like Kerala state school code (P7065), Sarvavijnanakosam ID (P7821) (Malayalam language encyclopedia), LSG localbody code (P8573) etc. Some contributors also use automated tools including QuickStatements, Lexeme-Forms, Petscan, HarvestTemplates for adding or importing content to Wikidata.

More info[edit]

References[edit]

  1. https://github.com/google-research/bert/blob/master/multilingual.md
  2. https://launchpad.net/ubuntu/+source/aspell-ml
  3. https://www.thehindu.com/sci-tech/technology/internet/access-to-internet-is-a-basic-right-says-kerala-high-court/article29462339.ece
  4. https://www.onmanorama.com/news/business/2020/05/14/intenet-penetration-in-india-iamai-report.html
  5. https://play.google.com/store/apps/details?id=com.google.android.inputmethod.latin
  6. https://economictimes.indiatimes.com/news/politics-and-nation/at-96-2-kerala-tops-literacy-rate-chart-andhra-pradesh-worst-performer-at-66-4/articleshow/77978682.cms
  7. http://www.niyamasabha.org/codes/ginfo_1.htm