Wikidata:Lexicographical data/Focus languages/Form/Russian

From Wikidata
Jump to navigation Jump to search

Language: Russian[edit]

Language details[edit]

Russian (Q7737) is an Eastern Slavic language, from Indoeuropean family. It uses Cyrillic alphabet with 33 letters. Russian is the state language of Russia and Belarus, the official language of Kazakhstan, Kyrgyzstan and Abkhazia and has some status in several other countries (see full list). It is a native language of 153.7 million and second language of another 104.3 million totaling to 8th place in the list. You can meet Russian speaker almost in any place of the Earth (and of course, outside it).

Russian had a status of 2nd language in the Internet[1].

Russian is an analytical language with strict single orthographical rules which helps mutual understanding all over the world. The last big reform of spelling was at 1918, removing some obsolete letters and rules.

Old Russian is much older than that spelling reform, for Russian between 1708 and 1917 the IETF language tag would be ru-petr1708.

Current representation of this language in Wikimedia projects[edit]

Wikipedia has 1.7+ mln articles (10th place).

Wiktionary has 1.1+ mln articles (4th place in total pages, ~2nd place in number of lemmas[2]).

Wikisource has 0.5+ mln pages (10th place).

There are also Wikinews, Wikiquote, Wikivoyage, Wikibooks, Wikiversity.

In Lexeme namespace Wikidata Russian is the most represented language (1st in total number of Lexemes, 2nd place for Forms, 9th place for Senses), which is the result of very elaborated import of free data from the (Russian) Wiktionary. It covers >50% of text tokens which allows to roughly represent any text.

Current representation of this language in other sources[edit]

There are several curated corpora, of which 2 worth special mentioning.

National Corpus of Russian is a large curated, tagged, multi-genre corpus of modern and old Russian. Totally it contains 600 mln words.

OpenCorpora is open-source curated corpus of texts under CC-BY-SA license. It contain ~2 mln words.

List of other corpora: https://ruscorpora.ru/new/corpora-other.html.

Virtual Language Observatory: https://vlo.clarin.eu/?1&fq=languageCode:code:rus&fqType=languageCode:or

Big Russian Wikisource can also serve as a free corpus.

Russian is very well studied and described. Russian is taught in all schools of Russia, there are many official school and university books.

Seed group of participants[edit]

Possible list of participants is:

All have much experience in Wikimedia-verse, long-term activity and passion to develop new and adapt traditional tools for describing and using language. All are native speakers and live among native speakers. All can communicate in English too. All have some experience in coding.

We have several online communities (Discord, Telegram channels/chats, lingvo forums and so on) to consult and collaborate about Russian linguistics and Wiki coding of different type. So any technical issue has good chance of resolving.

Potential for community growth[edit]

Internet access is widely available (~100%), both stationary and mobile (the latter is constantly growing). Literacy rate is ~100%. There are many universities, ~50% of population has higher education.

Openness of the existing community to innovation[edit]

Russian Wikipedia is one of the main collaborator with Wikidata, providing new tools (see WEF, Infobox editing) for it and using data from it. It is quite open to experiments but Article Placeholder was not greeted. Bot editing is allowed in a controlled way, not allowing overwhelming percentage. Modules are actively being created and used.

  1. http://w3techs.com/blog/entry/russian_is_now_the_second_most_used_language_on_the_web
  2. Triage to estimate a number of lemmas across language versions