|Babel user information|
|Users by language|
I have done very little regarding Wikidata the last few weeks. I hope to soon resume a more normal level of activity. My first priorities will be:
- Update read_items.c so the new 2013-09-10 database dump can be read.
- Update the regular database reports from the database dump
- Update countrymerge.c, numbermerge.c, intervalmerge.c so the output is sorted, and thus make it easy to see which items on the lists are new in a page diff
- Update numbermerge.c so Roman numerals are recognized
- Do other requests for new lists
- Prepare moving database reports from userspace to the Wikidata namespace.
Byrial (talk) 06:24, 12 September 2013 (UTC)
My analyses of database dumps
Status: The latest complete database dump of Wikidata is from 2013-09-10. I need to do some program changes before reading it because of changes in the data format. I hope to do that soon.
- /Language statistics for items – Labels, descriptions, aliases and sitelinks pr. language
- /Property statistics – List of all properties with language and usage statistics
- /Statement statistics – Various statistics about statements
- /namespace statistics – Overview over to which namespaces of the sitelinks
- Taxon analyses:
- /Deleted item values – Deleted items which are used as value in claims
- Family relations and sex:
- /Conflicting or missing sex – Conflicting or missing sex of items used as value for a sex specific property
- /Uncles – Test if uncle/aunt statements can be derived from father/mother + brother/sister statements
- /Use of he and she in enwiki – Distribution of he/she en articles in English Wikipedia about persons of known sex
- /Selflinks – Items linking to themselves in claims, qualifiers or sources
- Items with only one language link
- /Class-type conflict – Items which are a subclass, but not of same type as the superclass
- /Used-disambiguations – Disambiguation items used in claims, qualifiers or sources
- /Pairs – Items which may be pairs of named persons
- /Bad time values – Statements with malformed timevalues
- /P107 – Statistics for property 107 (GND maintype)
- /Identical dates of birth and death – items with identical dates of birth and death
- /namespace list – Items with sitelinks to talk, user, file, MediaWiki and special pages.
- /Most-aliases/fi – Items with aliases in Finnish.
- /Globes – Globes used in values for coordinate properties.
- Items which maybe can be merged. Feel free to merge if appropriate, and to ask for more language combinations.
- /numbermerge – Lists of items found by comparing link titles with numbers in two specific Wikipedias.
- /countrymerge – Lists of items found by comparing link titles with country and continent names in two specific Wikipedias.
- /commonsmerge – Lists of items with same Commons category and no links to the same languages.
- /projectmerge – Different items with identical links to Wikipedia and Wikivoyage for same language.
- /Category+name merge – Lists of items with links to pages with the same place in category subtrees in two specific Wikipedias.
- /two substring merge – Lists of items with links with two common substrings in a pattern linked together by a model item.
- /Merge candidates – Various possibly duplicate items. Content may change. Mostly for own use and experiments.
- /Long texts – Items with long labels, descriptions or aliases.
- /Residence – Items which more than one claim with P:P551 (residence).
The statistics is made from database dumps. The pages-articles.xml file from the database dump which contains the page text in JSON format for entities is parsed by the program read_items which adds data to an MySQL database. Other programs are then used to extract the various statistics from the database.
The used programs are licensed with GNU General Public License, version 3 (or later) and are available for download from http://toolserver.org/~byrial/wikidata-programs/. (I may not always remember to place new or updated programs for download, feel free to contact me if something is missing.) They are written in the 1999 ISO standard of C (known as C99). Use option "-std=c99" or "-std=iso9899:1999" when compiling with GCC.
If you want to see other lists or statistics about items and properties, please ask at my talk page.
Lists of False positives
If you find a false positive on the merge lists add it to the list on the merge exclusion page and it will be excluded in future.
If you find an unmerge candidate on a list which is a false positive then add it to the unmerge exclusion page and it will be excluded in future.
Pages used by translation administrators
- Wikidata pages
- Wikidata:Translation administrators: about
- Wikidata talk:Translation administrators: talk
- Wikidata:Translators' noticeboard
- Special pages
- Special:PageTranslation: pages with translate tags
- Special:LanguageStats: translation statistics
- Special:AggregateGroups: aggregate pages for the statistics
- Special:Translations: all translations of a message
- Special:SearchTranslations: search translations
- Special:NotifyTranslators: notify translators
- Special:Userlist/translationadmin: list of translation admins
- Special:Log/notifytranslators: translation notification
- Special:Log/pagetranslation: page translation
- Special:Log/translationreview: translation review
- Recent translations