User:Mateusz Konieczny

From Wikidata
Jump to navigation Jump to search

po 15: https://www.wikidata.org/w/index.php?title=User:Mateusz_Konieczny/failing_testcases&action=history

Ontology on Wikidata is systematically broken[edit]

In theory wikidata has a structured ontology so you can check what given item represents - is it a specific tree? Group of trees? Taxon? Event? Human? Mathematical term?

Sadly, Wikidata is currently quite unreliable for that purpose. For example, many things that are not events at all - are anyway indirectly classified as events. I was unaware that subclass tree is so unreliable when I tried using it. In effect I accidentally created detector of invalid subclass tress on Wikidata (while trying to create something else).

For example Baltic Cable (Q314180) is an event, according to Wikidata ontology. See how Wikidata classifies it at some point:

Baltic Cable (Q314180)

en: electrical interconnector (power connection between two electricity grids) [1]
en: interconnector (structure enabling energy to flow between networks) [2]
en: assembly (object consisting of multiple parts) [3]
en: merge (entity resulting from the act of combining several entities to form one) [4]
en: occurrence (occurrence of a fact or object in space-time; instantiation of a property in an object) [5] this was unexpected here as it indicates an event !!!!!!

User:Mateusz Konieczny/failing testcases has many such cases, more than I can process. If anyone is interested in fixing such broken classifications then help is welcome. And maybe if that tool exists anyway then maybe such report may be useful to someone?

(I post such cases to this page, I post limited number of cases to Wikidata talk:WikiProject Ontology and no more than once a year I post about this list to Wikidata:Project chat - let me know if that is too much and I should not post about it)

note to self how to generate more reports[edit]

  • run reinstall.sh in /home/mateusz/Documents/install_moje/OSM_software/wikibrain_py_package_published
    • to get more reports try following
  • /media/mateusz/OSM_cache and rename wikimedia-connection-cache so I am not using cache
    • this will reveal mistakes reintroduced by editing on Wikidata - cached version is NOT reporting them
    • remember to restore oldcache after making reports: I have no reason to suffer invalid versions! Old versions with fixes are superior to new breakage
  • if that is not generating reports: find and reenable commented out
    return wikidata_bugs # count
    in /home/mateusz/Documents/install_moje/OSM_software/wikibrain_py_package_published
    • that removes various exceptions for broken Wikidata ontology that I listed locally
  • add stat line below
  • if that is not generating flood of reports, go to my validator of wikipedia/wikidata in OSM and enable that hidden event category in webpage generation
    • I disabled it due to Wikidata ontology being utterly broken for events
  • create some new tests in wikibrain package and get back to start

Statistics how Wikidata ontology is broken[edit]

  • 2022-11-27: 41 of 146 tests failed
  • 2022-12-01: 19 of 167 tests failed
  • 2023-01-10: 14 of 232 tests failed
  • 2023-06-22: 25 of 381 tests failed
  • 2023-06-27: 17 of 393 tests failed
  • 2023-06-27: 17 of 393 tests failed
  • 2023-07-19: 7 of 476 tests failed
  • 2023-12-16: 371 of 675 tests failed

Thanks to all people who fixed reported issues!

[edit]

As a programmer I am paid for some of projects. Some of them may be involving Wikidata use.

Specifically, asking about accessing Wikidata in help channel may be paid editing (as I may be involved in programming something that accesses wikidata), though so far is is a rare - in most cases when I asked it was related to something implemented as a hobby.

If I spot some blatant and small scale mistakes in data that comes from Wikidata during a paid work I may edit Wikidata rather than blacklisting items within program.

In such cases I may be editing Wikidata during a job, though I was never hired (and not planning to be hired in such role) to edit Wikidata.

For example https://www.wikidata.org/w/index.php?title=Wikidata_talk:Lexicographical_data&diff=prev&oldid=1312236041 was a paid edit, as I was looking for way to gather data for one of subcomponents of a system that I am paid to create. Note: I ended not using Wikidata in such case, as it turned out that other data sources were clearly superior.

Relevant employers

  • Uniwersytet Jagielloński (I evaluated Wikidata, it turned out to be clearly inferior to alternative data sources in the end and was not used)

Other[edit]

Wikidata:Tools/User_scripts#overpass

https://github.com/matkoniecz/OSM-wikipedia-tag-validator/blob/master/README.md

see links in hidden gems README (FIST for adding images)

https://www.wikidata.org/wiki/Property:P279#P279$451EFD1A-AFA8-4D8D-9785-B207C7126C09 - conflicts with property constraint, would nice to apply this directly to items...

Babel user information
en-2 This user has intermediate knowledge of English.
pl-N Polski jest językiem ojczystym tego użytkownika.
Users by language