Wikidata:Events/Data Modelling Days 2023/FeedbackALG

From Wikidata
Jump to navigation Jump to search

✨---------------✨---------------✨---------------✨---------------✨---------------

Feedback session - How to help enforce consistent modelling with automated list generation?
Facilitation: Lydia Pintscher (WMDE), Daria Ammalainen (WMDE), Ifeatu Nnaobi (WMDE), Danny Benjafield (WMDE)
Notes: Elisha Cohen (WMDE)


✨---------------✨---------------✨---------------✨---------------✨---------------

Previous session: https://etherpad.wikimedia.org/p/DMD2023-FeedbackBiggerPicture
Next session: https://etherpad.wikimedia.org/p/DMD2023-Clinic2
    
👥 Number of participants (including speakers): ~48

🖊️ Notes & links
From 1600 UTC
From https://www.wikidata.org/wiki/Wikidata:Events/Data_Modelling_Days_2023#Friday,_December_1st

Team's meta project page: https://meta.wikimedia.org/wiki/WD4WMP
Report problems you have experienced with Wikidata integration: https://meta.wikimedia.org/wiki/WD4WMP/AddIssue
Sign up for interviews: https://wikimedia.sslsurvey.de/Wikidata-for-Wikimedia-Interviews

Presentation:
Intro to the team: formed ~6 months ago, current team is made up of UX, ComCom, PM. Team is currently doing discovery research, working to have an understanding of the status quo and what the community wants/needs when it comes to Wikidata integration
Automated list generation and Listeria


Do you use tools other than Listeria to create lists?
Some only use Listeria. Some wonder if there is anything other than Listeria?
Lua code for list generation
"I've been meaning to write a bot to make lists because listeria can't do lexemes properly"
"There is also Magnus's new tool but I cannot get it to work"
Name of tool?
https://toolflow.toolforge.org/#/
Off wiki: Snowman to create lists on Govdirectory
Tabernacle is kind of a list tool
Scholia displays lists of co-authors; works by an author, etc.

What do you use Listeria for?
I have used Listeria extensively in Wiki Loves Living Heritage https://meta.wikimedia.org/wiki/Wiki_Loves_Living_Heritage/Elements. It makes visual pages with row template. I made a presentation of all the challenges at GLAM Wiki named "Stretching Meta" https://docs.google.com/presentation/d/1XjXOJYZ2xoE2nV36NJ4Knghu2b6n-wQbpN-WDd4e4uE/edit#slide=id.g296629f6cea_0_119
on Wikispecies: https://species.wikimedia.org/wiki/Special:Contributions/ListeriaBot
List of treated taxa (example: https://species.wikimedia.org/wiki/Template_talk:Clark,_1962a)
List of virus species - https://species.wikimedia.org/wiki/List_of_virus_species
Journals by ISSN: https://species.wikimedia.org/wiki/ISSN
On ESwiki, to migrate wikitext tables to Wikidata;


Issues with Listeria:
"Some Wikipedia tables need additional columns with data that is not suitable for storing in Wikidata"
Sometimes it's down
terrible date generation
"it's hard to customise the appearance, like I wanted to hide the bars at the top and bottom "
+1
"I know that Susanna did a wonderful job at display customization for Listeria (on Meta IIRC) "
"Sometimes it's slow for big or complex queries"
"sometimes an update is just resorting."
so always force a particular unique ordering ?
"I've broke pages with Listeria due to use of subtemplates and «a few» thousands of rows. "
random errors, hard to problem solve when it doesn't work
"it doesn't support lexemes or other entities properly, you have to hackily make a fake qid"
"+1000, we need Lexemes support!!"
Example shared of how to force it to use Lexemes: https://www.wikidata.org/wiki/User:Nikki/German/Issues
(ah, it seems for lexemes you can give it the lexeme as the "item", but for commons you have to change the M in the MID to a Q)
"One never knows exactly when a Listeria list is going to be updated "
" it also doesn't allow more than one row for an entity "
Possibility to sync Listeria outputs via 1) template 2) the dynamic list from toolforge?
Large result sets require pagination, which is impossible to keep updated
Listeria update switces translation off, and it requires a translation admin to restart. Anf if the project has dozens of pages...
When using row templates, the pages cannot be sorted
When using templates with dynamic data from Wikidata, the dynamic content cannot be searched
One other thing that thereis demand for is more mini-lists within infoboxes: a frequent blocker is inability to ask for values of inversed-properties in Lua
In conversation with Wikifunctions about this topic. Hopefully it an be addressed there.
"I'd urge caution using dynamic lists inside infoboxes. Lots of people use them expecting human verified content. Dynamic lists will not be as verifiable compared to static lists"

Listeria needs a pagination function
@Andy: Yes, but all the pages need to be successfully updated at the same tima as well



How do you validate the accuracy of your automatically generated lists?
By comapring results with WDQS
"e.g. you notice inaccuracies when some entries which surely should appear in the list are in fact absent "
"I don't validate and often it's fine but sometimes it's a bit frustrating (especially when Listeria remove or add lines without apparent reason) "
Combination of references and history on claims; why simple list generation doesn't work because we need to run more complex queries to evaluate fitness of the information for a purpose
 If you have domain knowledge, errors may jump out at you simply by looking at the list; or if/when you click through to a linked Wikipedia article
Also, often one of the key things the list gives is a way to see which items have more complete or less complete sets of statements, and which ones are missing key info
 I hope not to need to check the accuracy. When I'm sure about my sparql query I suppose it runs fine by Listeria.

What changes inside and outside the Wikimedia movement will influence the field of automated list generation in the next five years?
Scalability of Wikidata in general
Would love Listeria to work with SDC (structured data on commons)
List making on Commons with the ability to do SPARQL on Commons/Wikibase...
(Listeria already making lists on Commons based just on WDQS: https://commons.wikimedia.org/wiki/Special:Contributions/ListeriaBot )
for listing files on commons, https://eu.wikipedia.org/wiki/Atari:Hezkuntza/Ikusgela/Azpitituluak is the example I helped with recently (the discussion was in the commons telegram channel on the 14th)
Also: Surely Listeria can already run WDQS queries that include federated input from the Commons Query Service (which might even be the entire output of the query) ?
https://phabricator.wikimedia.org/T341405 : An "improved autofix" needed to replace non-standard statements in Wikidata, in order to keep the data model coherent
Wikifunctions
"Easily editing Wikidata from a list on Wikipedia would surely increase the assecptance of such lists on Swedish Wikipedia"
Having more "pretty" formatting options for Listeria or other tools would likely help adoption on Wikipedias
 check this one for prettines: https://sv.wikipedia.org/wiki/Lista_%C3%B6ver_ledam%C3%B6ter_av_Sveriges_riksdag_2022%E2%80%932026#Nuvarande_ledam%C3%B6ter
Not much of the formatting is done in the SPARQL, but in the row template
Nice! I too have hacked the SPARQL to do some really cool formatting, but it'd be nicer if I didn't have to wrestle with it so much and do a lot of trial and error 😃
I'd also like to know how much "load" having unfettered use of Listeria has on our systems. Is it something we need to be more selective about, or just keep making more maintenance pages with Listeria knowing 50% never get looked at?


Potential solutions for automated list generation
Option 1: Defining lists where they are used
Similar to how Listeria works now
SPARQL query is stored on-wiki/client-side
Query is executed automatically
Option 2: Defining lists centrally
SPARQL query is stored on Wikidata
Query is executed automatically on a regular frequency or on demand
Option 3: Wikifunctions
If we allow Wikifunctions to make queries to Wikidata, then we could write functions there that could output a list
Then an article could just call the Wikifunction function in an article

Pros/cons to each, any thoughts or preferences?
Centralization - is it needed?
Don't see the point of the centralisation, unless the same list is being reused across many wikipedias
@James - In general I agree, though centralization might help us learn/share better
there are lots of "list of x" pages, I think there are a lot of lists that could be shared
centrally would make it easier for smaller wikis
to elaborate, centrally would mean smaller wikis don't have to create and maintain the list-generating code themselves
How will the Wikifunctions address the need to perform updates/changes in bulk for WD items?
Performance has not come up so far as a big issue, but if you have tons of lists that have heavy queries that are no longer being used, then it's nice to remove them. In the big picture, this hasn't been a big issue yet. Load is not an urgent concern.
"you can also adjust the frequency to update the listeria list on a monthly basis"

How does the concept of result sets bigger than a wiki page fit in the picture?
This is one of the current issues with Listeria, which is that it can generate results that are too big for a wiki page to hold. Not sure yet how to resolve this problem.
Some kind of pagination functuion?
Pagination creates a set of pages that are interdependent, and need to be all updated at the same time. Paginated pages are also not sortable.
What is the max size of a wiki page?
@jan: I don't know it in bits and bytes. For different use cases I just test.
@Jan There is a limit of templates you can run in a mediawiki page, don't remember the number.
a list with 2500 rows makes wiki article hard to edit
@Jan: In all cases I just test. I don't know how much of the lag and timeouts are caused by (nested) templates, wikidata lookups (P, Q, label, data etc. templates), Listeria itself, page load etc.
OTOH, an article with 2500 should not be supposed to be edited manually, imo :-m
Does any solution let the wiki editors add an on-wiki column ("remarks") to the generated table?
Favorite example: list of star trek episodes on en wiki, which has episode descriptions.
Not sure how to resolve, but this issue is known and being considered
for comments, maybe a json subpage which has qids (or some other row-specific unique id) as keys and the extra fields as an object, plus a tool that lets you edit the comments for each row and saves it to the json page

The WikiFunctions approach would be most extensible. This is about more than creating simple lists. It's really about curated query and processing functionality to return a dataset for a broad number of uses. Sometimes we might just put a list somewhere like a wiki page, but other times we'll want to use more of the content.
might be useful for eg more mini-lists within infoboxes; often blocked at the moment because can't access the inverse values of properties from within Lua





❓ Questions and discussions
Question here
Answer here


🎯 Key takeaways and outcomes
...
...


☑️ Next steps
Request to link to Wikidata for Wikimedia Projects team from: https://www.wikidata.org/wiki/Wikidata:Wikidata_in_Wikimedia_projects
...