Wikidata:Requests for permissions/Bot/AmmarBot 6
The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Approved--Ymblanter (talk) 18:44, 27 September 2021 (UTC)[reply]
AmmarBot 6[edit]
AmmarBot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: Ammarpad (talk • contribs • logs)
Task/s: Import maximum capacity (P1083) for articles about sports venues and arenas
Code: theatre-venue-data.py
Function details: Check through pages about theatres and sports venues on Wikipedia and extract data about capacity of the venue if it exists. It will then be added on Wikidata as maximum capacity (P1083) statement. This script is written as part of Outreachy program work and my mentor is Mike Peel. --Ammarpad (talk) 09:56, 10 August 2021 (UTC)[reply]
- test edits? there looks to be a bug in the summary text in the python code. If a page listed the capacity as "5 thousand" would your code import 5? BrokenSegue (talk) 21:16, 10 August 2021 (UTC)[reply]
- Thanks BrokenSegue. I fixed the edit summary. For the second issue, I contemplated on that but I believe it's not a common situation. This value is typically (and should be) given in digits — 5000 instead of 5 thousand — (the digits may contains separators, but that's not a problem to me). I reviewed many pages manually with the script in dry run and confirmed this. However, I found a related issue, some pages may contain multiple values and/or additional string context which is hard to make sense of programmatically. For instance, if you look at the value in w:Alexandra Palace it is listed as "
800 (Panorama Room)<br />1,750 (East Hall/Ice Rink)<br />2,000 (Palm Court)<br />2,500 (West Hall)<br />10,250 (Great Hall)<br />900 (seated)/1300 (seated/standing) (Theatre)
" with multiple values. Since it would be hard to correctly infer what these are for, I have restricted the script to work with only pages with clear numerical values and leave the rest for human review. I will add the link for test edits here once the run completes Ammarpad (talk) 06:45, 12 August 2021 (UTC)[reply]- ok seems fine. i would suggest adding a check like "capacity must be more than 20 and less than 100k" but i Support BrokenSegue (talk) 16:57, 12 August 2021 (UTC)[reply]
- Done. Ammarpad (talk) 04:32, 17 August 2021 (UTC)[reply]
- ok seems fine. i would suggest adding a check like "capacity must be more than 20 and less than 100k" but i Support BrokenSegue (talk) 16:57, 12 August 2021 (UTC)[reply]
- Thanks BrokenSegue. I fixed the edit summary. For the second issue, I contemplated on that but I believe it's not a common situation. This value is typically (and should be) given in digits — 5000 instead of 5 thousand — (the digits may contains separators, but that's not a problem to me). I reviewed many pages manually with the script in dry run and confirmed this. However, I found a related issue, some pages may contain multiple values and/or additional string context which is hard to make sense of programmatically. For instance, if you look at the value in w:Alexandra Palace it is listed as "