Wikidata:Requests for permissions/Bot/Bean49Bot
The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Approved--Ymblanter (talk) 07:19, 6 February 2023 (UTC)[reply]
Bean49Bot[edit]
Bean49Bot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Test edits: contribs
Operator: Bean49 (talk • contribs • logs)
Task/s: Update population of Hungarian municipalities.
Function details: Updating data published by Hungarian Central Statistical Office (Q1125966). --Bean49 (talk) 19:13, 17 December 2022 (UTC)[reply]
Data is published on pages like https://www.ksh.hu/apps/hntr.telepules?p_lang=EN&p_id=22327
You can see example on Badacsonytomaj (Q788322). --Bean49 (talk) 12:02, 19 December 2022 (UTC)[reply]
- Question Where is the source code of the bot? You wrote on its user page that it’s released under a free software license, but I see no link to the source code, neither here nor there. (I’d like to increase the bus factor (Q1812883).) —Tacsipacsi (talk) 13:50, 22 December 2022 (UTC)[reply]
- GitHub --Bean49 (talk) 14:55, 22 December 2022 (UTC)[reply]
- This is not your bot, only the framework you use. Where is the source code of the actual bot? —Tacsipacsi (talk) 21:39, 22 December 2022 (UTC)[reply]
- It is not published. --Bean49 (talk) 22:03, 22 December 2022 (UTC)[reply]
- Then please publish it. I suppose the code will work for the 2024, 2027 or 2030 data as well (with at most minor modifications, unless the website changes substantially). By that time, you may not be around, so if you don’t publish the source code now, someone will have to redo the work you’ve already done. Also, even though you’ve done quite some test edits by now, it’s always good to “use the source look” so that others can check if it works in edge cases before the bot hits those edge cases. —Tacsipacsi (talk) 00:17, 23 December 2022 (UTC)[reply]
- I understand your concerns, but I have nothing to publish. I would like to update the population figures as I showcased in the test runs. --Bean49 (talk) 02:31, 23 December 2022 (UTC)[reply]
- Of course there’s some code that can be published. You need to get/have got the numbers from ksh.hu, trasform/have transformed them, and publish them to Wikidata. This all needs code. Is it really that big of a request that I’d like to see a bus factor larger than one? —Tacsipacsi (talk) 02:28, 24 December 2022 (UTC)[reply]
- If we could get past this, I would like to update the population data. --Bean49 (talk) 11:12, 25 December 2022 (UTC)[reply]
- We could, you just need to publish the source code – or convince other users to support your request even without publishing the source code, making my opinion a minority. (I’d be disappointed, but if there’s majority for not requiring open source, I’ll accept the community decision.) If your only blocker is that you don’t want to create an account for a source code repository, you can also email me the code, and I’ll publish it for you on https://gitlab.wikimedia.org/. —Tacsipacsi (talk) 14:59, 25 December 2022 (UTC)[reply]
- If we could get past this, I would like to update the population data. --Bean49 (talk) 11:12, 25 December 2022 (UTC)[reply]
- Of course there’s some code that can be published. You need to get/have got the numbers from ksh.hu, trasform/have transformed them, and publish them to Wikidata. This all needs code. Is it really that big of a request that I’d like to see a bus factor larger than one? —Tacsipacsi (talk) 02:28, 24 December 2022 (UTC)[reply]
- I understand your concerns, but I have nothing to publish. I would like to update the population figures as I showcased in the test runs. --Bean49 (talk) 02:31, 23 December 2022 (UTC)[reply]
- Then please publish it. I suppose the code will work for the 2024, 2027 or 2030 data as well (with at most minor modifications, unless the website changes substantially). By that time, you may not be around, so if you don’t publish the source code now, someone will have to redo the work you’ve already done. Also, even though you’ve done quite some test edits by now, it’s always good to “use the source look” so that others can check if it works in edge cases before the bot hits those edge cases. —Tacsipacsi (talk) 00:17, 23 December 2022 (UTC)[reply]
- It is not published. --Bean49 (talk) 22:03, 22 December 2022 (UTC)[reply]
- This is not your bot, only the framework you use. Where is the source code of the actual bot? —Tacsipacsi (talk) 21:39, 22 December 2022 (UTC)[reply]
- GitHub --Bean49 (talk) 14:55, 22 December 2022 (UTC)[reply]
I don't intend to publish anything. This is not the subject of this request. Please consider assigning the bot flag to update the population data. --Bean49 (talk) 15:31, 25 December 2022 (UTC)[reply]
- The subject of the request is your bot. Its source code belongs to the bot. And I won’t consider assigning the bot flag, for two reasons:
- I consider closed-source bots a very bad practice. I think I have explained sufficiently why.
- I’m not a bureaucrat anyway, so I’m technically unable to assign the bot flag.
- —Tacsipacsi (talk) 01:56, 26 December 2022 (UTC)[reply]
Please don't be disturbed by this divergence. I restate my request, because it went in the wrong direction.
I would like to update the population of Hungarian municipalities published by the Hungarian Central Statistical Office (Q1125966). The references for the data will be pages like https://www.ksh.hu/apps/hntr.telepules?p_lang=EN&p_id=22327 You can see example on Badacsonytomaj (Q788322). The last data on items is from 2015. Data is published yearly. I would like to update them with the 2021 data, and later with the 2022 census, when it will be available.
Thank you, if you consider that it will be in the benefit of Wikidata. --Bean49 (talk) 12:08, 26 December 2022 (UTC)[reply]
- Okay, then I’ll keep it simple. I Oppose granting the request without publishing the source code or demonstrating why it’s impossible or unreasonble. —Tacsipacsi (talk) 13:06, 26 December 2022 (UTC)[reply]
- @Bean49: are you intending to write code to do this job? or is it a manual process? if you are writing code can you explain why you can't provide the code to us? many people will not approve bots without source code just because we cannot audit the logic. BrokenSegue (talk) 06:18, 10 January 2023 (UTC)[reply]
- The code is ready but it's a whole Java system not in a state to be published. I can detail the logic here if it's necessary. It's not much of a logic. I am updating the population figures with 2021. The last one is from 2015 from a yearly published data. Everyone can check the test runs. Please let me know, if more details are needed. Thank you. --Bean49 (talk) 10:14, 10 January 2023 (UTC)[reply]
- @Bean49: are you intending to write code to do this job? or is it a manual process? if you are writing code can you explain why you can't provide the code to us? many people will not approve bots without source code just because we cannot audit the logic. BrokenSegue (talk) 06:18, 10 January 2023 (UTC)[reply]
Hungary is a small country in Central Europe, member of the European Union, with only 3178 municipalities, but every country counts, and for citizens of Hungary is even more important to have updated population figures. --Bean49 (talk) 11:06, 19 January 2023 (UTC)[reply]
- the problem isn't that Hungary is a small country. The problem is that you aren't offering the code into the open source. Personally I'm indifferent to that requirement as long as the logic is sound but I am confused why you won't publish it. BrokenSegue (talk) 17:15, 19 January 2023 (UTC)[reply]
- What I don't understand is what are we waiting for? Why I am not allowed to update the population? Sorry for my impatience. --Bean49 (talk) 20:17, 19 January 2023 (UTC)[reply]
- you need approval to run a bot on wikidata. you are waiting for there to be consensus that your bot should be approved. there currently isn't consensus. BrokenSegue (talk) 17:11, 20 January 2023 (UTC)[reply]
- That was my question too. What kind of consensus do we need to update population data? Why do we need consensus for that? So far not a single objection came to that in a month. The way of the bot operation was demonstrated with more than 50 test edits. What else do we waiting for? Thank you. --Bean49 (talk) 17:37, 20 January 2023 (UTC)[reply]
- There are objections. You could try to solicit more feedback by asking for a review on the project chat page or elsewhere. We are waiting for most people here to be ok with the bot operation. BrokenSegue (talk) 17:49, 20 January 2023 (UTC)[reply]
- That was my question too. What kind of consensus do we need to update population data? Why do we need consensus for that? So far not a single objection came to that in a month. The way of the bot operation was demonstrated with more than 50 test edits. What else do we waiting for? Thank you. --Bean49 (talk) 17:37, 20 January 2023 (UTC)[reply]
- you need approval to run a bot on wikidata. you are waiting for there to be consensus that your bot should be approved. there currently isn't consensus. BrokenSegue (talk) 17:11, 20 January 2023 (UTC)[reply]
- What I don't understand is what are we waiting for? Why I am not allowed to update the population? Sorry for my impatience. --Bean49 (talk) 20:17, 19 January 2023 (UTC)[reply]
- Support I've reviewed the test edits against the source and all seems good. There's no requirement for bot users to publish source code - see Wikidata:Bots#Approval process. The user here has done exactly what is asked of them by the policy, and should not be given the runaround. The guy above waving an oppose is just making stuff up. The bot operator should do a test run of between 50 and 250 edits, so that the community can observe that the bot is working correctly. That's it. Those are the grounds on which approval should (or should not) be granted. Anything else is out of scope. --Tagishsimon (talk) 17:29, 21 January 2023 (UTC)[reply]
- And to be absolutely pedantically clear, it's fine for a user not to be able to support a bot request because they want to see code which is not forthcoming. Knock yourself out. It is absolutely not fine to oppose on that grounds (i.e. oppose on grounds which you just made up and which do not feature in the policy). Whoever closes these sorts of discussions needs to evaluate the reasons for support and oppose !votes, and discard any of either which fall outside the policy scope. --Tagishsimon (talk) 17:48, 21 January 2023 (UTC)[reply]
Could someone approve it please, before the data deprecate and become historical? Thank you. --Bean49 (talk) 11:32, 3 February 2023 (UTC)[reply]
- For the time being, I do not see consensus. Ymblanter (talk) 11:48, 4 February 2023 (UTC)[reply]
- Please excuse me but I don't understand. Consensus for what? Please read comments of Tagishsimon. I perfectly agree with him. Thank you. Bean49 (talk) 12:03, 4 February 2023 (UTC)[reply]
OpposeSorry but I do not believe that Wikidata is suitable for fast-changing data requiring an unlimited possible number of statements. Please see Q83889294#P1120 for an example of what can happen. If the data could be stored as mw:Help:Tabular Data with possibly the most recent value imported to Wikidata, then that might be the best solution. — Martin (MSGJ · talk) 11:44, 5 February 2023 (UTC)[reply]- Admittedly, the COVID-19 situation was a pathological case of dramatic population changes, but I can't see a rationale for updating population numbers more often than once a year. How is that different from, say, @BorkedBot updating "Social media followers" at the drop of a hat (which task will soon be discontinued anyway.) Elizium23 (talk) 11:59, 5 February 2023 (UTC)[reply]
- I don't want to update more often than once a year. I would like to update from 2015 to 2021. @MSGJ: could you please reconsider? Bean49 (talk) 13:53, 5 February 2023 (UTC)[reply]
- In 20 years the page will become long and unwieldy. Let's plan for the future and do this properly! — Martin (MSGJ · talk) 14:13, 5 February 2023 (UTC)[reply]
- @MSGJ FYI, I proposed a significant improvement to Tabular data at meta:Community Wishlist Survey 2023/Larger suggestions/Allow querying the Commons tabular data with the Wikidata Query Service to better support large numerical datasets. Unfortunately, it was excluded from the main list because it's too difficult to be considered for the wishlist. Vojtěch Dostál (talk) 14:18, 5 February 2023 (UTC)[reply]
- Seems like they are looking for quick fixes and easy wins :) — Martin (MSGJ · talk) 14:41, 5 February 2023 (UTC)[reply]
- What would be the proper way in your opinion? I don't insist to add a new statement, I just want Wikidata to have an updated figure. Many Wikipedias use this value, not counting others. Bean49 (talk) 14:19, 5 February 2023 (UTC)[reply]
- It's difficult, isn't it? On the one hand I do absolutely want the most up-to-date and reliable figure for population of a place on Wikidata. I also do not want to us to remove statements just because they are not contemporary, unless we can first export them to somewhere suitable. It's not your fault that we don't yet have the means to do this properly. If you are not updating more than once per year then I will withdraw my oppose but just wanted to note that this is not a long-term solution. — Martin (MSGJ · talk) 14:29, 5 February 2023 (UTC)[reply]
- Thank you, I would appreciate it. I don't intend to update more often than once a year, we don't even have more data at source. Thanks, Bean49 (talk) 14:33, 5 February 2023 (UTC)[reply]
- @MSGJ: Would you be so kind to remove your oppose as you offered, to be able to update the population? Thanks, Bean49 (talk) 17:32, 5 February 2023 (UTC)[reply]
- It's difficult, isn't it? On the one hand I do absolutely want the most up-to-date and reliable figure for population of a place on Wikidata. I also do not want to us to remove statements just because they are not contemporary, unless we can first export them to somewhere suitable. It's not your fault that we don't yet have the means to do this properly. If you are not updating more than once per year then I will withdraw my oppose but just wanted to note that this is not a long-term solution. — Martin (MSGJ · talk) 14:29, 5 February 2023 (UTC)[reply]
- Does "long and unwieldy" really apply to Wikidata items which are not intended to be read by humans like a Wikipedia article? Wikidata is full of machine-readable stuff. "Unwieldy" means a different thing to me as a human than it does to the machines which process and interpret Wikidata. Elizium23 (talk) 14:57, 5 February 2023 (UTC)[reply]
- Did you look at the example I provided? Wikidata items like this are unwieldy to computers as well! We have had frequent issues with Wikipedia articles with very large corresponding Wikidata items because the lua processing time exceeds the maximum limit. The larger the item, the longer it takes to process it. — Martin (MSGJ · talk) 15:38, 5 February 2023 (UTC)[reply]
- @MSGJ FYI, I proposed a significant improvement to Tabular data at meta:Community Wishlist Survey 2023/Larger suggestions/Allow querying the Commons tabular data with the Wikidata Query Service to better support large numerical datasets. Unfortunately, it was excluded from the main list because it's too difficult to be considered for the wishlist. Vojtěch Dostál (talk) 14:18, 5 February 2023 (UTC)[reply]
- In 20 years the page will become long and unwieldy. Let's plan for the future and do this properly! — Martin (MSGJ · talk) 14:13, 5 February 2023 (UTC)[reply]
- I don't want to update more often than once a year. I would like to update from 2015 to 2021. @MSGJ: could you please reconsider? Bean49 (talk) 13:53, 5 February 2023 (UTC)[reply]
- Admittedly, the COVID-19 situation was a pathological case of dramatic population changes, but I can't see a rationale for updating population numbers more often than once a year. How is that different from, say, @BorkedBot updating "Social media followers" at the drop of a hat (which task will soon be discontinued anyway.) Elizium23 (talk) 11:59, 5 February 2023 (UTC)[reply]
- Support This seems like a very simple import job which actually does not require any coding at all, I'm surprised that Bean49 is not using a tool such as OpenRefine or Wikibase-CLI. To insist on publishing a code in this case would be pedantic. Anyway, all the filtering of 'items to be updated' can be done on the level of a Wikidata Query, so I doubt that the scripts would be of much use to the community. I agree with @MSGJ: that Wikidata is not suitable for tabular data and there may come a time when we as a community decide to discontinue annual updates of population data, but the time is not here yet (although I'd welcome a RfC on the topic). Vojtěch Dostál (talk) 14:10, 5 February 2023 (UTC)[reply]
- Support I don't think requiring open code for bots is a good step. It only drives people to not ask for bot permission and instead try to solve the problem via quick statements. Population data seems important enough for me to have yearly values. ChristianKl ❪✉❫ 16:57, 5 February 2023 (UTC)[reply]
- Support while i'd prefer to see source code I guess not having it is acceptable BrokenSegue (talk) 19:02, 5 February 2023 (UTC)[reply]