Wikidata:Requests for permissions/Bot/THEbotIT 1
The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Approved --Lymantria (talk) 20:06, 15 May 2020 (UTC)[reply]
THEbotIT 1[edit]
THEbotIT (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: THE IT (talk • contribs • logs)
Task/s: Scans all encyclopedic articles of s:de:Paulys Realencyclopädie der classischen Altertumswissenschaft in a rolling mode and creates items for them or alters existing items.
Code: subfunctions for the creation of wikidata items at github
Function details: The creation of items is a part of the regular maintenance of the s:de:Paulys Realencyclopädie der classischen Altertumswissenschaft. The bot runs nightly for some hours and checks a part of the growing articles (at the moment 37.000+). In the future it is possible that the bot will run more frequent, but for shorter durations.
The bot checks for every article if there exists already an article. If this isn't the case it creates a new one. The Algorithm calculates a desired state of the item from the informations at Wikisource and also from further connections in Wikidata. This is a regular process and will be there continuously.
The algorithm will get further changes. We will add further attributes to the items and I will improve the test coverage of the current code.
Next night after 10 pm UTC time there is another test with 100 changed items.
I'm happy to answer further questions. --THE IT (talk) 13:03, 10 April 2020 (UTC)[reply]
Comments[edit]
- Looks good. @Mfchris84, M2k~dewiki: are currently looking into a similar project about BLKOe. Just a few minor things:
- column (P3903): This seems to remove the a correct statement ("502-503") and keeps an incorrect one ("502").
- I fixed this before the last test run [1].
- Rather than adding labels in countless languages, title (P1476) could be useful, possibly also separately the part in parenthesis as subtitle (P1680). For Q19993868, this would be "Bizone" and "Βιζώνη".
- The addition of title (P1476) and subtitle (P1680) is a good idea. I'm not sure if it is reliable enough to parse the expression in parentheses. But I will consider it. Do you think I should drop the labels? The idea behind that was to present as much users as possible meaningful names of the linked items. --THE IT (talk) 14:14, 11 April 2020 (UTC)[reply]
- For NBD, title/subtitle parsing went fairly well (just a few transcription issues at WS and some typos in the original). I still working on getting first line and word counts done.
As for labels, personally, I wouldn't. The descriptions can be more helpful. English would be great as it's the fall-back for all languages. --- Jura 08:03, 12 April 2020 (UTC)[reply]
- For NBD, title/subtitle parsing went fairly well (just a few transcription issues at WS and some typos in the original). I still working on getting first line and word counts done.
- Asan (Pauly-Wissowa) (Q19992080) had a cross-reference (Q1302249) value in P31, but this was removed [2].
- Yeah that's not the best move. I will implement the checking for cross references. --THE IT (talk) 14:14, 11 April 2020 (UTC)[reply]
- Also, the description was changed from "cross-reference" to "article" [3]
- Fix this as well. --THE IT (talk) 14:14, 11 April 2020 (UTC)[reply]
- I was going to suggest adding document file on Wikimedia Commons (P996), but given the single page format on Commons, I'm not sure about that. If there are not too many per entry, maybe this could be done.
- Good idea I will implement this. I think I can work with qualifiers here. Maybe only link the first and the last page and point this out with qualifiers? --THE IT (talk) 14:14, 11 April 2020 (UTC)[reply]
- How many are there generally? If it's just two or three, one could easily link each. Some qualifier would be useful to sort them (page(s) or series ordinal). --- Jura 08:03, 12 April 2020 (UTC)[reply]
- first line (P1922) with the first approx. 300 characters up to a space + " …" could be added.
- Yeah, cool idea. I will implement that as well. I will skip cross-references and non free aritcles. --THE IT (talk) 14:14, 11 April 2020 (UTC)[reply]
- Personally, I mostly manage to strip formatting/references from the text. I'm trying to get word counts done for NBD too. --- Jura 08:03, 12 April 2020 (UTC)[reply]
- It would be good to see a full range of test edits for all types of changes/creations it does.--- Jura 20:55, 10 April 2020 (UTC), --- Jura 03:56, 11 April 2020 (UTC)[reply]
- There were 100 test edits this morning. I will come back with some more after fixing the errors. --THE IT (talk) 14:14, 11 April 2020 (UTC)[reply]
Thanks for the good advises. :-). I will fix your pointers. I also found one myself :-(. The algorithm deletes already present links to subjects [4]. That's not acceptable --THE IT (talk) 14:14, 11 April 2020 (UTC)[reply]
- First thing ... I relinked all deleted items. --THE IT (talk) 14:36, 11 April 2020 (UTC)[reply]
Hi, I'm the person who spent the past months and by now a couple thousand edits on linking the RE items to their subjects (example: [5], [6]). I really appreciate THE IT's effort in this and hope he can get the bot flag soon! As for suggestions for improvement, it would be great if the link back from the subject items to the RE items via Property:P1343 could be implemented as soon as possible. Best,, --Tolanor (talk) 14:34, 12 April 2020 (UTC)[reply]
- @Lymantria: we haven't seen a corrected run yet. --- Jura 07:41, 19 April 2020 (UTC)[reply]
- @Jura1: My apologies, I now see this was a premature closing. I'll undo. Lymantria (talk) 07:43, 19 April 2020 (UTC)[reply]
- @Lymantria: if @THE IT: prefers, I suppose it could be approved for the working parts. Other functionalities could be added later. --- Jura 07:55, 19 April 2020 (UTC)[reply]
- THE IT promised an extended test run. Let's await that. If that is well enough, I will approve (indeed for the working parts). Lymantria (talk) 08:01, 19 April 2020 (UTC)[reply]
Is it good practice to add references for every claim?:
- imported from Wikimedia project German Wikisource
- reference URL https://de.wikisource.org/wiki/RE:<Lemma_Name>
--THE IT (talk) 07:53, 28 April 2020 (UTC)[reply]
- References yes, "imported from" (personally I think) it depends. Here, for most, it's fairly obvious that the item is just metadata for the linked Wikisource page. I'd add "imported from Wikimedia project" to the the main subject (P921) statement only. --- Jura 08:13, 28 April 2020 (UTC)[reply]
- So, reference URL for all Claims and imported from only for P921? --THE IT (talk) 12:54, 28 April 2020 (UTC)[reply]
- Personally, I'd skip reference URL here. --- Jura 16:22, 28 April 2020 (UTC)[reply]
- I see, so only for P921 a reference. I will implement it this way. --THE IT (talk) 19:25, 28 April 2020 (UTC)[reply]
- Personally, I'd skip reference URL here. --- Jura 16:22, 28 April 2020 (UTC)[reply]
Hello together, I refactored the code quite a bit. All mentioned bugs should be fixed and I added some features. For the moment document file on Wikimedia Commons (P996) and first line (P1922) are not implemented. I wanted to finish the existing claims first. I will add this properties in a later Request for permission. The next test run, with also 100 edits will start in 2,5 hours. --THE IT (talk) 19:39, 11 May 2020 (UTC)[reply]
- This night the run works. From my perspective it worked fine so far. The only thing I will remove is the creation of subtitles. They way brackets are used in the RE is to inconsistent (it was published over a period of ~ 100 years). So I will remove the subtitle claim, it is to error prone. Can I have go for the rest? --THE IT (talk) 07:59, 13 May 2020 (UTC)[reply]
Looks good. Would you have it create a few new ones (if there are any available)? Just a few minor things:
- That was another concern of mine too. I will make a run next night, with only newly created items.
- Adding P407=Q188 to the items might be worth doing.
- I will do this --THE IT (talk) 13:00, 13 May 2020 (UTC)[reply]
- Done--THE IT (talk) 15:26, 14 May 2020 (UTC)[reply]
- I will do this --THE IT (talk) 13:00, 13 May 2020 (UTC)[reply]
- If the title is wrapped in [7], maybe another language code could be used (grc?). Sample: Q19980984#P1476.
- The title is not fetched from the text. I fetch it from the page title. I thought about other languages (to detect greek is probably possible), but detecting latin (and a lot of the titles are latin). So my logic at this point is the language of the encyclopedia is german ... so all title are too? --THE IT (talk) 13:00, 13 May 2020 (UTC)[reply]
- Don't worry about it in that case. --- Jura 13:12, 13 May 2020 (UTC)[reply]
- The title is not fetched from the text. I fetch it from the page title. I thought about other languages (to detect greek is probably possible), but detecting latin (and a lot of the titles are latin). So my logic at this point is the language of the encyclopedia is german ... so all title are too? --THE IT (talk) 13:00, 13 May 2020 (UTC)[reply]
- If the page isn't about an encyclopedic article, I think another P31 is preferable (index), (preface). Also: cross-references (but I haven't noticed any). Given the low numbers of index/prefaces, that could be fixed afterwards, but shouldn't be overwritten by the bot.
- Q19981026 is a cross reference article of last night. The other articles you mentioned are some occasional cases (only 10 or so introduction articles), I could mark them as Q920285. Good idea? --THE IT (talk) 13:00, 13 May 2020 (UTC)[reply]
- P31_values has list of what we used for BLK. There is an item for prefaces, others we had to create.--- Jura 13:12, 13 May 2020 (UTC)[reply]
- Thanks for this list. I will use the items there. --THE IT (talk) 15:26, 14 May 2020 (UTC)[reply]
- P31_values has list of what we used for BLK. There is an item for prefaces, others we had to create.--- Jura 13:12, 13 May 2020 (UTC)[reply]
- Q19981026 is a cross reference article of last night. The other articles you mentioned are some occasional cases (only 10 or so introduction articles), I could mark them as Q920285. Good idea? --THE IT (talk) 13:00, 13 May 2020 (UTC)[reply]
- I haven't checked P6216 statements.
- I was very thorough at this point ;-). --THE IT (talk) 13:00, 13 May 2020 (UTC)[reply]
- items used for volumes should probably be cleaned up, but this isn't something due to the bot's activity.
- @Tolanor:. Could you help here? --THE IT (talk) 13:00, 13 May 2020 (UTC)[reply]
- I can fix them later, but I still have to finish the ones for BLK. --- Jura 13:12, 13 May 2020 (UTC)[reply]
- @Tolanor:. Could you help here? --THE IT (talk) 13:00, 13 May 2020 (UTC)[reply]
--- Jura 09:00, 13 May 2020 (UTC)[reply]
I fixed some more bugs, at the moment some new items are created, not much (apparently there are not so much missing). --THE IT (talk) 15:26, 14 May 2020 (UTC)[reply]
- Looks good. I suppose Iunius 111 (Pauly-Wissowa) (Q94412414) doesn't link to Arthur Stein (Q711593) as there is no Wikisource page for it. BTW, As copyright status (P6216) isn't actually part of WS, maybe these statements should have "import from" references too. --- Jura 15:39, 15 May 2020 (UTC)[reply]
- With Iunius 111 (Pauly-Wissowa) (Q94412414) you are right. There will be a link in 2021, if the page for stein is created in Wikisource.
- I will add imported from for copyright status (P6216).
- Let me know, when I can start with regular Bot runs. Nevertheless will I monitor the first nightly runs closely and keep the edits at a few hundreds for every night. --THE IT (talk) 19:11, 15 May 2020 (UTC)[reply]
- @Lymantria: can you review/flag it? --- Jura 19:22, 15 May 2020 (UTC)[reply]