Wikidata:Requests for permissions/Bot/AliciaFagervingWMSE-bot 8
The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Approved, see my comment at the bottom--Ymblanter (talk) 17:10, 2 November 2017 (UTC)[reply]
AliciaFagervingWMSE-bot 8[edit]
AliciaFagervingWMSE-bot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: Alicia Fagerving (WMSE) (talk • contribs • logs)
Task/s: The function of the bot is to import data about immovable cultural heritage to Wikidata as part of Wikimedia Sverige's Connected Open Heritage Project.
This request is for data about cultural heritage monuments in the Brussels region (Belgium) from the Wiki Loves Monuments Database.
Code: The bot uses Python and the Pywikibot framework. The code is up on Github: Framework, Specific table processing script
Function details:
The bot processes data from the Wiki Loves Monuments Database, in this case be-bru_(nl), about 1100 items. They will all make use of protected heritage site in Brussels ID (P3600) as an identifier.
Here's an example of how the data is translated into Wikidata properties: Wikidata:WikiProject WLM/Mapping tables/be-bru_(nl)/preview. And here are matched items which will be updated: Wikidata:WikiProject_WLM/Mapping_tables/be-bru_(nl)/matches
Some test edits have been made: Stade du Vivier d'Oie (Q2113474), Hotel de France (Q38906626), Fagus sylvatica (Q146149), Eclectic house (Q38906676), Traditional house (Q38906702), Hotel d'Ittre (Q38906726).
--Alicia Fagerving (WMSE) (talk) 13:14, 7 September 2017 (UTC)[reply]
- Comment instance of (P31) shouldn't have cultural property (Q2065736). We have a designated property for this: heritage designation (P1435). Maybe you can try to cleanup previous edits as well.
--- Jura 14:57, 7 September 2017 (UTC)[reply]- The mapping makes use of heritage designation (P1435)protected heritage site in Brussels (Q19346745) as well. It uses cultural property (Q2065736) as a backup when mapping through Wikidata:WikiProject WLM/Mapping tables/be-bru (nl)/objtype fails to ensure that none of the newly created items are without a instance of (P31) at all. /André Costa (WMSE) (talk) 10:49, 28 September 2017 (UTC)[reply]
- We have now updated the code phab:T176960 to only use instance of (P31)cultural property (Q2065736) when there are no instance of (P31) claims already on the item. /André Costa (WMSE) (talk) 11:06, 29 September 2017 (UTC)[reply]
- The mapping makes use of heritage designation (P1435)protected heritage site in Brussels (Q19346745) as well. It uses cultural property (Q2065736) as a backup when mapping through Wikidata:WikiProject WLM/Mapping tables/be-bru (nl)/objtype fails to ensure that none of the newly created items are without a instance of (P31) at all. /André Costa (WMSE) (talk) 10:49, 28 September 2017 (UTC)[reply]
- It would be nice if those who want to import geodata would participate in discussion of what the definition of Wikidata geocoordinates mean. If you add ill-defined data, you are at risk of having to do it over again. See Wikidata:Project chat#Any wikibase:geoPrecision users?. Location of a stadium to hundredths of an arcsecond? What does that mean? Jc3s5h (talk) 15:58, 7 September 2017 (UTC)[reply]
- What edits are you commenting? I don't see that happening. Even: items seem to lack coordinates. --- Jura 07:46, 9 September 2017 (UTC)[reply]
- There are 3 entries which have coordinates. These coordinates have likely been added by Wikipedia users. We use those values verbatim, just as if those users had entered them through the Wikidata UI. There is no consistent way for us to determine precision for the widly varying types of objects that are imported from the Monuments database. /André Costa (WMSE) (talk) 10:49, 28 September 2017 (UTC)[reply]
- What edits are you commenting? I don't see that happening. Even: items seem to lack coordinates. --- Jura 07:46, 9 September 2017 (UTC)[reply]
- Comment The third sample above seems to be mixup with an existing item: https://www.wikidata.org/w/index.php?title=Q146149&action=history . Please checks on your samples and do spot checks on your own edits.
--- Jura 07:46, 9 September 2017 (UTC)[reply]- That is the unfortunate side-effect of a manual linking done on Wikipedia. We've blacklisted taxon and are adding a preview mode (phab:T176949) of all matched items to make it easier to spot errors. Unfortunately the types of items that can be monuments are to varied to allow for an explicit whitelist. /André Costa (WMSE) (talk) 10:49, 28 September 2017 (UTC)[reply]
- A new mechanism has been added to produce a preview of the matches Wikidata:WikiProject_WLM/Mapping_tables/be-bru_(nl)/matches. This allows for easy inspection and augmentation of the blacklist prior to import. /André Costa (WMSE) (talk) 10:09, 29 September 2017 (UTC)[reply]
- We've also blacklisted any matches to items that have a subclass of (P279) claim. /André Costa (WMSE) (talk) 11:06, 29 September 2017 (UTC)[reply]
- A new mechanism has been added to produce a preview of the matches Wikidata:WikiProject_WLM/Mapping_tables/be-bru_(nl)/matches. This allows for easy inspection and augmentation of the blacklist prior to import. /André Costa (WMSE) (talk) 10:09, 29 September 2017 (UTC)[reply]
- That is the unfortunate side-effect of a manual linking done on Wikipedia. We've blacklisted taxon and are adding a preview mode (phab:T176949) of all matched items to make it easier to spot errors. Unfortunately the types of items that can be monuments are to varied to allow for an explicit whitelist. /André Costa (WMSE) (talk) 10:49, 28 September 2017 (UTC)[reply]
@Jura1: We've now implemented solutions for the problems raised above. Would you mind taking another look? /André Costa (WMSE) (talk) 11:06, 29 September 2017 (UTC)[reply]
- Also @Romaine: in case you want to take a look as you were involved in the mapping. /André Costa (WMSE) (talk) 11:15, 29 September 2017 (UTC)[reply]
- Would you do some test edits and link to them?
--- Jura 06:51, 1 October 2017 (UTC)[reply]- @Jura1:: The following are new examples Sonian Forest (Q2254455), Saule blanc (Salix alba) (Q41527253), Monument Gabrielle Petit (Q41527227), Forest-South railway station (Q952542) and Set of eclectic houses (Q41527288). /André Costa (WMSE) (talk) 10:53, 2 October 2017 (UTC)[reply]
- They seem mostly ok to me, expect that I'd try to do away entirely with P31=cultural property. For Monument Gabrielle Petit (Q41527227), you could use monument (Q4989906). Merely identifying it from the label seems to work. If it's too complicated to do it beforehand, maybe the import project could add least plan a subsequent cleanup effort.
--- Jura 10:19, 8 October 2017 (UTC)[reply]
- They seem mostly ok to me, expect that I'd try to do away entirely with P31=cultural property. For Monument Gabrielle Petit (Q41527227), you could use monument (Q4989906). Merely identifying it from the label seems to work. If it's too complicated to do it beforehand, maybe the import project could add least plan a subsequent cleanup effort.
- @Jura1:: The following are new examples Sonian Forest (Q2254455), Saule blanc (Salix alba) (Q41527253), Monument Gabrielle Petit (Q41527227), Forest-South railway station (Q952542) and Set of eclectic houses (Q41527288). /André Costa (WMSE) (talk) 10:53, 2 October 2017 (UTC)[reply]
@André Costa (WMSE): Three things that are important for the import. First all of all we should not have duplicate items created in Wikidata as some of the monuments already exist in Wikidata. In the past people have been creating articles in Wikipedia about certain monuments from the lists. While the articles are added to Wikidata, the monument identifiers have not (for a large part). To prevent duplicate items from being created, the category on Wikipedia (w:nl:Categorie:Beschermd erfgoed in het Brussels Hoofdstedelijk Gewest) with the monuments must be checked first that all of them have a monument ID on Wikidata. I think I can do that in the coming days.
Thirdly we need to make sure that instance of (P31) is added to every newly created monument in Wikidata (like house, church, parc, etc). Is that available in the database import for every monument? Romaine (talk) 04:30, 2 October 2017 (UTC)[reply]
- @Romaine:. If you could do that it would be great. Every monument which has an
|objtype=
which has been mapped in Wikidata:WikiProject WLM/Mapping tables/be-bru (nl)/objtype will get that as it's instance of (P31). Otherwise it will get the fallback instance of (P31)cultural property (Q2065736) (unless it already has another instance of (P31)). We log whenever we fail to match anobjtype
so that a volunteer could work of that list if desired to do the parts which the bot couldn't. /André Costa (WMSE) (talk) 07:39, 2 October 2017 (UTC)[reply]- Hi André Costa (WMSE), In the past days I have been going through all the monuments list from the Brussels Capital Region, both in French as Dutch, and have added to already exiting items on Wikidata (because of already existing articles) the monument identifiers. Now all together 324 items have a monument identifier. The rest can be imported now. I do however have some remarks:
- Addresses: Streets in Brussels have two names: one in Dutch, one in French. For that reason we always add the language, see: here
- Addresses: Is it possible to import also the French address/street?
- Addresses: Some streets have the Streetname followed by 0. This 0 should not be imported.
- English description: the description added is currently "heritage site in Brussel, Belgium". There is a spelling error in it and it does not make sense. At first a subject is not a monument but a house, other building, etc. If you can't be specific, do not add it, because it has to be removed and will be a waste of time. Example: here
- Labels/descriptions in general: many monuments have the same name, then the description must specify the difference. In these cases the address is the difference, so I added the address there, like here. Is this possible in case of duplicate names?
- French labels: The official monument register in Brussels has official names for monuments in both Dutch and French, to the newly created items only Dutch was added. Why?
- P31: if you can't be specific, saying it is a monument does not make sense, as that is not what the subject is. In those cases it should not be added, I will enter it later on, but do not want to remove + add (= waste of time).
- Concerning the 324 items on Wikidata with already a monument identifier: if the database has a different label, please add it as an alias. I assume that the address and other data that has not been added to these items will be added by your bot?
- I hope it is possible to take these remarks into account, if needed I can important the data as well. Romaine (talk) 07:39, 9 October 2017 (UTC)[reply]
- @Romaine: Thanks for the matching! In general the imported data all comes from the lists on nl.wikipedia, so we don't have access to any information that is not there/in the Monuments database.
- Multilingual names: The only street names we have are those from the lists on nl.wikipedia. I'll add a "nl" label to these per the method you described) but I don't have access to the French names.
- Streetnames with 0: This is how the data has been entered in e.g. w:nl:Lijst_van_beschermd_onroerend_erfgoed_in_Anderlecht. Considering the various formats the address takes I would recommend either fixing it there or importing it and then fixing these here.
- The typo in the English description was actually due to the use of the Dutch name of Brussles. I've removed the English description.
- There is no easy way of detecting if the label+description is unique. The catch all solution would be to add address to all descriptions which doesn't feel very elegant.
- For P31 I'm not sure I agree. For me an item without a P31 is worse than one which is not specific enough (but basically correct). If the type has not been matched in Wikidata:WikiProject WLM/Mapping tables/be-bru (nl)/objtype then the only thing we actually know is that it's a monument. If that can be manually replaced by something more specific later then that is of course better but leaving it empty because "at some point it will be done perfectly" does not feel right to me. /André Costa (WMSE) (talk) 08:56, 16 October 2017 (UTC)[reply]
- @Romaine: Thanks for the matching! In general the imported data all comes from the lists on nl.wikipedia, so we don't have access to any information that is not there/in the Monuments database.
- Hi André Costa (WMSE), In the past days I have been going through all the monuments list from the Brussels Capital Region, both in French as Dutch, and have added to already exiting items on Wikidata (because of already existing articles) the monument identifiers. Now all together 324 items have a monument identifier. The rest can be imported now. I do however have some remarks:
- Please let us know when the bot is ready for approval.--Ymblanter (talk) 07:41, 15 October 2017 (UTC)[reply]
- We're implementing a few of Romaine's suggestions and waiting for oen more round of feedback, then we should be done. /André Costa (WMSE) (talk) 08:56, 16 October 2017 (UTC)[reply]
- Some of the suggestions have been implemented and tested in the Sandbox to make sure they give the desired result. Bot should be ready for approval again. /André Costa (WMSE) (talk) 08:23, 20 October 2017 (UTC)[reply]
- We're implementing a few of Romaine's suggestions and waiting for oen more round of feedback, then we should be done. /André Costa (WMSE) (talk) 08:56, 16 October 2017 (UTC)[reply]
- Thanks. Then I will use the street names on FR Wikipedia to add it later the items.
- That a 0 is added is a mistake, as 0 is not part of an address. I assume it originates in a spreadsheet or database in what no house number was added. It would be better if we can remove this before importing, otherwise it will become much more work.
- I will see what I can do about the labels and descriptions after the import. As the city is half French, we certainly need a French label. Also English is handy. Perhaps I can import that together with the street names in French.
- Concerning P31, in general I agree with you, however in this case I would prefer not to. When you did your import, I can add P31 based on the Typologie column in the tables. If you add P31, I directly after you have to remove it, which takes extra work. And no, not at some point, I will do it directly after you imported it as this is for me number one priority.
Also I will look up and add the coordinates as that is something I also consider a requirement. Thanks! Romaine (talk) 00:30, 26 October 2017 (UTC)[reply]
- @Romaine: -- we have updated the Wikidata:WikiProject WLM/Mapping tables/be-bru_(nl)/preview with the last changes, removing the 0's from street names and also the default P31's. --Alicia Fagerving (WMSE) (talk) 11:30, 30 October 2017 (UTC)[reply]
- I have started the live upload now :) --Alicia Fagerving (WMSE) (talk) 12:46, 30 October 2017 (UTC)[reply]
- @Romaine: Also a short note that per your request the address is added when the description+label is the same as an other object). /André Costa (WMSE) (talk) 15:52, 30 October 2017 (UTC)[reply]
- André Costa (WMSE) & Alicia Fagerving (WMSE): I see the import is still ongoing. Please let me know when you have finished. Romaine (talk) 16:21, 31 October 2017 (UTC)[reply]
- @Romaine: The import is indeed finished, some statistics will be published here once they're done generating. --Alicia Fagerving (WMSE) (talk) 07:28, 1 November 2017 (UTC)[reply]
- André Costa (WMSE) & Alicia Fagerving (WMSE): I see the import is still ongoing. Please let me know when you have finished. Romaine (talk) 16:21, 31 October 2017 (UTC)[reply]
- @Romaine: Also a short note that per your request the address is added when the description+label is the same as an other object). /André Costa (WMSE) (talk) 15:52, 30 October 2017 (UTC)[reply]
- Now I am not sure - the bot went on to do the task without approval and finished it, right?--Ymblanter (talk) 16:48, 2 November 2017 (UTC)[reply]
- @Ymblanter: My apologies, this is on me. I misread the discussion and instructed Alicia to start the run when she got back to work. /André Costa (WMSE) (talk) 17:04, 2 November 2017 (UTC)[reply]
- Good, I will close the discussion for arxivation purposes, but if @Romaine: still has issues, I would advise him to contact directly André or Alicia--Ymblanter (talk) 17:10, 2 November 2017 (UTC)[reply]
- @Ymblanter: My apologies, this is on me. I misread the discussion and instructed Alicia to start the run when she got back to work. /André Costa (WMSE) (talk) 17:04, 2 November 2017 (UTC)[reply]