User talk:Nataliya Keberle

From Wikidata
Jump to navigation Jump to search

MIC import[edit]

Hi, note that replacing normal existing labels with uppercase labels is not not appropriate (see Help:Label). Also note that for company headquarters we use headquarters location (P159), not located in the administrative territorial entity (P131) (see Wikidata:WikiProject Companies/Properties). Also the references are not ideal, stated in (P248) should be linked to a specific source, not a general term like Market Identifier Code (Q6770697), publisher (P123) is also missing. Overall, it is preferable to discuss such large-scale imports somewhere in advance (for example at Wikidata:Project chat or Wikidata talk:WikiProject Companies. Jklamo (talk) 09:31, 12 October 2022 (UTC)[reply]

Apologies, this is my first bulk update. Thanks for noticing. I just used the data from MIC sheet as is, uppercase text goes from there. I can add publisher qualifier to MIC. Shall I undo all updates and re-do cleared ones? 188.163.5.158 09:37, 12 October 2022 (UTC)[reply]
hi @Jklamo! Nataliya is continuing/updating the work that I did 1.5 years ago. She's followed the way I did it , and there has been no comments on that work.
  • Uppercase: agree: the MIC import should not override labels when present
  • headquarters location (P159): don't quite agree. Many of these are not stock exchanges (not companies) but just markets (more a marketing mechanism than anything else), so is it correct to say they have a HQ?
  • References: disagree. It is stated in the best possible way. In addition to the general "Market Identifier Code" (database), there is the specific "MIC code". How do you want it changed?
stated in: Market Identifier Code
retrieved: September 2022
MIC market code: BJSE
  • publisher (P123): disagree. To my knowledge, no external identifier claim carries a "publisher". Such common knowledge should be attached to the identifier definition, not the claim.
Vladimir Alexiev (talk) 11:44, 12 October 2022 (UTC)[reply]
Affected items that I noticed are for companies and/or stock exchanges. As company or other organization itself generally isn't fixated to location, as opposed to its headquarters, then for such items P159 makes more sense to me. I'm not sure what would be an example of "marketing mechanism" item, but if P159 doesn't seem right for it then I'd assume P131 is still even less feasible. 2001:7D0:81FD:BC80:C5E1:CD41:6119:6634 12:51, 12 October 2022 (UTC)[reply]
headquarters location instead of located in the administrative territorial entity is applicable, in other sources of financial information the address is named as "headquarters location". Thanks for pointing out. Nataliya Keberle (talk) 08:48, 13 October 2022 (UTC)[reply]
Hi @Jklamo! 1) Labels in uppercase go from the MIC data sheet, which is the official ISO document. Converting them into Titlecase is wrong. I guess it is a question of which official document "weights" more or which is more important. Labels are not overridden if present (OR project cross-check).
2) re: publisher: along with property MIC market code there is the entity Market Identifier Code having official website and described by source, possessing all the knowledge necessary to avoid duplication in particular companies data. Nataliya Keberle (talk) 08:27, 13 October 2022 (UTC)[reply]
"Labels in uppercase": I disagree: Wikidata wants labels in title case, and no ISO standard specifies labels should be in uppercase. However, fixing the case of labels is very non-trivial, so if someone can do it, they should. It should not be up to Natalia (or me as ingester of the original dataset) to fix the case of labels and descriptions Vladimir Alexiev (talk) 08:53, 21 October 2022 (UTC)[reply]
I suppose uppercase labels are sort of fine as preliminary data. However in items in question these all caps aliases were added also to items where the case had been fixed previously. 2001:7D0:81FD:BC80:AD62:23B7:55D8:119D 12:15, 21 October 2022 (UTC)[reply]

Please pay attention to corrections like Special:Diff/1747313604. In this item you now readded statements that are about Luminor Bank (Q28966957) (Estonian legal entity), not about Luminor Bank AS (Q26969139) (former Latvian legal enity). Also, it seems you misinterpret your source data: it looks like you add P31=stock exchange (Q11691) only because the subject has MIC market code. The latter however is assigned also to e.g. trade reporting facilities, and entities operating an exchange or trading platform, see [1]. This bank itself quite certainly isn't a stock exchange. 2001:7D0:81FD:BC80:C5E1:CD41:6119:6634 12:38, 12 October 2022 (UTC)[reply]

Anonymous, why did you remove this contribution instead of making whatever fixes are needed?
> looks like you add P31=stock exchange only because the subject has MIC market code.
We have implemented some rather sophisticated logic based on:
This bank has category Not Specified (NSPD): do you know (or can you guess) why it has a MIC, and add that type?
Looking at https://www.wikidata.org/wiki/Talk:Q26969139, I don't know why Estonian users insist on being Anonymous.
Nataliya, please figure out which precisely Luminor has a MIC and add the MIC statements again. If Anonymous reverts your changes again as anonymous, then I'll move to ban him for a while.
Let's be constructive rather than just removing contributions! -- Vladimir Alexiev (talk) 09:07, 21 October 2022 (UTC)[reply]
I reverted because it was all wrong, i.e. all the data was about some other entity. I don't feel it's fair to intimidate me with a ban because of it. As I already pointed out, MIC data appears to be about Estonian bank indeed, while edit was made against item about Latvian bank (Q26969139). It is a lot of work to validate this data and to sort it between items, I don't feel it's my responsibility to do it. I appreciate you now investigate what went wrong and that second time Nataliya reverted their edits on their own.
Sorry, I'm not familiar with NSPD category and other specifics of MIC data, and I can't help much when it comes to interpreting this data correctly. I only wanted to point out that given P31 statement makes little sense, no matter if was added to item about Latvian bank or item about Estonian bank, and I hope this assertion itself was also helpful. 2001:7D0:81FD:BC80:AD62:23B7:55D8:119D 12:15, 21 October 2022 (UTC)[reply]

Please fix uppercase item labels[edit]

As discussed above you made a huge QS job [2] and screwed up a lot of items. They should not have all capital names. Please correct this. And please be careful when doing such large jobs. BrokenSegue (talk) 20:28, 22 December 2022 (UTC)[reply]

you also screwed up descriptions by setting them to all caps see [3]. Descriptions should never be all capitals. Please fix. BrokenSegue (talk) 20:32, 22 December 2022 (UTC)[reply]
Hi @BrokenSegue!
The task is difficult because:
  • There are many acronyms that must be left as is (eg APA, OTF, OTP, NASDAQ, STOXX, etc). So the bot should only change (capitalize) usual words found in a dictionary
  • Prepositions should be in lower case, eg "BOLSA DE COMERCIO DE SANTA FE" should become "Bolsa de Comercio de Santa Fe".
  • Titles and descriptions include not only English words, but also other languages
I don't think it should be up to me or @Nataliya Keberle to fix this ALLCAPS problem.
  • However, I agree with @Jklamo and others that Nataliya should not have overwritten existing names
  • @BrokenSegue as for [3], I think that MIC's description "ELECTRONIC MARKET MAKER" is more authoritative than the original "market-making firm". Of course, it would be better if rendered as "electronic market maker"...
So to sum it up: can anyone suggest software that can do proper capitalization, taking into account the difficulties listed above? Vladimir Alexiev (talk) 08:28, 23 December 2022 (UTC)[reply]

UPDATE: I notice that IotaFinance (the formatterURL site) has proper capitalization:

So, should we scrape the IotaFinance site? --Vladimir Alexiev (talk) 09:00, 23 December 2022 (UTC)[reply]

@Vladimir Alexiev: Look if the source information you are using is bad don't import from that source. You imported all-caps data when Wikidata doesn't want all caps data. So either rollback the import or fix it. Expecting others to write a bot to fix it for you or blaming the source isn't a great look. You created this mess. There's a sure fire way to fix it which is to manually go through all the edits by hand. At the very least you should do this to fix the overwritten descriptions/labels. I can maybe argue that an all-caps description is better than no description. BrokenSegue (talk) 16:31, 23 December 2022 (UTC)[reply]

The ISO MIC standard is not a "bad source" if you care about stick exchanges. Exchanges written in uppercase are better than no exchanges. We've spent over 100h matching and importing this data, and I don't think your assessment "a mess" is fair Vladimir Alexiev (talk) 15:19, 24 December 2022 (UTC)[reply]

@Vladimir Alexiev: look you seem to agree an error was made when properly written labels/descriptions were overwritten with all-caps ones. All I'm asking you to do is to fix that. BrokenSegue (talk) 17:15, 24 December 2022 (UTC)[reply]
After scraping of IotaFinance the labels and descriptions which did not conform to Wikidata requirements are finally updated, batches https://quickstatements.toolforge.org/#/batch/109105,
https://quickstatements.toolforge.org/#/batch/109107,
https://quickstatements.toolforge.org/#/batch/109108. Thanks go to you, @BrokenSegue, to @Vladimir Alexiev and Alexandr Ositsyn. Nataliya Keberle (talk) 17:26, 17 January 2023 (UTC)[reply]
this is better but still doesn't meet our style guide for descriptions. BrokenSegue (talk) 17:56, 17 January 2023 (UTC)[reply]
@BrokenSegue Can you concretize your last remark? Vladimir Alexiev (talk) 08:21, 22 February 2023 (UTC)[reply]
@Vladimir Alexiev: Please see Help:Description. In particular descriptions should not begin with an uppercase letter but there are lots of different kinds of errors. I don't even understand the description you set for Caveat Emptor (Q107223115)? Why is there a dash? Why do you repeat the label in the description. Etc. etc. BrokenSegue (talk) 15:43, 22 February 2023 (UTC)[reply]
That's the easy it comes from the ISO MIC standard list Vladimir Alexiev (talk) 19:34, 28 February 2023 (UTC)[reply]
sure that's where you got it. but it's not appropriate for Wikidata to be in that form. BrokenSegue (talk) 04:25, 1 March 2023 (UTC)[reply]
Want to join @BrokenSegue here and say this: @Vladimir Alexiev, @Nataliya Keberle, please revert your changes to descriptions entirely, not just decapitalise them. While some part of your changes is probably acceptable, changing human-readable descriptions to complete gibberish is awful and should not have been done on 2,000 articles without a discussion. Please revert your changes as soon as possible. stjn[ru] 14:28, 14 April 2023 (UTC)[reply]
To you "Equities, bonds, currencies, money market instruments, commodities and MOEX board" may be gibberish. To someone who actually works with stock exchanges, that's useful information.
I trust that the creators of ISO MIC know more about the important information about stock exchanges than you and I. Which doesn't mean we should uncritically take all their information: in this case I've edited the description to say it's in Moscow (check it out).
But PLEASE: we've contributed tons of useful stock exchange info, and all we get from the community is criticism about the descriptions? -- Vladimir Alexiev (talk) 07:11, 30 April 2023 (UTC)[reply]
Ah! I see in https://www.wikidata.org/wiki/Talk:Q2632892 that "stock exchange located in Moscow, Russia" is automatically generated.
And you've reverted the more informative MIC description "Equities, bonds, currencies, money market instruments, commodities and MOEX board", which is written by stock market experts, just because it sounds better to you. Vladimir Alexiev (talk) 07:13, 30 April 2023 (UTC)[reply]
For the record, now the description is both informative and readable: "Moscow stock exchange that deals in equities, bonds, currencies, money market instruments, commodities and runs the MOEX board" Vladimir Alexiev (talk) 13:45, 21 June 2023 (UTC)[reply]
Nata, there are a small percentage of Errors on the batches (Checked the last one). Please try "show only errors" and then "try to fix errors". Vladimir Alexiev (talk) 14:09, 19 January 2023 (UTC)[reply]
Done manually for those 7 errors. Nataliya Keberle (talk) 15:34, 19 January 2023 (UTC)[reply]

Wrong "end time" qualifiers[edit]

@nikolatulechki There are some badly formatted "end time" qualifiers, eg 20211025 in https://www.wikidata.org/wiki/Q93355424#P7534.

I found it only because there's a validation error The difference between start time (25 April 2011) and end time (year 20211025) should be between 0 year and 10,000 year.

Could you try to fix them? This comes from your batch https://iw.toolforge.org/quickstatements/#.2Fbatch.2F97968 Vladimir Alexiev (talk) 09:06, 23 December 2022 (UTC)[reply]

There are 80 MIC codes with "end time" qualifier, and they are all fixed: https://w.wiki/6Mzv Vladimir Alexiev (talk) 08:24, 22 February 2023 (UTC)[reply]