User talk:So9q

Jump to navigation Jump to search

About this board

talk talk

Previous discussion was archived at User talk:So9q/Archive 1 on 2019-11-18.

Mistakenly making hydrochlorothiazide (Q423930) a subject

Rdmpage (talkcontribs)

Hi, I've noticed that you've made Q423930 the main subject of papers that include "aquarius" in the title, e.g. Q100953456 "Aquarius philippinensis sp.n., a large endemic water strider (Insecta: Heteroptera: Gerridae) from ancient crater lakes in South Luzon, Philippines" (there are others) . In these papers Aquarius is Q2859194, a genus of insects. Is it possible to revert these edits? I guess this is the limitation of using simple text to determine subject, especially for items that have lots of synonyms that also match other items.

So9q (talkcontribs)
Rdmpage (talkcontribs)

I guess it's an occupational hazard of trying to determine what a paper is about based on its title. Taxonomic names can be a nightmare given all the possible name clashes. I wonder if what we need is something sophisticated enough to "know" whether a paper is likely to be on a species or a chemical compound, e.g. Q41799598.

So9q (talkcontribs)

Are you in the Wikicite telegram group? There we talk about how to better categorize all this knowledge. This is a blunt tool, using AI to read the abstracts would be a huge improvement. @houcemeddine: works on that if I am not mistaken, but few abstracts are available as open data at this time I'm afraid.

Rdmpage (talkcontribs)

Yes I am in that group. Abstracts are one way to determine what a manuscript is about, but I wonder whether we can get useful information from the surrounding network of connections? Knowing something about the journals and the authors may often tell us whether its likely that a strip refers to a chemical compound or a taxon.

So9q (talkcontribs)
Reply to "Mistakenly making hydrochlorothiazide (Q423930) a subject"

Suggestion to migrate usage comments in your User Scripts to use mw.loader.load

Jimman2003 (talkcontribs)

Nice work on user scripts! Just a suggestion... MediaWiki deprecated importScript since 1.16. I recommend you do a search and replace or a regex to automate this task... there is a chance I can help in a week though..

Jimman2003 (talkcontribs)

Can I find you though Matrix/IRC/<chat app here>? If so ,What is your username there?

So9q (talkcontribs)
Reply to "Suggestion to migrate usage comments in your User Scripts to use mw.loader.load"
Mxn (talkcontribs)

Regarding this reversion, there are a number of other abbreviations added as forms that I guess should be reverted for the same reason. I went through adding them for a federated query involving OpenStreetMap data last year. This discussion yielded no consensus on whether forms or separate lexemes would be appropriate for abbreviations, but perhaps something has changed in the meantime. In any case, the important thing is to make sure this data isn't lost. I'm not very experienced with lexeme modeling, so I'd appreciate it if you could start one for the "S." abbreviation that I could follow in the other cases. Thanks!

Reply to "Abbreviations"
Autom (talkcontribs)
Autom (talkcontribs)

Förlåt, du har rätt, jag tänkte på betydelsen "tillgänglig kapacitet som inte nyttjas".

So9q (talkcontribs)

Kul med nån som jobbar inom området. Jag sökte på ddg och hittade inget, därav min revert. Så länge vi kan belägga med källa kan det heta vad du vill 😃

Reply to "{{q|Q2006368}}"

Call for participation in the interview study with Wikidata editors

Kholoudsaa (talkcontribs)

Dear So9q,

I hope you are doing good,

I am Kholoud, a researcher at King’s College London, and I work on a project as part of my PhD research that develops a personalized recommendation system to suggest Wikidata items for the editors based on their interests and preferences. I am collaborating on this project with Elena Simperl and Miaojing Shi.

I would love to talk with you to know about your current ways to choose the items you work on in Wikidata and understand the factors that might influence such a decision. Your cooperation will give us valuable insights into building a recommender system that can help improve your editing experience.  

Participation is completely voluntary. You have the option to withdraw at any time. Your data will be processed under the terms of UK data protection law (including the UK General Data Protection Regulation (UK GDPR) and the Data Protection Act 2018). The information and data that you provide will remain confidential; it will only be stored on the password-protected computer of the researchers. We will use the results anonymized (?) to provide insights into the practices of the editors in item selection processes for editing and publish the results of the study to a research venue. If you decide to take part, we will ask you to sign a consent form, and you will be given a copy of this consent form to keep.

If you’re interested in participating and have 15-20 minutes to chat (I promise to keep the time!), please either contact me at or use this form with your choice of the times that work for you.

I’ll follow up with you to figure out what method is the best way for us to connect.

Please contact me using the email mentioned above if you have any questions or require more information about this project.

Thank you for considering taking part in this research.



Reply to "Call for participation in the interview study with Wikidata editors"
LA2 (talkcontribs)

När ett proper noun har nominativ = genitiv, fungerar det då, att bara ange en form, som i Lexeme:L475151, eller borde jag ange två former?

So9q (talkcontribs)

Ingen aning 😅 Är du med på Telegram? Vi har inte lagt fast för svenska än hur proper noun ska se ut i formerna. Bästa är nog att köra en sökning och se hur andra gjort. Jag vet att Fnielsen lagt till en massa i danska leksem i alla fall.

So9q (talkcontribs)
LA2 (talkcontribs)

Det vanligaste verkar vara att registrera två former, fastän de är lika, så då gör även jag så.

So9q (talkcontribs)

Toppen. Det låter bra.

Reply to "Egennamn på -s"
GZWDer (talkcontribs)

Do you have any comments on my recent edit?

GZWDer (talkcontribs)

I planned to add a reference to a large number of words (of 50000 from original source, at least 6000 exists in Wikidata) but I think we need community to discuss it first. See Wikidata_talk:Lexicographical_data#Moby_Part_of_Speech_List. (For the time being only existing lexemes will be edited.)

So9q (talkcontribs)

Sounds good to me. As long as it's a limited scope bot job with a some examples to judge the quality it sounds like a good idea to me :) I'm very happy you help finding these CC0 sources.

GZWDer (talkcontribs)

I want to receive some opinion on what is considered "everything from before is cleaned up"; Wikidata is a work-in-progress and edits are not required to be perfect. Usually, most items created by bots do not have most information filled but it is not usually considered an issue. Regearding duplicates, many tasks will create new duplicates and it is not possible to check them one-by-one (given the number of items created are several million), but what would be acceptable for already created items?

So9q (talkcontribs)

Yeah, it might not be a reasonable request at all. It's not a demand anyway, just what I would do myself and wish of others. I actually have something to clean up myself from an old QS batch that was eh misguided. 😅 The difference here is that you have someone nagging you and I don't. Nobody else seems to back me up so maybe you fine and in good standing? Ask Nikki, he is the only one I remember having mentioned you in Telegram (concerning lexemes). If you make any further bot requests I would very much like them to be limited in scope and with example edits. I would also love to see a bot that both for example for scientific artivlws:

  1. find a missing doi
  2. look up the authors
  3. imports any missing authors with ORCID with data from at least one source
  4. imports the paper and links to any authors and put the rest in author string

Since no one has written that I'm writing one now 😃

GZWDer (talkcontribs)

For lexemes, I have stated before that I will discuss every import seven days before at Wikidata talk:Lexicographical data. Wikidata's lexeme coverage is very limited and many very basic words are missing (you can see contribution of GZWDer (flood) and many words are very common), but due to the quality the sources we have, I proposed the Lexeme Mix'n'Match import approach. There are several online resources, (i) databases like WordData, WordNet and Flexion (Q101183911) (where I also imported a part), which contains a large number of invalid and duplicated entries (and for German, we still need to discuss the proper way to handle all inflection forms); (ii) online dictionaries, may be more reliable than i but the senses are copyrighted (so we may use a Mix'n'Match-like approach to manually match them); (iii) older text dictionaries, they may be in public domain but either they does not provide part-of-speech at all or part-of-speech may only be extracted using some complicated process; and (iv) word lists that provide nothing than words (which may contain plenty of non-lemma forms).

For authors of articles, there are some sources for them but either: (1) they conflated many people to one (Semantic Scholar); (2) containing multiple profiles for one person (Microsoft Academic); (3) Does not allow data mining per TOU (MathSciNet, Scopus). The nearest thing is ORCID API, where Magnus Manske's bot (currently inactive) was working on it.

So9q (talkcontribs)

I like the idea of a mixnmatch approach. I read up on it yesterday and I'm now using the userscript which is very user friendly. 😃 This is probably the best tool manske ever wrote. QS is also good but seems to be almost abandoned and that detract a lot of the value IMO. Also QS needs training on the users part and is not very intuitive IMO.

Reply to "Wikidata:Requests for permissions/Bot/GZWDer (flood) 6"
Arlo Barnes (talkcontribs)

Please find a good place on Wikidata to summarise progress made, for those who don't have a phone number to use Telegram.

Reply to "Re: topic:Vmaxvvzztj2kun8c"
Mateusz Konieczny (talkcontribs)
GZWDer (talkcontribs)

This new idea needs some input - in the future, data should be imported to a new system instead of directly to Wikidata and invalid words can be filtered out in advance.

Reply to "Wikidata_talk:Lexicographical_data#Tools_idea:_Lexeme_Mix'n'Match"