Topic on User talk:Jsamwrites

Jump to navigation Jump to search
So9q (talkcontribs)
Jsamwrites (talkcontribs)

@So9q Thanks. I will test it.

So9q (talkcontribs)
So9q (talkcontribs)
So9q (talkcontribs)
Jsamwrites (talkcontribs)
So9q (talkcontribs)
Jsamwrites (talkcontribs)

@So9q Thanks. I will use this pre-release version

So9q (talkcontribs)

New pre-release out with new features :) WDYT?

Jsamwrites (talkcontribs)

@So9q I see a lot of interesting features in this version. Great job!! But there are a couple of issues.

  1. I think the DEBUG mode is switched on by default. I see a lot of messages on the screen.
  2. I want to try the JOBS on toolforge. But it's not very clear how I can run prepared jobs on Toolforge/PAWS.
  3. I tried -l option. It doesn't work anymore. I think this option is replaced by -a option. But with -a option, I have to press Enter for every new scholarly article. I am not sure whether you are introducing it as a feature replacing the previous batch option.
Jsamwrites (talkcontribs)

@So9q Updated the above reply. I think, I checked out a wrong branch. Things are working fine. I striked-out some previous comments.

So9q (talkcontribs)
Jsamwrites (talkcontribs)

@So9q Thanks for this tutorial.

After doing ssh, I ran the following command (assuming that itemsubjector already exists), I get the following error


$ become itemsubjector

You are not a member of the group tools.itemsubjector.

Any existing member of the tool's group can add you to that.

So9q (talkcontribs)

Yeah, you have got to create your own tool in the web interface and name it whatever e.g. "itemsubjector-jsam" and use that.

I updated the guide with links.

Jsamwrites (talkcontribs)

@So9q Thanks for the updated guide. I am now able to ssh and run.

So9q (talkcontribs)

Great! I see you are editing a lot. Counting all your edits for september and oktober until now you have 1M edits! If all of them are main subject then you have single handedly taken Wikidata from 14M to 15M total main subjects on the 37M articles. Wow!

See https://qlever.cs.uni-freiburg.de/wikidata/zZAhrs which times out in Blazegraph.

Jsamwrites (talkcontribs)
So9q (talkcontribs)

According to this query we had 27M articles without any main subject before. I'm curious to see how many it is after our effort. At best, it is 1M less, so still 26M to go! :D

So9q (talkcontribs)

We are now almost down to 26M lacking P921! Nice work. 25M is our next target. How many weeks do you think it will take? 3?

Jsamwrites (talkcontribs)
So9q (talkcontribs)

That is a different measurement 😃 My search list all articles without any P921

So9q (talkcontribs)

How many reverts have you got per 100.000 edits? I got a few when I matched Canada, so I stopped matching countries.

Jsamwrites (talkcontribs)

@So9q I recall couple of them some weeks ago. The first one was 'Systemic therapy', which was ambiguous since it occurs in two different fields: psychology and cancer therapy.

  1. Q108744083 (Newly created one)
  2. Q1929812

One possible suggestion for improvement in ItemSubjector:

possible warnings when there is use of the property P1889 (different from), i.e., there are two items with the same label.


The second one was alcholism vs. alcholism treatment. I think that I added the former to an item which already had the latter. So, I am now careful with single word labels.

So9q (talkcontribs)
So9q (talkcontribs)

Fixed and working on master :)

Jsamwrites (talkcontribs)
So9q (talkcontribs)

Thanks for creating that and for inviting me. I feel honored. I signed up for WikidataCon and I will do my very best to attend. Would you like me to prepare something?


I just created a new query that might interest you/the participants: Wikidata:SPARQL query service/queries/examples#Galaxies ordered by the ones that are most linked from scientific articles

Also this query now times out on Blazegraph. A similar query works on QLever on older data from a few months back. I'm planning to set up a QLever instance in Toolforge in the near future.


Also: Did you try the new feature I pushed on master yesterday? :D

Jsamwrites (talkcontribs)

@So9q I tested the newest feature. It will be quite useful for certain subjects. I may need to test it more.

I did not yet test the Issue 26 fix.


Thanks for your reply concerning WikidataCon. I feel that the participants may have questions about ItemSubjector. If possible, you could present some of its key features.

So9q (talkcontribs)

Alright, I'll try to prepare a little demo video of the tool in action so they get a feeling for the interface.


People might want a QS similar front end, which could be a fun project.


I personally would like to make multiple batches and not have to wait for one to end before the next can be started.

So9q (talkcontribs)

I pushed QuickStatements export support to master (first step towards a WebUI) :) I did not find time to prepare a video, unfortunately.

If I don't make it into the meeting, please ask the participants whether they would like a Web UI for the tool that integrates with QS. See https://github.com/dpriskorn/ItemSubjector/issues/29

There has been very little traffic in the gitrepo so far so I wonder if a WebUI is worth the effort.

Jsamwrites (talkcontribs)
Jsamwrites (talkcontribs)
Jsamwrites (talkcontribs)

@So9q I tested the QS export option. It's working well. Great option. Thanks.

One observation: the CSV file generated misses the 'inferred from' (may be, it's a desired feature)

So9q (talkcontribs)

Oh, that's a bug. I'll open an issue

So9q (talkcontribs)

We are almost down to 25M articles missing P921 now. :) I'm working on medicine and there are thousands of subjects to cover still...

Jsamwrites (talkcontribs)

@So9q Yes, 25.16 M :)

Jsamwrites (talkcontribs)

@So9q < 25M articles now :)

So9q (talkcontribs)
Jsamwrites (talkcontribs)

@So9q Thanks for the update. I will take a look. Scholia statistics show that 20 M "main subject" values are now available.

So9q (talkcontribs)

Do you know how many P921 statements there were before I made the tool?

So9q (talkcontribs)

I have been working on Q35456 lately and will work on all of medicines for the forseeable future. Using the --sparql with --limit with the newest version makes the tool way better IMO because it first fetches (while I walk the dogs) and then I can approve/disapprove in one go in the end and start the job in k8s and leave it running until i finishes.

My biggest k8s job has been around 30k items so far running for 8 hours or more. Now I'm thinking 100k jobs and runtime of 24h might be easy to archive.

Jsamwrites (talkcontribs)

@So9q Based on what I remember, I think the number was somewhere between 15M-16M (Scholia Statistics). I may have to dig my tweets. I had shared some screen-captures in the beginning. Check this. It shows the difference: around 15M to 17M (October 2021). I think that a majority of them comes from itemSubjector.


Documenting essential medicinces (Q35456) will be useful. Wow, 100k jobs will be great.

So9q (talkcontribs)

I just started a 88k batch. Easy, took a few minutes to run all the queries and a few minutes to review. 😀

Jsamwrites (talkcontribs)
Reply to "New tool"