User talk:DaxServer

About this board

What (and how many) articles is your code importing?

18 comments • 11:54, 21 September 2024 4 days ago

18

How do you decide which articles are notable? I saw the User:DaxServer/crossref-a6091a2f6a1 file, but does does not tell me much.

Reply 12:06, 31 August 2024 25 days ago

DaxServer (talkcontribs)

@Egon Willighagen I thought they're all notable given they're scientific publications. Are they not?

Reply 12:28, 31 August 2024 25 days ago

Egon Willighagen (talkcontribs)

No, that's certainly not enough. Because of the scalability issues, the consensus is to not mass import scientific articles. Only clearly notable that can be added are things like: references to support an existing statement, and perhaps an article getting a 500+ Altmetrics.com score, so clearly the topic of very active discussion. And I guess articles that are cited in one of the Wikipedia's. Articles with more than 200 citations are likely notable. Or articles by a famous scholar. Most articles are not notable.

Reply 13:24, 31 August 2024 25 days ago

DaxServer (talkcontribs)

Thanks for the notes, I'll import those with the 500+ Altmetrics score and 200+ citations

Reply 13:41, 31 August 2024 25 days ago

Egon Willighagen (talkcontribs)

Please also see this discussion: Wikidata:Project chat#Mass-import policy Wikidata has a serious scalability issue and everyone is being careful in not breaking the system.

Reply 10:34, 1 September 2024 24 days ago

DaxServer (talkcontribs)

Thanks for the link. I'll follow the discussion along!

Reply 11:13, 1 September 2024 24 days ago

Prototyperspective (talkcontribs)

I also think that all studies should / need to be imported to enable all sorts of useful things which are not possible before that is done such as the statistics of Scholia. Please see this discussion and the discussion linked there.

I think any limitation would need to be temporary until scalability issues are solved.

Thanks for importing all these studies by the way. Also I think the altmetrics minimum should be lowered to 300+. I went through nearly all studies with an altmetrics score up to that monthly for many months (most recently only quarterly) and many of these are very notable. Studies with altmetric score between 300 and 500/700 may need some additional metric to filter out the less notable ones depending on what other data you can query (if there's nothing better one could use the citation count). Moreover: is the altmetrics score imported as well? That would be a very useful metadata to have here and I suggested open source improved altmetrics scores here. I'll also follow that discussion along.

Reply 23:49, 1 September 2024 24 days ago

GZWDer (talkcontribs)

Personally I also do want to see all Crossref imported if we solved the scalability issues.

Reply 13:23, 2 September 2024 23 days ago

Prototyperspective (talkcontribs)

@GZWDer Well then I think Wikidata for scholarly papers as well as Scholia are a lost cause and it would make no sense to spend any time importing and maintaining studies to WD as it's wasted time with no potential to ever be useful. Could you please explain why you think so? I made several points here and esp. in the discussions linked such as that the charts about e.g. studies about a certain subject or studies by some authors would be/remain false and misleading.

Reply 14:53, 2 September 2024 23 days ago

GZWDer (talkcontribs)

Even if citation count is not everything and for scholia it will almost always be biased, to get a relatively accurate estimate we need to import more than those who are influential, since we have no way to express an article is cited if the new article is not in Wikidata.

Reply 14:57, 2 September 2024 23 days ago

Prototyperspective (talkcontribs)

Thanks for clarifying, I very much agree with you and I'm sorry I misread your previous comment as "do not want to see" (I thought I double-checked but apparently didn't).

Reply 15:04, 2 September 2024 23 days ago

DaxServer (talkcontribs)

@Egon Willighagen Do you know if Altmetrics provides some sort of public dumps/data? I was not able find one

Reply 18:22, 20 September 2024 5 days ago

Egon Willighagen (talkcontribs)

Maybe you can extract the "Top 100"s from here: https://www.altmetric.com/top100/home/

And I guess this was their follow up: https://www.altmetric.com/blog/varied-layers-of-attention-the-altmetric-500/

Reply Edited 11:54, 21 September 2024 4 days ago

Egon Willighagen (talkcontribs)

I don't think so. They are a company, and the data their IP. They have an API: https://www.altmetric.com/solutions/altmetric-api/ They have badges, but it seems the documentation is hidden (at best): https://www.altmetric.com/solutions/altmetric-badges/

Reply 07:31, 21 September 2024 5 days ago

Egon Willighagen (talkcontribs)

No, the documentation is still available: https://badge-docs.altmetric.com/

Reply 11:47, 21 September 2024 4 days ago

Prototyperspective (talkcontribs)

Suggested adding the altmetrics badges to scholia. I don't know how ScienceOpen shows them but there all the studies have the altmetrics metadata and one can sort and filter by that which is very useful (I don't think getting this data through that site indirectly would make sense but maybe that's sth to consider). As suggested there, I think a Wikimedia-owned&developed altmetrics score would be great for many reasons. The Altmetrics score (Q130219307) is currently the most used. I guess one could only run API queries that get the altmetrics data and bulk-update the respective study items. Alternatively, one could not import them into Wikidata at all and implement that only in Scholia but the former would be better.

Reply 09:28, 21 September 2024 4 days ago

DaxServer (talkcontribs)

In that case, I'll resume my work with the articles with 200+ citations.

Reply 09:34, 21 September 2024 4 days ago

Prototyperspective (talkcontribs)

I think some discussion about the number of citations would be due then. 200+ seems extremely high – that is like one study per month and one of the top 20 papers of the year by altmetrics. Look at these examples that got ~100 k reads (10 k is significant) and in the top altmetrics of Q1 2024: 1 2 3 (they got 2, 23, 12 citations). Not even the study on potential life on Venus years ago got 200 citations by now and that is probably one of the top 10 studies of the decade or something on that order.

Moreover, I'd suggest you or somebody else imports the top 2 k (at least) studies per month sorted by altmetrics. If you can't use the altmetrics API for that then please use the ScienceOpen site and they may also have some API. This would at least fix the issue of most of even the most notable studies missing here despite that there are millions of scholarly paper items in WD. I created some items manually but only added a DOI and thought a bot would populate the remainder but so far none did. Example: Q126363497

Reply 10:53, 21 September 2024 4 days ago

Reply to "What (and how many) articles is your code importing?"

New scientific articles

8 comments • 06:22, 19 July 2024 2 months ago

8

Mahir256 (talkcontribs)

Is there any method with which you're choosing the articles from Crossref to add items for?

Reply 23:02, 14 June 2024 3 months ago

DaxServer (talkcontribs)

@Mahir256 Yes, I've added the filters here User:DaxServer/crossref-a6091a2f6a1 that I've used for the latest batch.

Reply 09:29, 15 June 2024 3 months ago

Egon Willighagen (talkcontribs)

Can you please also add a filter to not add retracted articles like Q126581861?

Reply Edited 10:44, 16 June 2024 3 months ago

DaxServer (talkcontribs)

@Egon Willighagen Noted.

Reply 10:01, 16 June 2024 3 months ago

Egon Willighagen (talkcontribs)

Do you have a timeline? I found several more retracted articles being added in your process.

Reply 19:04, 16 July 2024 2 months ago

DaxServer (talkcontribs)

Egon, could you post some examples. Perhaps the filters I put to remove the retracted articles aren't enough. Thanks!

Reply 21:38, 16 July 2024 2 months ago

Egon Willighagen (talkcontribs)

Yeah, I was thinking about a SPARQL query to list them all, but I am still not sure how to write one that includes the item history and do things like: "Match all retracted articles created by account X" :(

Reply 06:22, 19 July 2024 2 months ago

DaxServer (talkcontribs)

Never mind, I found them. I'll remove them from my lists. If you observe anymore hereafter, please let me know. Thanks!

Reply 07:25, 17 July 2024 2 months ago

Reply to "New scientific articles"

Incorrect collection usage

15 comments • 18:49, 17 July 2024 2 months ago

15

Multichill (talkcontribs)

Please don't try to shoehorn MET departments into collection (P195). That's not how the property is defined here. Please remove them.

Reply 17:55, 21 May 2024 4 months ago

DaxServer (talkcontribs)

Thanks for the info @Multichill (A). I'll update them.

Reply 21:05, 21 May 2024 4 months ago

Multichill (talkcontribs)

Thank you. I understand the wish to track what departement of a (large) museum is responsible for an object. Maybe maintained by (P126) is a good option for that? Not sure. Would need a broader discussion. Wikidata talk:WikiProject Visual arts is a suitable place to discuss that. Maybe you can start a topic?

Did you discover Wikidata:WikiProject Visual arts/Item structure? It's the best overview we have on how to model artworks. For the example for The Abolitionists in the Park (Q116445658): painting at the Metropolitan Museum of Art (MET, 2022.259):

location (P276) is missing
inventory number (P217) is missing (should be added as a statement, adding it as a qualifier to collection is optional)
attribution text (P8264) is for license information on Commons. Shouldn't be used here at all. Provenance should be added as structured data

Reply 10:47, 24 May 2024 4 months ago

Multichill (talkcontribs)

I expanded The Abolitionists in the Park (Q116445658): painting at the Metropolitan Museum of Art (MET, 2022.259) a bit based on the source.

Reply 10:58, 24 May 2024 4 months ago

DaxServer (talkcontribs)

Thanks for the link to the Item structure. I've been looking for one here and there around the MET and GLAM pages but wasn't able to discover it. This will be of great help! I'll start a discussion at the talk page regarding the maintained by.

Reply 14:45, 24 May 2024 4 months ago

Multichill (talkcontribs)

I've imported all the MET paintings some time ago. Last time I did it was based on the API ( https://collectionapi.metmuseum.org/public/collection/v1/objects/875741 for example). I will run that one again. It's a pretty extensive framework that adds all this data.

Reply 15:31, 24 May 2024 4 months ago

DaxServer (talkcontribs)

That's a relief. Please let me know once you're done.

Reply 15:57, 24 May 2024 4 months ago

Multichill (talkcontribs)

It's running now for a while creating new ones like The Last Civil War Veteran (Q126099913) and updating existing items.

Reply 19:17, 26 May 2024 3 months ago

Multichill (talkcontribs)

Looks like User:Fuzheado created a bunch of items (like Two Trees (Q116445196)) without inventory number (P217). That is causing duplicate items.

Reply 19:27, 26 May 2024 3 months ago

Multichill (talkcontribs)

You have the MET data set up in OpenRefine, right? Can you add missing inventory numbers? Quite a few are missing it. Query might need a bit of filtering for weird cases like Johann Balthasar Probst (Q18508222).

Going through the query it looks like User:Fuzheado batch created a lot of very incomplete items like Dog (Q116727308). Not sure what the idea was with these and why not more data (like the inventory numbers) has been added.

Reply 14:55, 27 May 2024 3 months ago

DaxServer (talkcontribs)

Ya, I have the data. I can add them.

Reply 15:13, 27 May 2024 3 months ago

Multichill (talkcontribs)

That would be nice. Maybe also add the year it entered the collection too like this?

Reply 15:55, 27 May 2024 3 months ago

DaxServer (talkcontribs)

It's running for ~19K items. I'll upload the rest tomorrow.

Reply 18:56, 27 May 2024 3 months ago

DaxServer (talkcontribs)

Most of them are done except a few. They need manual intervention. Other than these, I think you can run your bot to fill in the data.

Reply 17:44, 29 May 2024 3 months ago

Multichill (talkcontribs)

I also replied at Wikidata_talk:WikiProject_Visual_arts#Collecting_info_about_the_curatorial_departments_in_art_Items: for the MET let's do a pilot with maintained by (P126) so that we no longer have it in collection (P195). Can you do that?

Reply 18:49, 17 July 2024 2 months ago

Reply to "Incorrect collection usage"

Duplicate subdistrict entries

4 comments • 17:21, 9 July 2024 2 months ago

4

RamSeraph (talkcontribs)

Thanks for cleaning up my mess. May I know how you are locating these duplicates? So that I can also look?

Also, I maintain some Indian geospatial datasets and I added a locater plugin for it which you might find useful. User:RamSeraph/IndianopenmapsLocater.js is the plugin. And this is a sample locater page.

Reply Edited 14:47, 9 July 2024 2 months ago

DaxServer (talkcontribs)

Hey @RamSeraph I was matching the subdistricts on OSM with Wikidata and Wiki tags. So, I downloaded them https://overpass-turbo.eu/s/1NYY to OpenRefine and reconciled the OSM names. For those didn't match, I had to search for them manually in respective local wikis and EN wiki.

Thanks for pointing me to your script. Your repo looks pretty good. I'd explore it further!

Reply 15:38, 9 July 2024 2 months ago

RamSeraph (talkcontribs)

I see. I haven't done much manual checking.

BTW I do have a slightly older version of wiki OSM subdistrict mapping if you are interested. This was done around half an year ago. https://gist.github.com/ramSeraph/4aa69bd42838a81cc5224c75af584b59

Was planning to update OSM with a script but never got around to it as it was not clear to me if it would be considered an automated edit and would require going through the whole process of doing a writeup about it.

Reply Edited 16:21, 9 July 2024 2 months ago

DaxServer (talkcontribs)

This actually helps. I'll check it out tomorrow. Thanks ;)

Reply 17:21, 9 July 2024 2 months ago

Reply to "Duplicate subdistrict entries"

OSM relation id linking on wikidata

2 comments • 07:19, 5 July 2024 2 months ago

2

Arjunaraoc (talkcontribs)

@DaxServer, I saw your edits linking relation id of OSM in wikidata ( example). Please check Wikidata:OpenStreetMap#Linking from Wikidata to OSM, where it is suggested not to link as OSM ids are not stable.

Reply Edited 04:34, 5 July 2024 2 months ago

DaxServer (talkcontribs)

Bummer

Reply 07:19, 5 July 2024 2 months ago

Reply to "OSM relation id linking on wikidata"

species & open map

2 comments • 21:01, 17 June 2024 3 months ago

2

Maculosae tegmine lyncis (talkcontribs)

Hello, this does not look like a sensible addition to me https://www.wikidata.org/w/index.php?title=Q1073621&diff=2182236210&oldid=2147821545

Thank you

Reply 20:26, 17 June 2024 3 months ago

DaxServer (talkcontribs)

Hi @Maculosae tegmine lyncis Thanks for the correction!

Reply 21:01, 17 June 2024 3 months ago

Reply to "species & open map"

Updating constituent mandals

14 comments • 13:33, 13 April 2022 2 years ago

14

Arjunaraoc (talkcontribs)

@DaxServer, I noticed that you are updating mandals as per the new districts. I have the data collected from OSM, which will ensure error free update through quickstatements. If you are doing manually, can you stop, as there is scope for errors.

Reply 01:58, 12 April 2022 2 years ago

DaxServerOnMobile (talkcontribs)

Absolutely! Please import them and ping me :)

Reply 07:33, 12 April 2022 2 years ago

DaxServer (talkcontribs)

@Arjunaraoc Do you also want me to stop changing the descriptions like this one ?

Reply 09:20, 12 April 2022 2 years ago

Arjunaraoc (talkcontribs)

Yes. I will take care of that too.

Reply 09:23, 12 April 2022 2 years ago

DaxServer (talkcontribs)

Okay, thanks

Reply 09:28, 12 April 2022 2 years ago

Arjunaraoc (talkcontribs)

I have completed the update. Can you do a random check and confirm the quality of the update? Thanks

Reply 17:48, 12 April 2022 2 years ago

DaxServer (talkcontribs)

I only looked at the mandal counts query from your page. Would you be able to add the end date and start date for the mandals? Or should it be done manually? If so, I could do that soon-ish

Reply 09:36, 13 April 2022 2 years ago

Arjunaraoc (talkcontribs)

I have already updated for all the affected mandals. ( ex:Addateegala mandal), As you seem to have created new qids for mandals in enwiki, it is better to merge them with corresponding Telugu mandal qids, as those qids are being used in OSM already. If my response does not address the issue, share an example with the change required.

Reply 10:45, 13 April 2022 2 years ago

DaxServer (talkcontribs)

Ahh, you have updated mandals. I was looking at the district qid and wondering. Like Srikakulam district the mandals in P131 need qualifiers, for those that changed. I'll try to look at the mandals randomly and check the import.

Re the merged ones, did you notice something that I didn't merge?

Reply 11:05, 13 April 2022 2 years ago

Arjunaraoc (talkcontribs)

Thanks for correcting Addateegala mandal start date. OK I will work on end dates and start dates for P150 property of districts. I thought I have handled that case, but looks like I did not.

Reply 11:24, 13 April 2022 2 years ago

Arjunaraoc (talkcontribs)

I did not handle for old districts of changed mandals, I will do that.

Reply 11:26, 13 April 2022 2 years ago

Arjunaraoc (talkcontribs)

I have completed the updation of mandals in old district pages with their end date. Can you check and let me know if I missed any other updates.

Reply 11:56, 13 April 2022 2 years ago

DaxServer (talkcontribs)

You're quite fast! Thanks for your good work :)

Reply 13:22, 13 April 2022 2 years ago

Arjunaraoc (talkcontribs)

Thanks for your appreciation. Nice to have a co-working person like you on Wikidata. I got introduced four years back to Wikidata, but could get a better hang of it only recently. I am trying to clean up all the issues that are inadvertently introduced by other contributors, as they tried to contribute.

Reply 13:33, 13 April 2022 2 years ago

Reply to "Updating constituent mandals"

About revenue divisions for Andhrapradesh mandals in wikidata

11 comments • 02:00, 12 April 2022 2 years ago

11

Arjunaraoc (talkcontribs)

@DaxServer, Nice to see that you are actively editing wikidata regarding the AP district changes. I have seen that you have added revenue divisions for some mandals in Tirupati district. As per wikidata guidance, P131 should be used for the lowest admin setup of the entity. If we follow that we have to use only revenue divisions for P131 and then use districts as P131 for revenue division entities. Though revenue division are an intermediate layer, their use for public is mostly related to any escalation of land issues at Mandal level. Mandals are more prominent in general use. I have also added experimentally revenue divisions for Prakasam district in the past. I am of the view, we should not bother about revenue divisions in wikidata. Please share your view. I am deferring further edits till we reach consensus on this. BTW, I tried to see how Karnataka districts are represented. They are using State- Revenue Divisions-districts-Taluk, where as the actual hierarchy is State-Revenue Div- Sub Div-District-Taluk . Taluk entities are not having P131 at this time. Having too many layers makes the usage of wikidata through queries more complex. If you take census data, they consider districts and subdistricts as the primary hierarchy and ignore the revenue division in their documents as far as I know.

Reply Edited 09:04, 9 April 2022 2 years ago

DaxServer (talkcontribs)

@Arjunaraoc Nice to see you too! As the revenue divisions are not really used outside of land issues, as you said, the District -> Mandal -> Village hierarchy seems very natural. Altho, it would be really useful to also map the relation b/w revenue divisions and other entities somewhere. I have no idea where. I might not be the best person to talk to about Wikidata, but there is a Telegram group for India Wikidata, if you are not already part of, which could be useful. You could ask @Planemad if you'd like to join. I think you are already part of the OSM IN group, aren't you? I remember seeing your name.

Either way, do you think district -> mandal -> village is better representation for P131?

Reply 10:31, 9 April 2022 2 years ago

DaxServer (talkcontribs)

@Arjunaraoc You might want to post in Wikidata talk:WikiProject India as that might be better so others might also want to chip in if they are interested in. (P.S. If you need my inputs, please ping me always as I'm not watching Wikidata changes)

Reply 10:36, 9 April 2022 2 years ago

Arjunaraoc (talkcontribs)

Thanks for your prompt reply. I agree that district-mandal-village is certainly better. So far several editors from Telugu states have experimented with wikidata, sometimes adding data with validation problems. I think at least for this AP districts restructuring update, we two can form the core and implement the agreed hierarchy. This requires removing Revenue division properties that are added on a trial basis by you and me. I will go ahead with the work, as I have collected the changes in mandal data of districts from the OSM, which I can confirm is accurate for the erstwhile 670 mandals. I will consider joining Wikidata telegram group in due course.

Reply 10:40, 9 April 2022 2 years ago

Arjunaraoc (talkcontribs)

OK, I will post in the project space so that others are aware.

Reply 10:41, 9 April 2022 2 years ago

DaxServer (talkcontribs)

@Arjunaraoc Thanks. One thing that would be important is to retain the past districts using the start time, end time, preferred rank qualifiers, if you are not using them already.

Reply 10:48, 9 April 2022 2 years ago

Arjunaraoc (talkcontribs)

Yes, I have made edits of end time for all the mandals which are being transferred. Rank qualifiers can not be set automatically using quickstatements. At Telugu wikipedia, we started using wikidata based infobox for Mandals, villages in Prakasam district. I am working on expanding it to all Mandals, as othewise, the work is not sustainable and quality can not be assured, due to small and decreasing set of wikipedians

Reply Edited 11:12, 9 April 2022 2 years ago

DaxServer (talkcontribs)

@Arjunaraoc I think you can set qualifiers: Help:QuickStatements#Add statement with qualifiers

It's quite cool to see more integration of Wikidata into Wikipedia projects. Hope to see more!! If you think I could be of help somewhere (except writing content), let me know :)

It is true that maintenance is indeed a burden on the rest of us who are active. I still haven't figured out how to ensure the quality of Wikidata and revert errors or vandalism of sorts. The Recent Changes feed has an option to monitor them, but I haven't found it to my liking. What is your current process of monitoring Wikidata related things?

Reply 11:05, 9 April 2022 2 years ago

Arjunaraoc (talkcontribs)

As per the quickstatements help, rank can not be set or changed. I have seen some bots doing the rank setting based on the latest datasets.

I mainly focus on AP Mandals data. I check it once in three months or so with queries for counts in each district. I also track relevant wikidata changes in my tewiki recent changes with wikidata changes option enabled.

I am working on adding coordinate info to AP mandals, using the data in tewiki article pages. I reached a count of 345. For the rest, I need to use the mandal hq pages, if they have coordinates. Can you work on populating the new districts information except the consistuent mandals which I will add based on wiki articles?

Reply 04:14, 10 April 2022 2 years ago