User talk:DaxServer

Jump to navigation Jump to search

About this board

What (and how many) articles is your code importing?

18
Egon Willighagen (talkcontribs)
DaxServer (talkcontribs)

@Egon Willighagen I thought they're all notable given they're scientific publications. Are they not?

Egon Willighagen (talkcontribs)

No, that's certainly not enough. Because of the scalability issues, the consensus is to not mass import scientific articles. Only clearly notable that can be added are things like: references to support an existing statement, and perhaps an article getting a 500+ Altmetrics.com score, so clearly the topic of very active discussion. And I guess articles that are cited in one of the Wikipedia's. Articles with more than 200 citations are likely notable. Or articles by a famous scholar. Most articles are not notable.

DaxServer (talkcontribs)

Thanks for the notes, I'll import those with the 500+ Altmetrics score and 200+ citations

Egon Willighagen (talkcontribs)
DaxServer (talkcontribs)

Thanks for the link. I'll follow the discussion along!

Prototyperspective (talkcontribs)

I also think that all studies should / need to be imported to enable all sorts of useful things which are not possible before that is done such as the statistics of Scholia. Please see this discussion and the discussion linked there.

I think any limitation would need to be temporary until scalability issues are solved.

Thanks for importing all these studies by the way. Also I think the altmetrics minimum should be lowered to 300+. I went through nearly all studies with an altmetrics score up to that monthly for many months (most recently only quarterly) and many of these are very notable. Studies with altmetric score between 300 and 500/700 may need some additional metric to filter out the less notable ones depending on what other data you can query (if there's nothing better one could use the citation count). Moreover: is the altmetrics score imported as well? That would be a very useful metadata to have here and I suggested open source improved altmetrics scores here. I'll also follow that discussion along.

GZWDer (talkcontribs)

Personally I also do want to see all Crossref imported if we solved the scalability issues.

Prototyperspective (talkcontribs)

@GZWDer Well then I think Wikidata for scholarly papers as well as Scholia are a lost cause and it would make no sense to spend any time importing and maintaining studies to WD as it's wasted time with no potential to ever be useful. Could you please explain why you think so? I made several points here and esp. in the discussions linked such as that the charts about e.g. studies about a certain subject or studies by some authors would be/remain false and misleading.

GZWDer (talkcontribs)

Even if citation count is not everything and for scholia it will almost always be biased, to get a relatively accurate estimate we need to import more than those who are influential, since we have no way to express an article is cited if the new article is not in Wikidata.

Prototyperspective (talkcontribs)

Thanks for clarifying, I very much agree with you and I'm sorry I misread your previous comment as "do not want to see" (I thought I double-checked but apparently didn't).

DaxServer (talkcontribs)

@Egon Willighagen Do you know if Altmetrics provides some sort of public dumps/data? I was not able find one

Egon Willighagen (talkcontribs)
Prototyperspective (talkcontribs)

Suggested adding the altmetrics badges to scholia. I don't know how ScienceOpen shows them but there all the studies have the altmetrics metadata and one can sort and filter by that which is very useful (I don't think getting this data through that site indirectly would make sense but maybe that's sth to consider). As suggested there, I think a Wikimedia-owned&developed altmetrics score would be great for many reasons. The Altmetrics score (Q130219307) is currently the most used. I guess one could only run API queries that get the altmetrics data and bulk-update the respective study items. Alternatively, one could not import them into Wikidata at all and implement that only in Scholia but the former would be better.

DaxServer (talkcontribs)

In that case, I'll resume my work with the articles with 200+ citations.

Prototyperspective (talkcontribs)

I think some discussion about the number of citations would be due then. 200+ seems extremely high – that is like one study per month and one of the top 20 papers of the year by altmetrics. Look at these examples that got ~100 k reads (10 k is significant) and in the top altmetrics of Q1 2024: 1 2 3 (they got 2, 23, 12 citations). Not even the study on potential life on Venus years ago got 200 citations by now and that is probably one of the top 10 studies of the decade or something on that order.

Moreover, I'd suggest you or somebody else imports the top 2 k (at least) studies per month sorted by altmetrics. If you can't use the altmetrics API for that then please use the ScienceOpen site and they may also have some API. This would at least fix the issue of most of even the most notable studies missing here despite that there are millions of scholarly paper items in WD. I created some items manually but only added a DOI and thought a bot would populate the remainder but so far none did. Example: Q126363497

Reply to "What (and how many) articles is your code importing?"
Mahir256 (talkcontribs)

Is there any method with which you're choosing the articles from Crossref to add items for?

DaxServer (talkcontribs)
Egon Willighagen (talkcontribs)

Can you please also add a filter to not add retracted articles like Q126581861?

DaxServer (talkcontribs)
Egon Willighagen (talkcontribs)

Do you have a timeline? I found several more retracted articles being added in your process.

DaxServer (talkcontribs)

Egon, could you post some examples. Perhaps the filters I put to remove the retracted articles aren't enough. Thanks!

Egon Willighagen (talkcontribs)

Yeah, I was thinking about a SPARQL query to list them all, but I am still not sure how to write one that includes the item history and do things like: "Match all retracted articles created by account X" :(

DaxServer (talkcontribs)

Never mind, I found them. I'll remove them from my lists. If you observe anymore hereafter, please let me know. Thanks!

Reply to "New scientific articles"
Multichill (talkcontribs)
DaxServer (talkcontribs)
Multichill (talkcontribs)

Thank you. I understand the wish to track what departement of a (large) museum is responsible for an object. Maybe maintained by (P126) is a good option for that? Not sure. Would need a broader discussion. Wikidata talk:WikiProject Visual arts is a suitable place to discuss that. Maybe you can start a topic?

Did you discover Wikidata:WikiProject Visual arts/Item structure? It's the best overview we have on how to model artworks. For the example for The Abolitionists in the Park (Q116445658): painting at the Metropolitan Museum of Art (MET, 2022.259):

Multichill (talkcontribs)
DaxServer (talkcontribs)

Thanks for the link to the Item structure. I've been looking for one here and there around the MET and GLAM pages but wasn't able to discover it. This will be of great help! I'll start a discussion at the talk page regarding the maintained by.

Multichill (talkcontribs)
DaxServer (talkcontribs)

That's a relief. Please let me know once you're done.

Multichill (talkcontribs)
Multichill (talkcontribs)
Multichill (talkcontribs)

You have the MET data set up in OpenRefine, right? Can you add missing inventory numbers? Quite a few are missing it. Query might need a bit of filtering for weird cases like Johann Balthasar Probst (Q18508222).

Going through the query it looks like User:Fuzheado batch created a lot of very incomplete items like Dog (Q116727308). Not sure what the idea was with these and why not more data (like the inventory numbers) has been added.

DaxServer (talkcontribs)

Ya, I have the data. I can add them.

Multichill (talkcontribs)

That would be nice. Maybe also add the year it entered the collection too like this?

DaxServer (talkcontribs)

It's running for ~19K items. I'll upload the rest tomorrow.

DaxServer (talkcontribs)

Most of them are done except a few. They need manual intervention. Other than these, I think you can run your bot to fill in the data.

Multichill (talkcontribs)
Reply to "Incorrect collection usage"
RamSeraph (talkcontribs)

Thanks for cleaning up my mess. May I know how you are locating these duplicates? So that I can also look?


Also, I maintain some Indian geospatial datasets and I added a locater plugin for it which you might find useful. User:RamSeraph/IndianopenmapsLocater.js is the plugin. And this is a sample locater page.

DaxServer (talkcontribs)

Hey @RamSeraph I was matching the subdistricts on OSM with Wikidata and Wiki tags. So, I downloaded them https://overpass-turbo.eu/s/1NYY to OpenRefine and reconciled the OSM names. For those didn't match, I had to search for them manually in respective local wikis and EN wiki.

Thanks for pointing me to your script. Your repo looks pretty good. I'd explore it further!

RamSeraph (talkcontribs)

I see. I haven't done much manual checking.

BTW I do have a slightly older version of wiki OSM subdistrict mapping if you are interested. This was done around half an year ago. https://gist.github.com/ramSeraph/4aa69bd42838a81cc5224c75af584b59

Was planning to update OSM with a script but never got around to it as it was not clear to me if it would be considered an automated edit and would require going through the whole process of doing a writeup about it.

DaxServer (talkcontribs)

This actually helps. I'll check it out tomorrow. Thanks ;)

Reply to "Duplicate subdistrict entries"

OSM relation id linking on wikidata

2
Arjunaraoc (talkcontribs)
DaxServer (talkcontribs)

Bummer

Reply to "OSM relation id linking on wikidata"
Maculosae tegmine lyncis (talkcontribs)
DaxServer (talkcontribs)
Reply to "species & open map"
Arjunaraoc (talkcontribs)

@DaxServer, I noticed that you are updating mandals as per the new districts. I have the data collected from OSM, which will ensure error free update through quickstatements. If you are doing manually, can you stop, as there is scope for errors.

DaxServerOnMobile (talkcontribs)

Absolutely! Please import them and ping me :)

DaxServer (talkcontribs)

@Arjunaraoc Do you also want me to stop changing the descriptions like this one  ?

Arjunaraoc (talkcontribs)

Yes. I will take care of that too.

DaxServer (talkcontribs)

Okay, thanks

Arjunaraoc (talkcontribs)

I have completed the update. Can you do a random check and confirm the quality of the update? Thanks

DaxServer (talkcontribs)

I only looked at the mandal counts query from your page. Would you be able to add the end date and start date for the mandals? Or should it be done manually? If so, I could do that soon-ish

Arjunaraoc (talkcontribs)

I have already updated for all the affected mandals. ( ex:Addateegala mandal), As you seem to have created new qids for mandals in enwiki, it is better to merge them with corresponding Telugu mandal qids, as those qids are being used in OSM already. If my response does not address the issue, share an example with the change required.

DaxServer (talkcontribs)

Ahh, you have updated mandals. I was looking at the district qid and wondering. Like Srikakulam district the mandals in P131 need qualifiers, for those that changed. I'll try to look at the mandals randomly and check the import.

Re the merged ones, did you notice something that I didn't merge?

Arjunaraoc (talkcontribs)

Thanks for correcting Addateegala mandal start date. OK I will work on end dates and start dates for P150 property of districts. I thought I have handled that case, but looks like I did not.

Arjunaraoc (talkcontribs)

I did not handle for old districts of changed mandals, I will do that.

Arjunaraoc (talkcontribs)

I have completed the updation of mandals in old district pages with their end date. Can you check and let me know if I missed any other updates.

DaxServer (talkcontribs)

You're quite fast! Thanks for your good work :)

Arjunaraoc (talkcontribs)

Thanks for your appreciation. Nice to have a co-working person like you on Wikidata. I got introduced four years back to Wikidata, but could get a better hang of it only recently. I am trying to clean up all the issues that are inadvertently introduced by other contributors, as they tried to contribute.

Reply to "Updating constituent mandals"

About revenue divisions for Andhrapradesh mandals in wikidata

11
Arjunaraoc (talkcontribs)

@DaxServer, Nice to see that you are actively editing wikidata regarding the AP district changes. I have seen that you have added revenue divisions for some mandals in Tirupati district. As per wikidata guidance, P131 should be used for the lowest admin setup of the entity. If we follow that we have to use only revenue divisions for P131 and then use districts as P131 for revenue division entities. Though revenue division are an intermediate layer, their use for public is mostly related to any escalation of land issues at Mandal level. Mandals are more prominent in general use. I have also added experimentally revenue divisions for Prakasam district in the past. I am of the view, we should not bother about revenue divisions in wikidata. Please share your view. I am deferring further edits till we reach consensus on this. BTW, I tried to see how Karnataka districts are represented. They are using State- Revenue Divisions-districts-Taluk, where as the actual hierarchy is State-Revenue Div- Sub Div-District-Taluk . Taluk entities are not having P131 at this time. Having too many layers makes the usage of wikidata through queries more complex. If you take census data, they consider districts and subdistricts as the primary hierarchy and ignore the revenue division in their documents as far as I know.

DaxServer (talkcontribs)

@Arjunaraoc Nice to see you too! As the revenue divisions are not really used outside of land issues, as you said, the District -> Mandal -> Village hierarchy seems very natural. Altho, it would be really useful to also map the relation b/w revenue divisions and other entities somewhere. I have no idea where. I might not be the best person to talk to about Wikidata, but there is a Telegram group for India Wikidata, if you are not already part of, which could be useful. You could ask @Planemad if you'd like to join. I think you are already part of the OSM IN group, aren't you? I remember seeing your name.

Either way, do you think district -> mandal -> village is better representation for P131?

DaxServer (talkcontribs)

@Arjunaraoc You might want to post in Wikidata talk:WikiProject India as that might be better so others might also want to chip in if they are interested in. (P.S. If you need my inputs, please ping me always as I'm not watching Wikidata changes)

Arjunaraoc (talkcontribs)

Thanks for your prompt reply. I agree that district-mandal-village is certainly better. So far several editors from Telugu states have experimented with wikidata, sometimes adding data with validation problems. I think at least for this AP districts restructuring update, we two can form the core and implement the agreed hierarchy. This requires removing Revenue division properties that are added on a trial basis by you and me. I will go ahead with the work, as I have collected the changes in mandal data of districts from the OSM, which I can confirm is accurate for the erstwhile 670 mandals. I will consider joining Wikidata telegram group in due course.

Arjunaraoc (talkcontribs)

OK, I will post in the project space so that others are aware.

DaxServer (talkcontribs)

@Arjunaraoc Thanks. One thing that would be important is to retain the past districts using the start time, end time, preferred rank qualifiers, if you are not using them already.

Arjunaraoc (talkcontribs)

Yes, I have made edits of end time for all the mandals which are being transferred. Rank qualifiers can not be set automatically using quickstatements. At Telugu wikipedia, we started using wikidata based infobox for Mandals, villages in Prakasam district. I am working on expanding it to all Mandals, as othewise, the work is not sustainable and quality can not be assured, due to small and decreasing set of wikipedians

DaxServer (talkcontribs)

@Arjunaraoc I think you can set qualifiers: Help:QuickStatements#Add statement with qualifiers

It's quite cool to see more integration of Wikidata into Wikipedia projects. Hope to see more!! If you think I could be of help somewhere (except writing content), let me know :)

It is true that maintenance is indeed a burden on the rest of us who are active. I still haven't figured out how to ensure the quality of Wikidata and revert errors or vandalism of sorts. The Recent Changes feed has an option to monitor them, but I haven't found it to my liking. What is your current process of monitoring Wikidata related things?

Arjunaraoc (talkcontribs)

As per the quickstatements help, rank can not be set or changed. I have seen some bots doing the rank setting based on the latest datasets.


I mainly focus on AP Mandals data. I check it once in three months or so with queries for counts in each district. I also track relevant wikidata changes in my tewiki recent changes with wikidata changes option enabled.

I am working on adding coordinate info to AP mandals, using the data in tewiki article pages. I reached a count of 345. For the rest, I need to use the mandal hq pages, if they have coordinates. Can you work on populating the new districts information except the consistuent mandals which I will add based on wiki articles?

DaxServer (talkcontribs)

Ahh, "rank" qualifiers! I didn't see that. Would you be able to share those queries?

I'll work thru my way with the new districts slowly

Arjunaraoc (talkcontribs)

@DaxServer, You can see my queries on my userpage. For counts of mandals, see the section labeled counts.

Reply to "About revenue divisions for Andhrapradesh mandals in wikidata"
There are no older topics