How do you decide which articles are notable? I saw the User:DaxServer/crossref-a6091a2f6a1 file, but does does not tell me much.
User talk:DaxServer
Jump to navigation
Jump to search
@Egon Willighagen I thought they're all notable given they're scientific publications. Are they not?
No, that's certainly not enough. Because of the scalability issues, the consensus is to not mass import scientific articles. Only clearly notable that can be added are things like: references to support an existing statement, and perhaps an article getting a 500+ Altmetrics.com score, so clearly the topic of very active discussion. And I guess articles that are cited in one of the Wikipedia's. Articles with more than 200 citations are likely notable. Or articles by a famous scholar. Most articles are not notable.
Thanks for the notes, I'll import those with the 500+ Altmetrics score and 200+ citations
Please also see this discussion: Wikidata:Project chat#Mass-import policy Wikidata has a serious scalability issue and everyone is being careful in not breaking the system.
Thanks for the link. I'll follow the discussion along!
I also think that all studies should / need to be imported to enable all sorts of useful things which are not possible before that is done such as the statistics of Scholia. Please see this discussion and the discussion linked there.
I think any limitation would need to be temporary until scalability issues are solved.
Thanks for importing all these studies by the way. Also I think the altmetrics minimum should be lowered to 300+. I went through nearly all studies with an altmetrics score up to that monthly for many months (most recently only quarterly) and many of these are very notable. Studies with altmetric score between 300 and 500/700 may need some additional metric to filter out the less notable ones depending on what other data you can query (if there's nothing better one could use the citation count). Moreover: is the altmetrics score imported as well? That would be a very useful metadata to have here and I suggested open source improved altmetrics scores here. I'll also follow that discussion along.
Personally I also do want to see all Crossref imported if we solved the scalability issues.
@GZWDer Well then I think Wikidata for scholarly papers as well as Scholia are a lost cause and it would make no sense to spend any time importing and maintaining studies to WD as it's wasted time with no potential to ever be useful. Could you please explain why you think so? I made several points here and esp. in the discussions linked such as that the charts about e.g. studies about a certain subject or studies by some authors would be/remain false and misleading.
Even if citation count is not everything and for scholia it will almost always be biased, to get a relatively accurate estimate we need to import more than those who are influential, since we have no way to express an article is cited if the new article is not in Wikidata.
Thanks for clarifying, I very much agree with you and I'm sorry I misread your previous comment as "do not want to see" (I thought I double-checked but apparently didn't).
@Egon Willighagen Do you know if Altmetrics provides some sort of public dumps/data? I was not able find one
Maybe you can extract the "Top 100"s from here: https://www.altmetric.com/top100/home/
And I guess this was their follow up: https://www.altmetric.com/blog/varied-layers-of-attention-the-altmetric-500/
I don't think so. They are a company, and the data their IP. They have an API: https://www.altmetric.com/solutions/altmetric-api/ They have badges, but it seems the documentation is hidden (at best): https://www.altmetric.com/solutions/altmetric-badges/
No, the documentation is still available: https://badge-docs.altmetric.com/
Suggested adding the altmetrics badges to scholia. I don't know how ScienceOpen shows them but there all the studies have the altmetrics metadata and one can sort and filter by that which is very useful (I don't think getting this data through that site indirectly would make sense but maybe that's sth to consider). As suggested there, I think a Wikimedia-owned&developed altmetrics score would be great for many reasons. The Altmetrics score (Q130219307) is currently the most used. I guess one could only run API queries that get the altmetrics data and bulk-update the respective study items. Alternatively, one could not import them into Wikidata at all and implement that only in Scholia but the former would be better.
In that case, I'll resume my work with the articles with 200+ citations.
I think some discussion about the number of citations would be due then. 200+ seems extremely high – that is like one study per month and one of the top 20 papers of the year by altmetrics. Look at these examples that got ~100 k reads (10 k is significant) and in the top altmetrics of Q1 2024: 1 2 3 (they got 2, 23, 12 citations). Not even the study on potential life on Venus years ago got 200 citations by now and that is probably one of the top 10 studies of the decade or something on that order.
Moreover, I'd suggest you or somebody else imports the top 2 k (at least) studies per month sorted by altmetrics. If you can't use the altmetrics API for that then please use the ScienceOpen site and they may also have some API. This would at least fix the issue of most of even the most notable studies missing here despite that there are millions of scholarly paper items in WD. I created some items manually but only added a DOI and thought a bot would populate the remainder but so far none did. Example: Q126363497
Is there any method with which you're choosing the articles from Crossref to add items for?
@Mahir256 Yes, I've added the filters here User:DaxServer/crossref-a6091a2f6a1 that I've used for the latest batch.
Can you please also add a filter to not add retracted articles like Q126581861?
@Egon Willighagen Noted.
Do you have a timeline? I found several more retracted articles being added in your process.
Egon, could you post some examples. Perhaps the filters I put to remove the retracted articles aren't enough. Thanks!
Yeah, I was thinking about a SPARQL query to list them all, but I am still not sure how to write one that includes the item history and do things like: "Match all retracted articles created by account X" :(
Never mind, I found them. I'll remove them from my lists. If you observe anymore hereafter, please let me know. Thanks!
Please don't try to shoehorn MET departments into collection (P195). That's not how the property is defined here. Please remove them.
Thanks for the info @Multichill (A). I'll update them.
Thank you. I understand the wish to track what departement of a (large) museum is responsible for an object. Maybe maintained by (P126) is a good option for that? Not sure. Would need a broader discussion. Wikidata talk:WikiProject Visual arts is a suitable place to discuss that. Maybe you can start a topic?
Did you discover Wikidata:WikiProject Visual arts/Item structure? It's the best overview we have on how to model artworks. For the example for The Abolitionists in the Park (Q116445658): painting at the Metropolitan Museum of Art (MET, 2022.259):
- location (P276) is missing
- inventory number (P217) is missing (should be added as a statement, adding it as a qualifier to collection is optional)
- attribution text (P8264) is for license information on Commons. Shouldn't be used here at all. Provenance should be added as structured data
I expanded The Abolitionists in the Park (Q116445658): painting at the Metropolitan Museum of Art (MET, 2022.259) a bit based on the source.
Thanks for the link to the Item structure. I've been looking for one here and there around the MET and GLAM pages but wasn't able to discover it. This will be of great help! I'll start a discussion at the talk page regarding the maintained by.
I've imported all the MET paintings some time ago. Last time I did it was based on the API ( https://collectionapi.metmuseum.org/public/collection/v1/objects/875741 for example). I will run that one again. It's a pretty extensive framework that adds all this data.
That's a relief. Please let me know once you're done.
It's running now for a while creating new ones like The Last Civil War Veteran (Q126099913) and updating existing items.
Looks like User:Fuzheado created a bunch of items (like Two Trees (Q116445196)) without inventory number (P217). That is causing duplicate items.
You have the MET data set up in OpenRefine, right? Can you add missing inventory numbers? Quite a few are missing it. Query might need a bit of filtering for weird cases like Johann Balthasar Probst (Q18508222).
Going through the query it looks like User:Fuzheado batch created a lot of very incomplete items like Dog (Q116727308). Not sure what the idea was with these and why not more data (like the inventory numbers) has been added.
Ya, I have the data. I can add them.
That would be nice. Maybe also add the year it entered the collection too like this?
It's running for ~19K items. I'll upload the rest tomorrow.
Most of them are done except a few. They need manual intervention. Other than these, I think you can run your bot to fill in the data.
I also replied at Wikidata_talk:WikiProject_Visual_arts#Collecting_info_about_the_curatorial_departments_in_art_Items: for the MET let's do a pilot with maintained by (P126) so that we no longer have it in collection (P195). Can you do that?
Thanks for cleaning up my mess. May I know how you are locating these duplicates? So that I can also look?
Also, I maintain some Indian geospatial datasets and I added a locater plugin for it which you might find useful. User:RamSeraph/IndianopenmapsLocater.js is the plugin. And this is a sample locater page.
Hey @RamSeraph I was matching the subdistricts on OSM with Wikidata and Wiki tags. So, I downloaded them https://overpass-turbo.eu/s/1NYY to OpenRefine and reconciled the OSM names. For those didn't match, I had to search for them manually in respective local wikis and EN wiki.
Thanks for pointing me to your script. Your repo looks pretty good. I'd explore it further!
I see. I haven't done much manual checking.
BTW I do have a slightly older version of wiki OSM subdistrict mapping if you are interested. This was done around half an year ago. https://gist.github.com/ramSeraph/4aa69bd42838a81cc5224c75af584b59
Was planning to update OSM with a script but never got around to it as it was not clear to me if it would be considered an automated edit and would require going through the whole process of doing a writeup about it.
This actually helps. I'll check it out tomorrow. Thanks ;)
@DaxServer, I saw your edits linking relation id of OSM in wikidata ( example). Please check Wikidata:OpenStreetMap#Linking from Wikidata to OSM, where it is suggested not to link as OSM ids are not stable.
Bummer
Hello, this does not look like a sensible addition to me https://www.wikidata.org/w/index.php?title=Q1073621&diff=2182236210&oldid=2147821545
Thank you
Hi @Maculosae tegmine lyncis Thanks for the correction!
@DaxServer, I noticed that you are updating mandals as per the new districts. I have the data collected from OSM, which will ensure error free update through quickstatements. If you are doing manually, can you stop, as there is scope for errors.
Absolutely! Please import them and ping me :)
@Arjunaraoc Do you also want me to stop changing the descriptions like this one ?
Yes. I will take care of that too.
Okay, thanks
I have completed the update. Can you do a random check and confirm the quality of the update? Thanks
I only looked at the mandal counts query from your page. Would you be able to add the end date and start date for the mandals? Or should it be done manually? If so, I could do that soon-ish
I have already updated for all the affected mandals. ( ex:Addateegala mandal), As you seem to have created new qids for mandals in enwiki, it is better to merge them with corresponding Telugu mandal qids, as those qids are being used in OSM already. If my response does not address the issue, share an example with the change required.
Ahh, you have updated mandals. I was looking at the district qid and wondering. Like Srikakulam district the mandals in P131 need qualifiers, for those that changed. I'll try to look at the mandals randomly and check the import.
Re the merged ones, did you notice something that I didn't merge?
Thanks for correcting Addateegala mandal start date. OK I will work on end dates and start dates for P150 property of districts. I thought I have handled that case, but looks like I did not.
I did not handle for old districts of changed mandals, I will do that.
I have completed the updation of mandals in old district pages with their end date. Can you check and let me know if I missed any other updates.
You're quite fast! Thanks for your good work :)
Thanks for your appreciation. Nice to have a co-working person like you on Wikidata. I got introduced four years back to Wikidata, but could get a better hang of it only recently. I am trying to clean up all the issues that are inadvertently introduced by other contributors, as they tried to contribute.
@DaxServer, Nice to see that you are actively editing wikidata regarding the AP district changes. I have seen that you have added revenue divisions for some mandals in Tirupati district. As per wikidata guidance, P131 should be used for the lowest admin setup of the entity. If we follow that we have to use only revenue divisions for P131 and then use districts as P131 for revenue division entities. Though revenue division are an intermediate layer, their use for public is mostly related to any escalation of land issues at Mandal level. Mandals are more prominent in general use. I have also added experimentally revenue divisions for Prakasam district in the past. I am of the view, we should not bother about revenue divisions in wikidata. Please share your view. I am deferring further edits till we reach consensus on this. BTW, I tried to see how Karnataka districts are represented. They are using State- Revenue Divisions-districts-Taluk, where as the actual hierarchy is State-Revenue Div- Sub Div-District-Taluk . Taluk entities are not having P131 at this time. Having too many layers makes the usage of wikidata through queries more complex. If you take census data, they consider districts and subdistricts as the primary hierarchy and ignore the revenue division in their documents as far as I know.
@Arjunaraoc Nice to see you too! As the revenue divisions are not really used outside of land issues, as you said, the District -> Mandal -> Village hierarchy seems very natural. Altho, it would be really useful to also map the relation b/w revenue divisions and other entities somewhere. I have no idea where. I might not be the best person to talk to about Wikidata, but there is a Telegram group for India Wikidata, if you are not already part of, which could be useful. You could ask @Planemad if you'd like to join. I think you are already part of the OSM IN group, aren't you? I remember seeing your name.
Either way, do you think district -> mandal -> village is better representation for P131?
@Arjunaraoc You might want to post in Wikidata talk:WikiProject India as that might be better so others might also want to chip in if they are interested in. (P.S. If you need my inputs, please ping me always as I'm not watching Wikidata changes)
Thanks for your prompt reply. I agree that district-mandal-village is certainly better. So far several editors from Telugu states have experimented with wikidata, sometimes adding data with validation problems. I think at least for this AP districts restructuring update, we two can form the core and implement the agreed hierarchy. This requires removing Revenue division properties that are added on a trial basis by you and me. I will go ahead with the work, as I have collected the changes in mandal data of districts from the OSM, which I can confirm is accurate for the erstwhile 670 mandals. I will consider joining Wikidata telegram group in due course.
OK, I will post in the project space so that others are aware.
@Arjunaraoc Thanks. One thing that would be important is to retain the past districts using the start time, end time, preferred rank qualifiers, if you are not using them already.
Yes, I have made edits of end time for all the mandals which are being transferred. Rank qualifiers can not be set automatically using quickstatements. At Telugu wikipedia, we started using wikidata based infobox for Mandals, villages in Prakasam district. I am working on expanding it to all Mandals, as othewise, the work is not sustainable and quality can not be assured, due to small and decreasing set of wikipedians
@Arjunaraoc I think you can set qualifiers: Help:QuickStatements#Add statement with qualifiers
It's quite cool to see more integration of Wikidata into Wikipedia projects. Hope to see more!! If you think I could be of help somewhere (except writing content), let me know :)
It is true that maintenance is indeed a burden on the rest of us who are active. I still haven't figured out how to ensure the quality of Wikidata and revert errors or vandalism of sorts. The Recent Changes feed has an option to monitor them, but I haven't found it to my liking. What is your current process of monitoring Wikidata related things?
As per the quickstatements help, rank can not be set or changed. I have seen some bots doing the rank setting based on the latest datasets.
I mainly focus on AP Mandals data. I check it once in three months or so with queries for counts in each district. I also track relevant wikidata changes in my tewiki recent changes with wikidata changes option enabled.
I am working on adding coordinate info to AP mandals, using the data in tewiki article pages. I reached a count of 345. For the rest, I need to use the mandal hq pages, if they have coordinates. Can you work on populating the new districts information except the consistuent mandals which I will add based on wiki articles?
Ahh, "rank" qualifiers! I didn't see that. Would you be able to share those queries?
I'll work thru my way with the new districts slowly
@DaxServer, You can see my queries on my userpage. For counts of mandals, see the section labeled counts.
There are no older topics