Wikidata talk:WikiProject Source MetaData

From Wikidata
Jump to navigation Jump to search

Diamond OA journals[edit]

A new list of diamond OA journals has just been published (many of which are not in DOAJ) as part of a project:

What do people think about inclusion of non-DOAJ listed diamond OA journals in Wikidata? It could end up being the most complete repository for such items. T.Shafee(evo&evo) (talk) 08:03, 10 March 2021 (UTC)

@Evolution and evolvability: I support the indexing in Wikidata of course and I could even support labeling green/gold/black/diamond/whatever as properties. I have a personal opinion that all the labeling and distinction about green etc. has been a really distracting error that takes up too much of the documentation and prevents people from having conversations about open access. It is hard enough to explain to newbies the difference between open and non-open, and my opinion is that it is too much of communication challenge to overcome to also bring into the conversation new terms like diamond. The open access community is not very large and Wikipedia is a big part of it. As a stylistic decision for clarity, I wish in wiki we could stay on point just talking open/non-open, and tell the rest of the open access community to simply the conversation after our lead. "Diamond" does not even have anything to do with ability to access content, and is just about the journal's history of business practice. Blue Rasberry (talk) 20:45, 11 March 2021 (UTC)

Bias in medicine in all journal indexing[edit]

Everyone here already expects this but check out this search for law journals (thanks Daniel Mietchen for the query):

SELECT DISTINCT ?item ?title (COUNT(?work) AS ?works)  
WITH {
  SELECT ?item ?title  WHERE {
    ?item wdt:P236 ?issn ;
          wdt:P1476 ?title .
    FILTER REGEX (LCASE(?title), "(\\blaw|\\blegal|\\court)")
  }
}
AS %journals

WHERE {
  INCLUDE %journals
          ?work wdt:P1433 ?item .
}
GROUP BY ?item ?title
ORDER BY DESC(?works)

Try it!

Here we see that among the top law journals indexed in Wikidata there are only a few examples outside the field of medicine. This is not just a Wikidata bias but a library cataloging bias due to medicine being much more open with metadata and law intentionally being much more closed with metadata. This came up because I was talking with some researchers at my university law school who were asking about representation of law in Wikipedia. I have no expectation that there is an easy solution to this, however, I am going to have a conversation through our librarians with HeinOnline (Q5699635) which is a major United States legal database for law research. I want their metadata. Blue Rasberry (talk) 20:53, 11 March 2021 (UTC)

  • I don't think we do "all journal indexing". So there isn't really a bias. People interested in some fields contribute, that's all. --- Jura 23:22, 11 March 2021 (UTC)

Grant proposal to build a Visual Editor for Citoid Web Translators[edit]

Hi everyone! With Diegodlh we are presenting a grant proposal to build a visual editor for Citoid web translators. The goal of this visual editor is to make it easier for non-technical users to create and edit Citoid web translators to increase website coverage of this citation metadata retrieval service. The grant proposal is open for feedback and endorsement, and we are also seeking for collaborators and volunteers, so if you are interested, please visit the grant proposal page! --Scann (talk) 12:25, 18 March 2021 (UTC)

Wikicite-Zotero plugin: simple entity creation vs author disambiguation[edit]

Hi, all! I'm developing a WikiCite plugin for Zotero, that provides citations metadata support. One of the plugin's features is synchronization of "cites work" (P2860) relationships between local Zotero library and Wikidata. For that, both citing and cited items must be entities in Wikidata (i.e., they must have QIDs). The plugin can fetch QIDs from Wikidata; it currently does so using unique identifiers (DOI and ISBN) and SPARQL queries, but next pre-release will use Wikidata's reconciliation API to use title as well.

If no exact match is found in Wikidata, the plugin will offer a series of possible candidates to choose from. If no relevant candidates are found, the user will be offered to create a new entity in Wikidata. In principle, I wanted to have the plugin handle this, using the Wikidata API. However, I thought that having to deal with author disambiguation (to use P50 -author- instead of P2093 -author name string- where possible) would be too much work for now.

So, in the end, I inclined toward using zotkat's QuickStatements translator to output QS commands. However, I see QuickStatements doesn't provide a simple way to disambiguate author name string (P2093) statements. The user has to manually: (1) locate these commands, (2) disambiguate them, and (3) replace them with author (P50) statements.

My questions, therefore, are:

  • My assumption was that enabling the user to easily create new entities from the plugin using author name string P2093 statements by default (where P50 statements might be possible) would be undesired. But maybe my assumption is wrong. Would it be OK to do this, as long as the reconciliation API is used to minimize chances that duplicates would be created?
  • If, otherwise, QuickStatements is preferred, am I missing a simpler way of replacing author name string with author statements? If not, does it make sense that one such simpler way would be worth developing?

Thank you! --Diegodlh (talk) 18:41, 29 March 2021 (UTC)

@Diegodlh: Are you familiar with Author Disambiguator? Reconciling author names to author items is quite tricky in general (even ORCID's are not perfectly reliable - often the data doesn't include exactly *which* author has the ORCID and you still have to do a name match of some sort). So I would encourage you to proceed with just directly creating the items using author name string (P2093) as you suggest was the original plan. ArthurPSmith (talk) 19:08, 29 March 2021 (UTC)
@ArthurPSmith: Thank you very much for your reply! I'm familiar with the Author Disambiguator tool, yes. Great tool! User:Egon_Willighagen suggested trying to fetch ORCIDs from Crossref and use these to try and disambiguate author name strings into author entities. It was commented this is what the citation.js QuickStatements plugin does. I will consider this option as a intermediary option between no disambiguation at all, and full author disambiguation from the plugin. But in the meantime, creating the items using author name string (P2093) sounds great. Thanks! --Diegodlh (talk) 19:53, 29 March 2021 (UTC)
Yes, the Crossref data is probably a good source, I think they do link the ID to the specific author, at least for most papers where that relation is there. ArthurPSmith (talk) 12:31, 30 March 2021 (UTC)
I may use the QuickStatements export translator anyway, as an intermediary step, to take advantage that it is already taking care of part of the translation from Zotero item to Wikidata entity. I have therefore opened a thread in the QS translator repository asking whether fetching ORCIDs from Crossref (and reconciling published in (P1433) values) might be something to do within the translator itself, or afterwards by a tool converting QS commands into MediaWiki API requests. --Diegodlh (talk) 02:27, 31 March 2021 (UTC)

Translated scientific articles[edit]

My name is Victor Venema and I am member of a new initiative on translations of scientific articles/texts. We want to make it easier to find translations and (thus) make it more worthwhile to make them. Wikidata would be a good place to store such information, but there seem to be many ways to do so and I was wondering whether we could give people some guidance on the best way.

Jakob Voß made a query to find translations of scientific works for us. This query filters out scientific works by asking whether a publication has a DOI. (To see how adding a translation works I recently added a WMO report I wrote, which does not have a DOI.)

Everyone does it differently, but there seem to be two main methods. 1) One Item, which uses the "full work available at URL" with the language as qualifier. 2) Create one Item for the work and multiple Items for the translations as edition of the work. The former creates less Items, but may not always work, sometimes we only know, e.g., that the British Library has the translation, but do not have a URL. The latter may be how librarians like it, especially when it comes to books, which anyway may have multiple editions, but is more work.

Titles are sometimes translated in multiple description, sometimes an Item has multiple titles (not always with language as qualifier, sometimes just between brackets in the title). Translated descriptions are a bit of a problem because multiple Items cannot have the same English description; I solved that for my report by adding the language in brackets to the description.

What do you think? VVenema (talk) 16:18, 13 April 2021 (UTC)

@VVenema: (2) is probably the best approach for the way Wikidata works; a translation may be published in a different journal and have a number of other differing attributes (DOI, page number for older cases, etc.) and presumably has at least the additional attribute of a translator, though that could be added as a qualifier for full work available at URL (P953) too. Otherwise if all but a small number of attributes of the translation are identical to the original then maybe (1) would be ok. But generally I would go with (2). ArthurPSmith (talk) 17:21, 13 April 2021 (UTC)
@ArthurPSmith: I am new here and was waiting for more feedback, but I guess this was it. May I assume that more people have read this exchange than people who responded and that they did not respond because they mostly agreed? I am fine with suggestion (2), it was also how I in the end did it myself for the report I used to test how adding a translation works. When I did this test I was a bit overwhelmed by the large number of options on how to do this. Would it make sense to write some sort of guidance on how to add translations to Wikidata? It would have saved me quite some time exploring the options and picking one. VVenema (talk) 16:59, 26 April 2021 (UTC)
You'll get the most feedback with a post on Project chat. If you are looking for some sort of vote or consensus, there is an RFC process... but yes I am sure at least a few other interested people track this page and would have responded if they thought they had something to add. Adding a piece of documentation on what you are doing and guidance would be great, and this is the place for it - you can see all the sub-pages of this WikiProject on the main project page here. ArthurPSmith (talk) 17:34, 26 April 2021 (UTC)
  • @VVenema: Scientific articles generally just have one item (not a work and an edition item as some try to do for books). WMF struggles already with the current load, so we can't really maintain a second set of 30 million items for the same articles on Wikidata.
For translations of an articles, it would probably be good to have a separate item for each translation and link that to the article item. Maybe we should have a new pair of properties "has translation" and "translation of" to link them together. The more general P:P747 and P:P144 have a different primary focus.
As it's fairly common for articles to have an abstract in English, I don't think abstracts only should lead to the creation of new items. The same for title translations. As you probably noticed, both aspects are already handled (partly) on the existing items.
Essentially, this would lead to the creation of a fairly short item about the translation linking to a more general one about the article that was translated (avoid repeating any information present there). --- Jura 14:42, 3 May 2021 (UTC)

All upper case for titles?[edit]

I've run across some scientific article items where the Title is in ALL CAPS. I suspect this is how they were entered in whatever database a bot pulled them from. My question is: is there a preferred case for this property for scholarly articles? Should the case in Wikidata match the case used in the journal? Convention in many journals is capitalization of first letter and proper nouns, but this is certainly not consistent across time or journals. Thanks.--Friesen5000 (talk) 20:04, 27 April 2021 (UTC)

As far as possible the statements should match whatever the journal article currently shows (if it is available online), and not what other databases may say about it. ArthurPSmith (talk) 21:13, 27 April 2021 (UTC)
@Friesen5000: I agree with ArthurPSmith. A complicating factor, however, is that sometimes journals will use allcaps almost as a 'font' rather than really meaning that the letters are really all capitals. T.Shafee(evo&evo) (talk) 09:00, 28 April 2021 (UTC)
@Evolution and evolvability: Yes, I find the upper case 'font' for article titles is especially prevalent in older literature. Thank you and ArthurPSmith for you input. I'll leave them as I find them.--Friesen5000 (talk) 18:22, 28 April 2021 (UTC)

Wikipedia Citations in Wikidata[edit]

Several people in this group were interested in/endorsers of the WikiCite grant proposal: m:Wikicite/grant/Wikipedia Citations in Wikidata.
Today, the team at OpenCitations tweeted that the code for the project was now available here: https://github.com/opencitations/wcw

As described on that page:

"It's a collection of scripts that can be used to extract citations from the English Wikipedia to external bibliographic resources, and then to upload them to Wikidata. Our goal is to develop four software modules in Python (the codebase from now on) that can be easily reused by developers in the Wikidata community:
  • extractor a module to extract citation and bibliographic information from articles in the English Wikipedia;
  • converter a module to convert extracted information into a CSV-based format compliant with a shareable bibliographic data model, e.g., the OpenCitations Data Model;
  • enricher a module for reconciling bibliographic resources and people (obtained in step 2) with entities available in Wikidata via their persistent identifiers (primarily DOIs, QIDs, ORCIDs, VIAFs, then also persons, places and organisations if time allows);
  • pusher a module to disambiguate, deduplicate, and load citation and bibliographic data in Wikidata that reuses code already developed by the wikidata community as much as possible.
The repository folder structure reflects these same modules that constitute the entire workflow."

There are more details at github. I will leave it to the team themselves to answer any questions etc., I just wanted to share :-) LWyatt (WMF) (talk) 13:25, 3 May 2021 (UTC)

Wikisource integration project, requesting assistance for documentation + implementation[edit]

Last year, we (User:KCVelaga, on behalf of Open Heritage Foundation) have been funded through the WikiCite program to improve the integration between Wikisource and Wikidata. The project has three development components, a module, a bot and a tool. You can read about each of them in detail in this document. I am requesting help regarding the documentation of the technical development.

To get started, we have completed the development and testing of two modules on beta-Wikisource (https://en.wikisource.beta.wmflabs.org/wiki/Module:Index_data and https://en.wikisource.beta.wmflabs.org/wiki/Module:Index_template). While we are supporting and overseeing the deployment of these on a couple of mainstream Wikisources, it would be good if other Wikisource communities can deploy these themselves. Though we can support with fixes in the long-term, but not step-by-step implementation. To deploy these modules, people will primarily need to do translations and deciding whether or not to display certain properties (by commenting out lines of code). We have attempted to document which lines to be translated at, https://github.com/tshrinivasan/wikisource_wikidata_integration/blob/main/translate-examples.txt, but it didn't turn out quite well, as none of us has experience in technical documentation.

So we are requesting guidance on how we can develop documentation in the best way possible, especially in that is friendly for non-tech folks. A meeting would be helpful to get to communicate the needs more elaborately and to understand documentation practices.

Thank you, message posted by LWyatt (WMF) (talk) on behalf of User:KCVelaga. 11:22, 11 May 2021 (UTC)

@LWyatt (WMF), KCVelaga: While we are supporting and overseeing the deployment of these on a couple of mainstream Wikisources: does this include enWS, or do we need to do this outselves. I'm happy to do so, but I don't want to get in the way. Inductiveload (talk) 15:49, 11 May 2021 (UTC)
@Inductiveload: Thanks for asking. We are currently working with Indic languages and would like to develop documentation for others to deploy themselves. The project is set to conclude in the next few weeks and unfortunately, we don't have enough human resources to provide active support to more than three wikis. However, we can help with queries and fixes even after the project ends as well. Regarding enWS, it should fairly easy to deploy as there wouldn't be anything to translate, and we are happy to support the deployment. But the key step is to achieve community consensus; deploying this module and a bot that will be coming along, will literally affect all Index pages on a Wikisource, so the community should reach an agreement. If someone can start a discussion as soon as possible, we would be happy to answer any queries that might come up during the discussion. KCVelaga (talk) 04:55, 12 May 2021 (UTC)