Wikidata talk:WikiProject Source MetaData

From Wikidata
Jump to navigation Jump to search

Mining COVID19 research using [R] and Wikidata[edit]

For people interested in [R], textmining, Wikidata, COVID19 and open data: project posted to text mine and analyse the covid literature, and annotate publications' wikidata items with main subject (P921) values. Details at Wikidata_talk:WikiProject_COVID-19 and github repo. T.Shafee(evo&evo) (talk) 03:53, 19 March 2020 (UTC)

Towards more consistent P31 usage across the WikiCite corpus[edit]

I would like to take the bulk fixing error discussion as a starting point to review the data model for bibliographic items. In particular, I think it would be useful to have one (or a few) standard instance of (P31) for all of them, similar to how human (Q5) is used for all items about humans. A candidate for such a generic value of that P31 statement could be something like publication (Q732577), written work (Q47461344) or document (Q49848). Once we have sorted that out, we could go for additional properties (think "publication type"/ "document type" or similar) that would specify things like monograph (Q193495), preprint (Q580922) or technical report (Q3099732). --Daniel Mietchen (talk) 16:07, 20 March 2020 (UTC) Mattsenate (talk) 13:11, 8 August 2014 (UTC)
KHammerstein (WMF) (talk) 13:15, 8 August 2014 (UTC)
Mitar (talk) 13:17, 8 August 2014 (UTC)
Mvolz (talk) 18:07, 8 August 2014 (UTC)
Daniel Mietchen (talk) 18:09, 8 August 2014 (UTC)
Merrilee (talk) 13:37, 9 August 2014 (UTC)
Pharos (talk) 14:09, 9 August 2014 (UTC)
DarTar (talk) 15:46, 9 August 2014 (UTC)
HLHJ (talk) 09:11, 11 August 2014 (UTC)
Blue Rasberry (talk) 18:02, 11 August 2014 (UTC)
Micru (talk) 20:11, 12 August 2014 (UTC)
JakobVoss (talk) 12:23, 20 August 2014 (UTC)
Finn Årup Nielsen (fnielsen) (talk) 02:06, 23 August 2014 (UTC)
Jodi.a.schneider (talk) 09:24, 25 August 2014 (UTC)
Abecker (talk) 23:35, 5 September 2014 (UTC)
Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:21, 24 October 2014 (UTC)
Mike Linksvayer (talk) 23:26, 18 October 2014 (UTC)
Kopiersperre (talk) 20:33, 20 October 2014 (UTC)
Jonathan Dugan (talk) 21:03, 20 October 2014 (UTC)
Hfordsa (talk) 19:26, 5 November 2014 (UTC)
Vladimir Alexiev (talk) 15:09, 23 January 2015 (UTC)
Runner1928 (talk) 03:25, 6 May 2015 (UTC)
Pete F (talk)
econterms (talk) 13:51, 19 August 2015 (UTC)
Sj (talk)
author  TomT0m / talk page
guillom (talk) 21:57, 4 January 2016 (UTC)
·addshore· talk to me! 17:43, 18 January 2016 (UTC)
Bodhisattwa (talk) 16:08, 29 January 2016 (UTC)
Ainali (talk) 16:51, 29 January 2016 (UTC)
Shani Evenstein (talk) 21:29, 5 July 2018 (UTC)
Skim (talk) 07:17, 6 November 2018 (UTC)
PKM (talk) 23:19, 19 November 2018 (UTC)
Ocaasi (talk) 22:19, 29 November 2018 (UTC)
Trilotat Trilotat (talk) 15:43, 16 February 2019 (UTC)
NAH
Iwan.Aucamp
Pictogram voting comment.svg Notified participants of WikiProject Source MetaData --Daniel Mietchen (talk) 16:07, 20 March 2020 (UTC) LeadSongDog (talk) 21:42, 23 March 2016 (UTC)
RobLa-WMF (talk) 01:24, 25 March 2016 (UTC)
Kosboot (talk) 20:45, 30 March 2016 (UTC)
Sydney Poore/FloNight♥♥♥♥ 15:10, 14 April 2016 (UTC)
Peaceray (talk) 18:40, 28 April 2016 (UTC)
PKM (talk) 16:29, 1 May 2016 (UTC)
Aubrey (talk) 12:42, 25 August 2016 (UTC)
Chiara (talk) 12:47, 25 August 2016 (UTC)
Marchitelli (talk) 19:02, 1 September 2016 (UTC)
YULdigitalpreservation (talk) 17:44, 9 December 2016 (UTC)
Satdeep Gill (talk) 14:59, 2 February 2017 (UTC)
Pintoch (talk) 09:44, 28 February 2017 (UTC)
Raymond Ellis (talk) 16:06, 1 April 2017 (UTC)
Crazy1880 (talk) 18:21, 16 June 2017 (UTC)
T Arrow (talk) 07:55, 22 June 2017 (UTC)
GerardM (talk) 08:25, 30 July 2017 (UTC) With a particular interest of opening up sources about Botany and opening up any freely licensed publications.
Clifford Anderson (talk) 18:26, 11 August 2017 (UTC)
Jsamwrites (talk) 07:52, 27 August 2017 (UTC)
Krishna Chaitanya Velaga (talk) 09:52, 19 September 2017 (UTC)
Capankajsmilyo (talk) 18:32, 19 September 2017 (UTC)
Hsarrazin (talk) 20:41, 15 October 2017 (UTC)
Mlemusrojas (talk) 10:15, 6 December 2017 (UTC)
Samat (talk)
Ivanhercaz Plume pen w.png (Talk) 20:27, 25 December 2017 (UTC)
Simon Cobb (User:Sic19 - talk page) 21:20, 21 January 2018 (UTC)
Mahdimoqri (talk) 20:22, 26 March 2018 (UTC)
Maria zaos (talk) 18:45, 9 April 2018 (UTC)
Jaireeodell (talk) 14:07, 23 April 2018 (UTC)
Egon Willighagen (talk) 12:29, 10 May 2018 (UTC)
RobinMelanson (talk) 2:13, 25 November 2018 (UTC)
Vladimir Alexiev (talk) 03:02, 4 December 2018 (UTC) interested, in particular because of TRR project https://m.wikidata.org/wiki/Q56259739
Maxlath (talk) 18:36, 6 January 2019 (UTC)
Dcflyer (talk) 21:38, 26 January 2019 (UTC)
Trilotat Trilotat (talk) 15:39, 16 February 2019 (UTC)
Mfchris84 (talk) 05:37, 18 April 2019 (UTC)
Salgo60 (talk)
Walkuraxx (talk) 14:58, 18 July 2019 (UTC)
NAH
FULBERT (talk) 17:14, 10 November 2019 (UTC)
Wolfgang8741 (talk) 20:35, 19 April 2020 (UTC)
Csisc (talk) 17:46, 26 April 2020 (UTC)
Pictogram voting comment.svg Notified participants of WikiProject Source MetaData/More — second round of pings. --Daniel Mietchen (talk) 16:07, 20 March 2020 (UTC)

The approach we usually followed is to just use monograph (Q193495), preprint (Q580922) or technical report (Q3099732) as value of instance of (P31). In the end these values would not even scratch the surface of the publication type, and scientific articles, for example, would also be distinct from « news article », while both being article. In the end, for this specialized property the ontology of publication problem would be exactly the same as if we just use instance of (P31). author  TomT0m / talk page 16:50, 20 March 2020 (UTC)
@Daniel Mietchen: Presumably you have had a look at Wikidata:WikiProject_Books? I think more consistency would be great - I also have some concerns that I raised in this discussion. This may also be of interest: Wikidata:Requests_for_comment/Wikidata_to_use_data_schemas_to_standardise_data_structure_on_a_subject. I think the wikibooks model makes quite a lot of sense but it should be clarified for specific cases - under which wikiproject it should go is hard to say. We do need beter ontologies but I agree with TomT0m that new properties as you suggest won't help much. Iwan.Aucamp (talk) 16:54, 20 March 2020 (UTC)

TomT0m, the difference is that P31 needs to be stable throughout the life cycle of an academic work to accommodate data flows and feed apps like Scholia. Thus classes like "preprint" and "conference paper" vs "journal article" are not suitable. "Monograph" (scientific book) vs "scientific articke" is ok. Vladimir Alexiev (talk) 19:28, 23 March 2020 (UTC)

I would say, in terms of immediate practical usefulness, tidying up P31 statements for scientific journal (Q5633421) should be given more attention. A random example, Developmental Dynamics (Q59752), shows it is also instance of academic journal (Q737498), a second subclass of magazine genre (Q21114848). It is instance of hybrid open access journal (Q5953270), and there are certainly people who believe that open-access status should be dealt with by a dedicated property. And it is instance of society journal (Q73364223), which is a rather awkward way of dealing with important information on publisher. Charles Matthews (talk) 05:48, 20 May 2020 (UTC)

EntitySchema for preprints[edit]

There is a draft one at EntitySchema:E185, named preprint (E185). --Daniel Mietchen (talk) 16:26, 7 April 2020 (UTC)

Related note, do we have any automatic importing of preprint info into wikidata (discussion at Wikidata_talk:WikiProject_COVID-19)? T.Shafee(evo&evo) (talk) 10:44, 15 April 2020 (UTC)
@Evolution and evolvability: Most of the COVID-19 preprints so far have been imported by Konrad Foerstner. Not sure where his code is, though, and it did not account for author order by way of series ordinal (P1545), which causes problems in terms of author disambiguation. --Daniel Mietchen (talk) 18:32, 25 April 2020 (UTC)

Discontinued journals as a test corpus[edit]

As part of exploring the limits of WikiCite, I am looking into discontinued journals, for which (i) we can hope for a good degree of completeness, (ii) some documentation may be harder to find than for current ones. Is anyone working along these lines? To get things started, here is a query for items that

  • have a ISSN (P236) statement, hinting that it is a periodical
  • have a end time (P582) qualifier on the publisher (P123) statement, meaning that they ceased publishing with at least one publisher
  • have a official website (P856) statement, so there is a chance that additional information could be gathered
  • do not have any items published in them:

The following query uses these:

  • Properties: ISSN (P236) View with Reasonator View with SQID, publisher (P123) View with Reasonator View with SQID, official website (P856) View with Reasonator View with SQID, published in (P1433) View with Reasonator View with SQID, end time (P582) View with Reasonator View with SQID
     1 SELECT DISTINCT ?journal ?journalLabel WHERE {
     2   {
     3     SELECT DISTINCT ?journal WHERE {
     4       ?journal wdt:P236 ?issn;
     5         wdt:P123 ?publisher;
     6         p:P123 ?publisherStatement.
     7       ?publisherStatement pq:P582 ?endTime.
     8       ?journal wdt:P856 ?website.
     9     }
    10     LIMIT 100
    11   }
    12   FILTER(NOT EXISTS { ?item wdt:P1433 ?journal. })
    13   SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
    14 }
    

Amongst the currently 38 results, several look like a suitable starting point. I am particularly attracted to Scholia (Q15755172) because

  • it is not biomedical
  • I found this blog post that describes its corpus as "Scholia and Scholia Reviews published 862 contributions by 392 scholars and academics at 193 universities and other institutions in 36 countries", which seems a useful size for such a test corpus
  • it is a namesake of our tool Scholia (Q45340488)

--Daniel Mietchen (talk) 20:56, 25 April 2020 (UTC)

  • @Daniel Mietchen: This sounds like a great idea. A discontinued journal that I am familiar with and could get you full metadata for is Physics Physique Физика (Q85793224) - it doesn't have match your WDQS query because there's no statements on the Wikidata item at all, but it also appears to have none of its (some quite significant) papers listed in Wikidata yet either. ArthurPSmith (talk) 14:34, 27 April 2020 (UTC)
@ArthurPSmith: Sounds great — let's give that a try then! --Daniel Mietchen (talk) 14:56, 27 April 2020 (UTC)
PS: I linked some seed papers to Physics Physique Физика (Q85793224). --Daniel Mietchen (talk) 20:22, 27 April 2020 (UTC)
It will be interesting to see how this develops. What particular limits do you expect there to be? If you're looking for another option, there's Sussex Notes and Queries (Q92202389) which stopped in 1971 so might present a challenge as something less recent; the contents are documented in the Archaeology Data Service and the Sussex Records Society's website. Richard Nevell (talk) 17:32, 27 April 2020 (UTC)
@Richard Nevell: I expect a number of potential limits, e.g. in terms of our ability to track down things like
  • the identities (and perhaps affiliations) of authors referred to by author name strings (and perhaps affiliation strings) for publications in the journal — see P B GATNE (Q41919945) or D A Nordlund (Q92128997) for examples
  • exact publication dates
  • citations from and especially to
  • periods with specific editors, publishers, place of publication etc.
If we had some test corpora, we could quantify such things, which could help guide further curation efforts. --Daniel Mietchen (talk) 20:22, 27 April 2020 (UTC)
Sounds good. I've sometimes wondered if there's a way to tap into Google Scholar for citations to works. It can be noisy sometimes, including websites based on Wikipedia for example, but does surface some interesting uses such as references from MA theses where they're available online. Richard Nevell (talk) 18:54, 28 April 2020 (UTC)
Speaking of which, I added end time (P582) for Greater Manchester Archaeological Journal (Q42721106) and Cheshire Past (Q44323342)}. Do they need official website (P856) to show up in the query, or does the filter exclude journals where Wikidata has items for articles published in them? Helps if I read the query properly. Richard Nevell (talk) 19:09, 28 April 2020 (UTC)
Update: With the help of ArthurPSmith, I have now created all the missing items for articles published in Physics Physique Физика (Q85793224) View profile on Scholia. Next steps: annotating topics, citations (to and from) and authors, and the latter with affiliations etc. --Daniel Mietchen (talk) 21:39, 28 April 2020 (UTC)

Handle preprints?[edit]

Do we have any consistency on how should we handle preprints? For instance, [1] has a journal paper, a "versionless" preprint and a versioned preprint. There only seems to be a single version. I have currently used based on (P144) to link from the journal paper to one of the preprints. Perhaps the two preprints should be merged into one. Should further merging be made? — Finn Årup Nielsen (fnielsen) (talk) 09:34, 30 April 2020 (UTC)

@Fnielsen: I can imagine merging the preprints being logical in the majority of circumstances currently. I think significant event (P793) + submission (Q76903164) would be favourable (could be even be used to record all preprints), see example Q57912487#P793. I'd strongly preference avoiding publication date (P577) for preprint (Q580922) items, since preprint servers usually carefully avoid saying something is 'published' as such. Using significant event (P793) would also make timelines of articles easy to track. What do you reckon?T.Shafee(evo&evo) (talk) 07:45, 3 May 2020 (UTC)
Maybe there should be a generic preprint property that would work like arXiv ID (P818) but link to arbitrary URLs. Setting up preprints as editions would be a bit tedious. Ghouston (talk) 09:34, 3 May 2020 (UTC)
Additionally, I think Q56795015 and Q57912487 could definitely be merged. In most cases, there'll be a single preprint version and a single published version. Articles in F1000Research (Q27701587) will be tricky ones, since they often mint a lot of versions, and the preprint server and the published versions are hosted by the same organisation. That's partly why I favour a single item, with statements indicating the preprint version(s). @Ghouston:, what do you think of just using URL (P2699) as the arbitrary URL property? T.Shafee(evo&evo) (talk) 10:21, 3 May 2020 (UTC)
I'd expect URL (P2699) and/or official website (P856) to refer to the location of the final version, not a preprint. Ghouston (talk) 14:14, 3 May 2020 (UTC)
I agree that URL (P2699) for preprints would be better as qualifiers for a significant event (P793) of submission (Q76903164) of a preprint version. T.Shafee(evo&evo) (talk) 05:10, 7 May 2020 (UTC)

About that roadmap[edit]

A roadmap for WikiCite was laid out in August of 2018 - people voted on the options and the outcome is more or less:

For now:

  • 1. Centralized: 4 (+2 alternative choice, +1 second choice)
  • 2. Namespace: 3 (+2 alternative choice)
  • 3. Sister site: 0 (+1 second choice)
  • 4. Federated: 0

Eventually:

  • 1. Centralized: 2 (+2 alternative choice, +1 second choice)
  • 2. Namespace: 3 (+2 alternative choice)
  • 3. Sister site: 0 (+1 second choice)
  • 4. Federated: 2

I'm not sure what this practically means but it would be good to record this somewhere I think if it is considered finalized. If the matter is not finalized then it would be good to know under what option we should operate for now. Iwan.Aucamp (talk) 10:31, 10 May 2020 (UTC)

Hello Iwan - my sense is, for now we should add citation data to Wikidata, in the centralized approach; eventually moving to some combination of 2,3,4. Sj (talk) 12:03, 10 May 2020 (UTC)

Mattsenate (talk) 13:11, 8 August 2014 (UTC)
KHammerstein (WMF) (talk) 13:15, 8 August 2014 (UTC)
Mitar (talk) 13:17, 8 August 2014 (UTC)
Mvolz (talk) 18:07, 8 August 2014 (UTC)
Daniel Mietchen (talk) 18:09, 8 August 2014 (UTC)
Merrilee (talk) 13:37, 9 August 2014 (UTC)
Pharos (talk) 14:09, 9 August 2014 (UTC)
DarTar (talk) 15:46, 9 August 2014 (UTC)
HLHJ (talk) 09:11, 11 August 2014 (UTC)
Blue Rasberry (talk) 18:02, 11 August 2014 (UTC)
Micru (talk) 20:11, 12 August 2014 (UTC)
JakobVoss (talk) 12:23, 20 August 2014 (UTC)
Finn Årup Nielsen (fnielsen) (talk) 02:06, 23 August 2014 (UTC)
Jodi.a.schneider (talk) 09:24, 25 August 2014 (UTC)
Abecker (talk) 23:35, 5 September 2014 (UTC)
Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:21, 24 October 2014 (UTC)
Mike Linksvayer (talk) 23:26, 18 October 2014 (UTC)
Kopiersperre (talk) 20:33, 20 October 2014 (UTC)
Jonathan Dugan (talk) 21:03, 20 October 2014 (UTC)
Hfordsa (talk) 19:26, 5 November 2014 (UTC)
Vladimir Alexiev (talk) 15:09, 23 January 2015 (UTC)
Runner1928 (talk) 03:25, 6 May 2015 (UTC)
Pete F (talk)
econterms (talk) 13:51, 19 August 2015 (UTC)
Sj (talk)
author  TomT0m / talk page
guillom (talk) 21:57, 4 January 2016 (UTC)
·addshore· talk to me! 17:43, 18 January 2016 (UTC)
Bodhisattwa (talk) 16:08, 29 January 2016 (UTC)
Ainali (talk) 16:51, 29 January 2016 (UTC)
Shani Evenstein (talk) 21:29, 5 July 2018 (UTC)
Skim (talk) 07:17, 6 November 2018 (UTC)
PKM (talk) 23:19, 19 November 2018 (UTC)
Ocaasi (talk) 22:19, 29 November 2018 (UTC)
Trilotat Trilotat (talk) 15:43, 16 February 2019 (UTC)
NAH
Iwan.Aucamp
Pictogram voting comment.svg Notified participants of WikiProject Source MetaData

  • I agree with SJ. 1 is what is possible now, 2 and 3 seem only possible with Wikimedia movement financial investment and community organization which does not appear to exist, and 4 can happen anytime any external organization such as a university invests in a Wikibase instance, which also does not appear to be in the works anywhere. The practical development which I think has happened is that the growth rate of Wikicite has slowed. This tension came to be when Wikicite was 60% of all the items in Wikidata. Now Wikicite content is about 31% of all the items in Wikidata because Wikidata is growing to expand capacity for a range of projects. When Wikicite was the only project using scarce space it seemed more like an emergency, and now we instead have to do longer term planning for many projects which all will grow over time. Solving only Wikicite does not address the many other projects of comparable size which are also incoming. I think everyone is getting the idea to seek lots of feedback and be selective when there is a large upload possible. Blue Rasberry (talk) 15:13, 11 May 2020 (UTC)

Functioning of ORCIDator[edit]

Doesn't work? I've tried to run for Q60023087 and it does nothing. --Infovarius (talk) 19:10, 18 May 2020 (UTC)

Recent media & publications[edit]

Have you given/created a presentation, paper, tutorial, poster, research or documentation related to WikiCite and open citations since the WikiDataConference 2019??

If so, please add it to the list of Media & Events on Meta wiki so we can all keep track:
Meta:WikiCite/media#2020

Many events – including most of those which were approved under the 'satellite grants' program – have been forced to be cancelled/indefinitely postponed in recent months. But that does not mean people have stopped producing excellent work relating to linked bibliographic data in the Wikidataverse). Quite the opposite! So, it would be very helpful if you could help ensure that the work is easily findable by adding it to the list linked above.

Relatedly, I am currently preparing the 2019/20 WikiCite annual report – following in the sequence of the last three annual reports. I would like to include mention many of these things if possible. [not to take credit for them - but to demonstrate the variety and quality of work that is being done in our sector].

Sincerely,
LWyatt (WMF) (talk), project manager for WikiCite. 17:24, 20 May 2020 (UTC)