Wikidata talk:WikiProject Source MetaData

From Wikidata
Jump to navigation Jump to search

New Tool for Creating Items from a Pubmed ID[edit]

Hi All, I made a tool to help create items for journal articles from a Pubmed ID. It uses WikidataIntegrator, which is a python package created by User:Sebotic for creating bots and interacting with Wikidata. Check out the page here, and let me know any comments or suggestions.  – The preceding unsigned comment was added by Gstupp (talk • contribs) at 20:50, 28 January 2017‎ (UTC).

Where do I see the record of Source M.D.?[edit]

I ran a batch, but see no evidence that it worked. I cannot find anything about the ISBNs that I listed. I cannot find the books that it might have created based on those ISBNs. I see nothing that indicated the effort produced something. Where should I look? If the "batch created" is relevant, it was 20181010142459. Thank you. -Trilotat (talk) 19:42, 10 October 2018 (UTC)

Done. Thank you. -Trilotat (talk) 14:35, 16 November 2018 (UTC)

Questions about "scholarly articles"[edit]

I am the first to admit that I don't understand how batches of scholarly articles are selected to be added Wikidata, but I am puzzled by some items I discovered yesterday. How do we end up with these kinds of items?

  • A "scholarly article" that is a book review of a 1973 book that does not have a Wikidata item, and whose author does not have a Wikidata item. See Joan of Arc (Q58606956). (I added the book, but I wonder how valuable these items are.)
  • A "scholarly article" that is chapter 22 of a book which did not have a Wikidata item (thus no "published in" statement). See Margery Kempe (Q58236291). (I added the book.)
  • A "scholarly article" that is a two-line death notice of a general practitioner who does not meet our notability criteria. (See Robert (“Bob”) Tennant. (Q55527198) and the obituary; another at Margaret Wilson (née Fyfe). (Q46255903).)
  • Most concerning, three items with <instance of> "scholarly article" and <title> "Algebra (English)". These are in fact online editions of multi-chapter algebra textbooks in German, two of which have useful and disambiguating subtitles. (Algebra (Q55869482), Algebra (Q56627314), and Algebra (Q56637998).) (I improved these, but I have not created "work" items to associate with these editions.)

Is this just an inevitable side effect of loading massive amounts of citation data? Are there process improvements that we could make to avoid these? - PKM (talk) 22:12, 15 November 2018 (UTC)

I think we should stop creating new scholarly articles by batches until there is consensus (and resources) to import established corpora with a meaningful scope and a reasonable metadata quality threshold. − Pintoch (talk) 22:29, 15 November 2018 (UTC)
Symbol support vote.svg Support I also stumble upon these massive data imports which have never been checked manually. Wikidata should not be used as data dump. -- JakobVoss (talk) 09:10, 16 November 2018 (UTC)
Speaking in a personal capacity here (as opinions differ wildly in the community on this topic), I'd very much welcome a proposal ensuring that every large-scale data import is linked to documentation including: a specific statement of purpose, a clear impact story (why are we doing this, who's benefiting from this data), expected data quality/maintenance costs or issues, and a well-defined projection on the scope/completeness of the import (in terms of # of items and statements to be created). I have been expressing concerns in the past about the ingestion of sparse, non-random datasets that are not representative of any well-defined catalog. Inferences based on these datasets can be flawed unless there is a notion of completeness or scope built into them. Documenting these imports in an accessible way, and explaining the process behind them, would go a long way in providing visibility and a shared understanding of their purpose. This would be much more useful than a binary decision/RfC-style recommendation as to whether a specific dataset should be allowed to exist or not in Wikidata.--DarTar (talk) 19:47, 4 December 2018 (UTC)

Notifying the project as this is quite important Mattsenate (talk) 13:11, 8 August 2014 (UTC)
KHammerstein (WMF) (talk) 13:15, 8 August 2014 (UTC)
Mitar (talk) 13:17, 8 August 2014 (UTC)
Mvolz (talk) 18:07, 8 August 2014 (UTC)
Daniel Mietchen (talk) 18:09, 8 August 2014 (UTC)
Merrilee (talk) 13:37, 9 August 2014 (UTC)
Pharos (talk) 14:09, 9 August 2014 (UTC)
DarTar (talk) 15:46, 9 August 2014 (UTC)
HLHJ (talk) 09:11, 11 August 2014 (UTC)
Lawsonstu (talk) 15:15, 11 August 2014 (UTC)
Blue Rasberry (talk) 18:02, 11 August 2014 (UTC)
Micru (talk) 20:11, 12 August 2014 (UTC)
JakobVoss (talk) 12:23, 20 August 2014 (UTC)
Finn Årup Nielsen (fnielsen) (talk) 02:06, 23 August 2014 (UTC)
Jodi.a.schneider (talk) 09:24, 25 August 2014 (UTC)
Abecker (talk) 23:35, 5 September 2014 (UTC)
Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:21, 24 October 2014 (UTC)
Mike Linksvayer (talk) 23:26, 18 October 2014 (UTC)
Kopiersperre (talk) 20:33, 20 October 2014 (UTC)
Jonathan Dugan (talk) 21:03, 20 October 2014 (UTC)
Hfordsa (talk) 19:26, 5 November 2014 (UTC)
Vladimir Alexiev (talk) 15:09, 23 January 2015 (UTC)
Runner1928 (talk) 03:25, 6 May 2015 (UTC)
Pete F (talk)
econterms (talk) 13:51, 19 August 2015 (UTC)
Sj (talk)
author  TomT0m / talk page
guillom (talk) 21:57, 4 January 2016 (UTC)
·addshore· talk to me! 17:43, 18 January 2016 (UTC)
Bodhisattwa (talk) 16:08, 29 January 2016 (UTC)
Ainali (talk) 16:51, 29 January 2016 (UTC)
Shani Evenstein (talk) 21:29, 5 July 2018 (UTC)
Skim (talk) 07:17, 6 November 2018 (UTC)
PKM (talk) 23:19, 19 November 2018 (UTC)
Ocaasi (talk) 22:19, 29 November 2018 (UTC)
Pictogram voting comment.svg Notified participants of WikiProject Source MetaDataPintoch (talk) 10:48, 16 November 2018 (UTC)

I do not see any problem here. I would like to see as many articles in Wikidata. From a WikiCite perspective, one major concern of Wikidata is the lack of comprehensiveness: That it does not contain every paper and book. Scholars that visit Scholia would be disappointed when they see that only 10% of their publication are there. If we want to create automated bibliographies on Wikipedia from Wikidata information, then the data should be there. I experience a minor issue when using Magnus Manske's sourcemd: It assumes that every DOI is a scholarly article. That is not the case. I particularly see errors in connection with Springers book series where the invididual books, primarily conference proceedings, are miscategorized as scholarly articles instead of editions. That is, however, something I can live with. Perhaps we should focus on building a tool that will handle the Springer books and chapters. — Finn Årup Nielsen (fnielsen) (talk) 11:27, 16 November 2018 (UTC)
@Fnielsen: I would like to see as many articles in Wikidata. well that is a problem I think, because at the moment Wikidata really cannot handle importing these millions of DOIs. This is putting a significant strain on the servers and degrading the service. As far as I can tell there is no consensus for these batch imports so they should stop. one major concern of Wikidata is the lack of comprehensiveness: That it does not contain every paper and book. Scholars that visit Scholia would be disappointed when they see that only 10% of their publication are there that is not a major concern of Wikidata, it is a major concern of Wikicite. So I see two solutions:
  • Either we import the entire Crossref database (with the appropriate filters to get rid of pathological cases like the ones above) - but good luck with convincing the community and WMDE that this is something Wikidata can be used for - with the current hardware resources this seems unmanageable to me;
  • Or ad-hoc batch imports should stop and Scholia should not advertise Wikidata as a place where you can expect to find all your publications.
In any case it really does not make sense to keep adding batches of publications without a well defined scope, as far as I can tell.
Pintoch (talk) 11:46, 16 November 2018 (UTC)
@Pintoch: Thanks for bringing up these issues. I hope that you raise more issues and encourage others to raise more issues. There are lots of tough questions here which do not have easy answers. If you want direct answers then please ask shorter, single questions in their own sections.
This project, "WikiProject Source Metadata", has participants who upload lots of citations and also write the model for citations. There is a community which might be larger and more organized at meta:WikiCite which is actually seeking to address the challenge of sorting all the academic publications. Although there is overlap in the communities, the composition of the membership of these groups, their goals, and their editing strategies are different. Consider talking to both. The WikiCite community on meta has been organized enough to present 3 conferences, so seems to have some ability beyond this community discussion board.
You asked why those sources are in Wikidata, despite being short passages, single book chapters, or other odd publications. Those passages have an identifier like a doi. They are also part of a strategic subset that someone selected.
You asked about strain on the servers and service. The WikiCite community is treating this with urgency. Comment at Wikidata:WikiCite/Roadmap for how to fix this. Blue Rasberry (talk) 15:21, 16 November 2018 (UTC)
To me the problem is not import of large numbers of bibliographic records but automatic import of large number of bibliographic records without intellectual quality control. Every time I selected a "strategic subset" and imported the data I had to manually go through the list of imported items and correct ugly artifacts such as those noted above. -- JakobVoss (talk)
@Bluerasberry: They are also part of a strategic subset that someone selected. Is there a page somewhere that lists these strategic subsets, and shows that more than one person finds them strategic? Do people file requests for bot tasks where these scopes are discussed? As far as I can tell, people just import their own publications, those of their friends and colleagues, or those of anyone who uses the #icanhazwikidata hashtag on Twitter… Is that the strategy?
I am quite active both in the Wikidata and Wikicite communities, and I am aware of the roadmap. It is great that this discussion is taking place. Sadly, I personally do not have anything to propose to scale Wikicite at the moment. What I find problematic is that while this discussion is taking place, random publication imports keep happening. There are many other ways to contribute to Wikicite: import journals, institutions, publishers, conferences, open access policies, notable researchers… why don't we focus on that instead while we figure out a solution for publications? Let's try to find more Dona Stricklands instead of importing our own publications, distinctions and awards! The WikiCite community on meta has been organized enough to present 3 conferences, so seems to have some ability beyond this community discussion board. The discussion can happen elsewhere, but it must happen on Wikidata too as long as Wikidata is used to host the data. The fact that conferences are organized does not waive anything, I think. − Pintoch (talk) 18:31, 16 November 2018 (UTC)
@Bluerasberry: I can't find any discussions at meta:WikiCite about what should or should not be imported, or how. Are these discussions happening on the email discussion list? All I found was the statement "WikiProject Source MetaData is the place on Wikidata where coordination of these efforts happens." So this is where I posted. Lest anyone be confused, I am 100% supportive of the WikiCite effort. I just want quality entries that can be used as references and linked to their subjects, authors, and the books/journals in which they appear. If there are best practices for modeling scholarly articles, I'd like to see them and participate in improving them. I'll take your advice and post separate questions about specific cases. - PKM (talk) 22:04, 16 November 2018 (UTC)
Let me emphasize too that I love this project and I would be very, very happy if we could import the entire Crossref. But we need a viable plan for that and in the meantime we should not try to import everything we can get away with while the admins aren't looking. − Pintoch (talk) 09:03, 17 November 2018 (UTC)
"This is putting a significant strain on the servers and degrading the service." - Really? Do you have a citation for that? Or a statement from the dev team? Last I heard, they were quite sanguine about the volume of content being added. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:11, 20 November 2018 (UTC)
@Pigsonthewing: Well, I am only a user so I only have a partial view on this, but here are the few clues I have:
  • The main resource that needs to be shared is the editing throughput (number of edits made each minute), and Wikicite already takes up a fair share of that. We have seen various incidents recently with the dispatch lag going up, the query service going out of sync, the Wikidata API returning errors and other issues like that.
  • WMDE has had to hack custom code into the search profile to penalize scholarly articles in search results, so that they don't crowd the results for users who look for other items. That is not a good sign.
  • My understanding is that the size of Wikicite in the query service could also potentially slow down unrelated queries (which would therefore time out more frequently), but I don't know Blazegraph enough to be sure of that. Dario (WMF) wrote at Wikidata:WikiCite/Roadmap#Growing_pains that the rapid ingestion of content is taking a toll on the querying infrastructure, causing frequent timeouts.
If I am just making up these concerns then I would be very happy to be told so. In that case I will gladly file a bot request to import all journal articles from Crossref. − Pintoch (talk) 09:58, 21 November 2018 (UTC)
  • Completely agree with @Pintoch:. Dumping DOIs indiscriminately and inciting unnotable authors (like myself) to self-aggrandizement through #icanhazwikidata does not increase the value of Wikidata. It just increases the demoability of Scholia on isolated cases. But it creates usability problems and in time will lead to stricter Notability enforcement, imho.
    • don't assume that everything's roses at CrossRef, just look at their metadata completeness reports. I have as part of Tracking of Research Results (Q56259739), and their resolved authors (orcid) and affiliations (grid) are in single digit percentages. So if we dump all journal articles from CrossRef (50M or so), who's going to clean them and resolve them?
    • Articles should not be dumped if the authors are not resolved. Magnus clearly states in the "author name string" proposal

https://www.wikidata.org/wiki/Wikidata:Property_proposal/Archive/39 that it's a stop gap measure, only to be used for important/reference articles. Don't use it as excuse to dump obituaries and other junk (after the important physician from the obit is created, then the obit could be created as a reference item, not before! Or just use the doi link as reference)

Mattsenate (talk) 13:11, 8 August 2014 (UTC)
KHammerstein (WMF) (talk) 13:15, 8 August 2014 (UTC)
Mitar (talk) 13:17, 8 August 2014 (UTC)
Mvolz (talk) 18:07, 8 August 2014 (UTC)
Daniel Mietchen (talk) 18:09, 8 August 2014 (UTC)
Merrilee (talk) 13:37, 9 August 2014 (UTC)
Pharos (talk) 14:09, 9 August 2014 (UTC)
DarTar (talk) 15:46, 9 August 2014 (UTC)
HLHJ (talk) 09:11, 11 August 2014 (UTC)
Lawsonstu (talk) 15:15, 11 August 2014 (UTC)
Blue Rasberry (talk) 18:02, 11 August 2014 (UTC)
Micru (talk) 20:11, 12 August 2014 (UTC)
JakobVoss (talk) 12:23, 20 August 2014 (UTC)
Finn Årup Nielsen (fnielsen) (talk) 02:06, 23 August 2014 (UTC)
Jodi.a.schneider (talk) 09:24, 25 August 2014 (UTC)
Abecker (talk) 23:35, 5 September 2014 (UTC)
Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:21, 24 October 2014 (UTC)
Mike Linksvayer (talk) 23:26, 18 October 2014 (UTC)
Kopiersperre (talk) 20:33, 20 October 2014 (UTC)
Jonathan Dugan (talk) 21:03, 20 October 2014 (UTC)
Hfordsa (talk) 19:26, 5 November 2014 (UTC)
Vladimir Alexiev (talk) 15:09, 23 January 2015 (UTC)
Runner1928 (talk) 03:25, 6 May 2015 (UTC)
Pete F (talk)
econterms (talk) 13:51, 19 August 2015 (UTC)
Sj (talk)
author  TomT0m / talk page
guillom (talk) 21:57, 4 January 2016 (UTC)
·addshore· talk to me! 17:43, 18 January 2016 (UTC)
Bodhisattwa (talk) 16:08, 29 January 2016 (UTC)
Ainali (talk) 16:51, 29 January 2016 (UTC)
Shani Evenstein (talk) 21:29, 5 July 2018 (UTC)
Skim (talk) 07:17, 6 November 2018 (UTC)
PKM (talk) 23:19, 19 November 2018 (UTC)
Ocaasi (talk) 22:19, 29 November 2018 (UTC)
Pictogram voting comment.svg Notified participants of WikiProject Source MetaData --Vladimir Alexiev (talk) 23:53, 3 December 2018 (UTC)

About the demoability of Scholia - I don't even think importing our own publications makes for better demos. Who would demo Wikidata by first showing an item about themselves? Does Multichill demo the wikiproject Sum of all paintings by showing the Wikidata items about his own works of art? − Pintoch (talk) 00:19, 7 December 2018 (UTC)

Cleanup subpages[edit]

The list of subpages is an unusable mess. See Wikidata:Requests_for_deletions#Bulk_deletion_request_of_outdated_WikiCite_Listeria_pages and help cleaning up outdated content. -- JakobVoss (talk) 09:10, 16 November 2018 (UTC)

The pages have been deleted but current subpages still consist many material of unclear value. If pages have been created 2016 and dormant since or tasks listed there have been done, we should better archive and summarize their content. -- JakobVoss (talk) 06:17, 18 November 2018 (UTC)
+1. Coming into this project four years in, it's hard to tell from the many subpages what has actually been completed and what would be helpful to work on. - PKM (talk) 23:27, 19 November 2018 (UTC)

Author name strings[edit]

I'm new here and very interested in the project, looking forward to using the references database for Wikipedia. Just a technical question (hope this is the right place): Why did you decide to have a simple author name string instead of providing separate fields for surname and given name? When reusing the data (e.g., for Wikipedia), we really need to know which is which. This is an issue if the surname is composed of more than one word (e.g., "K. van Bibber" – any automatic tool would list this incorrectly as "Bibber, K. van" when it is in fact "van Bibber, K."). And this can lead to great confusion for Chinese names, where given name and surname are vice versa to what we are used to in Western societies (except for in American sources, who force them into the Western system). I fear that this issue can make the data quite useless as these cases are actually quite common; how are they handled? --Jens Lallensack (talk) 14:40, 16 November 2018 (UTC)

A similar idea came to my mind, for authors who have an item, the string that is used to credit this author should not be a plain statement but a qualifier of the « author » statement :
 author: (the author item)
     credited as : K. van Bibber
@Jens Lallensack: As you point out the notion of name parts is very cultural. Therefore we should not base any information system on the assumption that any name can be split into a first and last name. See for instance DBLP: Some Lessons Learned which explains why you should avoid to parse names into subfields in a bibliographic information system. − Pintoch (talk) 18:40, 16 November 2018 (UTC)
Thanks for the answers. That is rather disappointing though. Why not providing fields for first and last name optionally, in addition to the author name string? Thinking about it, the lack of this data is a big issue. The citation format "last name, first name" is the standard in both academia and at least the English Wikipedia (I am active in both). Also, in many cases you want to cite the initials of the first names only (in the English Wikipedia, we often go for the initials because we usually do not know the full first name of a number of authors, and we do not want to have a mixture of initials and full names). All this will not be possible with Wikidata if first names are not separated from last names. With this issue, I fear that source metadata from Wikidata will never be widely used by neither academia nor Wikipedia. --Jens Lallensack (talk) 22:36, 16 November 2018 (UTC)
@Jens Lallensack: family name (P734) and given name (P735) on the item for the author may give what you are looking for. Jheald (talk) 09:10, 17 November 2018 (UTC)
Thanks, that looks a bit better, but will not work in the many cases where authors published under different names: We always need to cite the name as it was presented in the source, per convention and for reasons of retrievability. If, for example, an author variously published works both with and without the middle initial, we have to cite the name exactly as it was presented in the respective source, even if we end up with several different variants of the name of a single person in the data. If the author changed his/her name (e.g., marriage), we, again, need to cite the name under which he/she published the respective source. Consequently, the "family name" given in the item of the author is not necessarily what we need to cite. For those reasons, we need to include within the item for the source itself, because this data is source-specific. I see no way around it; but please let me know if I am mistaken, I'm eager to learn. --Jens Lallensack (talk) 10:21, 17 November 2018 (UTC)
One question: I would like to read (and possibly join) the discussions regarding the design of the data models of the different publication types, however I was unable to find them. Does anybody know where these discussions are taking place or whom to talk to? Thanks, --Jens Lallensack (talk) 09:16, 18 November 2018 (UTC)
@Jens Lallensack: these discussions mostly happen through debates around the creation of new properties. On each property you should be able to find a link to the page where it was discussed, indicated with property proposal discussion (P3254). For instance, for author name string (P2093), the discussion is at Wikidata:Property_proposal/Archive/39#P2093. − Pintoch (talk) 09:22, 18 November 2018 (UTC)

Short death notices[edit]

Robert (“Bob”) Tennant. (Q55527198) and Margaret Wilson (née Fyfe). (Q46255903) are examples of two-line death notices of medical practitioners who do not meet our notability criteria. Should these items be flagged for deletion (and more importantly, if they are deleted, will some other process likely add them back in because their PubMed IDs are "missing" from Wikidata?) Is there value to including these items? - PKM (talk) 22:11, 16 November 2018 (UTC)

@PKM: I think the best place to start is "measure the value" rather than "is there a value". I hope that to start you will grant a value greater than 0, because we have a person, an obituary in a reliable source, and multiple facts about the person.
There are 3 notability criteria is at Wikidata:Notability. #1 is linking to other wiki articles, like a Wikipedia article. This person does not have a Wikipedia article, so that is a fail. The other two criteria, "clearly identifiable" and "fulfills structural need" seem like passes to me. Fewer than 1% of physicians get an obituary in a medical journal. To me that makes these two seem like plausible candidates for being high importance in their field. These Wikidata entries have some value now.
To get maximal value out of this in the long term we need Wikidata items for both the subject of the obituary and the obituary itself. The item for the person is much more valuable if we can fill out properties including year of birth and death, place of residence, occupation, and institutional affiliates for education and work. Since they were the subjects of obituaries, maybe they accomplished something significant in their lives, and maybe not, but at minimum with just the content in the obituary we get useful insights.
For example, if anyone queried the count of physicians who were prestigious enough to enter the Wikidata media record and who were practicing in the 1950s, then we can get insights into the ratios of how many globally at that time were female, what ethnicities get recognition, what fields of medicine were most represented, what locations put their physicians of that era into the media record, and what hospitals / schools have ties to historical personages. We can establish the permanent public global record of humanity here, and it probably is the case that some hospitals have records of physicians in decades past and some hospitals seemingly left no media footprint.
A near future plan for Wikidata is to query for a university or hospital and profile them to exhaustion for whatever everyone associated with them did, the demographic breakdown of whomever got media recognition representing them, and the demographic breakdown for whomever benefited from their output.
How would you measure the cost versus benefit of this? How would you feel about collecting every obituary in every academic journal? Blue Rasberry (talk) 15:24, 18 November 2018 (UTC)
@Bluerasberry:. Okay, you've convinced me that these items have potential value. As you say, unlocking that potential value is dependent on someone or some process teasing the biographical data out of these notices to create items for the physicians. - PKM (talk) 21:08, 18 November 2018 (UTC)
@Bluerasberry: You have not convinced me, sorry. If someone is actually interested in tracking those people, they'll create them first, before creating their obits. They'll go to https://www.bmj.com/content/333/7557/48.4, pay 30 EUR, and parse sentences like "Former general practitioner Highland area (b 1928; q Aberdeen 1951; DCH, DPH)" to create a person and some facts from his life. Now tell me do you truly believe this will happen within the next 20 years, and why does WD need such dead weight (no pun intended). It's easy to dump DOIs, it's much harder to do something useful with the data. --Vladimir Alexiev (talk) 00:06, 4 December 2018 (UTC)

A definition I had to learn about "intelligence" was "Intelligence is what the intelligence test measures.". A scholarly article is what is published in scholarly publications.. Now some of these publications have little merit, some have a lot of merit I am not a scientist and I am not there to judge. When I work on them, I link articles to authors or authors to publications and generate both thanks to what ORCID knows as people's publications. In this way a fine web is weaved. The problem described here is one where people assume that individual articles or authors are assessed. They are not. I just worked on a chemical award and for those awardees with an ORCiD identifier I submitted a job to add publications and co-authors. Literally hundreds of edits are made as a consequence. They are known good thanks to ORCiD. They do and could include publications Wikipedia could use to prove its points but they prevent one publication to be exclusively claimed when they are not. Thanks, GerardM (talk) 12:03, 9 December 2018 (UTC)

This is a Wiki and this Wiki will host all the citations of all Wikipedias[edit]

To all of you that talk about notability, one objective is to include all references of all Wikipedias. That is in itself a project that is going on with a database separate from Wikidata and it being ingested in phases. All the issues raised above are as appropriate for this subset but being a subset, you will not seed the wood for the trees. Leaving it as a subset will not have us see all papers on a subject, it will make authors with retractions not show. We will be left with information like a stamp collection.

The process of cleaning up data is largely based on available information from ORCiD. We import many, many authors and their publications from there. This process is involved and it does link publications to authors directly but only for authors who have a public record. As a consequence the Scholia information becomes comprehensive and the authors gain their notability through their work. When you look at properly processed Scholia information, you get a lot of information including co-authors, subjects, where people published, date lines and citations.

There is no such information available to us elsewhere. It is important to have this.

What I always find funny when people complain about problems is their lack of Wiki perspective. This is a Wiki and it is allowed to be incomplete and not always correct. The point is that we acknowledge that Wikidata is a work in progress. We can all note that a lot of effort goes in the development of citation data and their is a vision why it makes sense to have it. To put it bluntly, thanks to all this work, scientists who are open about their work gain notability, they will be more likely to be cited in Wikipedia and it will help us to find a neutral point of view as we will know the literature on a subject, any subject. Thanks, GerardM (talk) 06:11, 4 December 2018 (UTC)

Beyond "Thank you all" I do not understand your sketch. --Succu (talk) 22:48, 4 December 2018 (UTC)
@GerardM: Your comment is contradictory: how can you think that WPs will use WD data if you claim "it is allowed to be incomplete and not always correct" ? WP won't use WD until WD can prove that its data are correct and well maintained, so instead of trying to import always more data, better stop data import and curate existing data even if the dataset is quite small. And instead of speaking about WD future perhaps can we discuss about the real use of WD data by Wikipedias: the status is the same among main Wikipedias, WD data is considered as unreliable, due to bad data import and lack of data protection against vandalism, and not fulfilling the local Wikipedias rules so so massive data from WD are currently used. Just some examples: all RfCs in WP:en finished with no agreement to use WD, infoboxes using WD are regularly replaced in WP:fr,... Snipre (talk) 12:48, 5 December 2018 (UTC)
You’re incorrect to assume datas won’t ever be used by any major Wikipedia (as you did also recently on PC) and I don’t really understand where such strong statements are going. The problem with your approach is that you don’t explain where you will find the manpower to curate datas if … wikipedians are not involved. Few contributes directly to Wikidata, and free datasets won’t fall out of the sky. author  TomT0m / talk page 13:27, 5 December 2018 (UTC)
@TomT0m: Please explain me where my reasoning wrong when considering that WPs clearly mentioned the unreliability of WD as one of the major drawbacks to use it as data source and at the same time some WD contributors that having wrong data in WD is not a problem.
The claim of GerardM is typically an argument for the opponents of WD use in WP, so we just give reason to those who are looking for WD weaknesses. As long we adopt this kind of strategy, how do you want to convince WP to use WD ?
Please read the results of the last RfC and explain why the major lua infobox using WD data in WP:fr can be systematically replaced by infoboxes using local data and not the inverse ?
If you want to have wikipedians curating WD data, you need to hear what are their demands concerning data quality and act in consequence. Just comment please the following sentence from the closing comment at the end of the RfC "...if Wikipedia wants to use data from Wikidata, there needs to be clear assurances on the reliability of this data" vs. the claim of GerardM. Where am I wrong by saying that having the position of GerardM is one of the major problems to solve if we really want to provide what WPs are requesting ? Snipre (talk) 16:24, 5 December 2018 (UTC)
@Snipre: Your statements are far too strong, as for example frwiki uses Wikidata in infoboxes or for works item citation in bibliographies ( fr:Modèle:Bibliographie has about half a thousand of inclusions, which is not bad considering you have to find an item id for the work to find and this is not user friendly). But my point is, why would have datas a need to be curated if you allows anly imports from reliable databases ? If the alternative is either « Wikidata is perfect and Wikipedias uses it » and « Wikidata is not perfect and Wikidata is not used » we simply are going nowhere. There is a middle point and we already are somewhere in between. But I don’t think your radical position helps to find it. author  TomT0m / talk page 16:40, 5 December 2018 (UTC)
@TomT0m: Please provide me a link to a RfC or any other community in WP:fr allowing to use WD data in an unconditional way ? Use of WD data is tolerated not completely accepted. And your example is very good one: can you provide me the link towards the discussion which decided to use WD data in infobox Modèle:Bibliographie ? This discussion is mandatory according to the last RfC about the use of WD data. Your example is the correct description of WD use in WP: use which is restricted by some strong constraints, limited use or limited to special topics.
And please read again what I said: I never said we have to have perfect data, but we can't accept errors. We have to work correctly from the beginning and put the correct tools to avoid errors. This doesn't mean errors can't occur. Change of mind has to happen: the time when everythiny could be imported is finished and data has to be checked in some way before importations, tools to curate data has to be developed, people have to integrate quality in their contributions. I saw nothing corresponding to that in the claim of GerardM, but I read the inverse instead. Snipre (talk) 10:45, 6 December 2018 (UTC)
@Snipre: As far as I know, quality has not been that a concern in the RfC’s of frwiki and the restrictions would be the exact same one if Wikidata had stricter policies. And I’m not talking about how things are supposed to be in theory but how they are in practice. (Plus data quality is not a concern is using the « bibliographie » template, it’s a usability one as if it’s not used transcluded in a template it implies the use of a QID in the plain code of the page, which is the main concern of the frwiki community.) author  TomT0m / talk page 11:08, 6 December 2018 (UTC)

Wikipedia is a Wiki and as such incomplete and not necessarily always correct. The notion the Wikidata will be useful only when it is complete and perfect is just an opinion. When Wikidata includes all citations of all Wikipedias, it will provide a substantialy superior service and not using it/ considering its use will be just silly. This is NOT about all the issues Wikipedians come up with, this is about sources and citations, consequently info boxes are not a consideration. Thanks, GerardM (talk) 14:15, 5 December 2018 (UTC)

@GerardM: Sorry but did you spend some time in the other WPs ? When someone can show that the WP quality is highest that the WD quality, how can you expect they will use WD data ? WPs are working harder than us to provide better quality, they can integrate directly data from reference sources using bot so large WPs don't require WD.
If you want to sell a product, you have to be sure that your product is corresponding to a demand. What is the main demand of WPs ? Just read that summary to see what wikipedians are looking for. Snipre (talk) 16:24, 5 December 2018 (UTC)
So you want me to acknowledge the opinions some Wikipedians have about Wikidata.. Even though in this context it is not relevant.. Fine.
First, like Commons, Wikidata has a symbiotic relation with Wikipedia. Like commons, Wikidata is not only about Wikipedia for both Wikipedia represents a subset of what it has to offer. When Wikipedians through there lack of appreciation what a Wiki is, reject Wikidata, they do not understand how Wikidata can help with their disambiguation. Particularly in lists there is an error rate of over 4% and such issues could be found with tooling that has been suggested for years now.
Second, Wikipedias all of them, are a subset of what Wikidata has to offer. A substantial number of red links in a Wikipedia includes a lot of information. It is not offered to Wikipedia readers and imho Wikipedia does a disservice to the motto: "the sum of all knowledge". Arguably this still reflects Wikipedia but what happens with "these papers and authors" is that they are imported from ORCiD as a consequence for authors and papers a "Scholia" is build up. I am building up Scholias particularly for scientists on Twitter, scientists in the news, my aim is to build awareness that the Scholia information, information that is free, reflects the merit of a scientist and the authors they cooperate with.
Third, bravo to the quality drives of Wikipedians. We grieve the cost we incur because it is at the expense of cooperation and collaboration. When I read what English Wikipedia has to say in what it calls the "2018 state of affairs" I find little connection to what I perceive as the potential of Wikidata and the blinders Wikipedians willingly wear.
For as long as the staff of Wikimedia consider Wikidata as secondary to Wikipedia, the same is true for Commons by the way, and Wikipedians have this inflated sense of importance of Wikipedia, it makes that I do not care to "sell" Wikidata as a product. For me both Wikipedia, Commons and Wikidata are not products and I am not going to sell them as such. Thanks, GerardM (talk) 06:09, 6 December 2018 (UTC)
PS @Snipre: Where is their and your self reflection.. Do they/you not understand that this attitude is parasitic? Thanks, GerardM (talk) 07:10, 6 December 2018 (UTC)
@GerardM: Sorry but do you read what you write ? "Wikidata has a symbiotic relation with Wikipedia." A symbiosis means both participants agreed to work together. So please provide me the agreement of WP:en to use WD as open source for data.
You can say what you want about the mentioned RfC, by considering it as opinions of some Wikipedians, but RfC are currently the common way to express the WP community opinion, so please respect that process.
And if you don't understand that WPs were able to work years without WD and that a majority of Wikipedians are ready to continue to work like that as they don't see the advantages of WD, I think we can close the discussion. Snipre (talk) 10:57, 6 December 2018 (UTC)
<grin> Do you know what service Wikidata provides to Wikipedia? </grin> Apparantly not. From the start of Wikidata, all interwiki links are organised in Wikidata, it provides a superior service to Wikipedia. As to my appreciation, tell me WHY I am wrong not that I have to respect something I keep away from for good reasons. No, symbioses does not mean agreement, it is how things effectively are. Thanks, GerardM (talk) 11:24, 6 December 2018 (UTC)