Wikidata:Requests for comment/Semi-automatic Addition of References to Wikidata Statements
An editor has requested the community to provide input on "Semi-automatic Addition of References to Wikidata Statements" via the Requests for comment (RFC) process. This is the discussion page regarding the issue.
If you have an opinion regarding this issue, feel free to comment below. Thank you! |
THIS RFC IS CLOSED. Please do NOT vote nor add comments.
The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- go to
your Gadgets
; - tick the
Primary Sources
item (penultimate one,Wikidata-centric
section) and press thesave
button at the bottom of the page; - on the left sidebar, click on the gear icon, next to the
Random Primary Sources item
link; - select a dataset, such as
strephit-confident
,strephit-supervised
,strephit-rule-based
, orfreebase
; - you can test the tool in 3 ways:
- the
Random item
sidebar link shows an Item to be curated; - the
Primary Sources list
sidebar link (Tools
section) allows to filter by dataset and property; - an Item of your choice.
- the
Please note that:
- known technical problems are filed at the code repository: https://github.com/Wikidata/primarysources/issues
- known feature requests are open at: https://github.com/Wikidata/primarysources/issues?q=is%3Aopen+is%3Aissue+label%3A%22open+task%22
- previous discussions were collected at: Wikidata_talk:Primary_sources_tool
- more reactions are posted at: m:Grants_talk:IEG/StrepHit:_Wikidata_Statements_Validation_via_References/Renewal
Thanks! --Hjfocs (talk) 12:53, 26 May 2016 (UTC)Reply[reply]
Contents
- 1 User interface feature requests
- 1.1 Filter for Items
- 1.2 Background color of suggested claims
- 1.3 Highlight text snippet used to extract the claim
- 1.4 Edit the claim before approval
- 1.5 Show suggested claims at the top of the page
- 1.6 Don't reload the page when approving/disapproving a claim
- 1.7 Live scanning of new sources
- 1.8 Creating of items for parents and children
- 1.9 Updating reference for time instead of creating a new claim when a reference URL is added
- 2 A whitelist for sources
- 3 Issues
- 4 StrepHit sources
User interface feature requests[edit]
Filter for Items[edit]
My suggestion for the primary sources tool would be to add a feature that shows users only items they are interested in. It should be possible to somehow filter for these items. The solution could be to choose only items that include a specific claim. For example, if someone is interested in items related to chemistry, he should be able to select only the items that contain the property instance of (P31) and have chemical compound (Q11173) as a value. Till Sauerwein (talk) 08:56, 26 May 2016 (UTC)Reply[reply]
- probably should be instance of (P31) OR subclass of (P279) , and allow for additional subclass of (P279) between the instance and the class (item) of interest. ArthurPSmith (talk) 15:03, 26 May 2016 (UTC)Reply[reply]
- Yes, subclass of (P279), instance of (P31), language of work or name (P407), original language of film or TV show (P364), etc. CC0 (talk) 20:18, 9 November 2016 (UTC)Reply[reply]
Background color of suggested claims[edit]
A minor change would be to change the color of the claims to be curated. At the moment, they appear in a slightly darker blue compared to the other claims. Maybe a more noticeable color like yellow or orange would help to highlight the statement to be curated. Till Sauerwein (talk) 08:56, 26 May 2016 (UTC)Reply[reply]
Highlight text snippet used to extract the claim[edit]
Could you include an overview of where the NLP pipeline found the supporting text in the document? This snippet could be highlighted and shown to the user. Perhaps, for example, provide annotations on hypothesis so the user can browse the reference page and check it for themselves. Bomarrow1 (talk) 08:54, 26 May 2016 (UTC)Reply[reply]
Edit the claim before approval[edit]
Add a feature where you can make minor changes to the claim. For example if the claim faces only minor inconsistencies, the user should be able to keep the claim and only change the value a little bit. Right now there are only two possibilities: reject or except the claim. Till Sauerwein (talk) 09:15, 26 May 2016 (UTC)Reply[reply]
- It would be better to just approve/reject as necessary and make the edits. With changes it is NOT from the tool. Thanks, GerardM (talk) 06:32, 1 December 2016 (UTC)Reply[reply]
Show suggested claims at the top of the page[edit]
It would be more comfortable if all claims that are suggested for curation show up at the top of the item page. Till Sauerwein (talk) 09:15, 26 May 2016 (UTC)Reply[reply]
Don't reload the page when approving/disapproving a claim[edit]
Reloading the page makes using the tool unfun because the user has to wait for the reloading; and more importantly loses context.
Quite often I would approve a claim but I don't bother because I don't want to go through a page reload. ChristianKl (talk) 10:40, 5 October 2016 (UTC)Reply[reply]
- The interface seems to support submitting without reloading but tries to reload anyway, then "primary sources" tool loads again. CC0 (talk) 20:21, 9 November 2016 (UTC)Reply[reply]
Live scanning of new sources[edit]
It would be great if StrepHit would scan all new reference urls that are added in Wikidata and within minutes add all claims it can find in the reference url to Wikidata. It might be worth to do the same with en.Wiki sources. ChristianKl (talk) 10:10, 14 October 2016 (UTC)Reply[reply]
Creating of items for parents and children[edit]
When a source provides both the name and a birthday or the year of birth and the year of death, I think it would be great if the Primary Source tool would allow one-click creation of the items for the person and automatically link it correctly. ChristianKl (talk) 10:10, 14 October 2016 (UTC)Reply[reply]
Updating reference for time instead of creating a new claim when a reference URL is added[edit]
Sometimes, the tool creates duplicate claims for time point data like date of birth, when a reference is approved. It'd be more fun to add sources if they don't have to be manually copied again. CC0 (talk) 20:31, 9 November 2016 (UTC)Reply[reply]
A whitelist for sources[edit]
I love the reasoning behind the Primary Sources tool, and I have it turned on by default. However, I guess I'm a pretty strict and critical editor and I turn down most reference suggestions, as I don't find them reliable enough. And I did add quite a few domains to the blacklist. But to do some reverse thinking: I would be totally willing to work on a whitelist of reliable websites that can serve as reliable sources for suggested claims and references. In my field of expertise (art and culture) I'd be able to produce these quite quickly, actually. I can also imagine that certain websites would be most useful for referencing specific types of claims (e.g. RKDartists would be a great source to feed references to birth dates and death dates, birth and death places of artists), so perhaps source whitelists per property could be an idea. Spinster 💬 15:27, 13 June 2016 (UTC)Reply[reply]
Issues[edit]
I've been using Primary Sources for a while now, and I would like to outline some issues which I feel would make me use it more and make it more productive.
Dataset[edit]
- There is a number of exact duplicate statements displayed. There's no point to keep exact duplicates in the dataset or display them
- There is a number of references referring to pages now dead, or servers unreachable, or redirecting to main page (sites that do that without issuing 404 should be launched on a rocket into the Sun but that's different issue :). These refs should be removed.
- A lot of unreferenced statements are very hard to evaluate. Maybe StrepHit could help in that?
- I think some of the people data are confusing people with the same name - e.g. Louis Eugene King (Q6687111) and Louis King (Q11322) - the latter is a movie actor and director, but the former is not, but Primary Sources thinks he is. Maybe it's problem with original data, but maybe some issue on the way. I wonder if there's an easy way to move the data to the right person.
UI[edit]
- The fact that each click causes full page reload makes working with it excruciatingly slow. If I have 20 statements on a big page, it 20 big page reloads.
- Even more frustrating, sometimes after clicking on approve the reload does not return to the same place on the page, so I have to scroll down to locate where I was.
- There is a lot of duplicate data items that differ only in references or qualifiers. There should be a way to join them without having to transfer references/qualifiers manually.
- Some claims are improvement on an existing claim - e.g. more precise date. Right now I need to do several operations to replace one with another. It'd be nice if I could just replace them with one click. That especially true for properties that usually tend to have only one value - such as date of birth, gender, publication date, etc.
- Sometimes the claim is right but needs to be amended (e.g. it claims year, but I actually have the source that has the date too). Now, I have to either create a new statement and then reject PS claim, or accept PS claim and then edit it. It'd be nice if I could just edit PS claim in-place.
I will add more if I remember it :) Laboramus (talk) 20:26, 15 June 2016 (UTC)Reply[reply]
StrepHit sources[edit]
The list of candidate reliable sources has been first stated in m:Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References/Timeline#Biographies, discussed in m:Grants_talk:IEG/StrepHit:_Wikidata_Statements_Validation_via_References/Timeline, and implemented in [1]
Genealogics[edit]
@Tfmorris1: in m:Grants_talk:IEG/StrepHit:_Wikidata_Statements_Validation_via_References/Timeline and [2] you claim that genealogics.org should be discarded. Could you please expand on this? That source was selected for 2 reasons:
- it is included in the mix'n'match catalog list;
- Identifiers have their specific Wikidata property, i.e., Property:P1819.
Thanks, Hjfocs (talk) 09:15, 16 June 2016 (UTC)Reply[reply]
- Not sure why I need to repeat myself, but quoting the page you referenced (which is the one containing the actual entry, thus my comment there): "Genealogics is a secondary source (at best) and none of the aggregated person data should be used if the ban on crowdsourced sites is to be honored." There are lots of properties for identifiers for data sources that Wikidata doesn't consider a valid import source: Freebase, IMDB, etc, etc. Tfmorris1 (talk) 21:04, 16 June 2016 (UTC)Reply[reply]
- Was any action ever taken on this? It's hard to track the status when the discussion is spread over three different locations. Tfmorris1 (talk) 13:34, 26 July 2016 (UTC)Reply[reply]
- @Tfmorris1: Was there a specific decision to ban crowdsourced data was made? If so, could you link to it?ChristianKl (talk) 09:50, 22 August 2016 (UTC)Reply[reply]
- @ChristianKl: I can't cite a specific page off the top of my head, but believe it's implicit in the ban on Freebase, IMDB -- heck, even Wikipedia, as sources. None of them are considered "reliable" enough to be cited. Perhaps someone else who's more familiar with the rules can cite chapter and verse. Tfmorris1 (talk) 05:39, 20 September 2016 (UTC)Reply[reply]
- Defacto we have a lot of Wikipedia citations and even explicit tools to make citing Wikipedia easier. If multiple sources are available it's better to use more authoritative sources but if the only source that's available is in that category I don't think we have made any policy decision against citing it. In the case of Wikipedia we even made policy decisions by having tool to make it easier to have those citations. ChristianKl (talk) 09:59, 20 September 2016 (UTC)Reply[reply]
- @ChristianKl: I can't cite a specific page off the top of my head, but believe it's implicit in the ban on Freebase, IMDB -- heck, even Wikipedia, as sources. None of them are considered "reliable" enough to be cited. Perhaps someone else who's more familiar with the rules can cite chapter and verse. Tfmorris1 (talk) 05:39, 20 September 2016 (UTC)Reply[reply]
https://familysearch.org provides really great Generology information because it's about basic documents like birth certificates. ChristianKl (talk) 21:40, 2 October 2016 (UTC)Reply[reply]