User talk:Billinghurst/Archives/2019

From Wikidata
Jump to navigation Jump to search
This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.

Re

Replying you at ProjectChat. —Zhxy 519 (talk) 15:29, 12 January 2019 (UTC)

digital document

Hey billinghurst, a while ago you have proposed to create a Wikisource-specific badge called digital document (Q28064618), see phab:T153186 and this discussion. I have difficulties to find enough documentation for this badge type (in English Wikisource or Wikidata). Can you please roughly explain how this badge should be used, and how I can look up at Wikisource whether a given sitelink qualifies for this badge? Thanks, MisterSynergy (talk) 10:39, 14 October 2019 (UTC)

@MisterSynergy: Typically works at WSes are from added from scans of published works where contributors have proofread, initially, and subsequently validated, hopefully, and this is because predominantly works are prior to 1923. That clearly does relate to modern documents that are published in electronic format, so producing a "scan" to reproduce a work is somewhere between redundant and ridiculous. These types of documents would be considered digital documents. At the WSes such documents will have a reference to their electronic source.  — billinghurst sDrewth 10:44, 14 October 2019 (UTC)
Thanks, makes sense. As I have zero experience with Wikisource, where can I look up the status of a document? —MisterSynergy (talk) 19:20, 14 October 2019 (UTC)
@MisterSynergy: It is deduction from examination of the source/citation. For a contribution there should be a source on the talk page within s:template:textinfo, though many just plonk it somewhere on the page, hopefully in the "notes" section. If the work is from scans and transcluded (most of our old works), then look for a /‾source‾\ tab and the page numbering down the side, eg. s:Apollo 11 Goodwill Messages. Modern electronic documents would be press releases where whilst we could go and dump it into a document and upload it, seems pretty pointless when a copy and paste is equally accurate. Some do prefer the solid source. My guidance is if it didn't come with printed page numbers, then it probably is electronic.  — billinghurst sDrewth 20:48, 14 October 2019 (UTC)
Okay, I start to understand how this works (I guess, …). The Index pages (accessible via "Source" tab) seem to be categorizing themselves based on the value given in "Progress" field into one of these categories. I now speculate that:
There does not seem to be an explicit category for works that should have a problematic (Q20748094) badge (though I assume that their Index pages appear in one of the other subcategories of en:s:Category:Index), and there is also no explicit category for works that should have a digital document (Q28064618) badge. In particular, the latter status is not formally encoded anywhere in (English) Wikisource, correct? —MisterSynergy (talk) 08:24, 15 October 2019 (UTC)
"Category:Index" relates to Index: ns, which is only scanned works, and there for proofreading/transcription. Digital documents do not require proofreading, so arrive by copy and paste means, or by import, etc.

Yes, to the three aligned categories, though it is possible for works to arrive without scans to be proofread, though nightmare to undertake, and no one progresses that route any more. There is no digital document category. Noting that Index: pages are backend, not display. An Index: can contain one work, less than one work, or many works. What is transcluded and itemised in WD can be different. Wikisources are not so homogenous as WP. Also to note that many Index: pages that are "unproofread" don't have anything transcluded, so no item will exist. Examples:

  • s:Index:Confiscation in Irish history.djvu sits in not proofread at the index category level, though many of the pages are proofread, and none of the pages are transcluded
  • s:Index:Thom's Irish who's who.djvu is some way through proofreading and similarly sits in not proofread has front pages transcluded and is WD-itemised; it has some ancillary ToC built, some of the entries are transcluded (though haven't kept up with proofreading) and those entries created are individually WD-itemised, proofreading and indexing continues.
Comment: there is no tool available to apply badges either individually or broader bot-application, eg. petscan. The WEF-framework tool is okay for most aspects of creating an item, though lacks some elements, and does not apply badges. We are creating editions, which are separate from th creative work items, and means you generally have to create two items for every work at a Wikisource. Manually doing creations at WD is a PITA and is way too many edits, at least the WEF tool gets it done in one edit (if the items exist). Soooooooo adding WS editions to WD can be a tedious exercise, which many will not do, and some of us will do the basics. We work hard on author pages, at enWS and their pairs at WD, so that is mostly ducks in a line, except for problematic authors. Noting that rather than streams of one way discourses, if you are in IRC then we can have a two-way conversation. @MisterSynergy:  — billinghurst sDrewth 09:13, 15 October 2019 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── Thanks for your detailed comments, it helps a lot for Wikisource beginner like me.
The reason I am asking about the badges is that there is meanwhile badge support in pywikibot (since this summer), and technically one could also make direct API calls to automate badge management; the latter is clearly not advisable for large-scale batches. Yet, with pywikibot it seems possible to automate badge management with a "Badge Bot"—if the badge status can systematically be read from the connected wikis (template transclusions, categorization, etc) or from their Mediawiki APIs. In any case, for a "Badge Bot" Wikisource badges would require a totally different treatment than the other six regular badges, and there will probably quite some differences between the Wikisource projects as you also indicated in your previous comment.
On the other hand, I meanwhile doubt somewhat whether we should attach the Wikisource badges to sitelinks at all. My take is that Wikisource editors have sufficient local tools (categories, Index pages, templates, …) to have an overview of the page status, so Wikidata is clearly not the venue where they look for backlogs; thus, it is probably also not of much use that the Wikisource badges appear in the Query Service. The only real use case that I see would be interwikilinks that are decorated according to proofread status in other Wiktionary wikis… Not sure where I will be settling with my thoughts. —MisterSynergy (talk) 10:28, 15 October 2019 (UTC)

@MisterSynergy: Okay that helps. (waving hands and thinking out loud) Let us explore/progress this from the item end. You are correct that it will not be universal for WSes, though where wikis are utilising proofread page it will be pretty similar, especially where Index: ns is involved. [Prefaced statement: a Commons: scanned edition is a direct one to one relationship to a Wikisource Index: page]

Edition level
  • An WD-edition item can be one to one with an index, (so one file: at Commons) and from there we can read the status directly from the Index: page status. [majority of situations]
  • An WD-edition item can be one to many index: pages, eg. volumes of works, or separate scans due to file size; where this happens I would expect multiple listing of Commons scans (and that would be multiple Index: pages from which to get a status )
  • Multiple WD-editions of works to an Index: and that would be root works, think here "Works of Charles Dickens",or some weird disjointed compilation, which could have multiple works in the one volume. So no correlation between Commons: file, and item, though the Index: page status reflects the state of all the published works

Note: Issues arise where the scan file is held directly at enWS rather than Commons, so there is zero scans listed in WD-item.

Entry level

The proofread status of a page is inhaled and displayed as a ribbon on the top left of a page, examples

  • s:Thom's Irish Who's Who/Cairnes, William Plunket is a section of p. 29 from the published work, and the status of the item is directly related to status p. 29, currently proofread. There is a hidden statement
    <table class="pr_quality noprint" title="0 validated pages, 1 only proofread page and 0 not proofread pages">
  • s:Thom's Irish Who's Who/Brabazon, Major-Gen. Sir John Palmer components of pages 22 and 23, which today are both at proofread, so quality statement is
    <table class="pr_quality noprint" title="0 validated pages, 2 only proofread pages and 0 not proofread pages">

So at the entry level the status can be determined from the lowest non-zero level from the quality statement statement. Also noting that the page and status can change quietly in the Page:/Index: nss, without an overt change in the main ns.

Digital

I don't think that we are going to get "digital document" neatly within the scope without better work at the wikis. I am guessing that it will need the manual categorisation, either directly to a page, or through the use of specified templates.

At this point, I would say that what I would be saying is that the bot can be assisting and maybe categorising within WD,

  1. what is a scan-supported edition or subpage/entry from a scan at Commons
  2. what is a scan-supported edition or subpage/entry from a scan at a local WS
  3. what is a free-text document (unknown status or digital document)

Then we can work on a means to review and potentially tag those free-text documents. [Means to do so to be determined].  — billinghurst sDrewth 11:18, 15 October 2019 (UTC)