User:Jheald/VR

From Wikidata
Jump to navigation Jump to search

(JH non-VR focus reminders)

  • JH: Finish Stop of the Exchequer rewrite
  • JH: Pepys ID imports
  • YesY Suggest identifiers for WDQS queries -- [1] posted to wikidata mailing list
  • JH: Galloway Glens placenames ID property proposal [2] - Number of matches would be very few
  • JH: WikiTree: pages for remaining Brooke/Lawrence connections. Rowe tree. Ask for review.
  • JH: Ongoing TP de-duplication. Finish the WikiTree dupe sweep. Extract MPs, look for dupes. Baronets.
  • JH: KTop: Scotland artworks pilot. Make items. Upload files. qy: make sure Artwork template can handle unknown/stated as.

Phase 1 key activities[edit]

Prong 1: Map EMEW IDs to existing Wikidata items, or create new items when they do not exist.[edit]

  • A) propose and seek approval of a Wikidata property for EMEW ID
waiting on first-iteration EMEW ID landing page for some examples
will need items for EMEW, VR (TODO) done
Most EMEW IDs will relate to things from imported gazetteers --> matching the contents of these gazetteers are important datasets for Prong 2
May want a second ID for EMEW facts, so that we can easily reference eg coordinates in EMEW from a particular gazetteer to the relevant EMEW factoid.
It is possible we may not have wikidata items for all EMEW IDs (notability?) -- but we certainly should if any other online resource has an identifier for the same thing.

Prong 2: Create or improve Wikidata items for concepts significant to the Viae Regiae project[edit]

  • See below.

Prong 3: Investigate & prototype ways for EMEW to present information pulled from Wikidata[edit]

  • EMEW should have core ability to present info for a single EMEW id, and for a single "EMEW fact" ID.
  • Possible usages of WD info:
    • eg plot a list of EMEW ids on the VR map pulled via giving a URL for a query on WDQS. (lists also specifiable in a VR url ?)
    • eg infoboxes on the VR-wiki ?
    • eg routinely using information from wd to augment info presented by VR for a mapped place ?
INVESTIGATE sweet spot between pulling across data from wd on the fly vs keeping it locally at VR, perhaps updating it weekly vs full integration at VR
Best methods for plumbing in on-the-fly re-use? Should be well-established how to best do by now. Need to find & ask the right people.
  • On the Wikidata side, need to quality-improve those external-id datasets. Are they well-matched? How complete are they - are there more matches that could be made? (esp for items likely to appear in EMEW) --> also relevant datasets for Prong 2.

Dataset: Maps, texts, sources used by VR[edit]

Maps and texts[edit]

AIM: What different editions and resources are there for the key maps and texts being worked on by VR? Useful to flesh out with as complete items as possible.

-- In particular what digital editions? Are there IIIF versions? For maps: can we translate our annotations between different digital base versions? What data is needed to enable this?

(see especially User:PKM/Notes/VR for more on these)

(can be useful to look at in Reasonator, as most many->one relationships are only on the many, not the one (eg: exemplar of (P1574), edition or translation of (P629)). QUERY: Plug-in that does something similar on a standard wikidata page? Even then, other than a curated page like PKM's it can be hard to get a sense of how the individual documents group together.

STRETCH GOAL: Also useful to track secondary literature: bibliography of books, articles, etc about the creators and the works. Look for such items where we have them, and make sure they have main subject (P921) set, so we can easily find them.

(QUERY: For texts, what is useful for Wikisource? eg how to give WS cleaned-up OCR, how to include annotations in mark-up (or strip?) )

Specialist gazetteers and thesauruses[edit]

  • Data-sources being used by VR (eg specialist gazetteers, thesauruses): we should have an item for each of them. Focus list?

Referenced documents[edit]

Wikidata pages for archival documents that are being mined for information, eg

  • Inquisitions post-mortem
inquisition post mortem (Q6036904) is a completely blank item at the moment
We should probably create items for each individual Inq p.m. & try to relate them to individuals and manors
  • Quarter sessions
Quarter Sessions (Q7269253). It looks like some county-level items exist for Wales, see per Reasonator [3] "from related items".
Also Middlesex Quarter Sessions (Q16997751), England, Kent, Quarter Sessions and Court Files - FamilySearch Historical Records (Q94425097), Great Britain. Court of Quarter Sessions of the Peace (Glamorgan) Papers relating to, NLW MS 5203E (Q56177665) exist, with v limited information
We should probably have an item for every court VR cites (and ideally every court that existed, with bibliographic data). archives at (P485) important.
Unclear if finer-level items would have value
  • Probate records
At the very least, Wikidata should have information about the different probate registries. UNCLEAR whether there should be items on WD at the level of individual probate records -- possibly not

Datasets: Geo-spatial[edit]

PROCESS:
1) try to match by 3rd party shared identifiers.
2) try to match based on nearness + type + name similarity (progressively widening the net), based on making list of the items (with particular characteristics) nearest each thing, then calculating name similarity
2A) also try to geo-match the other way, from likely items to things in the dataset, via making list of the things nearest each item.
3) perhaps Mix'n'match for remainder -- but people are apt to match similarly named things that are quite different
ISSUE: How to open up, to allow more easy parallel participation? (? see if Andrew Lih has ideas -- cf how the Met project is participatory -- PAWS notebooks? shareable Google sheets?)
QUERY: Are any tools (OpenRefine, newer MnM versions) capable of anything like the above?
One thing MnM does represent is an easily-found central redlink list, for items that do not yet exist but maybe should. Item creation is labour intensive.

Already in EMEW, or planned (prong 1)[edit]

See https://viaeregiae.org/wiki/Datasets
We may have most of these; we ought to have all of them.
Unclear whether spreadsheet is available. No item-level accessible website, so a property not appropriate. But catalog code (P528) + catalog (P972) would be reasonable, or fall back to described by source (P1343).
A few (17 + 3 not found by query) have UK National Archives ID (P3029) links - should be more? Also, most should have National Heritage List for England number (P1216) -- hunt down the missing ID, and merge reported ID dupes that appear.
  • Bridges. (pre-1200 / tudor / pre-1900)
Here are bridges that that have listed building links: https://w.wiki/yzz , and we may well have more. But most are much, much later than the target period. Dataset could be significantly enhanced by drilling more data of of the listing info -- eg inception (P571), made from material (P186) ... other properties for bridges?
  • Markets and Fairs -- recurrent events, that happen to have locations.
Current data for market (Q132510) : https://w.wiki/zdS and fair (Q288514) https://w.wiki/zdT -- mostly modern; not always well distinguished from places where they occur
  • Forests and Chases
  • Churches
  • Church schools

Gazetteers being used to locate points[edit]

Not yet populated. Some items have coordinates. Hierarchy: hundred / parish / sub-parish / places / ... -- work down. Also, fast-track places with Saxton citations.
Property proposal for secondary ID currently running: Wikidata:Property proposal/English Placenames MADS ID
Currently our items for Hundreds/Wapentakes are particularly weak or nonexistent. Final set should also be matched to Vision of Britain unit ID (P3615). QUERY: correct linkage between settlement and wapentake? located in the administrative territorial entity (P131) with end time (P582) ?? what end time?
TODO: Propose property

Potential link-outs (prong 3)[edit]

Cf listing of external IDs found on items with Vision of Britain place ID (P3616), on rhs at Property talk:P3616/stats

Property talk pages ought to have detailed info on extraction progress, & any systematic issues. But may well not. Probably needs a dashboard, to check whether monitoring place, original datasets or scrapes accessible, etc.

Possibilities could include:

Good coverage (though not complete). Needs sanity checking -- issues such as Mortimers Cross, Herefordshire linked to Battle of Mortimer's Cross (Q620585) (because that's what the site linked to on Wikipedia). Perhaps as many as 2000 potentially anomalous. In some cases may have knocked-on to other external IDs.
Reasonable coverage (?).
ISSUE: currently linked to settlements, which may or may not be right. (?should be manors -- but then won't be cross-referenced. So perhaps both? People don't like that in external IDs, but might be acceptable)
Valuable for manors, churches, religious houses, as well as parishes and settlements.
Currently 2000 IDs, only a percentage of the possible matches, and at the moment only for settlements (Drawn from what had been matched at VoB).
Would be nice to have the list behind their app. Additional property for VCH Numerical ID ?
issue: a *huge* amount still just instance of (P31) = Kittisford Barton (Q17555635). And not well de-duplicated. Possible subsets: Grade I, II*, II buildings. Scheduled monuments. Churches. Monasteries (mostly done?). Bridges. Manor houses. ...?
perhaps - ?perhaps for churches
eg for monasteries (above) and manors (below)

as well as

Potential as additions to EMEW & the VR map[edit]

May be more lurking in en:Category:Former_manors_in_Somerset & up the tree -- investigate with PetScan -> tag as manors -> but then create separate items, if the item is also a settlement, or a building?
ISSUE: Appropriate relationship with settlement? And with the actual building (often later). Probably just create new items for almost all of them, use spouse (P26) to link to settlements, headquarters location (P159) (?) to link to manor building?
Example: Kittisford (Q21061464), Manor of Kittisford (Q21061823), Kittisford Barton (Q17555635)
  • ... ?

Datasets: Thesaurus entries[edit]

  • If VR is using any standard external controlled-vocabulary or thesaurus (eg for the type of the geo-spatial feature), we should make sure that we can identify the corresponding wikidata items, and vice-versa.
tbc

Allied groups of items, relevant to many items with EMEW ids, that may need work[edit]

  • Counties
The class historic county of the United Kingdom (Q67376938) and property historic county (P7959) (proposal discussion) were recently created. Needs attention to make sure part of (P361) and located in the administrative territorial entity (P131) are consistently present.
QUERY: Relation with "ordinary" items for counties is not clear to me.
-- if Essex (Q23240) is the item for "Essex" from the earliest times to the present day (and also has all the sitelinks) (? = the Bonnie-AND-Clyde item), then what is Essex (Q67442940) (? 'Clyde' -- but in what sense?) and how should it be relate to the first?
  • Historical sub-divisions of counties
Hundreds, wapentakes, rapes, etc. Need some love. (see note on EPNS-DEEP dataset above). Also other traditional areas defined within counties.

Cross-cutting issues[edit]

Other interested users[edit]

  • Important not to tread on toes, and to build collaboration. So need to be able to identify other users with interests in any of the above focus areas.
-- something like the 'recent changes' queries at Wikidata:WikiProject_BHL/Statistics:Titles may be useful, to see who has been editing the items.
QUERY: Most interaction will be on wikidata in the usual ways, but useful to be able to invite ppl to VR #wikidata slack channel, when there are things to talk through?

Metrics and tracking[edit]

  • Useful (and motivational!) to be able to compare before & after for states of groups of items. Dashboards to see how we're getting on, and what is still thin.
Listeria query pages are useful - edit history makes it easy to compare past & present
Examples - dashboard tool, BHL tracking pages, VoB stats page, disambiguation progress for The Peerage; others ?

Wikidata sister projects[edit]

  • Scope for focus drives, and/or one-off improvements, on Wikipedias, Commons, Wikisource, etc for items in focus sets here?
(& not just en-wiki : Best wikipedia coverage of Scottish heritage items may be de-wiki)
QUERY: Wikisource possible uploads for texts improved/annotated here -- WS style guide for marking up?

Links[edit]

Misc & other thoughts[edit]