User:Jheald/VR
- This page initial version - STATUS: still work-in-progress, but getting there (?)
- Property proposal for GB1900 -- Wikidata:Property proposal/GB1900 ID
- TODO: WikiProject pages
- TODO: Manors items: assess state, discuss with Jo Pugh, also: how to relate/distinguish from settlement items (eg OpenDomesday links)
- TODO:
make sure all historic county of the United Kingdom (Q67376938) items have appropriate part of (P361) & located in the administrative territorial entity (P131) stmts.- HOLD pending better understanding of what is needed - TODO: data pages to better understand county of England (Q171809) & ceremonial county of England (Q180673) vs historic county of the United Kingdom (Q67376938)
- canvass opinions for new WikiProjects for ENG, WAL, SCO, IRL -- some starter tweets sent
- TODO: KTop breakdown - by this week
- TODO: Gazetteer coordinates: discuss at Project Chat
- Items for VR, EMEW -- Viae Regiae (Q105547906), Gazetteer of Early Modern England and Wales (Q105548625)
- TODO: matches for items with known EPNS DEEP Saxton hits
- TODO: think about metrics to measure quality improvements. Are there baseline queries that should be run now?
(JH non-VR focus reminders)
- JH: Finish Stop of the Exchequer rewrite
- JH: Pepys ID imports
- Suggest identifiers for WDQS queries -- [1] posted to wikidata mailing list
JH: Galloway Glens placenames ID property proposal [2]- Number of matches would be very few- JH: WikiTree: pages for remaining Brooke/Lawrence connections. Rowe tree. Ask for review.
- JH: Ongoing TP de-duplication. Finish the WikiTree dupe sweep. Extract MPs, look for dupes. Baronets.
- JH: KTop: Scotland artworks pilot. Make items. Upload files. qy: make sure Artwork template can handle unknown/stated as.
Phase 1 key activities[edit]
Prong 1: Map EMEW IDs to existing Wikidata items, or create new items when they do not exist.[edit]
- A) propose and seek approval of a Wikidata property for EMEW ID
- waiting on first-iteration EMEW ID landing page for some examples
will need items for EMEW, VR (TODO)done
- Most EMEW IDs will relate to things from imported gazetteers --> matching the contents of these gazetteers are important datasets for Prong 2
- May want a second ID for EMEW facts, so that we can easily reference eg coordinates in EMEW from a particular gazetteer to the relevant EMEW factoid.
- It is possible we may not have wikidata items for all EMEW IDs (notability?) -- but we certainly should if any other online resource has an identifier for the same thing.
Prong 2: Create or improve Wikidata items for concepts significant to the Viae Regiae project[edit]
- See below.
Prong 3: Investigate & prototype ways for EMEW to present information pulled from Wikidata[edit]
- EMEW should have core ability to present info for a single EMEW id, and for a single "EMEW fact" ID.
- Possible usages of WD info:
- eg plot a list of EMEW ids on the VR map pulled via giving a URL for a query on WDQS. (lists also specifiable in a VR url ?)
- eg infoboxes on the VR-wiki ?
- eg routinely using information from wd to augment info presented by VR for a mapped place ?
- INVESTIGATE sweet spot between pulling across data from wd on the fly vs keeping it locally at VR, perhaps updating it weekly vs full integration at VR
- Best methods for plumbing in on-the-fly re-use? Should be well-established how to best do by now. Need to find & ask the right people.
- On the Wikidata side, need to quality-improve those external-id datasets. Are they well-matched? How complete are they - are there more matches that could be made? (esp for items likely to appear in EMEW) --> also relevant datasets for Prong 2.
Dataset: Maps, texts, sources used by VR[edit]
Maps and texts[edit]
AIM: What different editions and resources are there for the key maps and texts being worked on by VR? Useful to flesh out with as complete items as possible.
- -- In particular what digital editions? Are there IIIF versions? For maps: can we translate our annotations between different digital base versions? What data is needed to enable this?
(see especially User:PKM/Notes/VR for more on these)
- Christopher Saxton (Q2099402) : Atlas of the Counties of England and Wales (Q27919282)
- John Speed (Q1245028) : The Theatre of the Empire of Great Britaine (Q105484787)
- Humphrey Llwyd (Q5941507)
- John Ogilby (Q1389271)
(can be useful to look at in Reasonator, as most many->one relationships are only on the many, not the one (eg: exemplar of (P1574), edition or translation of (P629)). QUERY: Plug-in that does something similar on a standard wikidata page? Even then, other than a curated page like PKM's it can be hard to get a sense of how the individual documents group together.
STRETCH GOAL: Also useful to track secondary literature: bibliography of books, articles, etc about the creators and the works. Look for such items where we have them, and make sure they have main subject (P921) set, so we can easily find them.
(QUERY: For texts, what is useful for Wikisource? eg how to give WS cleaned-up OCR, how to include annotations in mark-up (or strip?) )
Specialist gazetteers and thesauruses[edit]
- Data-sources being used by VR (eg specialist gazetteers, thesauruses): we should have an item for each of them. Focus list?
Referenced documents[edit]
Wikidata pages for archival documents that are being mined for information, eg
- Inquisitions post-mortem
- inquisition post mortem (Q6036904) is a completely blank item at the moment
- We should probably create items for each individual Inq p.m. & try to relate them to individuals and manors
- Quarter sessions
- Quarter Sessions (Q7269253). It looks like some county-level items exist for Wales, see per Reasonator [3] "from related items".
- Also Middlesex Quarter Sessions (Q16997751), England, Kent, Quarter Sessions and Court Files - FamilySearch Historical Records (Q94425097), Great Britain. Court of Quarter Sessions of the Peace (Glamorgan) Papers relating to, NLW MS 5203E (Q56177665) exist, with v limited information
- We should probably have an item for every court VR cites (and ideally every court that existed, with bibliographic data). archives at (P485) important.
- Unclear if finer-level items would have value
- Probate records
- At the very least, Wikidata should have information about the different probate registries. UNCLEAR whether there should be items on WD at the level of individual probate records -- possibly not
- ... others : see VR wiki: Sources
Datasets: Geo-spatial[edit]
- PROCESS:
- 1) try to match by 3rd party shared identifiers.
- 2) try to match based on nearness + type + name similarity (progressively widening the net), based on making list of the items (with particular characteristics) nearest each thing, then calculating name similarity
- 2A) also try to geo-match the other way, from likely items to things in the dataset, via making list of the things nearest each item.
- 3) perhaps Mix'n'match for remainder -- but people are apt to match similarly named things that are quite different
- ISSUE: How to open up, to allow more easy parallel participation? (? see if Andrew Lih has ideas -- cf how the Met project is participatory -- PAWS notebooks? shareable Google sheets?)
- QUERY: Are any tools (OpenRefine, newer MnM versions) capable of anything like the above?
- One thing MnM does represent is an easily-found central redlink list, for items that do not yet exist but maybe should. Item creation is labour intensive.
Already in EMEW, or planned (prong 1)[edit]
- Monasteries. @DrJACameron list: [4]. Current data: https://w.wiki/zcm
- We may have most of these; we ought to have all of them.
- Unclear whether spreadsheet is available. No item-level accessible website, so a property not appropriate. But catalog code (P528) + catalog (P972) would be reasonable, or fall back to described by source (P1343).
- A few (17 + 3 not found by query) have UK National Archives ID (P3029) links - should be more? Also, most should have National Heritage List for England number (P1216) -- hunt down the missing ID, and merge reported ID dupes that appear.
- Bridges. (pre-1200 / tudor / pre-1900)
- Here are bridges that that have listed building links: https://w.wiki/yzz , and we may well have more. But most are much, much later than the target period. Dataset could be significantly enhanced by drilling more data of of the listing info -- eg inception (P571), made from material (P186) ... other properties for bridges?
- Markets and Fairs -- recurrent events, that happen to have locations.
- Current data for market (Q132510) : https://w.wiki/zdS and fair (Q288514) https://w.wiki/zdT -- mostly modern; not always well distinguished from places where they occur
- Forests and Chases
- Churches
- Church schools
Gazetteers being used to locate points[edit]
- Survey of English Place-Names ID (P3627) (aka EPNS DEEP).
- Not yet populated. Some items have coordinates. Hierarchy: hundred / parish / sub-parish / places / ... -- work down. Also, fast-track places with Saxton citations.
- Property proposal for secondary ID currently running: Wikidata:Property proposal/English Placenames MADS ID
- Currently our items for Hundreds/Wapentakes are particularly weak or nonexistent. Final set should also be matched to Vision of Britain unit ID (P3615). QUERY: correct linkage between settlement and wapentake? located in the administrative territorial entity (P131) with end time (P582) ?? what end time?
- GB1900 At VoB [5]. Discussed in 2019 paper
- TODO: Propose property
Potential link-outs (prong 3)[edit]
- Cf listing of external IDs found on items with Vision of Britain place ID (P3616), on rhs at Property talk:P3616/stats
Property talk pages ought to have detailed info on extraction progress, & any systematic issues. But may well not. Probably needs a dashboard, to check whether monitoring place, original datasets or scrapes accessible, etc.
Possibilities could include:
- Good coverage (though not complete). Needs sanity checking -- issues such as Mortimers Cross, Herefordshire linked to Battle of Mortimer's Cross (Q620585) (because that's what the site linked to on Wikipedia). Perhaps as many as 2000 potentially anomalous. In some cases may have knocked-on to other external IDs.
- Reasonable coverage (?).
- ISSUE: currently linked to settlements, which may or may not be right. (?should be manors -- but then won't be cross-referenced. So perhaps both? People don't like that in external IDs, but might be acceptable)
- Valuable for manors, churches, religious houses, as well as parishes and settlements.
- Currently 2000 IDs, only a percentage of the possible matches, and at the moment only for settlements (Drawn from what had been matched at VoB).
- Would be nice to have the list behind their app. Additional property for VCH Numerical ID ?
- issue: a *huge* amount still just instance of (P31) = Kittisford Barton (Q17555635). And not well de-duplicated. Possible subsets: Grade I, II*, II buildings. Scheduled monuments. Churches. Monasteries (mostly done?). Bridges. Manor houses. ...?
- perhaps - ?perhaps for churches
- eg for monasteries (above) and manors (below)
as well as
- Survey of English Place-Names ID (P3627)
- GB1900 if web pages for the IDs ever go live
- ... what else ?
Potential as additions to EMEW & the VR map[edit]
- Manors TNA list: [6] Current data: https://w.wiki/zce CONTACT: Jo Pugh tweet
- May be more lurking in en:Category:Former_manors_in_Somerset & up the tree -- investigate with PetScan -> tag as manors -> but then create separate items, if the item is also a settlement, or a building?
- ISSUE: Appropriate relationship with settlement? And with the actual building (often later). Probably just create new items for almost all of them, use spouse (P26) to link to settlements, headquarters location (P159) (?) to link to manor building?
- Example: Kittisford (Q21061464), Manor of Kittisford (Q21061823), Kittisford Barton (Q17555635)
- ... ?
Datasets: Thesaurus entries[edit]
- If VR is using any standard external controlled-vocabulary or thesaurus (eg for the type of the geo-spatial feature), we should make sure that we can identify the corresponding wikidata items, and vice-versa.
- tbc
Allied groups of items, relevant to many items with EMEW ids, that may need work[edit]
- Counties
- The class historic county of the United Kingdom (Q67376938) and property historic county (P7959) (proposal discussion) were recently created. Needs attention to make sure part of (P361) and located in the administrative territorial entity (P131) are consistently present.
- QUERY: Relation with "ordinary" items for counties is not clear to me.
- -- if Essex (Q23240) is the item for "Essex" from the earliest times to the present day (and also has all the sitelinks) (? = the Bonnie-AND-Clyde item), then what is Essex (Q67442940) (? 'Clyde' -- but in what sense?) and how should it be relate to the first?
- Historical sub-divisions of counties
- Hundreds, wapentakes, rapes, etc. Need some love. (see note on EPNS-DEEP dataset above). Also other traditional areas defined within counties.
Cross-cutting issues[edit]
Other interested users[edit]
- Important not to tread on toes, and to build collaboration. So need to be able to identify other users with interests in any of the above focus areas.
- -- something like the 'recent changes' queries at Wikidata:WikiProject_BHL/Statistics:Titles may be useful, to see who has been editing the items.
- QUERY: Most interaction will be on wikidata in the usual ways, but useful to be able to invite ppl to VR #wikidata slack channel, when there are things to talk through?
Metrics and tracking[edit]
- Useful (and motivational!) to be able to compare before & after for states of groups of items. Dashboards to see how we're getting on, and what is still thin.
- Listeria query pages are useful - edit history makes it easy to compare past & present
- Examples - dashboard tool, BHL tracking pages, VoB stats page, disambiguation progress for The Peerage; others ?
Wikidata sister projects[edit]
- Scope for focus drives, and/or one-off improvements, on Wikipedias, Commons, Wikisource, etc for items in focus sets here?
- (& not just en-wiki : Best wikipedia coverage of Scottish heritage items may be de-wiki)
- QUERY: Wikisource possible uploads for texts improved/annotated here -- WS style guide for marking up?
Links[edit]
- Wikidata-EMEW
- https://viaeregiae.org/
- Slack
- Github
- twitter: https://twitter.com/ViaeRegiae
- User:PKM/Notes/VR
- Wikidata:WikiProject_UK_and_Ireland exists (and would be parent project), but has never attracted a community
Misc & other thoughts[edit]
- Tracking page for Blaeu: https://www.wikidata.org/wiki/User:Jheald/Blaeu
- !!! Significant weaknesses in historic county of the United Kingdom (Q67376938) set. part of (P361) and located in the administrative territorial entity (P131) need to be everywhere. instance of (P31) may need checking. Also Chapman code should be added (w/ identifier shared with (P4070) qualifier as necessary)
- Tweet to @gbhgis re VoB <--> EPNS DEEP <--> GB1900 identified matches. See also comment on Slack
- Looking at
SaxtonSpeed on Mirador [7] (Cambridge U.L.), cf Q105487401#P6108 -- QUERY: Is there a wikidata JS gadget to send IIIF manifests to a favourite viewer? - AI and map feature capture -- slack
- IIIF style-guide (under construction) for map annotations
- WMF IIIF status - slack
- Reach-out to User:Spinster re current thinking on next-gen annotation support for Commons: tweet, esp when number of annotations may be very large
- How we might capture areas of text on maps and link them to corresponding glyphs -- slack
- Reverse map-warping tweet
- PERSONAL: Understand Recogito better. Texts with OCR issues - can it cope? Output from Leland useful for Wikisource? IA linked page <-> OCR format : allowing copy&paste
- QUERY: Use VoB as base Recogito gazetteer? Or perhaps WD populated places? (slack)
- "Towards a national collection" interim reports: 'https://twitter.com/nat_collection/status/1356199280067928066]
- Index of Place Names (Q65050559) -- ought to have WD property ?