Wikidata:WikiProject LD4 Wikidata Affinity Group/Affinity Group Calls/Meeting Notes/2021-01-12
Jump to navigation
Jump to search
Call details[edit]
- Date: 2021-01-12
- Topic: Using QuickStatements for bulk uploading archived website data to Wikidata
- Presenter: Peter Chan
- Link to original agenda with link to recording: https://docs.google.com/document/d/14KHUNQoTo5eX6iclLJdVSnOyPLNyPYQA4V854Mc7iY4/edit
Presentation materials[edit]
Notes[edit]
Introduction to Peter Chan’s work with modeling data for archived websites in Wikidata[edit]
Publish Archived Website Metadata as LOD[edit]
- Collections of archived website at Stanford in SearchWorks catalog
- Still not linked data
- https://searchworks.stanford.edu/view/mk656nf8485
- Collection level item in Wikidata: https://www.wikidata.org/wiki/Q98908071
- Item for website in Wikidata: https://www.wikidata.org/wiki/Q99851990
- Modeling issues
- Archive URL is qualifier and subproperty of URL
- Has constraints on formatting as well
- When use collection property has constraint suggestion of location--physical location not virtual or Internet
- Options might be allowing a virtual location with collection or adding virtual collection as a property
- Archive URL is qualifier and subproperty of URL
Discussion[edit]
- K1: this approach makes sense within Wikidata's current use of "archive URL," but it makes me feel like "archive URL" should be a top-level property in Wikidata (as opposed to a qualifier as it is now), and you could add collection as a qualifier to a given archive URL
- Website not in collection of web archive, but archive URL in collection
- D1 agrees
- E1: I agree with that too. It’s the archived website that’s being cataloged; not the website itself.
- K1: Maybe URL of item should be web archived URL and instance of something like archived website
- G1: asking for more granularity with instance? "archived website" verse "live website" both are websites though
- K1: I'm still a little unsure of that it would make sense to say that the website is in the collection of the fugitive us agencies web archive. I feel like it's not that the website is in the collection of the web archive except the archive URL is in the collection of the web. Item that gets created or if a property could be added to a website that is the archive URL. is there a difference in terms of wiki data to say something as the instance of a live website for the archive website and I know this is something that my colleagues at the frick
- M1 and S1, et al. at Frick worked on BIBFRAME model for web archives that might help share a data model for Wikidata
- M1: When we were thinking about creating the profile for web archives, it became clear that in order to create the distinctions that we needed regarding live sites versus archived sites (in our records we link to both when available), we had to parse them out into two separate instances--in part that was because the software didn't allow us to distinguish between the two with things like collection date or or the actual URL itself, but with Wikidata if you create an item or something related to the live site, it's different from the archive site in that the archive site does have that collection that its associated with--you can include the dates in terms of crawls, etc. that’s a bit more granular that obviously doesn't apply to a live site. Those were the distinctions that we were making. Making the distinction between the two and then potentially linking them as pointing to the same resource would help alleviate confusion about where the archive site fits within a collection
- E1: why didn’t you use archives at property for the web archives?
- https://www.wikidata.org/wiki/Property:P485
- Not sure that the way that you have it modeled necessarily makes any sense to use it because archives at usually connects a person to an institution
- it might be interesting to use if you're collecting particular agencies’ websites and add that information to the agency's wikidata page to say that archives are at Stanford's web archives--connects people who are interested in the particular entity that you are describing to give people an alternative point of access into those web archives
- Peter had considered, but wasn’t sure if could be used in this context
- J1: In my experience, Wikidata property constraints get adjusted fairly often, and warnings may therefore appear and disappear from time to time. Usually (and in the best case) this is due to community feedback on how properties are being, and should be, used. But it can be a bit bewildering to folks who edit Wikidata repeatedly.
- (Also in my experience, if an unprecedented Wikidata warning shows up that seems completely unreasonably, it’s often gone within a day or two after the constraint that raised it gets reverted.)
- H1: this is a really interesting query! What was the predicate for "main subject" and does "Federal Government" have a subject entity/linked to an LCSH or other controlled vocabulary? (Apologies if I missed that and it's already in the article)
- Predicate: P921
- H1: I do see this in the article "Linked the collection to 3 main subjects which have identifiers from authoritative agencies such as the Library of Congress" so that answers that last question
- D2: Is your plan to upload all of Stanford’s archived websites to Wikidata? This would be a great presentation to show Archive-It users if they are hosting any virtual conferences. Perhaps Society of American Archivists (SAA) also?
- A1: suggests property collection creator, instance of digital collection (Q60474998), instance of web archive (Q30047053)
- would just use instance of: web archive; “archived website with the title in the label”--would just say “archived website” and use “title” property to record the title
- E1: Also +1 to A1’s comment to add the instance of web archive to help with retrieval
- suggest the label is very odd: "archived website with the title in the label". I would just say "archived website" and use the "title" property to record the title
Bulk uploading data[edit]
Introduction to QuickStatements[edit]
- See slides
Helpful Resources About QuickStatements[edit]
Demo[edit]
Questions[edit]
- S2: How did you convert your data to the QS syntax (V1 & CSV )?
- CSV syntax was easy with this data--simply changed headers of the columns
- V1 more involved. For working with large datasets could try this tool: https://ash-dev.toolforge.org/wdutils/csv2quickstatements.php
- K1: can quickstatements be used to bulk update/change existing data in wikidata or is it mostly for creating new items/properties?
- Can do both
- Add a statement with V1 Command syntax: https://www.wikidata.org/wiki/Help:QuickStatements#Add_simple_statement
- Create an item with V1 Command syntax: https://www.wikidata.org/wiki/Help:QuickStatements#Item_creation
- Add a statement with CSV syntax:
- Can do both
- P1: Is there any way quick statements will warn you if you are creating a duplicate?
- Best to first reconcile your data with Wikidata
- Creating new Wikidata items with OpenRefine and Quickstatements
- Reconciliation with Open Refine Affinity Group Call
- Use “run” to load statements. “Run in background” may create duplicates: https://www.wikidata.org/wiki/Help:QuickStatements#Using_QuickStatements_version_2_in_batch_mode
- If you do realize you created a duplicate, you can merge them
- Existing statements with an exact match (property and value) will not be added again; however additional references might be added to the statement.
- Best to first reconcile your data with Wikidata