Wikidata:WikiProject 20th Century Press Archives/Tools & tasks

From Wikidata
Jump to navigation Jump to search

Pfeil oben.svg

 

Home

 

Data Structure

 

Typology

 

Data Sources

 

Use Cases

 

Tools & Tasks

 

Statistics

 

Tools[edit]

Task: Link individual PM20 folders to Wikidata[edit]

Often, the titles of PM20 folders can be matched only one by one, through manual lookup or browsing, to Wikidata items. Tools like Mix-n-match do not work well for certain types of folders. This is particularly true for the folders in the subject archives (Länder-/Sacharchiv). Therefore a short description of the according manual workflow here:

  1. Search or discover the folder via the web application or via lists of folders, e.g. from the country/topics archive (sortable and filterable folder list).
    We use the folder Deutschland (bis 1945) : Enteignung von Juden, Arisierung (1933-1945) (Expropriation of Jews in Germany 1933-1945, Aryanization) as example here.
  2. Copy the persistent link of the folder, which underlays the icon "Mappen-Zitier-Link". This is normally done by right-click and "Copy link address" (or similar named function) in the browser.
  3. Search the fitting Wikidata item - e.g. Aryanization (Q664017).
  4. Go to bottom of the item page and click "add statement".
  5. Start typing "pm20 folder" in the Property input box and select "PM 20 folder ID".
  6. Paste the persistent URL copied in step 2 and shorten it (e.g., from http://purl.org/pressemappe20/folder/sh/126128,208307 to "sh/126128,208307").
  7. Click "publish".

Add links to Wikipedia, if appropriate[edit]

Sometimes, PM20 folders may be a valuable external complement to Wikipedia articles. At the bottom, or the right side column, of the Wikidata item, links to Wikipedias in different languages are displayed. For each Wikipedia, there are rules on when and how to add external links - please check them carefully.

English Wikipedia[edit]

  • See the rules on w:Wikipedia:External links.
  • In order to be able to receive feedback on your edits, log into Wikipedia as a named user.
  • If the folder contents looks like a valuable addition to the according WP article, edit its "External links" section (or add == External links ==, at the article bottom, but above categories and the like) - see example.
  • Use the PM20 template with the folder ID described above, e.g.
   * {{PM20|FID=sh/126128,208307}}
for adding an link. By default, the WP article name is inserted into the link. If this does not fit well, you can insert an additional |NAME=... into the curly bracket with a better fitting description of the folder content.
  • Adding a short description of your edit in the "Summary" field helps watchers of the article.

German Wikipedia[edit]

  • See the rules at de:Wikipedia:Weblinks.
  • In order to be able to receive feedback on your edits, log into Wikipedia as a named user.
  • If the folder contents looks like a valuable addition to the according WP article, edit its "Weblinks" section (or add == Weblinks ==, at the article bottom, but above the section "Einzelnachweise" (individual citations), categories and the like) - see example.
  • Use the Pressemappe template with the folder ID described above, e.g.
   * {{Pressemappe|FID=sh/126128,208307}}
for adding an link. By default, the WP article name is inserted into the link. If this does not fit well, you can insert an additional |NAME=... into the curly bracket with a better fitting description of the folder content.
  • Adding a short description of your edit in the "Zusammenfassung und Quellen" field helps watchers of the article.

Regular maintenance tasks[edit]

Add PM20 ID via GND ID ("pm20 via gnd")[edit]

Has been run initially for 1600+ IDs. If GND IDs were inserted into Wikidata items which are known in not-yet-linked PM20 folders, we can automatically add the PM20 ID to the item.

 cd /opt/sparql-queries/bin
 perl make_qs_input.pl ../wikidata/missing_pm20_id_via_gnd.rq qsStatement

The query and the script are available on Github.

Set qualifiers ("pm20 folder name" / "pm20 doc count")[edit]

QuickStatements input files for named as (P1810), number of works (P3740) and number of works accessible online (P5592) are generated via

 cd /opt/sparql-queries/bin
 perl make_qs_input.pl ../pm20/folder_names_qs.rq qsStatement
 perl make_qs_input.pl ../pm20/folder_doc_total_count.rq qsStatement
 perl make_qs_input.pl ../pm20/folder_doc_online_count.rq qsStatement

Because company names are cleaned up currently, creation of "named as" qualifiers is restricted to sh wa pe for now.

The folder names / doc counts queries and the script are available on Github.

One-time tasks[edit]

Add items for all un-linked person folders[edit]

After extended M-n-m and looking up heads of state and multiple-documents folders manually, and some testing, items for all 346 remaining person folders were created automatically. As discussed on the talk page,

 perl add_missing_wikidata.pl pm20_pe create

(script, query) was executed and the output pasted into Quickstatements. Jneubert (talk) 15:08, 13 June 2019 (UTC)

Rather minimal example item: Albert Hopff (Q64589732)

Add person information from PM20 to WD[edit]

 perl add_missing_wikidata.pl pm20_pe enhance P106

Create Mix-n-match catalog for newspapers[edit]

DONE A mnm catalog for newspapers and journals from PM20 was created, comprising 1359 entries from the internal "publikation" database table, with the ZDB ID is key. Records without ZDB ID were omitted, some duplicates (e.g. same ZDB ID for paper and supplement) were skipped. (input file) --Jneubert (talk) 06:52, 8 September 2019 (UTC)

Replace Wikipedia links which do not use the templates[edit]

Links to (webopac|webopac0).hwwa.de and zbw.eu/beta/p20 will become obsolete, probably by end of 2020. Therefore, all references to such links have to be replaced.

Folder links[edit]

Document links[edit]

Direct links to documents or pages have to be replaced, too. Depends on the introduction of persistent addresses for documents.

Add PM20 geo/subject folders[edit]

  • Add PM20 geo codes to linked items according to existing mapping
  • Upper level categories (first and second level)
    • DONE Translate subject category labels to (British) English
    • DONE Create items for PM20 subject categories (160 in total)
      perl add_missing_wikidata.pl pm20_subject_category
      perl add_missing_wikidata.pl pm20_subject_category enhance P361 (partOf hiearchy)
      Two dozend items which link to special intermediate levels not transferred to Wikidata got no partOf link and need to be fixed
    • DONE Create items for folders (3776 in total)
      perl add_missing_wikidata.pl pm20_subject_folder - temporarily interrupted because of Quickstatements creating duplicates
  • All remaining categories
    • DONE Translate subject category labels to English
    • DONE Fix hierarchy
    • DONE Create items for PM20 subject categories (exactly the 1452 categories from "klassifikator WHERE klass_code='JE' and mappen_anzahl is not null")
      perl add_missing_wikidata.pl pm20_subject_category
    • DONE Create category hierachy
      perl add_missing_wikidata.pl pm20_subject_category enhance P361 (partOf hiearchy)
    • DONE Create category sort label
      perl add_missing_wikidata.pl pm20_subject_category enhance P8484
      perl add_missing_wikidata.pl pm20_geo_category enhance P8483
    • DONE Create items for folders
      perl add_missing_wikidata.pl pm20_subject_folder
    • DONE Set document counts
      perl make_qs_input.pl ../pm20/folder_doc_total_count.rq qsStatement
      perl make_qs_input.pl ../pm20/folder_doc_online_count.rq qsStatement
  • OPTIONAL (later)
    • Map subject categories to WD items (via main subject (P921))
    • Create all known geo and subject categories, even when for now without folders (for later use in film sections)
    • Create reverse has part statements (issues: meaningful order, completeness)
    • Create film sections for countres not or incompletely represented as folders, create pages and add according geo codes

Add company/institution folders[edit]

  • Retrieving and using direct links
    • DONE via GND
    • DONE via linked Wikipedia page in PM20
  • for each segment
    DONE for Dutch, for English, for German, IN PROGRESS for French (Mnm, search, QS, errors), ...
    • Mapping
      • Rules for in-exact matches, expressed via mapping relation type (P4390):
      • Create mnm catalog for company folders with documents, order by document count, matching against organization and wikipedia for the according language
        • for all entries, including already mapped
        • with synonyms (altLabel), names with GND excluded
          ./make_mnm_input.sh pm20 nl
      • Map from top
        • Create openrefine in same order (matching English labels) (???)
        • only for unmapped entries (after mnm automatch)
      • Create list of QS insert statements and use in parallel for creating missing items
    • Create QS inserts for all unmapped entries (using the country code lists above)
      TODO exclude only exactly or unqualified mapped items
      ./add_missing_wikidata.pl pm20_co
    • Update Mix-n-match (Action -> Katalog manuell synchronisieren -> Mix-n-match aktualisieren)
  • Cleanup / extension re. inexact mappings
    • set "related match" mapping relation qualifier for all co/person, co/building etc. mappings
    • repeat creation fo QS inserts (as above)
  • Adding instance-of statements (if not existant)
  • Mapping and import of industry sector
  • Mapping and import of headquarter location
  • Interlinking with companies
    • successor/predecessor
    • subsidiary/mother
  • Interlinking with persons
    • founder: perl add_missing_wikidata.pl pm20_co enhance P112
    • board: perl add_missing_wikidata.pl pm20_co enhance P3320
    • advisory board: perl add_missing_wikidata.pl pm20_co enhance P5052
  • Perhaps, link or create separate items for companies indentified by GND (zbwext:includesInstitutionNamed)
    • Note as part of the description

Activity log[edit]

Rough log of PM20-related activities