Help talk:Import BLKÖ from wikisource

From Wikidata
Jump to navigation Jump to search

Previous discussion

[edit]

Discussion before starting mass import for BLKÖ is found here: Wikidata:Bot_requests#BLKÖ

Thanks to @Jura1: for starting this Help-page with interesting maintenance queries. --Mfchris84 (talk) 10:03, 7 April 2020 (UTC)[reply]

  • Thanks for your feedback. I added a few that might not necessarily be useful (any more) for this project, but could help similar imports in the future. --- Jura 11:41, 7 April 2020 (UTC)[reply]


For genealogical tables, I left a note on WikiProject_Genealogy. I think adding scans for all of these on Commons would be helpful. The questions is how far we want to go with other information in these tables. --- Jura 20:14, 14 April 2020 (UTC)[reply]

A few property proposal that could help (directly now, indirectly maybe later) with the effort are:

All these seem somewhat peripheral to other users' interests, so more constructive input and support from your side could help. --- Jura 20:14, 14 April 2020 (UTC)[reply]

@Jura1: thanks a lot for all your work. Interesting reports and queries and of course a lot of work to do! unfortunately i have not that much time at the moment, at least to inform German WikiSource community to interact and improving all these items. ;-) At the moment, it is hard to start QuickStatements Batch Jobs, one of 20 jobs were loaded and initialized, it is a little bit frustrating. --Mfchris84 (talk) 09:47, 17 April 2020 (UTC)[reply]
ok, now all batches are running to create all the missing articles. some articles won't be created due errors on duplicate labels and descriptions, which need some manually observation or correction afterwards. i will report about that asap. the update batches for each volumes left many errors in quickstatements; the statements are all correct, but there were some troubles with QuickStatements. most of the errors may happend while adding the qualifiers on the described by source statement in the biographical items. we should query probably missing qualifiers there (subj. is statement of, volume and page qualifier). --Mfchris84 (talk) 09:59, 17 April 2020 (UTC)[reply]
@Mfchris84: Good news. Thanks for doing that. It seems we are moving close to the end [1]. Can you try to restart the batches with errors? Personally, I don't use the batch mode of QS, but v.2 non-batch mode requires often starting a batch several times until all parts of the edits go through. Oddly v.1 generally works better. --- Jura 10:15, 17 April 2020 (UTC)[reply]

@Mfchris84, M2k~dewiki:

  • I added a short status at Help:Import_BLKÖ_from_wikisource#Steps. Maybe once with get Mix'n'Match for main subject going (with Magnus' help), you could inform WS about it. Ideally, we could offer them a few features that can already be done with Wikidata (e.g. complete main subject there, display information from Wikidata about the subject). BTW, could you provide some input on the property proposals? --- Jura 05:00, 23 April 2020 (UTC)[reply]

Maintenance

[edit]

i will improve some points within the next few days/weeks (due to lack of time unfortunately):

  • Inform the deWikisource Community about this ongoing project. Especially to engange them for some improvements here and vice versa on WikiSource (e.g. missing cross-reference links, change wiki-links from disambiguation-pages to correct biographical pages etc)
    • "cross-reference" created through WikiSource Community and not representing any cross-reference in the dictionary: BLKÖ:Munsch, Leopold
  • cross-references are a certain problem. some (or maybe many) of them are not 'tagged' as "(Verweis)" in the page title. So we have to check die Wikitext more in detail and guess that really short pages are cross-references.
    • an easy way to detect them based on this wikidata-ingest could be to query all described by source statements where more then one 'statement is subject of' qualifier exists. then we have blkö-items where a cross-reference isn't detected appropriate.
  • double-biographies some biographical articles containing two biographies (e.g. for siblings) which isn't mentioned in the page title or in the template:BLKÖ values. (i'm looking for some examples)

--Mfchris84 (talk) 10:03, 7 April 2020 (UTC)[reply]

It might be easier to sort them out once all items are created. --- Jura 11:41, 7 April 2020 (UTC)[reply]

When compiling the first lines, I changed a few more P31. There are now some 1000 cross-references compared to 16,000 articles. I was wondering about the following questions:

  • Should we use a different P31 for cross-references by Wikisource editors (as opposed to those by Constant v. Wurzbach)? I would tend to do that.
  • Should we handle cross-references to articles that mention other people differently? I'm not quite sure how.

--- Jura 20:14, 14 April 2020 (UTC)[reply]

First line

[edit]

Hi @Jura1:, (as inclusionist ;-) ) i really liked your idea about adding first line (P1922). Pointing my Wikisource-Parser for BLKÖ i tried different approaches to fetch the first line. E.g. the 'first sentence' functionality given by the mediawiki api, or different regex statements. MediaWiki stops at first dot, which is absolutely useless in this project, because we have so many abbreviations like "geb." or "gest." in the first line. Different regex also fails to extract exactly the first sentence because there isn't a well formed structure in the articles and they have so many shortend terms and therefore so many dots which are not terminating the sentence. How do you extracted the lines, do you have some ideas? E.g. Benâczy, Franz von (BLKÖ) (Q88593329) you also haven't the whole first sentence. that would be "Benâczy, Franz von (Rittmeister, geb. zu Ende der ersten Hälfte des vorigen Jahrhunderts, gest. ?). Ward am 4. October 1759 zum Graf Erdödi’schen Husaren-Regimente assentirt, avancirte 1770 zum Corporal, 1776 zum Wachtmeister, 1778 zum Unterlieutenant, 1785 zum Oberlieuten. und 1790 zum Secondrittmeister." do you have some ideas? --Mfchris84 (talk) 10:16, 7 April 2020 (UTC)[reply]

I tried to go for the first sentence here and at Naval Biographical Dictionary (NBD). For both, I concluded that I couldn't extract that reliably. NBD has the additional problem that the first sentence isn't necessarily that important (some start with "John Doe is the son of William Doe and the brother of James Doe").
In a first test for this, I went for approx. 300 characters and ended it with "…" (the property could take up to 1500 characters). Samples at [2]. What do you think? I could make it longer or shorter. --- Jura 11:41, 7 April 2020 (UTC)[reply]
  • I went ahead with 300 characters after striping most formatting and references (except those in single square brackets "[]"). Page numbers are rendered as [123], similar to Wikisource. Shall I change something? If you find some errors to fix, please leave a note here or revert me (for now all articles for volume 8 have the statements + all articles without P921 for volumes 1 to 7, see comment below). --- Jura 20:14, 14 April 2020 (UTC)[reply]

Cross-references

[edit]

BLKÖ contains a lot of cross-references for articles to allow listing of different spellings or pseudonyms. Some of these cross-references has in the wikisource pagetitle the suffix "(Verweis)". This allowed us to tag them automatically with the statement: instance of (P31)cross-reference (Q1302249). But many of these items are missing these suffix and there is no value in the Wikisource Template or Page to extract machine based on an easy way this information.

The datamodel of a BLKÖ cross-reference indicates that main subject (P921) should not point to the described person but to the referenced biographical article. Furthermore biographical items on Wikidata should only have one described by source (P1343)Biographisches Lexikon des Kaiserthums Oesterreich (Q665807)statement is subject of (P805)biographical item.

This Query shows as items where the described by source (P1343) statement has more then one statement is subject of (P805) qualifier. We could use the list to correct the BLKÖ-items as references, clean-up the main-subject of these cross-references pointing to the biographical article in BLKÖ and remove the links from the described by source (P1343) statement in the biographical wikidata item. --Mfchris84 (talk) 11:22, 7 April 2020 (UTC)[reply]

Interesting example
For Adam Friedrich Oeser (Q215129) exists two BLKÖ-articles 1) Oeser, Adam Friedrich (BLKÖ) (Q89569467) and 2) Oefner, Friedrich (BLKÖ) (Q89569452). The last one is technically a certain form of a cross-reference, but it is still more, because the editor collected there different (biographical) information about the pseudonym, and referecences where it was stated. so i think it is both, an encylopedical article about the pseudonym and its history and usage, and a cross-reference to Oeser, Adam Friedrich (BLKÖ) (Q89569467).

I edited Oefner, Friedrich (BLKÖ) (Q89569452) as mentioned above and also changed Adam Friedrich Oeser (Q215129) in scope of this situation: i think Oefner, Friedrich (BLKÖ) (Q89569452) is much more then a cross-reference, but it isn't a fully qualified biographical article. so i changed the rank of the two BLKÖ claims in Adam Friedrich Oeser (Q215129). --Mfchris84 (talk) 09:03, 29 April 2020 (UTC)[reply]
In slightly difference cases, I used addendum (Q352858) in P31. Eventually, someone might want to look into cross-references in detail and make sure we gather all information that can be gained from them, especially, cross-references for persons mentioned in other articles. --- Jura 10:37, 3 May 2020 (UTC)[reply]

Place names

[edit]

The first lines include places of birth/death. While "Wien" is still in use, some others might not longer be (e.g. Constant's place of birth). Maybe we could store these as lexemes. That has the advantage that we can match them gradually to items and we separate selection of names from their identification. I will try to do a few samples. --- Jura 20:14, 14 April 2020 (UTC)[reply]

maybe you are talking especially about the German names of places which are located now in Eastern European countries? i am not sure if lexeme is really appropriate. (but i have absolutely no idea about lexemes at all) we could enrich the existing place-items with the "old" German names as alias there. --Mfchris84 (talk) 08:45, 29 April 2020 (UTC)[reply]
Place names are words too. If you look at de.wiktionary.org you will find plenty. I did an experiment at reports/names/comparison/place of birth. There is some interest to get this working. The main problem is that people interested in lexemes aren't necessarily interested in place names and people interested in place names might not have tried to figure out the lexeme structure yet. Another aspect is that features that could help for place names are still to be developed and development has slowed down, if it hasn't stopped. So place name lexemes tend to gain information I think are only marginally important for these. --- Jura 10:37, 3 May 2020 (UTC)[reply]

For all articles that don't have P921, there is now first line (P1922) for volume 1 to 8 (see overview).

Unless there is some matching that could be done with other means, I would try to use the first lines to prepare a list for Mix-n'-match (per Help:Add main subject with Mix-n-Match).

Are there any simpler options left that could be used? I think GND or the template at Wikisource is already exhausted. --- Jura 20:14, 14 April 2020 (UTC)[reply]

  • I try to complete P921 with Harvesttemplates. This just found a few additional ones (< 20 for 25000).
Also, some came up with "no target page found". For some this is normal (dewiki article was moved and the redirect deleted [3]), for others, I'm not sure why. Samples [4][5][6]. --- Jura 00:36, 18 April 2020 (UTC)[reply]
It can't really work with harvesttemplates .. wrong wiki. --- Jura 17:15, 21 April 2020 (UTC)[reply]

Structured Data On Commons

[edit]

The genealogical tables could be described very well with Structured Data on Commons. I tried something on: Genealogical Table of House of Sedlnitzky

--Mfchris84 (talk) 09:27, 18 April 2020 (UTC)[reply]

There is much to added around here .. maybe SDC will be fully operational by then ;) --- Jura 10:56, 19 April 2020 (UTC)[reply]

German Wikisource

[edit]

Hi @Jura1:, your edit on German Wikisource Vorlage:BLKÖ was reverted through an IP. I don't want to start an edit war right now. As i promised weeks ago, i will asap inform the deWikiSource Community about the ongoing edits on Wikidata. With some information on data model and some visualizations about the power of querying and liking items on Wikidata it won't be hard to get consensus for your idea to add the Person-QID in the BLKÖ-Infobox on WikiSource. --Mfchris84 (talk) 11:58, 22 May 2020 (UTC)[reply]

@Mfchris84: No worries. It's one of the steps I had listed on Help:Import_BLKÖ_from_wikisource#Steps and I consider it done now.
If I would be active there, I'd probably revert the IP and eventually ask for the page to be semi-protected. We don't want random IPs to revert edits for no reason. Feel free to do what you prefer, I wont follow-up on it. The link on "biographical articles" did make it easier to switch from ws to wd. --- Jura 11:54, 23 May 2020 (UTC)[reply]
@Mfchris84: I think it would still be useful to navigate there. Would you kindly restore it? --- Jura 16:39, 5 July 2020 (UTC)[reply]
@Jura1: i have posted s:de:Wikisource:Skriptorium#BLKÖ_und_Wikidata on German Wikisource that is now time to extend the textbox template there with a link to the biographic wikidata item. as it is usual on German Wikisource, i will wait ten days to restore your edit on the template. Mfchris84 (talk) 07:49, 16 July 2020 (UTC)[reply]
@Mfchris84: thanks. --- Jura 08:31, 18 July 2020 (UTC)[reply]
@Mfchris84: I was expecting some lively discussion (?) --- Jura 07:43, 23 July 2020 (UTC)[reply]
@Jura1: in a certain way, i expected it too, but it seems to be not ;-) two thanks on my Skriptorium-Edit indicates that there will be a 'consensus' on make the extension of the template available, but let us wait the typical 10-14 days in German Wikisource after announcing the enhancement before i will restore your edit on the template. (The deWS-community isn't huge, but very accurate! e.g. missing a previous discussion was presumably the only reason for reverting your edit) Mfchris84 (talk) 07:48, 23 July 2020 (UTC)[reply]
@Mfchris84: no hurry. BTW, there is an upcoming online event that also publishes "resource papers" about Wikidata. I was wondering if you would be interested in co-authoring one about these items. Maybe Q94383904 could be a model for it. Timeframe is somewhat short though. --- Jura 08:12, 23 July 2020 (UTC)[reply]
@Jura1:, oh that sounds very interesting! curiously i am thinking also about a ressource paper on linking the first German illustrated magazine Die Gartenlaube - also transcribed on German Wikisource - to Wikidata and Structured Data on Commons. So we could merge both ideas into on ressource paper (like 'German Wikisource bibliographic metadata') or go forward with two separate projects to get out more the specificities (BLKÖ has a lot of special structure in data, Gartenlaube is a more than less straight modeld magazine) --Mfchris84 (talk) 13:27, 23 July 2020 (UTC)[reply]
@Mfchris84: I'm not really knowledgeable about the other. BLKO has the advantage that WMF people might be able to relate to it .. it's a WP 170 years before WP. --- Jura 13:45, 23 July 2020 (UTC)[reply]
@Jura1: let's try it on two different papers hopefully that won't be 'too much German Wikisource' for the reviewers. ;-) Mfchris84 (talk) 13:48, 23 July 2020 (UTC)[reply]
@Mfchris84: ok. Uses of either resource is likely different as well. BTW, would you happen to have access about the papers published about the (paper) BLKO? The German article lists a few, but it might take me some time to retrieve through my channels. --- Jura 13:56, 23 July 2020 (UTC)[reply]
@Jura1: sorry which papers do you mean precisly?  – The preceding unsigned comment was added by Mfchris84 (talk • contribs).
Wurzbach-Aspekte (Q97826757) and Q97824007 mainly. I thought they were mentioned in the article about the work, but it's in the biography. Anyways, found them in the meantime. --- Jura 11:29, 12 August 2020 (UTC)[reply]
@Jura1: German Wikisource BLKÖ Infobox shows now the link to the corresponding Wikidata-Item. --Mfchris84 (talk) 12:36, 19 August 2020 (UTC)[reply]

Use as reference

[edit]

A series of statements are now referenced with article items. numbers/refs/by property does a break-down by property (notably, dates and places of birth/death, occupation).

This is picked up by in some Wikipedia editions:

--- Jura 11:54, 23 May 2020 (UTC)[reply]


Phase completed

[edit]

@Mfchris84, M2k~dewiki, Pyfisch, Zabia, AndreasPraefcke:

All articles now have items and (almost) all main subjects for biographical articles are defined.

There are a few things that are still ongoing (see steps), but the bulk of the work is done.

I'd glad to have your feedback on the current status and suggestions of things we could do later/reports we could add. As I understand it, Mfchris84 will share this with the wider dewikisource community at a later stage.

For Wikisource, I made a suggestion for a change to the infobox here (see discussion above). If you need help to code more features, I can give it a try (or find someone who knows how). Maybe an additional (minor) one: personally, I'd have placed the genealogical tables on separate pages (and possibly transcluded into the family articles). This would also make it easier to analyze the text of the article.

Anyways, congrats to the amazing effort to ensure transcriptions of all these articles and thanks to help create these items.

As the 129th anniversary is coming close, maybe one of you wants to size the occasion for wider circulation of this. --- Jura 14:09, 24 May 2020 (UTC)[reply]


Additional articles and mentions of the same person

[edit]

This search finds a series of entries about the same person. Essentially, these are part of the following types:

  • (1) cross-reference (using cross-reference (Q1302249)): this would generally be a short note about some person already discussed in another article. This can be an article about the person or its family. Some of these notes are longer than others. The Wikisource edition has cross-references not present in the original work.
  • (2) addendum (using addendum (Q352858)): a note published later explicitly completing (or correcting) the biography of the person. This is a sample list of such addenda. It also includes entries about new people. Wurzbach stopped publishing addenda at some point. Members of the Habsburg family described in volumes 6/7 (published 1860/1861) are also present as Grand Dukes of Tuscany in volume 46 (1882). Others were already featured as members of the house of Este in volume 4 (1858). The Tuscany articles can also be seen as a full rewrite.
  • (3) duplicate entry (using duplicate entry (Q1263068)): a second article about the same person that doesn't reference the first entry (in one way or the other). For some, the text strongly suggests that it's the same person, for others only additional information allows to draw that conclusion.
  • (4) conflation: according to dewiki this entry (and a tombstone) incorrectly conflate two persons.

For most of the above, I added corresponding P31. There may be some mixup between addendum (2) and duplicate entry (3), as this isn't necessarily visible when checking only the text presented on a single Wikisource page. --- Jura 08:31, 18 July 2020 (UTC)[reply]

Cites

[edit]

I made some progress on these. 44% of biographical articles now have at least one work cited (sample).

Some of the works cited are also suitable for "described by source" on a persons article (sample). One of the last columns in the "by work" list mentions the percentage where this is done.

Once a reference is added to one entry, the report at Help:Import_BLKÖ_from_wikisource/reports/cites/1 makes it easier to identify more articles with the same reference (the string search fails in some cases). --- Jura 08:31, 18 July 2020 (UTC)[reply]

Items for works in both lists are now available. For some, I should be able to add more info from there, for others I hope someone else will enventually do it from other sources.
Basic data I try to have: is it a periodical? If not: when was it published? language is generally already included. --- Jura 07:43, 23 July 2020 (UTC)[reply]

Geschlecht automatisiert eintragen

[edit]

Viele Datenobjekte zu Personen haben Aussagen für ihre Vornamen, aber kein eingetragenes Geschlecht: Männer ~5000 Einträge, Frauen ~70 Einträge. Spricht etwas dagegen das Geschlecht automatisiert einzutragen? --Pyfisch (talk) 12:00, 24 August 2020 (UTC)[reply]

über den Bezug auf male|female given name. Gute Idee! Go ahead @Pyfisch: ;-) --Mfchris84 (talk) 17:21, 24 August 2020 (UTC)[reply]

Lexeme coverage

[edit]

There is now a tool to calculate lexeme coverage for Wikipedia, e.g. Lexicographical_coverage#de and Lexicographical_coverage/Missing/de (exceptions: Lexicographical_coverage/Filter/de).

It would interesting to have same for s:de:Kategorie:BLKÖ:

Maybe one should filter against items for family names/given names as well, maybe not.

There is an attempt to check places of births against lexemes at /reports/names/comparison/place_of_birth.

@DVrandecic (WMF): how complicated would it be to generate these? Can you run them with the existing tool? --- Jura 11:51, 5 March 2021 (UTC)[reply]

@Jura1: I would do that in two steps: first, one to acquire the corpus - to create text files with just the text. Second, to run the completeness script against that. I am happy to help with the second step, or even to do it - that is comparatively little work. But the first step would be a little bit of work, I guess. --DVrandecic (WMF) (talk) 19:35, 5 March 2021 (UTC)[reply]

@DVrandecic (WMF): sounds good. Can you work with the output from s:de:Special:Export,

I think anything that is in a template can be stripped. Also, I added frequent first names to Help:Import BLKÖ from wikisource/reports/lexemes/coverage/filter.

Something similar based on a fr-Wikisource category should give 100% coverage. --- Jura 13:14, 6 March 2021 (UTC)[reply]

@Jura: Yes, that it is a good start. Stripping the templates etc, setting up the downloads, etc. is what I'd prefer to be done by someone else. Have me one file with all the clean text, I will do the rest :) --DVrandecic (WMF) (talk) 19:12, 6 March 2021 (UTC)[reply]