User talk:Trilotat

From Wikidata
Jump to navigation Jump to search

pulling from talk page at Q59806611[edit]

@Daniel Mietchen: I would be really happy also to work on a dedicated corpus. Truth is, I'm not sure how that even works. I'm really new (and admittedly obsessing over) wikidata and am really erratic in my effort. I'm not organized or coherent in my work, so some direction (and accountability) would do me good. I welcome your advice. Trilotat(talk) 14:17, 18 December 2018 (UTC)

family name "Παπούλιας/Papoulias" set on several items by your quickstatements batch[edit]

Hi there! Have a look at this: https://www.wikidata.org/w/index.php?title=Q59863249&diff=814531120&oldid=prev

https://www.wikidata.org/w/index.php?title=Q59861384&diff=814508671&oldid=prev

https://www.wikidata.org/w/index.php?title=Q59860599&diff=814497249&oldid=prev

https://www.wikidata.org/w/index.php?title=Q59865998&diff=814551273&oldid=prev

It looks like the family name Παπούλιας (Q59208560) is being set on a few items that might not be named that. Can you take a look? Moebeus (talk) 18:11, 18 December 2018 (UTC)

@Moebeus: I have no idea why that is happening. I simply created an item for them with their ORCID. @Daniel Mietchen: Have you ever seen anything like this? Is there something with their last name that's creating the family name property? Trilotat (talk) 20:33, 18 December 2018 (UTC)

Erratum[edit]

Let's move discussion here, if that's okay, so we don't bore the pants off the Request a Query page. Here are a next set of queries; I've added a line to exclude Erratum which point back to themselves - I think that was caused by glitchy data in the report servers. Not sure where you stand on wanting additional checks that the item found is the antecedent of the erratum ... adding a couple of lines which say ?errata wdt:P1433 ?PI . ?item wdt:P1433 ?PI . will check they were both published in the same journal. Where I got up to last night was trying the same trick with author names, but not really getting the response I expected. I've put this page on my watchlist, so if you care to discuss, I'll be alerted. I'll go off and play with the reports some more and see if I come up with anything interesting. I think the more of these you can knock on the head with quickstatements, the more we'll see the wood for the trees. (Equally quickstatements is being a bit bothersome right now - lots of complaints about it logging users out. Nothing's easy.) --Tagishsimon (talk) 17:58, 19 January 2019 (UTC)

Erratum where title starts - Erratum to:

SELECT ?errata ?errataLabel ?itemLabel ?item
WHERE 
{
  hint:Query hint:optimizer "None" .
  ?errata wdt:P31 wd:Q1348305 .                  # ?errata is an errata
  filter not exists {?itemZ wdt:P2507 ?errata . } # there's no ?item pointing to the errata.
  ?errata rdfs:label ?errataLabel . filter(lang(?errataLabel)="en")
  filter(strstarts(?errataLabel,"Erratum to:"))
  bind(replace(?errataLabel,"Erratum to: ","") as ?itemLabel) .
  ?item rdfs:label ?itemLabel.
  filter(strlen(?itemLabel)>20)
  filter(?item != ?errata)
}
Try it!

Erratum where title starts - Corrigendum:

SELECT ?errata ?errataLabel ?itemLabel ?item
WHERE 
{
  hint:Query hint:optimizer "None" .
  ?errata wdt:P31 wd:Q1348305 .                  # ?errata is an errata
  filter not exists {?itemZ wdt:P2507 ?errata . } # there's no ?item pointing to the errata.
  ?errata rdfs:label ?errataLabel . filter(lang(?errataLabel)="en")
  filter(strstarts(?errataLabel,"Corrigendum:"))
  bind(replace(?errataLabel,"Corrigendum: ","") as ?itemLabel) .
  ?item rdfs:label ?itemLabel.
  filter(strlen(?itemLabel)>20)
  filter(?item != ?errata)
}
Try it!

Erratum where title starts - Corrigendum to:

SELECT ?errata ?errataLabel ?itemLabel ?item
WHERE 
{
  hint:Query hint:optimizer "None" .
  ?errata wdt:P31 wd:Q1348305 .                  # ?errata is an errata
  filter not exists {?itemZ wdt:P2507 ?errata . } # there's no ?item pointing to the errata.
  ?errata rdfs:label ?errataLabel . filter(lang(?errataLabel)="en")
  filter(strstarts(?errataLabel,"Corrigendum to:"))
  bind(replace(?errataLabel,"Corrigendum to: ","") as ?itemLabel) .
  ?item rdfs:label ?itemLabel.
  filter(strlen(?itemLabel)>20)
  filter(?item != ?errata)
}
Try it!

Erratum where title starts - Erratum: Corrigendum:

SELECT ?errata ?errataLabel ?itemLabel ?item
WHERE 
{
  hint:Query hint:optimizer "None" .
  ?errata wdt:P31 wd:Q1348305 .                  # ?errata is an errata
  filter not exists {?itemZ wdt:P2507 ?errata . } # there's no ?item pointing to the errata.
  ?errata rdfs:label ?errataLabel . filter(lang(?errataLabel)="en")
  filter(strstarts(?errataLabel,"Erratum: Corrigendum:"))
  bind(replace(?errataLabel,"Erratum: Corrigendum: ","") as ?itemLabel) .
  ?item rdfs:label ?itemLabel.
  filter(strlen(?itemLabel)>20)
  filter(?item != ?errata)
}
Try it!

Erratum where title starts - Erratum: correction:

SELECT ?errata ?errataLabel ?itemLabel ?item
WHERE 
{
  hint:Query hint:optimizer "None" .
  ?errata wdt:P31 wd:Q1348305 .                  # ?errata is an errata
  filter not exists {?itemZ wdt:P2507 ?errata . } # there's no ?item pointing to the errata.
  ?errata rdfs:label ?errataLabel . filter(lang(?errataLabel)="en")
  filter(strstarts(?errataLabel,"Erratum: correction:"))
  bind(replace(?errataLabel,"Erratum: correction: ","") as ?itemLabel) .
  ?item rdfs:label ?itemLabel.
  filter(strlen(?itemLabel)>20)
  filter(?item != ?errata)
}
Try it!
I'm fine with moving it here. I appreciate the time you're spending with the effort and with me. Regarding QS, I was afraid that logging out problem might be something on my end (thx for validation that it's not). I do think the erratum article should be in same publication, but not sure if you want to bog down the report to achieve that. I'm not sure there's much risk of matching titles in different publications (maybe there is.) I will keep revisiting the list. There's so many, I'll make every effort to plug away at it via QS.
Well, these queries are now (at this moment and maybe just for the moment) returning no results. Thanks for the amazing help. I'll keep plugging away at these queries. Trilotat (talk) 06:03, 20 January 2019 (UTC)
Here are a couple of additional variations, dealing with other formats of the item label. Can I leve it to you to ring the changes on this for "Corrigendum:", "Errata to:" and "Errata"? Good work, btw. The overall list is decreasing nicely. --Tagishsimon (talk) 13:51, 20 January 2019 (UTC)
NP!! You're the creative talent on this project. I'm the mindless editor! I'll take from here. You're brilliant! Trilotat (talk) 19:13, 20 January 2019 (UTC)
SELECT ?errata ?errataLabel ?itemLabel ?item
WHERE 
{
  hint:Query hint:optimizer "None" .
  ?errata wdt:P31 wd:Q1348305 .                  # ?errata is an errata
  filter not exists {?itemZ wdt:P2507 ?errata . } # there's no ?item pointing to the errata.
  ?errata rdfs:label ?errataLabel . filter(lang(?errataLabel)="en")
  filter(strstarts(?errataLabel,"Corrigendum to:"))
  bind(replace(?errataLabel,"Corrigendum to: “","") as ?111) .
  bind(replace(?111,"”","") as ?itemLabel) .
  ?item rdfs:label ?itemLabel.
  filter(strlen(?itemLabel)>20)
  filter(?item != ?errata)
}
Try it!
SELECT ?errata ?errataLabel ?itemLabel ?item
WHERE 
{
  hint:Query hint:optimizer "None" .
  ?errata wdt:P31 wd:Q1348305 .                  # ?errata is an errata
  filter not exists {?itemZ wdt:P2507 ?errata . } # there's no ?item pointing to the errata.
  ?errata rdfs:label ?errataLabel . filter(lang(?errataLabel)="en")
  filter(strstarts(?errataLabel,"Corrigendum to:"))
  bind(replace(?errataLabel,"Corrigendum to: “","") as ?111) .
  bind(strbefore(?111,"”") as ?222) .
  bind(replace(?222,"”","") as ?itemLabel) .
  ?item rdfs:label ?itemLabel.
  filter(strlen(?itemLabel)>20)
  filter(?item != ?errata)
}
Try it!

What about when the errata is another "scholarly article"?[edit]

Thanks again for continuing to help me with on this... I noticed that the queries filtered for instance of (P31) erratum (Q1348305), but there are times when the errata is also scholarly article (Q13442814), eg:.

Correction to “Localized gravity/topography admittance and correlation spectra on Mars: Implications for regional and global evolution” (Q58090997)

I tried to edit the queries to filter for scholarly articles and it failed. Notice that this starts with "Correction to"? I also tried to edit that text string, into the queries and it also failed. Drat. You did say text strings are problematic... Trilotat (talk) 02:18, 20 January 2019 (UTC)

That's a big problem. There are millions of items with P31=scholarly article (Q13442814) - too many to enable us to select them all and filter them for string-starts-with "Correction to". @Fnielsen: - the game here, Finn, is to find all scholarly articles with titles starting "correction to" or "erratum to", &c, such that we can point the erratum back at the principal paper, and the principal to the errata paper. Might you have tools or a dataset which would facilitate coding any such item with erratum (Q1348305) rather than or in addition to scholarly article (Q13442814)? --Tagishsimon (talk) 02:57, 20 January 2019 (UTC)
It occurs to me that this query will find the problem children; and you can change the search term in the line mwapi:srsearch "Errata haswbstatement:P31=Q13442814".
SELECT DISTINCT ?item ?itemLabel 
WHERE {
  hint:Query hint:optimizer "None".
  SERVICE wikibase:mwapi {
    bd:serviceParam wikibase:api "Search";
                    wikibase:endpoint "www.wikidata.org";
                    mwapi:srsearch "Errata haswbstatement:P31=Q13442814".
    ?title wikibase:apiOutput mwapi:title.
  }
  BIND(IRI(CONCAT(STR(wd:), ?title)) AS ?item)
  FILTER NOT EXISTS { ?item wdt:P921 wd:Q1348305. }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
# LIMIT 10000
Try it! --Tagishsimon (talk) 13:36, 20 January 2019 (UTC)
Cool. I would say the REAL PROBLEM is those erratum that are called Erratum (of which there are 311), even worse when they hide the name of the original article behind a paywall. Aaargh. I will start connecting the dots where I can. Trilotat (talk) 14:37, 20 January 2019 (UTC)

copy your queries to my SPARQL user page[edit]

@Tagishsimon: May I copy your queries to my SPARQL user page here:

SPARQL Queries I use

I will certainly give you credit.

No credit required; do as you wish with them :) --Tagishsimon (talk) 19:33, 20 January 2019 (UTC)
Thanks!
Per your Retractions query request [1], I've coded a bunch of papers as retraction (Q45203135) and/or retraction notice (Q7316896). You can probably adapt one of the earlier queries to look for the source papers & link them. --Tagishsimon (talk) 02:00, 25 January 2019 (UTC)

RegEx[edit]

@Eihel: You mentioned at [2] that you "allowed [yourself] to make a RegEx." What does that mean? Thanks Trilotat (talk) 11:54, 2 April 2019 (UTC)

Hello Trilotat, I mentioned "simplified". A regex with [A-Z0-9]{1} is identical to [A-Z0-9]. Then I added a line comment on my modification. You can test [A-Z0-9]{1}, it will be written "meaningless quantifier". Cordially. --Eihel (talk) 12:18, 2 April 2019 (UTC)
@Eihel: I understood about 5% of your reply... I will try to learn enough about what regex is and does and get back to you. Thanks for your patience as I attempt to muddle through this. Trilotat (talk) 13:14, 2 April 2019 (UTC)
  • The first part between square brackets is a list. In this list, the characters in upper case from A to Z and the numbers from 0 to 9 are allowed, 1 time. (ASCII characters)
  • A list or any other thing may be followed by a "quantifier" between braces. For example, a{3} are exactly 3 a consecutive, no more, no less.
  • If my identifier includes a{3} and I write "abc", the last 2 characters are false and the Regular Expression is false. If we write a, it's already one a. Following it with the quantifier {1} is useless.
  • The RegEx allows a constraint on the value. If the user writes a value that does not match the RegEx, an error occurs. It also helps prevent vandalism. Do you arrive at 6%? Moreover if you have problems for a new Property, I am ready to help you (to the extent of my possibilities), you don't disturb me absolutely. Wikipedially. --Eihel (talk) 13:52, 2 April 2019 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── @Eihel: Eihel, do you think that the proposed NGMDb ID is ready for creation? I'm anxious to start applying the id to publications/articles/maps. Thanks in advance. Trilotat (talk) 22:07, 18 April 2019 (UTC)

✓ Done --Eihel (talk) 00:42, 19 April 2019 (UTC)

Scraping websites[edit]

@Eihel: Hello... I am trying to build a scraping protocol to pull data from each of the ~16K Geolex ID (P6202) units at https://ngmdb.usgs.gov/Geolex/search. Doing so requires that I write a line that searches https://ngmdb.usgs.gov/Geolex/Units/ for the variety of possibilities that look like:

I need to replace the "Aalenian_11821" part so it's replaced with wild cards. The form is a term that starts with a capital letter, some lower case letters (there could be another capitalized term for two word units like GlenRose_8370), followed by a number of ?count of digits. I'm using webscraper. I thought the REGEX terms were the terms I'd use, but I'm ignorant of that language. Thoughts on how to write the search? Thanks, Trilotat (talk) 17:06, 12 April 2019 (UTC)

✓ Done + corrections, Geolex ID (P6202). Looking forward to hearing from you. --Eihel (talk) 19:40, 18 April 2019 (UTC)

Your modification[edit]

Hello, if you have a reference, add it, but do not delete another reference to put yours. Thank you. --Eihel (talk) 22:24, 18 April 2019 (UTC)

Where did I do that such that you see it as incorrect? I recently replaced another reference that I had previously added, but today realized is a bad link. Trilotat (talk) 22:26, 18 April 2019 (UTC)
This info here lists references that weren't valid, so I replaced with good IDs. It was apparent when I was a rookie editor, that's the reference I was trying to provide.... I think. I am sensing that was bad form. Trilotat (talk) 22:40, 18 April 2019 (UTC)
You're right: error 404. Sorry. --Eihel (talk) 22:43, 18 April 2019 (UTC)