User talk:Matěj Suchánek

Jump to navigation Jump to search

About this board

Xaris333 (talkcontribs)
Matěj Suchánek (talkcontribs)

Do you mean "from the article via the infobox"?

Xaris333 (talkcontribs)

No. I want to have the value in the body of the article. Not the template.

Matěj Suchánek (talkcontribs)

You can use the Wikidata template with arguments id=Q28870988|property=P1082. Or if it's more convenient for you to use page names, then you can specify it with title=.

Xaris333 (talkcontribs)

I was trying this but without property=, just P1082 :) Thanks!!

Reply to "Property of a different item"
Jura1 (talkcontribs)

Good initiative with the cleanup of the disambiguation items!

Looks like we have plenty of them without even some valid sitelink. The deletion list exploded.

Matěj Suchánek (talkcontribs)

Messy disambiguations are a great debt that we haven't got under control since Wikidata's establishment. I'll try to help with the deletion list.

My intention was to have them all imported to Magnus' Duplicity, so that users can help with connecting them where they belong. However, this initiative is being disrupted by Emaus' bot, which keeps readding some of those links.

By the way, aware of ?

Jura1 (talkcontribs)

I thought that cebwiki stuff had finally stopped ..

Matěj Suchánek (talkcontribs)

(Now I realise you were talking about disconnecting redirects, so ignore the rest.)

Jura1 (talkcontribs)

Still: interesting tool.

  • Seems you cleaned out most redirects. I wonder how many pile up in a week or a month. Maybe it's sufficient to remove them once a month.
  • I couldn't find any deleted pages or "false positives".
  • For articles, enwiki has some subcategories for disambiguation pages. Maybe these should be filtered too. I don't think much would be lost by automatically undoing things like the bot edit you mention.
Matěj Suchánek (talkcontribs)

Deleted pages were all cleaned up, "false positives" is a feature I haven't implemented yet (it is meant to help with the following).

For the last issue, I don't understand why some wikis tend to stop tagging disambiguations with __DISAMBIG__ (enwiki, frwiki...). Don't they know this is the modern way to tell the tools that a page is a disambiguation?

Jura1 (talkcontribs)

The other day, I tried adding an infobox to an enwiki article to try to figure out what it displays .. I think it only showed P18. So explaining what may be the modern way probably has still a long way to go.

From my pov, the main problem with these items is that they may bury links to actual articles (similar to the Malay link you mentioned). Removing these links should probably be the priority. Identification would probably need some fine tuning for each wiki.

The other problems is that items that only link disambiguation pages are generally not useful for any other purpose at Wikidata. As we have gotten better with constraints, add systematically P31 and descriptions to these, I think this is less an issue now. Maybe Duplicity should skipped them directly.

Matěj Suchánek (talkcontribs)

the main problem with these items is that they may bury links to actual articles - yes, this was the trigger of my efforts. By the way, there are 85,026 entries in the tool's database (which we may consider the upper bound).

Anyway, I'll see what can be done about those "mis-flagged" pages. Unless something better, I'll make and periodically run a bash script with wiki-category pairs and mark the pages as FALSE.

Jura1 (talkcontribs)

Couldn't we disconnnect to others directly?

If it helps, I can add some statement to category items for those categories that may include pages that can be linked from dab items.

Matěj Suchánek (talkcontribs)
Jura1 (talkcontribs)

yeah, or category combines topic: "para-disamiguation and disambiguation" applies to part : enwiki

Is there a way to list the categories the 85000 sitelinks (or at least the enwiki ones are in?).

For enwiki, I think anything named "set index on" or "disambiguation" shouldn't be bothered. Maybe some others too.

Matěj Suchánek (talkcontribs)

To be honest, no way that I'd like to start. (I cannot join tables from diffrent databases, page titles are stored differently, not to mention how clumsy I am when working on Toolforge...)

Jura1 (talkcontribs)

Let me think about it. How many are there for enwiki of the 85000?

Matěj Suchánek (talkcontribs)

34,727 (obviously the first)

Jura1 (talkcontribs)
Matěj Suchánek (talkcontribs)
Jura1 (talkcontribs)

It mentions Quarry explicitly.

I did some checks on the 20 items I found with PetScan and w:Category:Living people. Interesting sample. Some have items, others not. Some happened through page moves, page conversions or merely interwiki additions by users or the bot mentioned above.

Matěj Suchánek (talkcontribs)

Whatever, Quarry can only access public replicas, not the tool's private one (it would need to know my credentials etc.).

FYI EmausBot scoring again.

Jura1 (talkcontribs)
Jura1 (talkcontribs)

I will have a look at the tool.

Personally, I still like my plan outlined some time ago. I don't think we should invest too much time in manual edits in this.

Reply to "cleanup"

Mark for translation

2
Summary by Epìdosis

Done

Epìdosis (talkcontribs)

Could you mark this for translation? Thank you!

Matěj Suchánek (talkcontribs)

Done. (I usually check pages to mark on weekends but yesterday I forgot again.)

Property P361 in items of Moscow Metro stations

2
Michgrig (talkcontribs)

Hi Matěj,

Based on your batch 2886, QuickStatementsBot removed property P361 from items of Moscow Metro stations (example). If this property is not for this case, then please advice us what property we should use to say that those stations are part of Moscow Metro?

Matěj Suchánek (talkcontribs)
Reply to "Property P361 in items of Moscow Metro stations"
Oravrattas (talkcontribs)

Hi,

Your bot appears to be removing important information from labels of elections: e.g. turning "Saint Lucian general election, 2016" into simply "Saint Lucian general election", or "United Kingdom general election, 2015" into "United Kingdom general election". Can you please prevent it from doing this?

Matěj Suchánek (talkcontribs)

I have stopped it for a while.

Oravrattas (talkcontribs)

Thanks. Are you able to see what other edits like this it might have made, to undo them?

Matěj Suchánek (talkcontribs)

I have made a list of items that were modified by my bot with cleanup at the end of the summary. Certainly not all of those 40,000 items have got this problem. I will scan all of those changes and see what I can do. Do you think this is only specific for English?

Reply to "removing years from Election labels"
Gkml (talkcontribs)
Matěj Suchánek (talkcontribs)

You will find more information on Help:Merge. In short, you will either:

  1. activate the tool Merge in the preferences
  2. or use this form
Reply to "Thanks to inform me"

Why require {{Q|21503247}} {{P|131}} for {{P|197}}?

2
Liuxinyu970226 (talkcontribs)

I don't think stations in Singapore do even need to find P131 value(s).

Matěj Suchánek (talkcontribs)
Reply to "Why require {{Q|21503247}} {{P|131}} for {{P|197}}?"

Pages that link to Wikidata (Q2013)

4
Summary by Matěj Suchánek

This will have been done by the end of weekend

Dcljr (talkcontribs)

At Wikidata:Project chat, we said:

Many pages, including things like Berlin Hauptbahnhof (Q1097) and .NET Framework (Q5289), are linked directly to Wikidata (Q2013), apparently through references on certain properties that say they are "imported from Wikidata". Is this legit? Should a Wikidata item be allowed to cite Wikidata as a source? - dcljr (talk) 06:52, 24 August 2017 (UTC)
These references should be replaced using inferred from (P3452). I have got an approval for my bot to clean up this kind of sourcing, I could also take a look at those. Matěj Suchánek (talk) 07:10, 24 August 2017 (UTC)
[…]

Since this exchange has now been archived, I just wanted to remind you about it and ask: is there any progress on this? You have decided what, if anything, should be done?

Matěj Suchánek (talkcontribs)

Thanks for your reminder. There hasn't been any progress yet because my bot was completely occupied with another (major) task... until yesterday (yes, it took months). So now I will try to focus on this one. I will notify you about my progress here later.

Dcljr (talkcontribs)
Matěj Suchánek (talkcontribs)
TomT0m (talkcontribs)
Matěj Suchánek (talkcontribs)
TomT0m (talkcontribs)

Seriously, lua functions do not handle redirects ? We should fix them then. I’ll see with constraint reports guy if it’s possible to do a second pass with constraint violation to see if a percieved violation is a redirect and remove it for the result anyway.

Jura1 (talkcontribs)

For instance of Q47150325, the sitelink titles aren't necessarily ideal labels

Matěj Suchánek (talkcontribs)

I agree. But we can run a bot over all of them with the correct label (if I had a pattern for each language).

Jura1 (talkcontribs)

For languages I had that, it's mostly done. Could you skip these for now? It took considerable effort to delete all the Wikinews strings in these items.

Matěj Suchánek (talkcontribs)

This is a query that can be used to import labels in any language if the language has a proper label for the given month:

SELECT ?item ?lang ?new WITH {
  SELECT DISTINCT ?lang ?month ?pattern {
    VALUES ?lang { 'es' 'tr' } .
    ?item wdt:P31 wd:Q47150325;
          wdt:P585 ?date;
          rdfs:label ?label FILTER( LANG( ?label ) = ?lang ) .
    ?item ^schema:about [ schema:inLanguage ?lang; schema:name ?title ] FILTER( ?title != ?label ) .
    BIND( YEAR( ?date ) AS ?year ) .
    BIND( MONTH( ?date ) AS ?month ) .
    BIND( DAY( ?date ) AS ?day ) .
    BIND( REPLACE( ?label, STR( ?year ), '\\$year' ) AS ?pattern1 ) .
    BIND( REPLACE( ?pattern1, STR( ?day ), '\\$day' ) AS ?pattern ) .
    FILTER( ?label != ?pattern1 && ?pattern1 != ?pattern ) .
    FILTER( !REGEX( ?pattern, '\\$day.*\\$day' ) ) .
  }
} AS %patterns WHERE {
  INCLUDE %patterns .
  ?item wdt:P31 wd:Q47150325;
        wdt:P585 ?date .
  FILTER( MONTH( ?date ) = ?month ) .
  BIND( REPLACE( ?pattern, '\\$year', STR( YEAR( ?date ) ) ) AS ?_new ) .
  BIND( REPLACE( ?_new, '\\$day', STR( DAY( ?date ) ) ) AS ?new ) .
  MINUS { ?item rdfs:label ?new } .
} ORDER BY ?date ?lang

Try it!

Jura1 (talkcontribs)

Looks good. It seems to be an endless work as your bot keeps adding more:

Reply to "labels"