Topic on User talk:Oravrattas

Jump to navigation Jump to search
Pallor (talkcontribs)

Please don't do this, because you will delete data. If you are unsatisfied with the form of data entry, you have the option to display the data arranged in the form you think is appropriate, but the sudden deletion of the data is against the interests of the project. Pallor (talk) 21:54, 21 January 2023 (UTC)

Oravrattas (talkcontribs)
Pallor (talkcontribs)

Again, I ask you not to delete sourced data from Wikidata. I draw your attention to two important facts:

  1. you have the opportunity to correct the data you believe to be incorrect
  2. "It doesn't make sense" can't be an argument for any deletion, probably the person who uploaded the data saw their meaning.

If you don't like it, just leave it and move on.

Oravrattas (talkcontribs)

Data entered in this way is fundamentally incorrect, and no more valid than a single statement on Elizabeth Taylor (Q34851) like:

spouse (P26)
Normal rank Richard Burton (Q151973)
start time 15 March 1964
10 October 1975
end time 26 June 1974
29 July 1976
0 references
add reference


add value

A human can certainly make some sense out of a claim like this, but this is not how we choose to enter data, because Wikidata is primarily for machines to understand, and so has certain semantics for what data means. Glomming multiple incompatible qualifiers onto a single statement causes many standard SPARQL queries to produce incorrect results, and makes it impossible to add any further qualifiers without making things even worse.

In the specific case here—that of someone holding the same political office over multiple consecutive periods—the standard data model provides two ways to enter that information. I corrected the data to the format of the first approach. Moving it to the second approach is significantly more time-consuming, and so I chose not to do so here, given the first approach fundamentally does not lose any information.

Pallor (talkcontribs)

Unfortunately, we are not talking about the same thing.

You are talking about the structure of the data - which can be the topic of a separate discussion - and I am talking about whether one editor has the right to delete what the other editor entered into the database and supported it with a source.

The part of your answer that referred to the last data deletion is false. You write that you repaired the element, but this is not true, you damaged it a lot.

There is no query that shows the finance minister for the governments involved since you deleted the governments. There is no query that would show who were the representatives of Fidesz in the relevant legislative term, since you deleted the qualifier for the term. (Of course, you can write a query in which you enter the start and end dates of the term, but this is more difficult because you have to look up the data).

The merged terms can be broken apart with a bot afterwards, because the corner dates (beginning and end of the term) are available, but if you erase important data, the person operating the stick has to sweat blood to remodel, for example, governments or legislative terms from the incomplete data.

Once again, I ask you not to delete sourced data from Wikidata. This data is used by other queries (such as this and this etc.), and information is lost if you start randomly deleting them. Fix it or leave it.

Oravrattas (talkcontribs)

As far as I can see Wikidata:WikiProject every politician/Hungary/Cabinet/Q111475445 does not use the P5054 qualifier anywhere within the query, so I'm unsure how deleting it from any statements would affect the result. Rather it takes the approach of looking for P39 statements with dates that intersect with boundary dates of the bound cabinet item.

A better version of the legislative term members query (which IIRC is in use for similar pages in some other countries) would operate in the same manner, and be more generically useful, as in most countries relying on a P2937 qualifier will miss a lot of data. (In the case of Wikidata:WikiProject every politician/Hungary/data/Assembly/1998-2002 mentioned above, for example, it would also find István Csurka (Q721791) and István Nyakó (Q51880837) who both have P39s that span the term, but are missing a P2937 qualifier to it, and are thus currently absent from the table.)

These sorts of queries also show why the approach of glomming multiple P2937 or P5054 qualifiers onto a single statement produces broken data. Simply adding an additional column for, say, elected in (P2715), produces a query equivalent to:

SELECT ?item ?itemLabel ?start ?end ?district ?party ?election ?electionLabel
WHERE 
{
  ?item p:P39 ?statement .
  ?statement ps:P39 wd:Q17590876  ; pq:P2937 wd:Q50357311.
  OPTIONAL { ?statement pq:P580 ?start }
  OPTIONAL { ?statement pq:P582 ?end }
  OPTIONAL { ?statement pq:P768 ?district }
  OPTIONAL { ?statement pq:P4100 ?party }
  OPTIONAL { ?statement pq:P2715 ?election }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en".
} }
Try it!

This tells us that Zoltán Balog (Q218882) and Imre Pesti (Q1233411) were elected to the 2006–2010 legislative term (Q50357311) in 2014 Hungarian parliamentary election (Q4275093), which is clearly wrong.

Pallor (talkcontribs)

Again, it's just about the data structure, which was useful, but I'd really like to get a feedback like "ok, I won't delete any more data". Unfortunately, this conversation will never be useful for both of us, as I have wanted to achieve this since January, but you avoid responding. The funny thing is that I understand what you write from the first moment, but you don't respond to what I ask.

The reason for combining the data of several cycles is that there is a lot of data that comes from several sources, so it has to be uploaded individually, it is not possible to speed up their transfer to Wikidata with a bot or QuickStatements. Therefore, the duration of data entry must be shortened. If there is no change during the representative mandate (he is elected in the same electoral district, he is represented by the same party, he does not switch to another faction, he does not interrupt his mandate), then there is a good chance that the data of several cycles will be entered in the same P39. If there is a change in any of the mentioned data, I separate the cycles. I have to come up with an optimal solution not only for today's and recent political elections, but also for the 15,000 mandates of more than 40 national election cycles of 175 years. When all the data is uploaded and the accompanying data is clarified, they can be separated with a bot (if anyone has the capacity to do so).

About P2715: I may be narrow-minded, but I find it almost impossible to have precise data for P2715 for any country. Here, one must think not only of the national parliamentary elections, but also of the regional and/or interim elections of one hundred to one hundred and fifty years ago, i.e. all electoral events through which a representative is elected to the parliament. It is also necessary to prepare for the registration of the data of the repeated elections ordered as a result of the successfully appealed results, that is, within a legislative cycle, the data of approximately 20-30 (or even more) elections should be available, which in the case of Hungary - also in terms of magnitude - should be more than a thousand election elements, so that the P2715 can be used. There are data on 42 cycles in Hungary, 58 in the United Kingdom, and 120 cycles in the United States, with an unimaginable number of elections: do you think it is really a realistic expectation that there should be an element for each of these? Are there any historical sources for all the small local by-elections that got representatives into parliament? If, on the other hand, there is no element about every choice, isn't it a conceptual error to build the data structure on that? I'm afraid - at least that's how I see it at the moment - I won't be the one to bring up the elements of all the Hungarian elections that will allow P2715 to be used.

You are right that P5054 is not used by the Q111475445 list, but this also has its drawback: I could not compile the composition of the historical counter-governments (governments operating in overlapping periods). Although I only tried in relation to Hungary, are you sure that it is possible to compile a list of the governments of the period of the Spanish Civil War, or the governments of any occupied country during the Second World War, without using P5054? But regardless of whether it succeeds or not, I use P5054 in various queries for data entry and verification, and that's reason enough to expect another user not to delete this feature. I hope you also sense the contradiction: you warn that certain queries won't work for adding representative seats with a combined legislative cycle, then delete data that my queries REALLY won't work with.

Thank you for pointing out the incomplete data - all the more so because there are some that refer to a term that I myself thought was accurate (post-2002). It is always possible to improve, but if you delete data in a randomly way, you make the other person's job extremely difficult, so I ask (not for the first time) not to do this.

Reply to "Data deletion"