Wikidata talk:Data donation

From Wikidata
Jump to navigation Jump to search


Hi @Lydia Pintscher (WMDE):

Data donations need champions feeling responsible for making them a success in the long run.

This phrase makes me a little unconfortable. "Champions", really ? I understand Wikidata needs to be kind to contributors but ... I personnaly would feel like a child beeing called a "champion of data maintenance" Face-smile.svg I realize I'm even more confortable than that. This is a collaborative project. Maintaining datas is a collaborative effort, it feel a little weird and hell of conservative to suppose a donator has a solution for maintenance problems. Could even be a love killer for him ... would be a shame he don't even try to go to bot import page if he has good datas but does not feel so sure of himself. TomT0m (talk) 19:20, 8 April 2015 (UTC)

I think this part is absolutely crucial for the long-term viability of Wikidata. I've talked to hundreds of organisations, companies and institutions interested in donating data. Only 2 understood initially that they can not just dump their data into Wikidata and expect magic to happen. They need to understand that working with Wikidata is more than just throwing some data over the fence and they don't if we don't very explicitly tell them. The text also says that it doesn't have to be them doing the long-term work. All in all survival of the community is much much more important than having a bit more data for Wikidata. --Lydia Pintscher (WMDE) (talk) 08:53, 9 April 2015 (UTC)
@Lydia Pintscher (WMDE): I'm doubtful. Predicting future is kind of an hard exercise, pretty much everyone who tried was wrong ... More than the chicken and egg problem (no data => no data, and nothing to maintain), nothing says that the path you envision is the right one and that this is the path that will make it work. This Wikimedia project has always been projects where users where encouraged to do any kind of edits, and WMF is currently angry of user, and that we take the problems and figure them out as they come. Is not it a little counter productive to scare contributors like that ? After all, they do as they can as the rest of us. If their datas proves to be useful no doubt community will figure something out on the way. I think of OSM. Who would have contributed mapping their neighborhood if someone had tell them "you first have to organise stuff with your neighbors such as they maintain the datas, even if you move away ?" The maintenance of datas is of course also a challenge for OSM I guess, but the initial effort is also a big thing. The project needed to be credible before some local administrations would look into helping it for example. TomT0m (talk) 15:03, 10 April 2015 (UTC)
We don't have the issue of not having data. We do have data. And we are already struggling to keep it in good shape. So we need to put measures in place to make it work. I am not predicting the future. I am looking at where we are right now and the mindset big data donations are offered with. We're not prohibiting anything but really people need to understand that there is more to it than just dropping a big pile of data. --Lydia Pintscher (WMDE) (talk) 15:11, 10 April 2015 (UTC)
@Lydia Pintscher (WMDE): How do we compare to freebase ? I have the impression that atm we still have only a fraction of Wikipédias infoboxes. The recent surreladiscussion with the Wikipedian about whether or not wikidata is a reliable source ( ) told me that some wikipedian still do not really get Wikidata, it seems. So no, I would not say Wikidata has the datas. We have not totally bootstrapped yet, and we don't really have the Wikimedian community yet. This is a crucial step to data maintenance, isn't it ? Maybe a little bait like more datas would speed up the infoboxes migrations which seems not fast at all. TomT0m (talk) 15:33, 10 April 2015 (UTC)
We are at about half of freebase. And we do not have all of Wikipedia infoboxes. The biggest missing stuff is because of technical limitations, namely unit support. The issue with the discussion you link to is not one of the amount of data we have but one of trust in the data we have. Which is exactly what I am getting at. We need to make sure the data we have is in good shape so these trust issues are addressed. We're not making this faster by taking on a lot more data and maintaining it worse. --Lydia Pintscher (WMDE) (talk) 15:47, 10 April 2015 (UTC)
@Lydia Pintscher (WMDE): If we don't get Wikipedian, we'll lose a manpower for maintenance. Your equation seem wrong : the "more data = more maintainers", and more maintenance automation efforts because Wikidata will become more important, is totally a point of view that can be defended. Plus Wikipedians demands reliability, you demand reliable datas + maintenance, so you ask more to importers than Wikipedian asks to use (and thus maintain?) the datas ... Back to the chicken and egg problem. If organisation have datas, it's often more than the standards admissibility criteria on Wikipedia as we could cite those organisations as sources. TomT0m (talk) 15:55, 10 April 2015 (UTC)
Tom: I happen to agree with Lydia. We need curators. Maybe this is a better word. We as individual contributors (i.e. not usually aligned with an organization) cannot and should not be expected to maintain other people's data without those people helping us in return. We simply do not have the manpower even if you include the vast number of contributors to Wikimedia wikis and the increasing power of automation we have with Widar and other tools.

Your analogy regarding OSM is flawed. Correct me if I'm wrong, but OSM was predominantly built by individual persons, some of whom have tirelessly donated their time and data to that effort. By extension this means that the data on OSM was built in a way which was maintainable for that community. (I'm sure they've had automation too....) One of the problems we have here (Wikipedia has this problem too, at least) is that we need experts who know what they're doing and why they're doing it when they're interacting with a data item, and this is another reason we should want curators/champions. Thirdly, it's damaging to our relationship with these organizations when they have the incorrect expectation that magic will happen, and then we won't get new data to play with because they will have abandoned our project after "it went wrong the first time". And not having more data to play with when we've agreed to have that relationship is unequivocally bad. I would rather not have a relationship, or have a very limited one, where everyone is happy with the data, than not have a relationship at all, and where we at Wikidata get the bad faith feelings of failure. --Izno (talk) 19:27, 10 April 2015 (UTC)

I think this is far more complicated for OSM. It used for example vast public data, for exemple the public domain US goverment, mass imports like this data gave by a company. This video of the visual growth shows vast data imports at once, then probably community activity on very populated places. But I community wanted to map the world, it seems that they refused outdated datas, they did not feared that datas from unpopulated place where unlikely to be often maintained by a local community, they imported away. Of course Wikidata is different, but the question we need to understand is "how" ? The companies with datas and want to import them do really want to import them to dump them ? I doubt it. It's because they think that Wikidata is something interesting and building something new, or that they will be able to cooperate. This is a neutral place for cooperation. The forms this cooperation will take is yet to invent. Otherwise it's just business as usual for them. And Wikidata is just yet another database ? TomT0m (talk) 10:19, 11 April 2015 (UTC)
TomT0m, Wikidata has nearly two million taxa. All these data came from wikimedia projects! So I'd love to have some wikimedians checking names or providing sources to current taxonomic viewpoints. Having experts or „champions” or curators as Izno put it would be very helpful. Just dreaming. --Succu (talk) 20:12, 10 April 2015 (UTC)
@Succu: Seems a little out of scope, but it seems like a dream indeed to have one champion verifying alone :) Question about taxa, are the data actually used in Wikipedia ? There is active communities of Wikipedian working on taxonomy, one crucial step is to make them cooperate. We don't need a champion, we need a cross wikipedias consistent teamwork in that area. Seems not that obvious to the local communities who seems to somehow distrust or beware each other ... They fear to lose control for example. Cross language discussion can be a problem. If data maintenance or infobox building discussion is to be mainly on Wikidata, we'll have to do some social engeneering or figure out what does not work to make cooperation a reality. TomT0m (talk) 09:55, 11 April 2015 (UTC)
It's not really out of scope. For example Lsjbot (Q17430942) (run by Sverker Johansson (Q17417773)) dumped more than 1,000,000 article stubs into svwiki, warwiki and cebwiki, all of low quality and taken from a database full of errors. The wikipedians of these communities don't care about this fact. But we have to maintain the basic data, creating hints like Wikimedia duplicated page (Q17362920) and find reliable sources. So this situation is similar to data donations by third parties. Curating a database is a fulltime job. A volunteer can't fullfill this job. The donator should champion for his data to keep them in good shape. Sverker didn't. „Social engeneering” can help to solve special taxonomic or nomenclatural questions. „Social engeneering” is worthless to curate a bulk of 2,000,000 items. Regards --Succu (talk) 19:06, 11 April 2015 (UTC)
@Succu: This policy won't change that a bit. A Wikipedian that created a lot of articles, will see the corresponding items created here. Plus low quality data is an argument that is totally understandable in a discussion about the importer bot discussion. Hardly a maintenance problem. TomT0m (talk) 14:18, 12 April 2015 (UTC)
TomT0m, this is not a wikidata policy and was never meant as such. It's only an informal page. One of the aspects it tries to tell a vistitor is: if you want to donate a (lage) amount of data keep care of it and make your data usefull within Wikidata. Thats all. --Succu (talk) 22:21, 12 April 2015 (UTC)
@Succu: I know that, all I'm saying is that by being to careful, sometime we can avoid to move at all. Which is not good either. TomT0m (talk) 09:36, 13 April 2015 (UTC)

Large update to page[edit]

Hi all, I've been working on an updated version of the page, where i have tried to simplify it for people who don't know about Wikidata, it still needs some work but I would appreciate comments and additions.


--John Cummings (talk) 10:34, 4 January 2016 (UTC)

Recommended tools[edit]

How about replacing "Primary Sources" with QuickStatements (Q20084080). The later is maintained and suitable for large scale imports. The import format for both is the same.
--- Jura 16:34, 16 April 2016 (UTC)

✓ Done
--- Jura 10:48, 23 April 2016 (UTC)
I don't think it was a good idea to remove the primary sources tool here. The workflow it supports of having people do a second check of large imports is very important. Why did you remove it and not just add the other one? --Lydia Pintscher (WMDE) (talk) 09:38, 25 April 2016 (UTC)
I think we should recommend tools that are known to work for recent large scale imports and are maintained. Other tools are listed on the tools page anyway.
QuickStatements works for this and, afaik, has been used for most recent large scale imports. Even Mix'n'match relies on it. I'm not aware of recent imports with the other tool that have been successful and from complaints I'm reading, it seems that maintenance has ceased. I think an attempt was made to use it for some data from enwiki, but its outcome was, let's say "mixed".
--- Jura 11:54, 28 April 2016 (UTC)

"do not use the name of an organisation in the title"[edit]

Do we actually mean "do not use the name of an organisation as the title"? Wondering since I often recommend GLAM partners to use "First Last (Organisation)" as the name of the account they use while at work (thereby allowing them to keep any private account separate). Looking at WMF/WMDE that it would also seem that the recommendation goes against the standard for naming work accounts. -- 15:12, 8 June 2016 (UTC)

divulging use of wikidata: adding entry visualising public knowledge[edit]

I would like to add an entry of a public tool for visualising wikidata.

I would like to add an entry here:

Q: what does the markup <! T:#number> and <tvar|link> refer to and how to useit?

The link above refers to visualisation section, but the hyperlink is located to the [edit] of the section above (bot) : how to correct the structure of main page? (I would have liked to do it myself, but I see markup that in wikipedia I have not seen and don't want to mess up the page) Gg4u (talk) 19:04, 14 January 2017 (UTC)Gg4u

DB rights[edit]

The text...

If you are an institution based in Europe, the whole of your ID list may be under database copyright

Not sure about other countries, but South Korean copyright law has sui generis database right (Q688416), and in the Korean court DB right has been acknowledged (to the owner of non-WMF wiki) in a civil case. Maybe this statement can be replaced to...

If you are an institution based in the countries that protects database rights, the whole of your ID list may be under database copyright

or anything more like that. While I could be bold, I thought listening for 3rd party opinion doesn't harm. — regards, Revi 17:47, 5 November 2017 (UTC)

Renaming the page to something like Wikidata:Publishing data on Wikidata[edit]

The current title is not a correct description of the page and also suggests people publishing data on Wikidata are giving something up which is not very helpful. I think Wikidata:Publishing data on Wikidata would be a better description of the page. I understand there are a few extra technical steps to renaming a page with translations and have found out what to do.


--John Cummings (talk) 19:56, 25 April 2018 (UTC)