Wikidata talk:WikiProject Tabular data

From Wikidata
Jump to navigation Jump to search

Phabricator

[edit]

COVID-19 tabular data

[edit]

I am planning to import a COVID-19 case dataset (https://github.com/stccenter/COVID-19-Data) as tabular data on Commons, and would like some inputs here. The dataset focuses on COVID-19 data of subnational divisions across the world. Data are collected, manually checked and curated by NSF Spatiotemporal Innovation Center, which is jointly operated by George Mason, Harvard and UCSB. (COI disclosure: I am collaborating with them on this project to collect COVID-19 data, though I am not a member of the institution.) Currently there are data from about 4,500 administrative divisions. I have uploaded a few samples on Commons (for example [1], [2]), as well a few summary tables (for example [3], [4]). Any thoughts? --Stevenliuyi (talk) 21:44, 7 September 2020 (UTC)[reply]

TiagoLubiana 01:35, 16 March 2020 Daniel Mietchen 01:42, 16 March 2020 (UTC)[reply]
Jodi.a.schneider 02:45, 16 March 2020 (UTC)[reply]
Chchowmein 02:45, 16 March 2020 (UTC)[reply]
Dhx1 03:38, 16 March 2020 (UTC)[reply]
Konrad Foerstner 06:02, 16 March 2020 (UTC)[reply]
Netha Hussain 06:19, 16 March 2020 (UTC)[reply]
Bodhisattwa 06:56, 16 March 2020 (UTC)[reply]
Neo-Jay 07:04, 16 March 2020 (UTC)[reply]
John Samuel 07:31, 16 March 2020 (UTC)[reply]
KlaudiuMihaila 07:53, 16 March 2020 (UTC)[reply]
Salgo60 09:11, 16 March 2020 (UTC)[reply]
Andrawaag 10:12, 16 March 2020 (UTC)[reply]
Whidou 10:16, 16 March 2020 (UTC)[reply]
Blue Rasberry 15:07, 16 March 2020 (UTC)[reply]
TJMSmith 16:15, 16 March 2020 (UTC)[reply]
Egon Willighagen 16:49, 16 March 2020 (UTC)[reply]
Nehaoua 20:32, 16 March 2020 (UTC)[reply]
Andy Mabbett (UTC)
Peter Murray-Rust 00:00, 17 March 2020 (UTC)[reply]
Kasyap 02:45, 17 March 2020 (UTC)[reply]
Denny 16:21, 17 March 2020 (UTC)[reply]
Kwj2772 16:56, 17 March 2020 (UTC)[reply]
Joalpe 22:47, 17 March 2020 (UTC)[reply]
Finn Årup Nielsen fnielsen) 10:59, 18 March 2020 (UTC)[reply]
Skim 11:45, 18 March 2020 (UTC)[reply]
SCIdude 15:15, 18 March 2020 (UTC)[reply]
Evolution and evolvability 01:23, 20 March 2020 (UTC)[reply]
Susanna Ånäs (Susannaanas) 07:05, 20 March 2020 (UTC)[reply]
Mlemusrojas 15:30, 20 March 2020 (UTC)[reply]
Yupik 20:23, 20 March 2020 (UTC)[reply]
Csisc 23:05, 20 March 2020 (UTC)[reply]
OAnick 10:26, 21 March 2020 (UTC)[reply]
Gnoeee 12:28, 21 March 2020 (UTC)[reply]
Jjkoehorst 14:27, 21 March 2020 (UTC)[reply]
So9q 08:58, 22 March 2020 (UTC)[reply]
Nandana 14:58, 23 March 2020 (UTC)[reply]
Addshore 15:56, 23 March 2020 (UTC)[reply]
Librarian lena 18:19, 24 March 2020 (UTC)[reply]
Jelabra 19:19, 24 March 2020 (UTC)[reply]
AlexanderPico 23:34, 27 March 2020 (UTC)[reply]
Higa4 02:51, 29 March 2020 (UTC)[reply]
JoranL 19:56, 29 March 2020 (UTC)[reply]
Alejgh 11:04, 1 April 2020 (UTC)[reply]
Will (Wiki Ed)) 17:36, 1 April 2020 (UTC)[reply]
Ranjithsiji 04:47, 2 April 2020 (UTC)[reply]
AntoineLogean 07:35, 2 April 2020 (UTC)[reply]
Hannolans 17:22, 2 April 2020 (UTC)[reply]
Farmbrough 21:15, 3 April 2020 (UTC)[reply]
Ecritures 21:26, 3 April 2020 (UTC)[reply]

Notified participants of WikiProject COVID-19 --Stevenliuyi (talk) 21:56, 7 September 2020 (UTC)[reply]

@Stevenliuyi:. Looks good. I see that you are using cumulative data. I mean Novembre 2020 case means cases that happen during that month + previous cases, not just cases that occurred during the month. We should either have a way to document that, or just decide that every data are should work the same way. Though using cumulative data may sound more natural for Covid, I am not sure it would be the best solution oeverall, especially for longer time series. --Zolo (talk) 12:07, 8 March 2021 (UTC)[reply]

Community Wishlist Survey 2021

[edit]
user:Zolo user:Jheald user:Moebeus user:Stevenliuyi user:Theklan Waldyrious (talk) Sj

Notified participants of WikiProject Tabular data

I've added the following for the wishlist survey, that you may want to consider:

Jheald (talk) 19:46, 17 November 2020 (UTC)[reply]

Structuring and documenting tabular data

[edit]

If we want data to be easily usable, we should try to have predictable, documented data structures.

For tabular case data (P8204), we already have at least two different structres:

The second file has more columns, which sounds ok. But there are also columns that have the same meaning in both files, but with different names. That does not sound good.

My proposal would be:

  • Recommend starting colunn names with the Wikidata property number when possible
  • Document how the data should be structured in the relevant Wikidata property. We could start with creating a "suggested fields" property-type property that would provide guidelines about how to use tabular-data properties. For instance:

<tabular case data (P8204)> <Recommended fields> point in time (P585), number of cases (P1603), etc.

Of course the data themselves are on Commons, but there is no really relevant place for this kind of discussion on Commons, and Wikidata is the place to go for data-related issues.


Pinging users who contributed the date user:Stevenliuyi and user:Mxn. --Zolo (talk) 10:07, 7 March 2021 (UTC)[reply]

@Zolo: Tabular data fields have localizable titles in addition to names, so it would be pretty reasonable to standardize on QIDs or property IDs. Not sure which is better though. Several Wikipedia templates and modules would need to be updated to recognize the new field names, and so would the scripts that keep these tables up to date. (At the moment, it looks like I'm the only one still actively updating COVID case tables via a somewhat scripted process...) – Minh Nguyễn 💬 11:36, 14 August 2021 (UTC)[reply]
That said, I would caution against treating most of this tabular case data as something to be aggregated across geographies, which seems to be a motivation behind ideas about querying tabular data. Every source has different methodology, especially from one geography to another. In particular, there are different practices around retroactively updating past data, which is the main argument in favor of maintaining historical case data as tabular data instead of as Wikidata items. – Minh Nguyễn 💬 04:31, 15 August 2021 (UTC)[reply]

Scope and Deletion/Undeletion discussion at Commons

[edit]
user:Zolo user:Jheald user:Moebeus user:Stevenliuyi user:Theklan Waldyrious (talk) Sj

Notified participants of WikiProject Tabular data

Following deletion of a number of .tab data files on Commons, participants in this group may like to know that in the last couple of weeks there has been a discussion thread at Commons:Village Pump about the deletions, that has now been followed by the opening of an undeletion request for discussion. Input from here may be useful. Jheald (talk) 20:17, 31 May 2023 (UTC)[reply]