Wikidata talk:WikiProject Tabular data
Phabricator
[edit]- Workboard for Tabular Data: https://phabricator.wikimedia.org/tag/commons-datasets/
- All tabular data tickets should include the tag
commons-datasets
, to be included in this workboard - At the moment no WMF organisational group appears to 'own' tabular data, or to regularly dedicate people/resources/time/activity to it.
- but some tickets are also tagged for the
maps
,wikidata
, orjsonconfig
boards. Jheald (talk) 09:26, 10 August 2020 (UTC)
- but some tickets are also tagged for the
- All tabular data tickets should include the tag
COVID-19 tabular data
[edit]I am planning to import a COVID-19 case dataset (https://github.com/stccenter/COVID-19-Data) as tabular data on Commons, and would like some inputs here. The dataset focuses on COVID-19 data of subnational divisions across the world. Data are collected, manually checked and curated by NSF Spatiotemporal Innovation Center, which is jointly operated by George Mason, Harvard and UCSB. (COI disclosure: I am collaborating with them on this project to collect COVID-19 data, though I am not a member of the institution.) Currently there are data from about 4,500 administrative divisions. I have uploaded a few samples on Commons (for example [1], [2]), as well a few summary tables (for example [3], [4]). Any thoughts? --Stevenliuyi (talk) 21:44, 7 September 2020 (UTC)
Notified participants of WikiProject COVID-19 --Stevenliuyi (talk) 21:56, 7 September 2020 (UTC)
- @Stevenliuyi:. Looks good. I see that you are using cumulative data. I mean Novembre 2020 case means cases that happen during that month + previous cases, not just cases that occurred during the month. We should either have a way to document that, or just decide that every data are should work the same way. Though using cumulative data may sound more natural for Covid, I am not sure it would be the best solution oeverall, especially for longer time series. --Zolo (talk) 12:07, 8 March 2021 (UTC)
Community Wishlist Survey 2021
[edit]Notified participants of WikiProject Tabular data
I've added the following for the wishlist survey, that you may want to consider:
Jheald (talk) 19:46, 17 November 2020 (UTC)
Structuring and documenting tabular data
[edit]If we want data to be easily usable, we should try to have predictable, documented data structures.
For tabular case data (P8204), we already have at least two different structres:
- commons:Data:COVID-19 (STCenter)/US/Q110739.tab
- commons:Data:COVID-19 cases in Santa Clara County, California.tab
The second file has more columns, which sounds ok. But there are also columns that have the same meaning in both files, but with different names. That does not sound good.
My proposal would be:
- Recommend starting colunn names with the Wikidata property number when possible
- Document how the data should be structured in the relevant Wikidata property. We could start with creating a "suggested fields" property-type property that would provide guidelines about how to use tabular-data properties. For instance:
<tabular case data (P8204)> <Recommended fields> point in time (P585), number of cases (P1603), etc.
Of course the data themselves are on Commons, but there is no really relevant place for this kind of discussion on Commons, and Wikidata is the place to go for data-related issues.
Pinging users who contributed the date user:Stevenliuyi and user:Mxn. --Zolo (talk) 10:07, 7 March 2021 (UTC)
- @Zolo: Tabular data fields have localizable titles in addition to names, so it would be pretty reasonable to standardize on QIDs or property IDs. Not sure which is better though. Several Wikipedia templates and modules would need to be updated to recognize the new field names, and so would the scripts that keep these tables up to date. (At the moment, it looks like I'm the only one still actively updating COVID case tables via a somewhat scripted process...) – Minh Nguyễn 💬 11:36, 14 August 2021 (UTC)
- That said, I would caution against treating most of this tabular case data as something to be aggregated across geographies, which seems to be a motivation behind ideas about querying tabular data. Every source has different methodology, especially from one geography to another. In particular, there are different practices around retroactively updating past data, which is the main argument in favor of maintaining historical case data as tabular data instead of as Wikidata items. – Minh Nguyễn 💬 04:31, 15 August 2021 (UTC)
Scope and Deletion/Undeletion discussion at Commons
[edit]Notified participants of WikiProject Tabular data
Following deletion of a number of .tab data files on Commons, participants in this group may like to know that in the last couple of weeks there has been a discussion thread at Commons:Village Pump about the deletions, that has now been followed by the opening of an undeletion request for discussion. Input from here may be useful. Jheald (talk) 20:17, 31 May 2023 (UTC)