  • Make Wikidata a relevant company data repository, with rich, accurate, and very well referenced data, starting first by covering all of the worlds largest companies, then adding most midsize companies (going down to about 20 people minimum), and finally by adding many smaller companies, over the longer term.
  • Ensure data quality to enable reuse. This incudes the accuracy of the data itself, the reference(s) to where it was sourced from, and the technical standard under which the data were produced. All instances of actual company data must have the source referenced.
  • Wikidata is already used "live" in several wikipedias making integrity paramount:
  • Transfer industry classification currently done by categories to Wikidata
    • Or adjust various WikiPedia industry classifications (accomplished by tagging a specific company page to a category indicating industry) to conform to the main industry classification scheme adopted here at WikiData (see more below).
  • Develop a controlled dictionary for industry (P452) (currently ~100 industries are used)
    • The authoritative classifications on Industries (Economic Activities) come from UNstat and Eurostat: UN ISIC, EU NACE (and national versions thereof), NA NAICS (US, CA, MX). For products it's more complex: UN has CPC, HS, SITC while EU has CPA, CN, PRODCOM. See eg, and linked data at --Vladimir Alexiev (talk) 20:02, 23 February 2017 (UTC)
    • Given the global scope of WikiData we might best go with 56 industries classified according to the International Standard Industrial Classification revision 4 (ISIC Rev. 4). This is the set used by the World Input-Output tables [[1]]. They take data from all 28 EU countries and 15 other major countries in the world and transform it to be comparable using these industries. Its the broadest "nearly global" coverage I can find. It would be also advisable to accommodate multiple industry assignments per entity / establishment, each with the standard and year which were followed, applied from a specifically enumerated list. For example in North America data will often be available according to the most current, and highly granular 2017 NAICS system [[2]] and there are concordances between versions see: [[3]] and [[4]]. Looking towards the future where large amounts of company data are machine imported it would be best to preserve the original, most detailed industry codes available (such as the 6 digit NACIS code) and preserve the standard and year associated with that assigned code(s). Given the year and the detail the concordances can later be used to machine add different codes as needed. Granular users are then accommodated, and people looking to do cross country / global analysis (at the 56 industry level) are also accommodated. Rjlabs (talk) 05:14, 13 March 2017 (UTC)


How to contribute[edit]

  • Read items on the discussion page (see tab at the top of this page) and participate.
  • Take a look at the properties page (tab near the top of this page) and contribute.
  • Coordinate with Wikidata:WikiProject_Economics to insure both projects data needs are well met and harmonized.
  • add instance of (P31): business (Q4830453) and other properties
  • fix the class hierarchy. Independent (or was it Incorporated) Cities, to the common Joe like me, are not Companies. Vladimir Alexiev (talk)
    • Indeed the classes under organization (Q43229) are chaotic and not well structured, and need a heavy refactoring. company (Q783794) is ill defined, and there is also business (Q4830453). There are also many other classes at that level that are poorly thought through. (Isn't a bank a "type of" company, vs. being its own animal?) Currently the too "flat" structure under organization (Q43229) obscures more than it reveals. Under organization (Q43229) there should be a top level split between entities "counted in GDP" and those that are not. Most are. However if its a pure social organization with zero economic impact that should be hived off into its own class (stamp collectors who are not dealers, coffee clubs, non funded petition groups, etc.) Under organizations typically counted in GDP there should be further divisions that match those used by most national statistical agencies that compute GDP (such as for profit-private entities (most companies in non Communist nations), plus relevant categories for various government funded operations, private non profits, trade groups, PACs, etc.) Every entity that has a payroll, borrows money, has revenues, or takes in contributions counts in GDP and must fit into the hierarchy somewhere. This includes schools, cities, counties, military, churches, public utilities, government regulators, other non profits, etc. In this project we are primarily concerned with significant organizations that contribute to GDP and are typically private, for-profit. However we are also interested in local output, industry output, total output, and how these private-for profit contribute.Rjlabs (talk) 18:19, 31 March 2017 (UTC)
      • but cities and other government entitites are certainly things that employ people and so should fit in somewhere. I've been running into issues with for example Indian tribes, which are generally given the instance of (P31) value "ethnic group", which doesn't actually come under the organization hierarchy here at all right now. Also separating the geographic and organizational entities can be tricky - there's the "city" case of course, but even for a hospital, say, the building may itself be historically important and in a sense a separate item from the medical facility that operates there. Also note how wikidata has decided to handle people - we treat all people as instance of human (Q5), and define "types" via other properties on the person such as position, country, etc. While I don't think it makes sense to lump all organizations directly under organization (Q43229), maybe there should be just a small number of top-level subclasses (government agency, private for-profit company, public corporation, non-profit organization, etc) and define "types" via other properites? ArthurPSmith (talk) 19:09, 31 March 2017 (UTC)
    • Arthur, great comment and really gets me thinking. As a start I'm trying to draw in some Economics experts for more eyes on this (see: Initially, I think taking a Statistical Business Registry (SBR) approach may be the way to go. All countries that report GDP have one, and they are very hard core data wise. Country experts on SBRs have faced this challenge and are likely willing to share approaches (they also benefit if WikiData can mesh with their structures). In addition to that I continue to ponder your points (Is the Indian GDP part of USA GDP or is it a sovereign nation?; Geo vs. legal entity...hopefully the SBR experts can be of some help; help finding really important research groups that are small in number of employees, have lower square footage, (fly under the traditional radar) yet do very important high tech work is a good challenge, most SBR experts are also tasked with find our small high tech entities that are growing rapidly, so we can pour additional gasoline on them and really grow our economy; I like your idea of lumping together all organizations with only one distinction: counts in GDP or does not as the only distinction at the top level. One more thought (then on to my "activities of daily living" that have backed up considerably) there are additional models beyond the "classic" economic model (which is followed by virtually all SBR experts) - things like environmental accounting, social accounting & sustainability accounting. As long as we are engineering "for the next 20 years" we probably ought to think about what data structures at WikiData around the "business registry" would facilitate those accounting models going forward... Rjlabs (talk) 15:33, 1 April 2017 (UTC)
  • help out with longer term scalability and compatibility issues, and insure that company data can be aggregated up to the national and global level for industry and economic analysis. See: draft here.

