Wikidata:WikiProject Journalists

From Wikidata
Jump to navigation Jump to search

The goal of this project is to organize and add information about journalists to Wikidata. Wikimedia relies on high quality sources to use as references. Many of these sources are written by people who are not in Wikidata. How can we help fix that?


For now we will simply pose some questions and fill them in as we go.

Existence?[edit]

Should this project exist or are there other projects that it should be merged with?

Current State[edit]

What is the current state of journalist data in Wikidata?

What are the most common items and properties?[edit]

Items[edit]

Properties[edit]

What are some useful SPARQL queries that can be used to assess the current data?[edit]

What does the ontology look like?[edit]

Sub-classes of journalist graph

https://angryloki.github.io/wikidata-graph-builder/?property=P279&item=Q1930187&mode=reverse

What is the Gender gap?[edit]

Gender gap report from Denelezh

SQID Report on Q1930187 (journalist)[edit]

Reasonator Report on Q1930187 (journalist)[edit]

Scholia[edit]

Scholia is a linked data project focused on academic publications but sometimes is able to generate interesting reports for journalists. A good showcase of what is possible if enough journalist linked data is put into Wikidata. For example,

Deduplication and Record Linkage[edit]

There are already on the order of 100k journalists in Wikidata. Any attempt to add new data in bulk will need to resolve collisions between incoming and existing journalists. This is a very common problem and solutions typically involve Record linkage. Can we design record linkage solutions for existing databases? Will we need a custom record linkage model for each database we try to incorporate or are there common features that we can use across multiple databases?

Currently the closest thing with have to a unique id for them is their twitter ID!

OpenRefine[edit]

The OpenRefine tool is one well supported method of doing this.

Existing Databases[edit]

What existing databases of journalists exist and how can we integrate their data?

Muck Rack[edit]

Good visability on google, seems to have a page for every journalist and on that page has summary of who they written for, excerpts of thie work, links to thier social media and thier twitter feed.

The jounalist can take ownership of each page and corrections are delt with via a chat mechnaism that can actioned with a few hours.

The unique ID of the page is proprietarty and the links they show are to properietry sites too such as twitter, I would like to see them add and use a open cross-platform id

The Factual[edit]

Not publicly available or offered commercially, but they maintain an internal database of journalists.

Standards[edit]

How should journalism data be structured?[edit]

What information do we need about journalists, publishers, newspapers?

What is the best way to handle freelancers?

Should we link news sites to their ratings on

Can we incorporate data from the Wikipedia:Reliable_sources/Perennial_sources or the other way around?

Is there a uniqueid for us to use for each jounalist from an open and independant organisation[edit]

for example instead of a "twitter handle" which has become the de-facto "uniqueID" it should be something like :-

  • Integrated Authority File, ISNI,VIAF or Worldcat

Also judging by how disorganised most jouranlist media presence can be, the id will have needed to be given them automatically rather than something they had to apply for.

Can we develop ShEx expressions that encapsulate these expectations?[edit]

How should we handle referencing?[edit]

Related External Projects[edit]

JournalList[edit]

"A Networked List of News Publishers" --https://journallist.net/

Managers of the trust.txt framework (see this video for a short introduction and comparison to existing "txt" solutions such robots.txt and ads.txt.

Related Wikimedia Projects and Sites[edit]

What lessons can we learn from existing projects? How can we collaborate?

Tools[edit]