Wikidata:Property proposal/Semantic Scholar paper ID

Semantic Scholar paper ID

Originally proposed at Wikidata:Property proposal/Creative work

Done: Semantic Scholar paper ID (P4011) (Talk and documentation)

Description	identifier for an article in the Semantic Scholar database
Represents	Semantic Scholar (Q22908627)
Data type	External identifier
Domain	work (Q386724)
Allowed values	\w+
Example	The Semantic Web Revisited (Q29037447) -> 5acd1dd3da5752e1de4c5b46f75b7aec2bc50503
Formatter URL	https://www.semanticscholar.org/paper/$1
See also	Wikidata:Property proposal/Google scholar paper ID Wikidata:Property proposal/Semantic Scholar author ID

Motivation

Semantic Scholar (Q22908627) is a nice paper archive with great statistics and recommendations/links. Most papers have open access full-text (it sources arXiv and CiteSeerX). It's still quite smaller than Google Scholar (eg "semantic" finds 140k on Semantic Scholar and 3.9M on Google Scholar), but I think it's increasing in importance. I'll talk to them about getting API access for en:Wikipedia:OABOT, as discussed with @Pintoch:. It is NOT limited to semantic web or computer science only. #WikiCite2017 Vladimir Alexiev (talk) 13:33, 25 May 2017 (UTC)[reply]

#WikiCite :-) Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:47, 28 May 2017 (UTC)[reply]

Discussion

Support It's worth having entries for all significant identifiers online. [eventually we'll need a better way to browse the available identifiers! but whenever data is coming from a source that is primarily indexed by some ID, we should be able to track that ID here.] Sj (talk) 12:09, 25 May 2017 (UTC)[reply]
Support --Andrawaag (talk) 12:48, 25 May 2017 (UTC)[reply]
Support --Daniel Mietchen (talk) 17:07, 25 May 2017 (UTC)[reply]
Support. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:47, 28 May 2017 (UTC)[reply]
Support. YULdigitalpreservation (talk) 17:17, 1 June 2017 (UTC)[reply]
@Pigsonthewing, Sj, Andrawaag, Daniel Mietchen, YULdigitalpreservation: Done ChristianKl (talk) 11:50, 3 June 2017 (UTC)[reply]

SemanticScholar Corpus

Contacted them, let's see the answer: Is there an API we can use to find Open Access sources for papers? After the recent WikiCite conference, we created props for Semantic Scholar on Wikidata, eg see https://www.wikidata.org/wiki/Wikidata:Property_proposal/Semantic_Scholar_paper_ID. We'd use this API to feed https://en.wikipedia.org/wiki/Wikipedia:OABOT. Thanks in advance! --Vladimir Alexiev (talk) 08:16, 6 June 2017 (UTC)[reply]

Erika McAuliffe <support@semanticscholar.freshdesk.com> replied" Thank you for your feedback! We're pleased that you find this information interesting. We have many more ideas and features planned for Semantic Scholar, and we hope you continue to be delighted. We don't currently have an API, but you can find the subset of our corpus that powered Citeomatic http://labs.semanticscholar.org/citeomatic here: http://labs.semanticscholar.org/corpus/
@Pintoch: can you take a look at this corpus and write some thoughts whether we can use it for finding Open Access papers? --Vladimir Alexiev (talk) 12:10, 14 June 2017 (UTC)[reply]

"Over 7 million published research papers in Computer Science and Neuroscience". The format is rather simple and uses the IDs we already got:

Semantic Scholar paper ID (P4011): id, inCitations, outCitations
Semantic Scholar author ID (P4012): authors.ids

So it's a great basis of matching papers against WP/WD by title and author names

{
  "id": "060e50b8752fdd799201fd9570e0bb668f017402",
  "title": "A review of Web searching studies and a framework for future research",
  "paperAbstract": "Research on Web searching is at an incipient stage. ...",
  "keyPhrases": [
    "OPAC",
    "..."
  ],
  "authors": [
    {
      "ids": [
        "7981846"
      ],
      "name": "Bernard J. Jansen"
    },
    "..."
  ],
  "inCitations": [
    "81027fc698ca6f49f506c3d5cf679178f3c74df1",
    "..."
  ],
  "outCitations": [
    "3811f1176f27b4030bda7b6e431e6ce45cb89996",
    "2b0a8ac61e63a6c4dca5290b93b7622976a6b273",
    "..."
  ],
  "year": 2001,
  "s2Url": "http://semanticscholar.org/paper/060e50b8752fdd799201fd9570e0bb668f017402",
  "venue": "Seattle Tech Conf"
}

Hi Vladimir Alexiev, thanks a lot for looking into this! If I remember correctly, Semantic Scholar uses only PDFs crawled by CiteSeerX (which is already covered by OAbot through BASE). Could you ask your contact at Semantic Scholar if that is still the case ? Otherwise, if these new PDFs do not overlap with any other existing source in the bot, it should be possible to import the dump in dissemin's backend, but I try to avoid that (since this is a static dataset that will not be updated). − Pintoch (talk) 12:59, 14 June 2017 (UTC)[reply]

hi Pintoch I don't think SemScholar is limited to CiteSeerX. Eg see Mario Lipinski on citeseerx (1 paper) vs on semscholar (20). SemScholar seems to have mixed two authors with the same name, but even so it has 6-7 papers (on compsci and extraction from PDF) by the one present on citeseerx. --Vladimir Alexiev (talk) 13:50, 14 June 2017 (UTC)[reply]

SemanticScholar Citeomatic

Citeomatic http://labs.semanticscholar.org/citeomatic.

I tried this tool on a recent paper (http://vladimiralexiev.github.io/pubs/Tagarev2017-DomainSpecificGazetteer.pdf) and the results are impressive: http://labs.semanticscholar.org/citeomatic/url/56780d97eac3744403ddaf551dcad872811692d0.

Parsed correctly the title and abstract, most of the authors (but parsed "Toloşi" as "Tolo¸sitolo¸si"),
don't know where it got "2014" for the year
Most importantly: We found 49 new citations and 1 that you have already cited... Export: you can explore/read the papers right there, and export them one by one for your bibliography.
So that's a great new way of adding citations of papers you've never read, thus make your paper a lot more scholarly ;-)
This is just a light joke: the value of this tool for exploring areas of science is huge! --Vladimir Alexiev (talk) 12:28, 14 June 2017 (UTC)[reply]

Wikidata:Property proposal/Semantic Scholar paper ID

Semantic Scholar paper ID

SemanticScholar Corpus

SemanticScholar Citeomatic

Navigation menu

Search