Wikidata:Property proposal/Semantic Scholar paper ID
Semantic Scholar paper ID
[edit]Originally proposed at Wikidata:Property proposal/Creative work
Description | identifier for an article in the Semantic Scholar database |
---|---|
Represents | Semantic Scholar (Q22908627) |
Data type | External identifier |
Domain | work (Q386724) |
Allowed values | \w+ |
Example | The Semantic Web Revisited (Q29037447) -> 5acd1dd3da5752e1de4c5b46f75b7aec2bc50503 |
Formatter URL | https://www.semanticscholar.org/paper/$1 |
See also |
- Motivation
Semantic Scholar (Q22908627) is a nice paper archive with great statistics and recommendations/links. Most papers have open access full-text (it sources arXiv and CiteSeerX). It's still quite smaller than Google Scholar (eg "semantic" finds 140k on Semantic Scholar and 3.9M on Google Scholar), but I think it's increasing in importance. I'll talk to them about getting API access for en:Wikipedia:OABOT, as discussed with @Pintoch:. It is NOT limited to semantic web or computer science only. #WikiCite2017 Vladimir Alexiev (talk) 13:33, 25 May 2017 (UTC)
- #WikiCite :-) Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:47, 28 May 2017 (UTC)
- Discussion
- Support It's worth having entries for all significant identifiers online. [eventually we'll need a better way to browse the available identifiers! but whenever data is coming from a source that is primarily indexed by some ID, we should be able to track that ID here.] Sj (talk) 12:09, 25 May 2017 (UTC)
- Support --Andrawaag (talk) 12:48, 25 May 2017 (UTC)
- Support --Daniel Mietchen (talk) 17:07, 25 May 2017 (UTC)
- Support. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:47, 28 May 2017 (UTC)
- Support. YULdigitalpreservation (talk) 17:17, 1 June 2017 (UTC)
- @Pigsonthewing, Sj, Andrawaag, Daniel Mietchen, YULdigitalpreservation: Done ChristianKl (talk) 11:50, 3 June 2017 (UTC)
SemanticScholar Corpus
[edit]Contacted them, let's see the answer: Is there an API we can use to find Open Access sources for papers? After the recent WikiCite conference, we created props for Semantic Scholar on Wikidata, eg see https://www.wikidata.org/wiki/Wikidata:Property_proposal/Semantic_Scholar_paper_ID. We'd use this API to feed https://en.wikipedia.org/wiki/Wikipedia:OABOT. Thanks in advance! --Vladimir Alexiev (talk) 08:16, 6 June 2017 (UTC)
- Erika McAuliffe <support@semanticscholar.freshdesk.com> replied" Thank you for your feedback! We're pleased that you find this information interesting. We have many more ideas and features planned for Semantic Scholar, and we hope you continue to be delighted. We don't currently have an API, but you can find the subset of our corpus that powered Citeomatic http://labs.semanticscholar.org/citeomatic here: http://labs.semanticscholar.org/corpus/
- @Pintoch: can you take a look at this corpus and write some thoughts whether we can use it for finding Open Access papers? --Vladimir Alexiev (talk) 12:10, 14 June 2017 (UTC)
"Over 7 million published research papers in Computer Science and Neuroscience". The format is rather simple and uses the IDs we already got:
- Semantic Scholar paper ID (P4011): id, inCitations, outCitations
- Semantic Scholar author ID (P4012): authors.ids
So it's a great basis of matching papers against WP/WD by title and author names
{ "id": "060e50b8752fdd799201fd9570e0bb668f017402", "title": "A review of Web searching studies and a framework for future research", "paperAbstract": "Research on Web searching is at an incipient stage. ...", "keyPhrases": [ "OPAC", "..." ], "authors": [ { "ids": [ "7981846" ], "name": "Bernard J. Jansen" }, "..." ], "inCitations": [ "81027fc698ca6f49f506c3d5cf679178f3c74df1", "..." ], "outCitations": [ "3811f1176f27b4030bda7b6e431e6ce45cb89996", "2b0a8ac61e63a6c4dca5290b93b7622976a6b273", "..." ], "year": 2001, "s2Url": "http://semanticscholar.org/paper/060e50b8752fdd799201fd9570e0bb668f017402", "venue": "Seattle Tech Conf" }
- Hi Vladimir Alexiev, thanks a lot for looking into this! If I remember correctly, Semantic Scholar uses only PDFs crawled by CiteSeerX (which is already covered by OAbot through BASE). Could you ask your contact at Semantic Scholar if that is still the case ? Otherwise, if these new PDFs do not overlap with any other existing source in the bot, it should be possible to import the dump in dissemin's backend, but I try to avoid that (since this is a static dataset that will not be updated). − Pintoch (talk) 12:59, 14 June 2017 (UTC)
- hi Pintoch I don't think SemScholar is limited to CiteSeerX. Eg see Mario Lipinski on citeseerx (1 paper) vs on semscholar (20). SemScholar seems to have mixed two authors with the same name, but even so it has 6-7 papers (on compsci and extraction from PDF) by the one present on citeseerx. --Vladimir Alexiev (talk) 13:50, 14 June 2017 (UTC)
SemanticScholar Citeomatic
[edit]Citeomatic http://labs.semanticscholar.org/citeomatic.
I tried this tool on a recent paper (http://vladimiralexiev.github.io/pubs/Tagarev2017-DomainSpecificGazetteer.pdf) and the results are impressive: http://labs.semanticscholar.org/citeomatic/url/56780d97eac3744403ddaf551dcad872811692d0.
- Parsed correctly the title and abstract, most of the authors (but parsed "Toloşi" as "Tolo¸sitolo¸si"),
- don't know where it got "2014" for the year
- Most importantly: We found 49 new citations and 1 that you have already cited... Export: you can explore/read the papers right there, and export them one by one for your bibliography.
- So that's a great new way of adding citations of papers you've never read, thus make your paper a lot more scholarly ;-)
- This is just a light joke: the value of this tool for exploring areas of science is huge! --Vladimir Alexiev (talk) 12:28, 14 June 2017 (UTC)