Wikidata:SourceMD/instructions

From Wikidata
Jump to navigation Jump to search

Typical use[edit]

Overview[edit]

  1. Get 1 or more identifiers for a publication: a DOI, PMID, or PMCID.
  2. Go to tools.wmflabs.org/sourcemd/ (fully automated batch mode) or to tools.wmflabs.org/sourcemd/index_old.php (semiautomated mode). These instructions describe use of the semiautomated mode.
  3. Put the identifiers 1 per line.
  4. Run
  5. Check output in SourceMD if you wish, then proceed to go to QuickStatements.
  6. Run in Wikidata:QuickStatements.
  7. Done!
  8. Check output in various Wikidata records if you wish.
  9. Address problems with further Wikidata editing if any identified.

Collect media identifiers[edit]

SourceMD accepts input in these forms:

Traditional citation styles based on paper publishing may not list these identifiers. Citation systems for digital publishing may show them. Often the publication itself will list the media identifier.

Put the identifiers into the input box[edit]

List one identifier per line. Use only one identifier per publication; do not use both DOI and PMCID for the same item. When in doubt, use the PubMed Central ID (PMCID).

You can list multiple identifiers on multiple lines; this will create multiple items in a batch.

100 SourceMD input field.png

SourceMD stages information for review[edit]

SourceMD takes the source identifier and returns source metadata to edit
screenshot of SourceMD tool taken for use in documentation of the SourceMD tool for generating structured data for citations in Wikidata


SourceMD collects information from off-wiki databases and formats it for inclusion into Wikidata.

The user can edit the text which SourceMD presents. Typically there is no reason to change anything.

SourceMD will provide different information from different identifiers. In publishing academic papers the following sequence of events happens:

  1. Publishers report to CrossRef that they have media to publish
  2. CrossRef assigns a DOI to the media and registers it in their database
  3. About one day later, PubMed checks to see if the media is in their index of medical publications. If it is, then they copy the Crossref data, get the DOI, and assign a PMID to the work.
  4. About a week later, PubMed Central shares a free-to-read copy of the publication only if they have an agreement to publish it. If they publish, then they take the DOI and the PMID, and also they assign a PMCID.

What this means for Wikidata is that if possible share the PMCID. In this case Wikidata gets the PMCID, the PMID, and the DOI. Anyone sharing the PMID also gets the DOI. Anyone sharing the DOI only gets the DOI.

Transfer data from SourceMD to QuickStatements[edit]

  1. Press the button "Open in QuickStatements"
  2. All of the statements previewed in SourceMD will be transferred to the QuickStatements interface
screenshot of SourceMD tool taken for use in documentation of the SourceMD tool for generating structured data for citations in Wikidata

Run QuickStatements[edit]

  1. If you have not previously authorized QuickStatements, you will be prompted to allow it to make changes on behalf of your account.
  2. Press the "Run" button.
  3. When it has finished, you will be able to view the item(s) you created.

Consider output of QuickStatements[edit]

After the item has been processed by QuickStatements, you will be brought to a new screen of the newly-created item(s).

  1. If you ran QuickStatements for a single item, you can view the item from the Done screen.
  2. If you ran QuickStatements as a batch, there will not be a new screen when it has finished but you can view the batch log report, then select the items that were edited in that batch. From here, click on the item title/Qnumber to review the output of QuickStatements.

Special cases[edit]

Merge records[edit]

identify multiple Wikidata items for one publication[edit]

By error Wikidata may have more than one item for the same media. Correct this error by merging the items.

This error can happen with SourceMD by one person processing one set of identifiers, like a DOI, then another person processing another possible identifier, like a PMID. The tool could create different items.

Use the merge function[edit]

Please see Help:Merge for a detailed discussion of how to perform merges.

The best way is to use the Merge.js gadget to perform merges. Add the gadget to your account as described on the help page, then use the drop-down "More" menu at the top right of any page to access the merge action.


screenshot taken of Wikidata edit log for use in documentation of the SourceMD tool for generating structured data for citations in Wikidata

Verify the merge[edit]

View the edit logs from both pages to verify that the merge was successful. Make sure the redirect works.

screenshot taken of Wikidata edit log for use in documentation of the SourceMD tool for generating structured data for citations in Wikidata. Shows edit log of item to which merge has been made.
screenshot taken of Wikidata edit log for use in documentation of the SourceMD tool for generating structured data for citations in Wikidata. Shows edit log of item which was redirected to merged item.
screenshot taken of Wikidata for use in documentation of the SourceMD tool for generating structured data for citations in Wikidata. Shows the redirect of the item from which merge was made

Changing the SourceMD formatting[edit]

The SourceMD information can be edited from the output that was generated from the DOI.

  1. Properties can be added. Terms must be separated by tabs. LAST always adds the statement to the newly-created item.
  2. If you are adding an article by an author who has a Q number, at this point you could make a statement about P50 (author) with the author's Q number as the value, as the scraped value is always P2093 (author name string). Be sure to maintain the correct P1545 (series ordinal) for that author.
  3. Deprecated properties can be removed.
screenshot of SourceMD tool taken for use in documentation of the SourceMD tool for generating structured data for citations in Wikidata. This shows the intermediate output that SourceMD scraped.
screenshot of SourceMD tool taken for use in documentation of the SourceMD tool for generating structured data for citations in Wikidata. This shows a deprecated property that should be removed.

Property applied in error[edit]

If there is a statement that was applied in error or is deprecated, you can remove it.

  1. Click on the edit pencil in the right-hand corner.
  2. Select "Remove".

In the example shown, P364 (original language of work) is deprecated and should be replaced with P407 (language of work or name) instead.

screenshot taken of Wikidata for use in documentation of the SourceMD tool for generating structured data for citations in Wikidata. Shown: a deprecated property that should be removed.

Duplicated field[edit]

A statement may have accidentally been duplicated.

  1. Click on the edit pencil for the superfluous statement.
  2. Select "Remove".
screenshot taken of Wikidata for use in documentation of the SourceMD tool for generating structured data for citations in Wikidata. Shown: Two statements about the date.
screenshot taken of Wikidata for use in documentation of the SourceMD tool for generating structured data for citations in Wikidata. Shown: The incomplete date should be removed.

Other[edit]

screenshot of SourceMD tool taken for use in documentation of the SourceMD tool for generating structured data for citations in Wikidata
screenshot of SourceMD tool taken for use in documentation of the SourceMD tool for generating structured data for citations in Wikidata


screenshot of SourceMD tool taken for use in documentation of the SourceMD tool for generating structured data for citations in Wikidata
screenshot of SourceMD tool taken for use in documentation of the SourceMD tool for generating structured data for citations in Wikidata




screenshot of SourceMD tool taken for use in documentation of the SourceMD tool for generating structured data for citations in Wikidata
screenshot of SourceMD tool taken for use in documentation of the SourceMD tool for generating structured data for citations in Wikidata