Topic on User talk:Magnus Manske

Jump to navigation Jump to search

Matching CoSing numbers using multiple identifiers

3
Teolemon (talkcontribs)

Hi Magnus,

The CoSing number has recently been created for Chemical compounds. It is the EU canonical identifier for Chemistry and Cosmetics, and as a result, there a 25 000 identifiers, as well as identifiers to all the other chemistry systems, and interesting info for properties and labels.

I had first thought truncating the file for import using Mix N'Match, but I wondered what could be done to maximize the utility of the file (see snippet of the file in the page).

Property talk:P3073#Importing the identifiers

Magnus Manske (talkcontribs)

I've been thinking about a new tool for dealing with generic tables for a while now. This looks like a good object to study.

Teolemon (talkcontribs)

Duly noted. Being able to match against several columns for consistency checks would be good for Mix-Match (or another tool).

UNII is an even larger dataset (similar but from the USDA) with the same kind of issues (it's already been imported, but due to lack of labels, or synonyms, the matching hasn't been as good as it could have been using the other columns).

Reply to "Matching CoSing numbers using multiple identifiers"