synonyms in DwC Archives

Hilmar Lapp

18 Mar 2014 18 Mar '14

19:55

I'm looking for recommendations on how best to put synonyms for taxon records into DwC Archive format. I'm assuming that these would go into an extension file. Do I have this right? What I'm having more trouble with is determining the right column term. there's dwc:vernacularName, which is also in the examples, but what about synonyms of different types that come with taxonomies (such as NCBI's) or that result from merging taxonomies. There isn't an obvious candidate in DwC, and the list at http://rs.gbif.org/core/dwc_taxon.xmldoesn't have a suggestion either that would seem pertinent. Any suggestions, pointers to documentation or examples? -hilmar -- Hilmar Lapp -:- informatics.nescent.org/wiki -:- lappland.io

Attachments:

attachment.html (text/html — 1.1 KB)

Show replies by date

Chuck Miller

18 Mar 18 Mar

21:44

Hilmar, Sticking strictly to Darwin Core and not adding RDF, I think there are a couple of DwC terms that are attributes that can be used to identify a synonym: taxonomicStatus - The status of the use of the scientificName as a label for a taxon. Requires taxonomic opinion to define the scope of a taxon. Rules of priority then are used to define the taxonomic status of the nomenclature contained in that scope, combined with the experts opinion. It must be linked to a specific taxonomic reference that defines the concept. Recommended best practice is to use a controlled vocabulary. Examples: "invalid", "misapplied", "homotypic synonym", "accepted". relationshipofResource - The relationship of the resource identified by relatedResourceID to the subject (optionally identified by the resourceID). Recommended best practice is to use a controlled vocabulary. Examples: "duplicate of", "mother of", "endoparasite of", "host to", "sibling of", "valid synonym of", "located within". There's also acceptedNameUsage and acceptedNameUsageID, which if used infer that the name the terms are associated with is a synonym of the AcceptedName. But, so far there is no guideline for how to organize synonyms in a Darwin Core Archive. They can be embedded in the core file using relationshiopofResource from a synonym name to an accepted name in the same file. Or they can be in an extension file, where the extension file may be called Synonyms and thus define a one-to-many "synonym relationship" from the taxonID in the core file to synonym names in the extension file. There are probably other ways. RDF adds the ability to be more explicit about the relationships. Rich Pyle has lectured prolifically on this so I'm sure he has good advice to offer. Chuck From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Hilmar Lapp Sent: Tuesday, March 18, 2014 2:55 PM To: TDWG Content Mailing List Cc: Dan Leehr Subject: [tdwg-content] synonyms in DwC Archives I'm looking for recommendations on how best to put synonyms for taxon records into DwC Archive format. I'm assuming that these would go into an extension file. Do I have this right? What I'm having more trouble with is determining the right column term. there's dwc:vernacularName, which is also in the examples, but what about synonyms of different types that come with taxonomies (such as NCBI's) or that result from merging taxonomies. There isn't an obvious candidate in DwC, and the list at http://rs.gbif.org/core/dwc_taxon.xml doesn't have a suggestion either that would seem pertinent. Any suggestions, pointers to documentation or examples? -hilmar -- Hilmar Lapp -:- informatics.nescent.org/wiki<http://informatics.nescent.org/wiki> -:- lappland.io<http://lappland.io>

John Wieczorek

22:10

A point of information. One cannot use dwc:relationshipOfResource in a Darwin Core Archive Core file. That term does not belong among those in the Simple Darwin Core (http://rs.tdwg.org/dwc/terms/simple/index.htm), and so it can't be part of a Taxon Core file (http://tools.gbif.org/dwca-validator/extension.do?id=http://rs.tdwg.org/dwc/...). All of what you want to do could be done with a Taxon Core and a ResourceRelationship extension (http://tools.gbif.org/dwca-validator/extension.do?id=http://rs.tdwg.org/dwc/...), but there has to be a better way. Surprisingly, I do not see any extension made for this purpose. On Tue, Mar 18, 2014 at 2:44 PM, Chuck Miller <Chuck.Miller@mobot.org> wrote:

...

Hilmar,

Sticking strictly to Darwin Core and not adding RDF, I think there are a couple of DwC terms that are attributes that can be used to identify a synonym:

taxonomicStatus - The status of the use of the scientificName as a label for a taxon. Requires taxonomic opinion to define the scope of a taxon. Rules of priority then are used to define the taxonomic status of the nomenclature contained in that scope, combined with the experts opinion. It must be linked to a specific taxonomic reference that defines the concept. Recommended best practice is to use a controlled vocabulary. Examples: "invalid", "misapplied", "homotypic synonym", "accepted".

relationshipofResource - The relationship of the resource identified by relatedResourceID to the subject (optionally identified by the resourceID). Recommended best practice is to use a controlled vocabulary. Examples: "duplicate of", "mother of", "endoparasite of", "host to", "sibling of", "valid synonym of", "located within".

There's also acceptedNameUsage and acceptedNameUsageID, which if used infer that the name the terms are associated with is a synonym of the AcceptedName.

But, so far there is no guideline for how to organize synonyms in a Darwin Core Archive. They can be embedded in the core file using relationshiopofResource from a synonym name to an accepted name in the same file. Or they can be in an extension file, where the extension file may be called Synonyms and thus define a one-to-many "synonym relationship" from the taxonID in the core file to synonym names in the extension file. There are probably other ways. RDF adds the ability to be more explicit about the relationships.

Rich Pyle has lectured prolifically on this so I'm sure he has good advice to offer.

Chuck

From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Hilmar Lapp Sent: Tuesday, March 18, 2014 2:55 PM To: TDWG Content Mailing List Cc: Dan Leehr Subject: [tdwg-content] synonyms in DwC Archives

I'm looking for recommendations on how best to put synonyms for taxon records into DwC Archive format.

I'm assuming that these would go into an extension file. Do I have this right? What I'm having more trouble with is determining the right column term. there's dwc:vernacularName, which is also in the examples, but what about synonyms of different types that come with taxonomies (such as NCBI's) or that result from merging taxonomies. There isn't an obvious candidate in DwC, and the list at http://rs.gbif.org/core/dwc_taxon.xml doesn't have a suggestion either that would seem pertinent.

Any suggestions, pointers to documentation or examples?

-hilmar

--

Hilmar Lapp -:- informatics.nescent.org/wiki -:- lappland.io

_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content

Markus Döring

22:35

Hi Hilmar, GBIF, Catalog of life and others have produced guidelines for how to express taxonomies with synonyms and these are in widespread use already since over a year. I will forward links tomorrow when Im back at my desk. The common idea is to include synonyms together with accepted taxa in the core file. This allows one to also add extension data to synonyms, for example bibliographic references, types data, etc. The term acceptedNameUsageID is used to link to the accepted record in the core file (targeting taxonID), originalNameUsageiD for the basionym and taxonomicStatus to declare a specific type of synonym such as homo/heterotypic or later/junior synonym. The scientificName is used both for accepted and synonym records. You should be able to find many dwca examples in the gbif dataset search when filtered for checklists: http://www.gbif.org/dataset/search?type=CHECKLIST For example try these: http://data.canadensys.net/ipt/archive.do?r=vascan http://ipt.speciesfile.org:8080/archive.do?r=orthoptera Cheers, Markus

...

Am 18.03.2014 um 22:44 schrieb Chuck Miller <Chuck.Miller@mobot.org>:

Hilmar, Sticking strictly to Darwin Core and not adding RDF, I think there are a couple of DwC terms that are attributes that can be used to identify a synonym:

taxonomicStatus - The status of the use of the scientificName as a label for a taxon. Requires taxonomic opinion to define the scope of a taxon. Rules of priority then are used to define the taxonomic status of the nomenclature contained in that scope, combined with the experts opinion. It must be linked to a specific taxonomic reference that defines the concept. Recommended best practice is to use a controlled vocabulary. Examples: "invalid", "misapplied", "homotypic synonym", "accepted".

relationshipofResource - The relationship of the resource identified by relatedResourceID to the subject (optionally identified by the resourceID). Recommended best practice is to use a controlled vocabulary. Examples: "duplicate of", "mother of", "endoparasite of", "host to", "sibling of", "valid synonym of", "located within".

There’s also acceptedNameUsage and acceptedNameUsageID, which if used infer that the name the terms are associated with is a synonym of the AcceptedName.

But, so far there is no guideline for how to organize synonyms in a Darwin Core Archive. They can be embedded in the core file using relationshiopofResource from a synonym name to an accepted name in the same file. Or they can be in an extension file, where the extension file may be called Synonyms and thus define a one-to-many “synonym relationship” from the taxonID in the core file to synonym names in the extension file. There are probably other ways. RDF adds the ability to be more explicit about the relationships.

Rich Pyle has lectured prolifically on this so I’m sure he has good advice to offer.

Chuck

From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Hilmar Lapp Sent: Tuesday, March 18, 2014 2:55 PM To: TDWG Content Mailing List Cc: Dan Leehr Subject: [tdwg-content] synonyms in DwC Archives

I'm looking for recommendations on how best to put synonyms for taxon records into DwC Archive format.

I'm assuming that these would go into an extension file. Do I have this right? What I'm having more trouble with is determining the right column term. there's dwc:vernacularName, which is also in the examples, but what about synonyms of different types that come with taxonomies (such as NCBI's) or that result from merging taxonomies. There isn't an obvious candidate in DwC, and the list at http://rs.gbif.org/core/dwc_taxon.xml doesn't have a suggestion either that would seem pertinent.

Any suggestions, pointers to documentation or examples?

-hilmar

-- Hilmar Lapp -:- informatics.nescent.org/wiki -:- lappland.io

_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content

Hilmar Lapp

19 Mar 19 Mar

02:44

Tony - your example matches pretty much what GBIF does. John - yes indeed, if using ResourceRelationship terms, it would have to be in an extension and not the core taxon file. Markus and all - yes, I realized after I emailed how GBIF does this. I agree that this has advantages. However, this way of doing synonyms requires that there is an identifier for the synonym. For the core use-case I'm interested in synonyms are metadata of taxon records and do not have their own identifier. For example, synonyms in NCBI don't have identifiers, and they don't in Catalog of Fishes. (I'm not sure they do in PaleoDB.) One could of course invent identifiers on behalf of the taxonomy providers in these cases, but that's a hack. I think if there is an extension for vernacularNames, there ought to be one as well for synonyms that are simply names. -hilmar On Tue, Mar 18, 2014 at 6:35 PM, Markus Döring <m.doering@mac.com> wrote:

...

Hi Hilmar,

GBIF, Catalog of life and others have produced guidelines for how to express taxonomies with synonyms and these are in widespread use already since over a year. I will forward links tomorrow when Im back at my desk.

The common idea is to include synonyms together with accepted taxa in the core file. This allows one to also add extension data to synonyms, for example bibliographic references, types data, etc. The term acceptedNameUsageID is used to link to the accepted record in the core file (targeting taxonID), originalNameUsageiD for the basionym and taxonomicStatus to declare a specific type of synonym such as homo/heterotypic or later/junior synonym. The scientificName is used both for accepted and synonym records.

You should be able to find many dwca examples in the gbif dataset search when filtered for checklists: http://www.gbif.org/dataset/search?type=CHECKLIST

For example try these: http://data.canadensys.net/ipt/archive.do?r=vascan http://ipt.speciesfile.org:8080/archive.do?r=orthoptera

Cheers, Markus

Am 18.03.2014 um 22:44 schrieb Chuck Miller <Chuck.Miller@mobot.org>:

Hilmar,

Sticking strictly to Darwin Core and not adding RDF, I think there are a couple of DwC terms that are attributes that can be used to identify a synonym:

taxonomicStatus - The status of the use of the scientificName as a label for a taxon. Requires taxonomic opinion to define the scope of a taxon. Rules of priority then are used to define the taxonomic status of the nomenclature contained in that scope, combined with the experts opinion. It must be linked to a specific taxonomic reference that defines the concept. Recommended best practice is to use a controlled vocabulary. Examples: "invalid", "misapplied", "homotypic synonym", "accepted".

relationshipofResource - The relationship of the resource identified by relatedResourceID to the subject (optionally identified by the resourceID). Recommended best practice is to use a controlled vocabulary. Examples: "duplicate of", "mother of", "endoparasite of", "host to", "sibling of", "valid synonym of", "located within".

There's also acceptedNameUsage and acceptedNameUsageID, which if used infer that the name the terms are associated with is a synonym of the AcceptedName.

But, so far there is no guideline for how to organize synonyms in a Darwin Core Archive. They can be embedded in the core file using relationshiopofResource from a synonym name to an accepted name in the same file. Or they can be in an extension file, where the extension file may be called Synonyms and thus define a one-to-many "synonym relationship" from the taxonID in the core file to synonym names in the extension file. There are probably other ways. RDF adds the ability to be more explicit about the relationships.

Rich Pyle has lectured prolifically on this so I'm sure he has good advice to offer.

Chuck

*From:* tdwg-content-bounces@lists.tdwg.org [ mailto:tdwg-content-bounces@lists.tdwg.org<tdwg-content-bounces@lists.tdwg.org>] *On Behalf Of *Hilmar Lapp *Sent:* Tuesday, March 18, 2014 2:55 PM *To:* TDWG Content Mailing List *Cc:* Dan Leehr *Subject:* [tdwg-content] synonyms in DwC Archives

I'm looking for recommendations on how best to put synonyms for taxon records into DwC Archive format.

I'm assuming that these would go into an extension file. Do I have this right? What I'm having more trouble with is determining the right column term. there's dwc:vernacularName, which is also in the examples, but what about synonyms of different types that come with taxonomies (such as NCBI's) or that result from merging taxonomies. There isn't an obvious candidate in DwC, and the list at http://rs.gbif.org/core/dwc_taxon.xmldoesn't have a suggestion either that would seem pertinent.

Any suggestions, pointers to documentation or examples?

-hilmar

--

Hilmar Lapp -:- informatics.nescent.org/wiki -:- lappland.io

_______________________________________________

tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content

-- Hilmar Lapp -:- informatics.nescent.org/wiki -:- lappland.io

Roderic Page

05:37

Hi Hilmar, I think the phrase “simply names” summarises part ot the problem here. Many taxon databases seem determined to conflate names and taxa, and treat names as dumb strings rather than entities of interest (or worse, as taxa, rather than names of taxa). Many synonyms are relationships between names (e.g., names that change because we move species to different genera are the “same” name), others are relationships between taxa (e.g., this set of things and that set of things are the members of the same set of things). We could model things much more cleanly if we distinguished between taxa and names for taxa, and made use of the fact that we have databases of names (the nomenclators) that have been serving name data and stable identifiers for almost a decade. It is a pity that most taxonomic databases make little or no use of this. Map names to those ids, have ids for taxa, and link all the names to the taxa to which they’ve been applied. Instead we have a flat Darwin Core with row upon row of things that we call “taxa” but which, in many cases, aren’t. I get why we like flat Darwin Core, but sometimes the world isn’t flat. Regards Rod On 19 Mar 2014, at 02:44, Hilmar Lapp <hlapp@nescent.org> wrote:

...

Tony - your example matches pretty much what GBIF does.

John - yes indeed, if using ResourceRelationship terms, it would have to be in an extension and not the core taxon file.

Markus and all - yes, I realized after I emailed how GBIF does this. I agree that this has advantages. However, this way of doing synonyms requires that there is an identifier for the synonym. For the core use-case I'm interested in synonyms are metadata of taxon records and do not have their own identifier. For example, synonyms in NCBI don't have identifiers, and they don't in Catalog of Fishes. (I'm not sure they do in PaleoDB.)

One could of course invent identifiers on behalf of the taxonomy providers in these cases, but that's a hack. I think if there is an extension for vernacularNames, there ought to be one as well for synonyms that are simply names.

-hilmar

On Tue, Mar 18, 2014 at 6:35 PM, Markus Döring <m.doering@mac.com> wrote: Hi Hilmar,

GBIF, Catalog of life and others have produced guidelines for how to express taxonomies with synonyms and these are in widespread use already since over a year. I will forward links tomorrow when Im back at my desk.

The common idea is to include synonyms together with accepted taxa in the core file. This allows one to also add extension data to synonyms, for example bibliographic references, types data, etc. The term acceptedNameUsageID is used to link to the accepted record in the core file (targeting taxonID), originalNameUsageiD for the basionym and taxonomicStatus to declare a specific type of synonym such as homo/heterotypic or later/junior synonym. The scientificName is used both for accepted and synonym records.

You should be able to find many dwca examples in the gbif dataset search when filtered for checklists: http://www.gbif.org/dataset/search?type=CHECKLIST

For example try these: http://data.canadensys.net/ipt/archive.do?r=vascan http://ipt.speciesfile.org:8080/archive.do?r=orthoptera

Cheers, Markus

Am 18.03.2014 um 22:44 schrieb Chuck Miller <Chuck.Miller@mobot.org>:

...
Hilmar,

Sticking strictly to Darwin Core and not adding RDF, I think there are a couple of DwC terms that are attributes that can be used to identify a synonym:

taxonomicStatus - The status of the use of the scientificName as a label for a taxon. Requires taxonomic opinion to define the scope of a taxon. Rules of priority then are used to define the taxonomic status of the nomenclature contained in that scope, combined with the experts opinion. It must be linked to a specific taxonomic reference that defines the concept. Recommended best practice is to use a controlled vocabulary. Examples: "invalid", "misapplied", "homotypic synonym", "accepted".

relationshipofResource - The relationship of the resource identified by relatedResourceID to the subject (optionally identified by the resourceID). Recommended best practice is to use a controlled vocabulary. Examples: "duplicate of", "mother of", "endoparasite of", "host to", "sibling of", "valid synonym of", "located within".

There’s also acceptedNameUsage and acceptedNameUsageID, which if used infer that the name the terms are associated with is a synonym of the AcceptedName.

But, so far there is no guideline for how to organize synonyms in a Darwin Core Archive. They can be embedded in the core file using relationshiopofResource from a synonym name to an accepted name in the same file. Or they can be in an extension file, where the extension file may be called Synonyms and thus define a one-to-many “synonym relationship” from the taxonID in the core file to synonym names in the extension file. There are probably other ways. RDF adds the ability to be more explicit about the relationships.

Rich Pyle has lectured prolifically on this so I’m sure he has good advice to offer.

Chuck

From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Hilmar Lapp Sent: Tuesday, March 18, 2014 2:55 PM To: TDWG Content Mailing List Cc: Dan Leehr Subject: [tdwg-content] synonyms in DwC Archives

I'm looking for recommendations on how best to put synonyms for taxon records into DwC Archive format.

I'm assuming that these would go into an extension file. Do I have this right? What I'm having more trouble with is determining the right column term. there's dwc:vernacularName, which is also in the examples, but what about synonyms of different types that come with taxonomies (such as NCBI's) or that result from merging taxonomies. There isn't an obvious candidate in DwC, and the list at http://rs.gbif.org/core/dwc_taxon.xml doesn't have a suggestion either that would seem pertinent.

Any suggestions, pointers to documentation or examples?

-hilmar

--

Hilmar Lapp -:- informatics.nescent.org/wiki -:- lappland.io

_______________________________________________

tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content

-- Hilmar Lapp -:- informatics.nescent.org/wiki -:- lappland.io

_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content

--------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 Skype: rdmpage Facebook: http://www.facebook.com/rdmpage LinkedIn: http://uk.linkedin.com/in/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html Wikipedia: http://en.wikipedia.org/wiki/Roderic_D._M._Page Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ ORCID: http://orcid.org/0000-0002-7101-9767