synonyms in DwC Archives
I'm looking for recommendations on how best to put synonyms for taxon records into DwC Archive format.
I'm assuming that these would go into an extension file. Do I have this right? What I'm having more trouble with is determining the right column term. there's dwc:vernacularName, which is also in the examples, but what about synonyms of different types that come with taxonomies (such as NCBI's) or that result from merging taxonomies. There isn't an obvious candidate in DwC, and the list at http://rs.gbif.org/core/dwc_taxon.xmldoesn%27t have a suggestion either that would seem pertinent.
Any suggestions, pointers to documentation or examples?
-hilmar
Hilmar, Sticking strictly to Darwin Core and not adding RDF, I think there are a couple of DwC terms that are attributes that can be used to identify a synonym:
taxonomicStatus - The status of the use of the scientificName as a label for a taxon. Requires taxonomic opinion to define the scope of a taxon. Rules of priority then are used to define the taxonomic status of the nomenclature contained in that scope, combined with the experts opinion. It must be linked to a specific taxonomic reference that defines the concept. Recommended best practice is to use a controlled vocabulary. Examples: "invalid", "misapplied", "homotypic synonym", "accepted".
relationshipofResource - The relationship of the resource identified by relatedResourceID to the subject (optionally identified by the resourceID). Recommended best practice is to use a controlled vocabulary. Examples: "duplicate of", "mother of", "endoparasite of", "host to", "sibling of", "valid synonym of", "located within".
There's also acceptedNameUsage and acceptedNameUsageID, which if used infer that the name the terms are associated with is a synonym of the AcceptedName.
But, so far there is no guideline for how to organize synonyms in a Darwin Core Archive. They can be embedded in the core file using relationshiopofResource from a synonym name to an accepted name in the same file. Or they can be in an extension file, where the extension file may be called Synonyms and thus define a one-to-many "synonym relationship" from the taxonID in the core file to synonym names in the extension file. There are probably other ways. RDF adds the ability to be more explicit about the relationships.
Rich Pyle has lectured prolifically on this so I'm sure he has good advice to offer.
Chuck
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Hilmar Lapp Sent: Tuesday, March 18, 2014 2:55 PM To: TDWG Content Mailing List Cc: Dan Leehr Subject: [tdwg-content] synonyms in DwC Archives
I'm looking for recommendations on how best to put synonyms for taxon records into DwC Archive format.
I'm assuming that these would go into an extension file. Do I have this right? What I'm having more trouble with is determining the right column term. there's dwc:vernacularName, which is also in the examples, but what about synonyms of different types that come with taxonomies (such as NCBI's) or that result from merging taxonomies. There isn't an obvious candidate in DwC, and the list at http://rs.gbif.org/core/dwc_taxon.xml doesn't have a suggestion either that would seem pertinent.
Any suggestions, pointers to documentation or examples?
-hilmar
-- Hilmar Lapp -:- informatics.nescent.org/wikihttp://informatics.nescent.org/wiki -:- lappland.iohttp://lappland.io
A point of information. One cannot use dwc:relationshipOfResource in a Darwin Core Archive Core file. That term does not belong among those in the Simple Darwin Core (http://rs.tdwg.org/dwc/terms/simple/index.htm), and so it can't be part of a Taxon Core file (http://tools.gbif.org/dwca-validator/extension.do?id=http://rs.tdwg.org/dwc/...).
All of what you want to do could be done with a Taxon Core and a ResourceRelationship extension (http://tools.gbif.org/dwca-validator/extension.do?id=http://rs.tdwg.org/dwc/...), but there has to be a better way. Surprisingly, I do not see any extension made for this purpose.
On Tue, Mar 18, 2014 at 2:44 PM, Chuck Miller Chuck.Miller@mobot.org wrote:
Hilmar,
Sticking strictly to Darwin Core and not adding RDF, I think there are a couple of DwC terms that are attributes that can be used to identify a synonym:
taxonomicStatus - The status of the use of the scientificName as a label for a taxon. Requires taxonomic opinion to define the scope of a taxon. Rules of priority then are used to define the taxonomic status of the nomenclature contained in that scope, combined with the experts opinion. It must be linked to a specific taxonomic reference that defines the concept. Recommended best practice is to use a controlled vocabulary. Examples: "invalid", "misapplied", "homotypic synonym", "accepted".
relationshipofResource - The relationship of the resource identified by relatedResourceID to the subject (optionally identified by the resourceID). Recommended best practice is to use a controlled vocabulary. Examples: "duplicate of", "mother of", "endoparasite of", "host to", "sibling of", "valid synonym of", "located within".
There's also acceptedNameUsage and acceptedNameUsageID, which if used infer that the name the terms are associated with is a synonym of the AcceptedName.
But, so far there is no guideline for how to organize synonyms in a Darwin Core Archive. They can be embedded in the core file using relationshiopofResource from a synonym name to an accepted name in the same file. Or they can be in an extension file, where the extension file may be called Synonyms and thus define a one-to-many "synonym relationship" from the taxonID in the core file to synonym names in the extension file. There are probably other ways. RDF adds the ability to be more explicit about the relationships.
Rich Pyle has lectured prolifically on this so I'm sure he has good advice to offer.
Chuck
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Hilmar Lapp Sent: Tuesday, March 18, 2014 2:55 PM To: TDWG Content Mailing List Cc: Dan Leehr Subject: [tdwg-content] synonyms in DwC Archives
I'm looking for recommendations on how best to put synonyms for taxon records into DwC Archive format.
I'm assuming that these would go into an extension file. Do I have this right? What I'm having more trouble with is determining the right column term. there's dwc:vernacularName, which is also in the examples, but what about synonyms of different types that come with taxonomies (such as NCBI's) or that result from merging taxonomies. There isn't an obvious candidate in DwC, and the list at http://rs.gbif.org/core/dwc_taxon.xml doesn't have a suggestion either that would seem pertinent.
Any suggestions, pointers to documentation or examples?
-hilmar
--
Hilmar Lapp -:- informatics.nescent.org/wiki -:- lappland.io
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Hi Hilmar,
GBIF, Catalog of life and others have produced guidelines for how to express taxonomies with synonyms and these are in widespread use already since over a year. I will forward links tomorrow when Im back at my desk.
The common idea is to include synonyms together with accepted taxa in the core file. This allows one to also add extension data to synonyms, for example bibliographic references, types data, etc. The term acceptedNameUsageID is used to link to the accepted record in the core file (targeting taxonID), originalNameUsageiD for the basionym and taxonomicStatus to declare a specific type of synonym such as homo/heterotypic or later/junior synonym. The scientificName is used both for accepted and synonym records.
You should be able to find many dwca examples in the gbif dataset search when filtered for checklists: http://www.gbif.org/dataset/search?type=CHECKLIST
For example try these: http://data.canadensys.net/ipt/archive.do?r=vascan http://ipt.speciesfile.org:8080/archive.do?r=orthoptera
Cheers, Markus
Am 18.03.2014 um 22:44 schrieb Chuck Miller Chuck.Miller@mobot.org:
Hilmar, Sticking strictly to Darwin Core and not adding RDF, I think there are a couple of DwC terms that are attributes that can be used to identify a synonym:
taxonomicStatus - The status of the use of the scientificName as a label for a taxon. Requires taxonomic opinion to define the scope of a taxon. Rules of priority then are used to define the taxonomic status of the nomenclature contained in that scope, combined with the experts opinion. It must be linked to a specific taxonomic reference that defines the concept. Recommended best practice is to use a controlled vocabulary. Examples: "invalid", "misapplied", "homotypic synonym", "accepted".
relationshipofResource - The relationship of the resource identified by relatedResourceID to the subject (optionally identified by the resourceID). Recommended best practice is to use a controlled vocabulary. Examples: "duplicate of", "mother of", "endoparasite of", "host to", "sibling of", "valid synonym of", "located within".
There’s also acceptedNameUsage and acceptedNameUsageID, which if used infer that the name the terms are associated with is a synonym of the AcceptedName.
But, so far there is no guideline for how to organize synonyms in a Darwin Core Archive. They can be embedded in the core file using relationshiopofResource from a synonym name to an accepted name in the same file. Or they can be in an extension file, where the extension file may be called Synonyms and thus define a one-to-many “synonym relationship” from the taxonID in the core file to synonym names in the extension file. There are probably other ways. RDF adds the ability to be more explicit about the relationships.
Rich Pyle has lectured prolifically on this so I’m sure he has good advice to offer.
Chuck
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Hilmar Lapp Sent: Tuesday, March 18, 2014 2:55 PM To: TDWG Content Mailing List Cc: Dan Leehr Subject: [tdwg-content] synonyms in DwC Archives
I'm looking for recommendations on how best to put synonyms for taxon records into DwC Archive format.
I'm assuming that these would go into an extension file. Do I have this right? What I'm having more trouble with is determining the right column term. there's dwc:vernacularName, which is also in the examples, but what about synonyms of different types that come with taxonomies (such as NCBI's) or that result from merging taxonomies. There isn't an obvious candidate in DwC, and the list at http://rs.gbif.org/core/dwc_taxon.xml doesn't have a suggestion either that would seem pertinent.
Any suggestions, pointers to documentation or examples?
-hilmar
-- Hilmar Lapp -:- informatics.nescent.org/wiki -:- lappland.io
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Tony - your example matches pretty much what GBIF does.
John - yes indeed, if using ResourceRelationship terms, it would have to be in an extension and not the core taxon file.
Markus and all - yes, I realized after I emailed how GBIF does this. I agree that this has advantages. However, this way of doing synonyms requires that there is an identifier for the synonym. For the core use-case I'm interested in synonyms are metadata of taxon records and do not have their own identifier. For example, synonyms in NCBI don't have identifiers, and they don't in Catalog of Fishes. (I'm not sure they do in PaleoDB.)
One could of course invent identifiers on behalf of the taxonomy providers in these cases, but that's a hack. I think if there is an extension for vernacularNames, there ought to be one as well for synonyms that are simply names.
-hilmar
On Tue, Mar 18, 2014 at 6:35 PM, Markus Döring m.doering@mac.com wrote:
Hi Hilmar,
GBIF, Catalog of life and others have produced guidelines for how to express taxonomies with synonyms and these are in widespread use already since over a year. I will forward links tomorrow when Im back at my desk.
The common idea is to include synonyms together with accepted taxa in the core file. This allows one to also add extension data to synonyms, for example bibliographic references, types data, etc. The term acceptedNameUsageID is used to link to the accepted record in the core file (targeting taxonID), originalNameUsageiD for the basionym and taxonomicStatus to declare a specific type of synonym such as homo/heterotypic or later/junior synonym. The scientificName is used both for accepted and synonym records.
You should be able to find many dwca examples in the gbif dataset search when filtered for checklists: http://www.gbif.org/dataset/search?type=CHECKLIST
For example try these: http://data.canadensys.net/ipt/archive.do?r=vascan http://ipt.speciesfile.org:8080/archive.do?r=orthoptera
Cheers, Markus
Am 18.03.2014 um 22:44 schrieb Chuck Miller Chuck.Miller@mobot.org:
Hilmar,
Sticking strictly to Darwin Core and not adding RDF, I think there are a couple of DwC terms that are attributes that can be used to identify a synonym:
taxonomicStatus - The status of the use of the scientificName as a label for a taxon. Requires taxonomic opinion to define the scope of a taxon. Rules of priority then are used to define the taxonomic status of the nomenclature contained in that scope, combined with the experts opinion. It must be linked to a specific taxonomic reference that defines the concept. Recommended best practice is to use a controlled vocabulary. Examples: "invalid", "misapplied", "homotypic synonym", "accepted".
relationshipofResource - The relationship of the resource identified by relatedResourceID to the subject (optionally identified by the resourceID). Recommended best practice is to use a controlled vocabulary. Examples: "duplicate of", "mother of", "endoparasite of", "host to", "sibling of", "valid synonym of", "located within".
There's also acceptedNameUsage and acceptedNameUsageID, which if used infer that the name the terms are associated with is a synonym of the AcceptedName.
But, so far there is no guideline for how to organize synonyms in a Darwin Core Archive. They can be embedded in the core file using relationshiopofResource from a synonym name to an accepted name in the same file. Or they can be in an extension file, where the extension file may be called Synonyms and thus define a one-to-many "synonym relationship" from the taxonID in the core file to synonym names in the extension file. There are probably other ways. RDF adds the ability to be more explicit about the relationships.
Rich Pyle has lectured prolifically on this so I'm sure he has good advice to offer.
Chuck
*From:* tdwg-content-bounces@lists.tdwg.org [ mailto:tdwg-content-bounces@lists.tdwg.orgtdwg-content-bounces@lists.tdwg.org] *On Behalf Of *Hilmar Lapp *Sent:* Tuesday, March 18, 2014 2:55 PM *To:* TDWG Content Mailing List *Cc:* Dan Leehr *Subject:* [tdwg-content] synonyms in DwC Archives
I'm looking for recommendations on how best to put synonyms for taxon records into DwC Archive format.
I'm assuming that these would go into an extension file. Do I have this right? What I'm having more trouble with is determining the right column term. there's dwc:vernacularName, which is also in the examples, but what about synonyms of different types that come with taxonomies (such as NCBI's) or that result from merging taxonomies. There isn't an obvious candidate in DwC, and the list at http://rs.gbif.org/core/dwc_taxon.xmldoesn%27t have a suggestion either that would seem pertinent.
Any suggestions, pointers to documentation or examples?
-hilmar
--
Hilmar Lapp -:- informatics.nescent.org/wiki -:- lappland.io
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Hi Hilmar,
I think the phrase “simply names” summarises part ot the problem here. Many taxon databases seem determined to conflate names and taxa, and treat names as dumb strings rather than entities of interest (or worse, as taxa, rather than names of taxa). Many synonyms are relationships between names (e.g., names that change because we move species to different genera are the “same” name), others are relationships between taxa (e.g., this set of things and that set of things are the members of the same set of things).
We could model things much more cleanly if we distinguished between taxa and names for taxa, and made use of the fact that we have databases of names (the nomenclators) that have been serving name data and stable identifiers for almost a decade. It is a pity that most taxonomic databases make little or no use of this. Map names to those ids, have ids for taxa, and link all the names to the taxa to which they’ve been applied. Instead we have a flat Darwin Core with row upon row of things that we call “taxa” but which, in many cases, aren’t.
I get why we like flat Darwin Core, but sometimes the world isn’t flat.
Regards
Rod
On 19 Mar 2014, at 02:44, Hilmar Lapp hlapp@nescent.org wrote:
Tony - your example matches pretty much what GBIF does.
John - yes indeed, if using ResourceRelationship terms, it would have to be in an extension and not the core taxon file.
Markus and all - yes, I realized after I emailed how GBIF does this. I agree that this has advantages. However, this way of doing synonyms requires that there is an identifier for the synonym. For the core use-case I'm interested in synonyms are metadata of taxon records and do not have their own identifier. For example, synonyms in NCBI don't have identifiers, and they don't in Catalog of Fishes. (I'm not sure they do in PaleoDB.)
One could of course invent identifiers on behalf of the taxonomy providers in these cases, but that's a hack. I think if there is an extension for vernacularNames, there ought to be one as well for synonyms that are simply names.
-hilmar
On Tue, Mar 18, 2014 at 6:35 PM, Markus Döring m.doering@mac.com wrote: Hi Hilmar,
GBIF, Catalog of life and others have produced guidelines for how to express taxonomies with synonyms and these are in widespread use already since over a year. I will forward links tomorrow when Im back at my desk.
The common idea is to include synonyms together with accepted taxa in the core file. This allows one to also add extension data to synonyms, for example bibliographic references, types data, etc. The term acceptedNameUsageID is used to link to the accepted record in the core file (targeting taxonID), originalNameUsageiD for the basionym and taxonomicStatus to declare a specific type of synonym such as homo/heterotypic or later/junior synonym. The scientificName is used both for accepted and synonym records.
You should be able to find many dwca examples in the gbif dataset search when filtered for checklists: http://www.gbif.org/dataset/search?type=CHECKLIST
For example try these: http://data.canadensys.net/ipt/archive.do?r=vascan http://ipt.speciesfile.org:8080/archive.do?r=orthoptera
Cheers, Markus
Am 18.03.2014 um 22:44 schrieb Chuck Miller Chuck.Miller@mobot.org:
Hilmar,
Sticking strictly to Darwin Core and not adding RDF, I think there are a couple of DwC terms that are attributes that can be used to identify a synonym:
taxonomicStatus - The status of the use of the scientificName as a label for a taxon. Requires taxonomic opinion to define the scope of a taxon. Rules of priority then are used to define the taxonomic status of the nomenclature contained in that scope, combined with the experts opinion. It must be linked to a specific taxonomic reference that defines the concept. Recommended best practice is to use a controlled vocabulary. Examples: "invalid", "misapplied", "homotypic synonym", "accepted".
relationshipofResource - The relationship of the resource identified by relatedResourceID to the subject (optionally identified by the resourceID). Recommended best practice is to use a controlled vocabulary. Examples: "duplicate of", "mother of", "endoparasite of", "host to", "sibling of", "valid synonym of", "located within".
There’s also acceptedNameUsage and acceptedNameUsageID, which if used infer that the name the terms are associated with is a synonym of the AcceptedName.
But, so far there is no guideline for how to organize synonyms in a Darwin Core Archive. They can be embedded in the core file using relationshiopofResource from a synonym name to an accepted name in the same file. Or they can be in an extension file, where the extension file may be called Synonyms and thus define a one-to-many “synonym relationship” from the taxonID in the core file to synonym names in the extension file. There are probably other ways. RDF adds the ability to be more explicit about the relationships.
Rich Pyle has lectured prolifically on this so I’m sure he has good advice to offer.
Chuck
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Hilmar Lapp Sent: Tuesday, March 18, 2014 2:55 PM To: TDWG Content Mailing List Cc: Dan Leehr Subject: [tdwg-content] synonyms in DwC Archives
I'm looking for recommendations on how best to put synonyms for taxon records into DwC Archive format.
I'm assuming that these would go into an extension file. Do I have this right? What I'm having more trouble with is determining the right column term. there's dwc:vernacularName, which is also in the examples, but what about synonyms of different types that come with taxonomies (such as NCBI's) or that result from merging taxonomies. There isn't an obvious candidate in DwC, and the list at http://rs.gbif.org/core/dwc_taxon.xml doesn't have a suggestion either that would seem pertinent.
Any suggestions, pointers to documentation or examples?
-hilmar
--
Hilmar Lapp -:- informatics.nescent.org/wiki -:- lappland.io
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Hilmar Lapp -:- informatics.nescent.org/wiki -:- lappland.io
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
--------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 Skype: rdmpage Facebook: http://www.facebook.com/rdmpage LinkedIn: http://uk.linkedin.com/in/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html Wikipedia: http://en.wikipedia.org/wiki/Roderic_D._M._Page Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ ORCID: http://orcid.org/0000-0002-7101-9767
Dear Hilmar,
I completely concur with Rod. An additional remark is that Biodiversity Informatics has unfortunately twisted the meaning of the word ‘synonym’ as it is used in the different codes where it designates primarily a relationship between taxa and only secondarily by extension a relationship between names (because we designate taxa by using names). Formally, two different combinations of the same basionym are not synonyms: they are different combinations.
The flat DarwinCore cannot handle (easily) the 3 necessary Ids that have to be managed:
- TaxonId
- BasionymId
- NameAsStringId
Technically, the NameAsStringId could be used alone, but only if there is a second field that precises if it is a Basionym or not, and a third that indicates if it is the current accepted name: in the latter case, then the NameAsStringId can be used as the TaxonId. So in any case, a triplet has to be managed, otherwise, if you give an id alone, you cannot know what it is beyond a simple NameAsString.
But to be honest I prefer for clarity to have 3 different codes. It is what we have in FishBase, a SpecCode as TaxonId, a SynCode as NameAsStringId, and we use the Catalog of Fishes (CofF) CAS_SPC as the BasionymId – so we do exactly what Rod suggests, we use the acknowledged nomenclator. Strictly speaking and referring to my remark above, CofF has ids for all basionyms, whatever they are linked to valid taxa or synonym taxa. What CofF does not have is an id for the new combinations and misspellings.
Where FishBase is not completely consistent is that we use a SynCode also for the misidentifications and the misapplied names, while these relationships should be managed in another table than the table of NameAsStrings. Also like Markus described, the current accepted NameAsString is repeated both in our Taxon and our NameAsString tables. These were compromises to simplify the structure of the database.
BW Nicolas.
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Roderic Page Sent: Wednesday, March 19, 2014 1:38 PM To: Hilmar Lapp Cc: TDWG Content Mailing List Subject: Re: [tdwg-content] synonyms in DwC Archives
Hi Hilmar,
I think the phrase “simply names” summarises part ot the problem here. Many taxon databases seem determined to conflate names and taxa, and treat names as dumb strings rather than entities of interest (or worse, as taxa, rather than names of taxa). Many synonyms are relationships between names (e.g., names that change because we move species to different genera are the “same” name), others are relationships between taxa (e.g., this set of things and that set of things are the members of the same set of things).
We could model things much more cleanly if we distinguished between taxa and names for taxa, and made use of the fact that we have databases of names (the nomenclators) that have been serving name data and stable identifiers for almost a decade. It is a pity that most taxonomic databases make little or no use of this. Map names to those ids, have ids for taxa, and link all the names to the taxa to which they’ve been applied. Instead we have a flat Darwin Core with row upon row of things that we call “taxa” but which, in many cases, aren’t.
I get why we like flat Darwin Core, but sometimes the world isn’t flat.
Regards
Rod
On 19 Mar 2014, at 02:44, Hilmar Lapp <hlapp@nescent.orgmailto:hlapp@nescent.org> wrote:
Tony - your example matches pretty much what GBIF does.
John - yes indeed, if using ResourceRelationship terms, it would have to be in an extension and not the core taxon file.
Markus and all - yes, I realized after I emailed how GBIF does this. I agree that this has advantages. However, this way of doing synonyms requires that there is an identifier for the synonym. For the core use-case I'm interested in synonyms are metadata of taxon records and do not have their own identifier. For example, synonyms in NCBI don't have identifiers, and they don't in Catalog of Fishes. (I'm not sure they do in PaleoDB.)
One could of course invent identifiers on behalf of the taxonomy providers in these cases, but that's a hack. I think if there is an extension for vernacularNames, there ought to be one as well for synonyms that are simply names.
-hilmar
On Tue, Mar 18, 2014 at 6:35 PM, Markus Döring <m.doering@mac.commailto:m.doering@mac.com> wrote: Hi Hilmar,
GBIF, Catalog of life and others have produced guidelines for how to express taxonomies with synonyms and these are in widespread use already since over a year. I will forward links tomorrow when Im back at my desk.
The common idea is to include synonyms together with accepted taxa in the core file. This allows one to also add extension data to synonyms, for example bibliographic references, types data, etc. The term acceptedNameUsageID is used to link to the accepted record in the core file (targeting taxonID), originalNameUsageiD for the basionym and taxonomicStatus to declare a specific type of synonym such as homo/heterotypic or later/junior synonym. The scientificName is used both for accepted and synonym records.
You should be able to find many dwca examples in the gbif dataset search when filtered for checklists: http://www.gbif.org/dataset/search?type=CHECKLIST
For example try these: http://data.canadensys.net/ipt/archive.do?r=vascan http://ipt.speciesfile.org:8080/archive.do?r=orthoptera
Cheers, Markus
Am 18.03.2014 um 22:44 schrieb Chuck Miller <Chuck.Miller@mobot.orgmailto:Chuck.Miller@mobot.org>: Hilmar, Sticking strictly to Darwin Core and not adding RDF, I think there are a couple of DwC terms that are attributes that can be used to identify a synonym:
taxonomicStatus - The status of the use of the scientificName as a label for a taxon. Requires taxonomic opinion to define the scope of a taxon. Rules of priority then are used to define the taxonomic status of the nomenclature contained in that scope, combined with the experts opinion. It must be linked to a specific taxonomic reference that defines the concept. Recommended best practice is to use a controlled vocabulary. Examples: "invalid", "misapplied", "homotypic synonym", "accepted".
relationshipofResource - The relationship of the resource identified by relatedResourceID to the subject (optionally identified by the resourceID). Recommended best practice is to use a controlled vocabulary. Examples: "duplicate of", "mother of", "endoparasite of", "host to", "sibling of", "valid synonym of", "located within".
There’s also acceptedNameUsage and acceptedNameUsageID, which if used infer that the name the terms are associated with is a synonym of the AcceptedName.
But, so far there is no guideline for how to organize synonyms in a Darwin Core Archive. They can be embedded in the core file using relationshiopofResource from a synonym name to an accepted name in the same file. Or they can be in an extension file, where the extension file may be called Synonyms and thus define a one-to-many “synonym relationship” from the taxonID in the core file to synonym names in the extension file. There are probably other ways. RDF adds the ability to be more explicit about the relationships.
Rich Pyle has lectured prolifically on this so I’m sure he has good advice to offer.
Chuck
From: tdwg-content-bounces@lists.tdwg.orgmailto:tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Hilmar Lapp Sent: Tuesday, March 18, 2014 2:55 PM To: TDWG Content Mailing List Cc: Dan Leehr Subject: [tdwg-content] synonyms in DwC Archives
I'm looking for recommendations on how best to put synonyms for taxon records into DwC Archive format.
I'm assuming that these would go into an extension file. Do I have this right? What I'm having more trouble with is determining the right column term. there's dwc:vernacularName, which is also in the examples, but what about synonyms of different types that come with taxonomies (such as NCBI's) or that result from merging taxonomies. There isn't an obvious candidate in DwC, and the list at http://rs.gbif.org/core/dwc_taxon.xml doesn't have a suggestion either that would seem pertinent.
Any suggestions, pointers to documentation or examples?
-hilmar
-- Hilmar Lapp -:- informatics.nescent.org/wikihttp://informatics.nescent.org/wiki -:- lappland.iohttp://lappland.io/
_______________________________________________
tdwg-content mailing list tdwg-content@lists.tdwg.orgmailto:tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Hilmar Lapp -:- informatics.nescent.org/wikihttp://informatics.nescent.org/wiki -:- lappland.iohttp://lappland.io/
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.orgmailto:tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
--------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.ukmailto:r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 Skype: rdmpage Facebook: http://www.facebook.com/rdmpage LinkedIn: http://uk.linkedin.com/in/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html Wikipedia: http://en.wikipedia.org/wiki/Roderic_D._M._Page Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ ORCID: http://orcid.org/0000-0002-7101-9767
I’m reluctant to dive into this conversation, because there are so many things going on. Very briefly, Rod and Nicolas come the closest to reflecting my own interpretation of the “simple” explanation. One question for Nicolas – why do you need “NameAsStringID”? Why not just NameString (literal)? What value is there for assigning an identifier to what is effectively a string of UTF-8-encoded characters?
The key here, as Rod (and others) alluded to, is that different people have different meanings for the word “Synonym” in this context; and this is further confused by the whole names vs. concepts thing.
Ignoring the names v. concepts issue altogether, we have various interpretations of “synonym” as including or excluding homotypic synonyms, heterotypic synonyms, and orthographic variations. That is further confounded by the “names as conceptual objects” vs, “names as literal text strings” issues.
The answer is obvious: reduce everything to atomized Taxon Name Usage instances (where the word “usage” comes from n DwC) to track “names as objects”, and index every unique text string to track the strings themselves. GNUB is well along this path on the former for the groups that have been populated, and we are now ramping up our efforts to bulk-populate the content from various sources. Once you populate the Protonym (~=Basionym) Usages (the hardest part, by far), then you can start to capture subsequent usages en masse to represent the spectrum of how names have been actually used through history -- different spellings, different classifications and genus combinations, different ranks (e.g., as a full species vs. as a subspecies of another species), and treatments of names as valid (=accepted) or as junior heterotypic synonyms. With that properly captured & index, you can do all sorts of wonderful, magical, amazing things that have only been dreamed about.
So, the problem is well understood. And the solution is also well understood (as a result of the development of GNUB and GNI and associated services via GNA, guided by a series of NOMINA meetings going back several years). What we need to do now is populate the content (particularly GNUB), and then implement the services.
I realize this is getting pretty far away from Hilmar’s original question – but that has been reasonably well addressed by others (particularly Markus, Tony, Rod and Nicholas).
One final point to keep in mind when discussing this stuff – particularly in a flattened DwC sort of structure:
The unqualified statement “Aus bus is a synonym of Aus cus” should never be represented in the form of metadata without an “According to” to go along with it. Even objective synonyms should be understood in the context of the relevant Code (i.e., where the “According To” is effectively the Code of Nomenclature itself). This is why the “usage” qualifiers were added to the various DWC terms in the Taxon class – to emphasize that the status of a name as being either accepted, or as representing a synonym of another name, is implied to be in the context of a particular usage (i.e., a particular “according to”).
Aloha,
Rich
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Bailly, Nicolas (WorldFish) Sent: Tuesday, March 18, 2014 8:57 PM To: Roderic Page; Hilmar Lapp Cc: TDWG Content Mailing List Subject: Re: [tdwg-content] synonyms in DwC Archives
Dear Hilmar,
I completely concur with Rod. An additional remark is that Biodiversity Informatics has unfortunately twisted the meaning of the word ‘synonym’ as it is used in the different codes where it designates primarily a relationship between taxa and only secondarily by extension a relationship between names (because we designate taxa by using names). Formally, two different combinations of the same basionym are not synonyms: they are different combinations.
The flat DarwinCore cannot handle (easily) the 3 necessary Ids that have to be managed:
- TaxonId
- BasionymId
- NameAsStringId
Technically, the NameAsStringId could be used alone, but only if there is a second field that precises if it is a Basionym or not, and a third that indicates if it is the current accepted name: in the latter case, then the NameAsStringId can be used as the TaxonId. So in any case, a triplet has to be managed, otherwise, if you give an id alone, you cannot know what it is beyond a simple NameAsString.
But to be honest I prefer for clarity to have 3 different codes. It is what we have in FishBase, a SpecCode as TaxonId, a SynCode as NameAsStringId, and we use the Catalog of Fishes (CofF) CAS_SPC as the BasionymId – so we do exactly what Rod suggests, we use the acknowledged nomenclator. Strictly speaking and referring to my remark above, CofF has ids for all basionyms, whatever they are linked to valid taxa or synonym taxa. What CofF does not have is an id for the new combinations and misspellings.
Where FishBase is not completely consistent is that we use a SynCode also for the misidentifications and the misapplied names, while these relationships should be managed in another table than the table of NameAsStrings. Also like Markus described, the current accepted NameAsString is repeated both in our Taxon and our NameAsString tables. These were compromises to simplify the structure of the database.
BW
Nicolas.
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Roderic Page Sent: Wednesday, March 19, 2014 1:38 PM To: Hilmar Lapp Cc: TDWG Content Mailing List Subject: Re: [tdwg-content] synonyms in DwC Archives
Hi Hilmar,
I think the phrase “simply names” summarises part ot the problem here. Many taxon databases seem determined to conflate names and taxa, and treat names as dumb strings rather than entities of interest (or worse, as taxa, rather than names of taxa). Many synonyms are relationships between names (e.g., names that change because we move species to different genera are the “same” name), others are relationships between taxa (e.g., this set of things and that set of things are the members of the same set of things).
We could model things much more cleanly if we distinguished between taxa and names for taxa, and made use of the fact that we have databases of names (the nomenclators) that have been serving name data and stable identifiers for almost a decade. It is a pity that most taxonomic databases make little or no use of this. Map names to those ids, have ids for taxa, and link all the names to the taxa to which they’ve been applied. Instead we have a flat Darwin Core with row upon row of things that we call “taxa” but which, in many cases, aren’t.
I get why we like flat Darwin Core, but sometimes the world isn’t flat.
Regards
Rod
On 19 Mar 2014, at 02:44, Hilmar Lapp hlapp@nescent.org wrote:
Tony - your example matches pretty much what GBIF does.
John - yes indeed, if using ResourceRelationship terms, it would have to be in an extension and not the core taxon file.
Markus and all - yes, I realized after I emailed how GBIF does this. I agree that this has advantages. However, this way of doing synonyms requires that there is an identifier for the synonym. For the core use-case I'm interested in synonyms are metadata of taxon records and do not have their own identifier. For example, synonyms in NCBI don't have identifiers, and they don't in Catalog of Fishes. (I'm not sure they do in PaleoDB.)
One could of course invent identifiers on behalf of the taxonomy providers in these cases, but that's a hack. I think if there is an extension for vernacularNames, there ought to be one as well for synonyms that are simply names.
-hilmar
On Tue, Mar 18, 2014 at 6:35 PM, Markus Döring m.doering@mac.com wrote:
Hi Hilmar,
GBIF, Catalog of life and others have produced guidelines for how to express taxonomies with synonyms and these are in widespread use already since over a year. I will forward links tomorrow when Im back at my desk.
The common idea is to include synonyms together with accepted taxa in the core file. This allows one to also add extension data to synonyms, for example bibliographic references, types data, etc. The term acceptedNameUsageID is used to link to the accepted record in the core file (targeting taxonID), originalNameUsageiD for the basionym and taxonomicStatus to declare a specific type of synonym such as homo/heterotypic or later/junior synonym. The scientificName is used both for accepted and synonym records.
You should be able to find many dwca examples in the gbif dataset search when filtered for checklists: http://www.gbif.org/dataset/search?type=CHECKLIST
For example try these:
http://data.canadensys.net/ipt/archive.do?r=vascan
http://ipt.speciesfile.org:8080/archive.do?r=orthoptera
Cheers,
Markus
Am 18.03.2014 um 22:44 schrieb Chuck Miller Chuck.Miller@mobot.org:
Hilmar,
Sticking strictly to Darwin Core and not adding RDF, I think there are a couple of DwC terms that are attributes that can be used to identify a synonym:
taxonomicStatus - The status of the use of the scientificName as a label for a taxon. Requires taxonomic opinion to define the scope of a taxon. Rules of priority then are used to define the taxonomic status of the nomenclature contained in that scope, combined with the experts opinion. It must be linked to a specific taxonomic reference that defines the concept. Recommended best practice is to use a controlled vocabulary. Examples: "invalid", "misapplied", "homotypic synonym", "accepted".
relationshipofResource - The relationship of the resource identified by relatedResourceID to the subject (optionally identified by the resourceID). Recommended best practice is to use a controlled vocabulary. Examples: "duplicate of", "mother of", "endoparasite of", "host to", "sibling of", "valid synonym of", "located within".
There’s also acceptedNameUsage and acceptedNameUsageID, which if used infer that the name the terms are associated with is a synonym of the AcceptedName.
But, so far there is no guideline for how to organize synonyms in a Darwin Core Archive. They can be embedded in the core file using relationshiopofResource from a synonym name to an accepted name in the same file. Or they can be in an extension file, where the extension file may be called Synonyms and thus define a one-to-many “synonym relationship” from the taxonID in the core file to synonym names in the extension file. There are probably other ways. RDF adds the ability to be more explicit about the relationships.
Rich Pyle has lectured prolifically on this so I’m sure he has good advice to offer.
Chuck
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Hilmar Lapp Sent: Tuesday, March 18, 2014 2:55 PM To: TDWG Content Mailing List Cc: Dan Leehr Subject: [tdwg-content] synonyms in DwC Archives
I'm looking for recommendations on how best to put synonyms for taxon records into DwC Archive format.
I'm assuming that these would go into an extension file. Do I have this right? What I'm having more trouble with is determining the right column term. there's dwc:vernacularName, which is also in the examples, but what about synonyms of different types that come with taxonomies (such as NCBI's) or that result from merging taxonomies. There isn't an obvious candidate in DwC, and the list at http://rs.gbif.org/core/dwc_taxon.xml doesn't have a suggestion either that would seem pertinent.
Any suggestions, pointers to documentation or examples?
-hilmar
Using the string as the key … The way we manage the synonymy pro parte would not work (we repeat the string and have 2 different syn records), but it is because of compromises we do for simplicity. For homonyms, you would need mandatorily to include the authorship, but maybe it means that assigning a numeric Id is adding more meaning than the strict string, which is again a compromise. To say it in other words, we don’t have a repository of meaningless strings like GNI, our synonym table is thus closer to a subset of the GNUB implicitly. So ok you got a point: what I call NameAsString with its Id in FishBase is not strictly a name as string. I suppose it is the difference I saw comparing FishBase and GNA, but I did not pay attention enough to the vocabulary. …
Agree with the rest of your analyses and suggestions. And with the less sexy part of it: populate the GNUB! With the most tedious part: the “according to”! Unfortunately, I don’t see that it can be entirely automated …
And yes, I dream about it … !!!
BW Nicolas.
From: Richard Pyle [mailto:pylediver@gmail.com] On Behalf Of Richard Pyle Sent: Wednesday, March 19, 2014 6:16 PM To: Bailly, Nicolas (WorldFish); 'Roderic Page'; 'Hilmar Lapp' Cc: 'TDWG Content Mailing List' Subject: RE: [tdwg-content] synonyms in DwC Archives
I’m reluctant to dive into this conversation, because there are so many things going on. Very briefly, Rod and Nicolas come the closest to reflecting my own interpretation of the “simple” explanation. One question for Nicolas – why do you need “NameAsStringID”? Why not just NameString (literal)? What value is there for assigning an identifier to what is effectively a string of UTF-8-encoded characters?
The key here, as Rod (and others) alluded to, is that different people have different meanings for the word “Synonym” in this context; and this is further confused by the whole names vs. concepts thing.
Ignoring the names v. concepts issue altogether, we have various interpretations of “synonym” as including or excluding homotypic synonyms, heterotypic synonyms, and orthographic variations. That is further confounded by the “names as conceptual objects” vs, “names as literal text strings” issues.
The answer is obvious: reduce everything to atomized Taxon Name Usage instances (where the word “usage” comes from n DwC) to track “names as objects”, and index every unique text string to track the strings themselves. GNUB is well along this path on the former for the groups that have been populated, and we are now ramping up our efforts to bulk-populate the content from various sources. Once you populate the Protonym (~=Basionym) Usages (the hardest part, by far), then you can start to capture subsequent usages en masse to represent the spectrum of how names have been actually used through history -- different spellings, different classifications and genus combinations, different ranks (e.g., as a full species vs. as a subspecies of another species), and treatments of names as valid (=accepted) or as junior heterotypic synonyms. With that properly captured & index, you can do all sorts of wonderful, magical, amazing things that have only been dreamed about.
So, the problem is well understood. And the solution is also well understood (as a result of the development of GNUB and GNI and associated services via GNA, guided by a series of NOMINA meetings going back several years). What we need to do now is populate the content (particularly GNUB), and then implement the services.
I realize this is getting pretty far away from Hilmar’s original question – but that has been reasonably well addressed by others (particularly Markus, Tony, Rod and Nicholas).
One final point to keep in mind when discussing this stuff – particularly in a flattened DwC sort of structure:
The unqualified statement “Aus bus is a synonym of Aus cus” should never be represented in the form of metadata without an “According to” to go along with it. Even objective synonyms should be understood in the context of the relevant Code (i.e., where the “According To” is effectively the Code of Nomenclature itself). This is why the “usage” qualifiers were added to the various DWC terms in the Taxon class – to emphasize that the status of a name as being either accepted, or as representing a synonym of another name, is implied to be in the context of a particular usage (i.e., a particular “according to”).
Aloha, Rich
From: tdwg-content-bounces@lists.tdwg.orgmailto:tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Bailly, Nicolas (WorldFish) Sent: Tuesday, March 18, 2014 8:57 PM To: Roderic Page; Hilmar Lapp Cc: TDWG Content Mailing List Subject: Re: [tdwg-content] synonyms in DwC Archives
Dear Hilmar,
I completely concur with Rod. An additional remark is that Biodiversity Informatics has unfortunately twisted the meaning of the word ‘synonym’ as it is used in the different codes where it designates primarily a relationship between taxa and only secondarily by extension a relationship between names (because we designate taxa by using names). Formally, two different combinations of the same basionym are not synonyms: they are different combinations.
The flat DarwinCore cannot handle (easily) the 3 necessary Ids that have to be managed:
- TaxonId
- BasionymId
- NameAsStringId
Technically, the NameAsStringId could be used alone, but only if there is a second field that precises if it is a Basionym or not, and a third that indicates if it is the current accepted name: in the latter case, then the NameAsStringId can be used as the TaxonId. So in any case, a triplet has to be managed, otherwise, if you give an id alone, you cannot know what it is beyond a simple NameAsString.
But to be honest I prefer for clarity to have 3 different codes. It is what we have in FishBase, a SpecCode as TaxonId, a SynCode as NameAsStringId, and we use the Catalog of Fishes (CofF) CAS_SPC as the BasionymId – so we do exactly what Rod suggests, we use the acknowledged nomenclator. Strictly speaking and referring to my remark above, CofF has ids for all basionyms, whatever they are linked to valid taxa or synonym taxa. What CofF does not have is an id for the new combinations and misspellings.
Where FishBase is not completely consistent is that we use a SynCode also for the misidentifications and the misapplied names, while these relationships should be managed in another table than the table of NameAsStrings. Also like Markus described, the current accepted NameAsString is repeated both in our Taxon and our NameAsString tables. These were compromises to simplify the structure of the database.
BW Nicolas.
From: tdwg-content-bounces@lists.tdwg.orgmailto:tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Roderic Page Sent: Wednesday, March 19, 2014 1:38 PM To: Hilmar Lapp Cc: TDWG Content Mailing List Subject: Re: [tdwg-content] synonyms in DwC Archives
Hi Hilmar,
I think the phrase “simply names” summarises part ot the problem here. Many taxon databases seem determined to conflate names and taxa, and treat names as dumb strings rather than entities of interest (or worse, as taxa, rather than names of taxa). Many synonyms are relationships between names (e.g., names that change because we move species to different genera are the “same” name), others are relationships between taxa (e.g., this set of things and that set of things are the members of the same set of things).
We could model things much more cleanly if we distinguished between taxa and names for taxa, and made use of the fact that we have databases of names (the nomenclators) that have been serving name data and stable identifiers for almost a decade. It is a pity that most taxonomic databases make little or no use of this. Map names to those ids, have ids for taxa, and link all the names to the taxa to which they’ve been applied. Instead we have a flat Darwin Core with row upon row of things that we call “taxa” but which, in many cases, aren’t.
I get why we like flat Darwin Core, but sometimes the world isn’t flat.
Regards
Rod
On 19 Mar 2014, at 02:44, Hilmar Lapp <hlapp@nescent.orgmailto:hlapp@nescent.org> wrote:
Tony - your example matches pretty much what GBIF does.
John - yes indeed, if using ResourceRelationship terms, it would have to be in an extension and not the core taxon file.
Markus and all - yes, I realized after I emailed how GBIF does this. I agree that this has advantages. However, this way of doing synonyms requires that there is an identifier for the synonym. For the core use-case I'm interested in synonyms are metadata of taxon records and do not have their own identifier. For example, synonyms in NCBI don't have identifiers, and they don't in Catalog of Fishes. (I'm not sure they do in PaleoDB.)
One could of course invent identifiers on behalf of the taxonomy providers in these cases, but that's a hack. I think if there is an extension for vernacularNames, there ought to be one as well for synonyms that are simply names.
-hilmar
On Tue, Mar 18, 2014 at 6:35 PM, Markus Döring <m.doering@mac.commailto:m.doering@mac.com> wrote: Hi Hilmar,
GBIF, Catalog of life and others have produced guidelines for how to express taxonomies with synonyms and these are in widespread use already since over a year. I will forward links tomorrow when Im back at my desk.
The common idea is to include synonyms together with accepted taxa in the core file. This allows one to also add extension data to synonyms, for example bibliographic references, types data, etc. The term acceptedNameUsageID is used to link to the accepted record in the core file (targeting taxonID), originalNameUsageiD for the basionym and taxonomicStatus to declare a specific type of synonym such as homo/heterotypic or later/junior synonym. The scientificName is used both for accepted and synonym records.
You should be able to find many dwca examples in the gbif dataset search when filtered for checklists: http://www.gbif.org/dataset/search?type=CHECKLIST
For example try these: http://data.canadensys.net/ipt/archive.do?r=vascan http://ipt.speciesfile.org:8080/archive.do?r=orthoptera
Cheers, Markus
Am 18.03.2014 um 22:44 schrieb Chuck Miller <Chuck.Miller@mobot.orgmailto:Chuck.Miller@mobot.org>: Hilmar, Sticking strictly to Darwin Core and not adding RDF, I think there are a couple of DwC terms that are attributes that can be used to identify a synonym:
taxonomicStatus - The status of the use of the scientificName as a label for a taxon. Requires taxonomic opinion to define the scope of a taxon. Rules of priority then are used to define the taxonomic status of the nomenclature contained in that scope, combined with the experts opinion. It must be linked to a specific taxonomic reference that defines the concept. Recommended best practice is to use a controlled vocabulary. Examples: "invalid", "misapplied", "homotypic synonym", "accepted".
relationshipofResource - The relationship of the resource identified by relatedResourceID to the subject (optionally identified by the resourceID). Recommended best practice is to use a controlled vocabulary. Examples: "duplicate of", "mother of", "endoparasite of", "host to", "sibling of", "valid synonym of", "located within".
There’s also acceptedNameUsage and acceptedNameUsageID, which if used infer that the name the terms are associated with is a synonym of the AcceptedName.
But, so far there is no guideline for how to organize synonyms in a Darwin Core Archive. They can be embedded in the core file using relationshiopofResource from a synonym name to an accepted name in the same file. Or they can be in an extension file, where the extension file may be called Synonyms and thus define a one-to-many “synonym relationship” from the taxonID in the core file to synonym names in the extension file. There are probably other ways. RDF adds the ability to be more explicit about the relationships.
Rich Pyle has lectured prolifically on this so I’m sure he has good advice to offer.
Chuck
From: tdwg-content-bounces@lists.tdwg.orgmailto:tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Hilmar Lapp Sent: Tuesday, March 18, 2014 2:55 PM To: TDWG Content Mailing List Cc: Dan Leehr Subject: [tdwg-content] synonyms in DwC Archives
I'm looking for recommendations on how best to put synonyms for taxon records into DwC Archive format.
I'm assuming that these would go into an extension file. Do I have this right? What I'm having more trouble with is determining the right column term. there's dwc:vernacularName, which is also in the examples, but what about synonyms of different types that come with taxonomies (such as NCBI's) or that result from merging taxonomies. There isn't an obvious candidate in DwC, and the list at http://rs.gbif.org/core/dwc_taxon.xml doesn't have a suggestion either that would seem pertinent.
Any suggestions, pointers to documentation or examples?
-hilmar
-- Hilmar Lapp -:- informatics.nescent.org/wikihttp://informatics.nescent.org/wiki -:- lappland.iohttp://lappland.io/
_______________________________________________
tdwg-content mailing list tdwg-content@lists.tdwg.orgmailto:tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Hilmar Lapp -:- informatics.nescent.org/wikihttp://informatics.nescent.org/wiki -:- lappland.iohttp://lappland.io/
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.orgmailto:tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
--------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.ukmailto:r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 Skype: rdmpage Facebook: http://www.facebook.com/rdmpage LinkedIn: http://uk.linkedin.com/in/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html Wikipedia: http://en.wikipedia.org/wiki/Roderic_D._M._Page Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ ORCID: http://orcid.org/0000-0002-7101-9767
Thanks, Nicolas – that makes sense.
I guess I’ve come to the view that the most sensible path is to split nomenclatural data management processes into two tracks:
Track 1: Names as text strings. This is the GNI track, and it literally treats a sequence of UTF-8 characters as a unique “thing”. These are different records in GNI, all of which refer to the same Protonym (“bus”):
Aus bus Linnaeus 1758
Aus bus Linnaeus, 1758
Aus bus L. 1758
Aus bus Linnaeus
Aus bus Linn.
Aus bus Lin.
Aus bus L.
Aus bus
Aus buus Linnaeus
Xus bus (L.)
Xus bus (Linn.)
Xus bus (Linn., 1758)
Xus bus (Linn. 1758)
[and so on….]
It doesn’t matter if “Aus bus” is a homonym – there is still only one record in GNI for each unique UTF-8-encoded text string.
Track 2: Names as objects. This is the GNUB track, where the subset of all Taxon Name Usage instances (TNUs) that represent Protonyms also represent the “name” as an object. As such, the text string is just a property of that object, (i.e. metadata). Records in GNUB look something like this (Simplified):
TNUID ProtID Reference ParTNUID ValidTNUID Rank Namestring
---------------------------------------------------------------------------------------------------------------
1 1 Linnaeus 1758 - 1 Gen. Aus
2 2 Linnaeus 1758 1 2 Sp. bus
3 1 Smith 1850 - 3 Gen. Aus
4 2 Smith 1850 3 4 sp. buus
5 5 Jones 1900 - 5 Gen. Aus
6 6 Jones 1900 5 6 sp. bus
7 7 Brown 1950 - 7 Gen. Xus
8 2 Brown 1950 7 8 sp. bus
9 9 Brown 1950 7 9 sp. cus
10 7 Pyle 2000 - 10 Gen. Xus
11 2 Pyle 2000 10 11 sp. bus
12 9 Pyle 2000 - 11 - cus
TNUIDs 1, 2, 5, 6, 7 & 9 are Protonyms; the others are subsequent usages (as indicated by the ProtID).
TNUID 5 represents a genus-level homonym for TNUID 1; and TNUID 6 is a species-level primary homonym for TNUID 2.
TNUID 4 is a misspelling of the species epithet, representing a homotypic “synonym” of Aus bus.
TNUID 7 puts an existing species in a different genus, representing a homotypic “synonym” of Aus bus.
TNUID 12 establishes Aus cus Brown 1950 as a junior heterotypic “synonym” of Aus bus Linnaeus 1758 (according to Pyle 2000)
The “Reference” column represents the “According to”.
No – it cannot be “entirely” automated. However, Rob Whitton has been developing tools that allow us to photograph a checklist table or index (or both!) in a book or published article, OCR it, clean up the OCR, disambiguate against Protonyms in GNUB, and then insert TNUs in GNUB – at a rate of about 100 TNUs in 8 minutes. For example, just during the past week or two while developing & testing the tools, Rob has processed nearly 4,000 TNUs from ten different publications (regional checklists, books, etc.). The key is in having the Protonyms already in GNUB to anchor the new TNUs to. In April we will flesh out some tools that focus on capturing Protonyms in a similar fashion (bulk import).
Aloha,
Rich
From: Bailly, Nicolas (WorldFish) [mailto:N.Bailly@cgiar.org] Sent: Wednesday, March 19, 2014 1:36 AM To: Richard Pyle; 'Roderic Page'; 'Hilmar Lapp' Cc: 'TDWG Content Mailing List' Subject: RE: [tdwg-content] synonyms in DwC Archives
Using the string as the key …
The way we manage the synonymy pro parte would not work (we repeat the string and have 2 different syn records), but it is because of compromises we do for simplicity.
For homonyms, you would need mandatorily to include the authorship, but maybe it means that assigning a numeric Id is adding more meaning than the strict string, which is again a compromise. To say it in other words, we don’t have a repository of meaningless strings like GNI, our synonym table is thus closer to a subset of the GNUB implicitly.
So ok you got a point: what I call NameAsString with its Id in FishBase is not strictly a name as string. I suppose it is the difference I saw comparing FishBase and GNA, but I did not pay attention enough to the vocabulary. …
Agree with the rest of your analyses and suggestions. And with the less sexy part of it: populate the GNUB! With the most tedious part: the “according to”! Unfortunately, I don’t see that it can be entirely automated …
And yes, I dream about it … !!!
BW
Nicolas.
From: Richard Pyle [mailto:pylediver@gmail.com] On Behalf Of Richard Pyle Sent: Wednesday, March 19, 2014 6:16 PM To: Bailly, Nicolas (WorldFish); 'Roderic Page'; 'Hilmar Lapp' Cc: 'TDWG Content Mailing List' Subject: RE: [tdwg-content] synonyms in DwC Archives
I’m reluctant to dive into this conversation, because there are so many things going on. Very briefly, Rod and Nicolas come the closest to reflecting my own interpretation of the “simple” explanation. One question for Nicolas – why do you need “NameAsStringID”? Why not just NameString (literal)? What value is there for assigning an identifier to what is effectively a string of UTF-8-encoded characters?
The key here, as Rod (and others) alluded to, is that different people have different meanings for the word “Synonym” in this context; and this is further confused by the whole names vs. concepts thing.
Ignoring the names v. concepts issue altogether, we have various interpretations of “synonym” as including or excluding homotypic synonyms, heterotypic synonyms, and orthographic variations. That is further confounded by the “names as conceptual objects” vs, “names as literal text strings” issues.
The answer is obvious: reduce everything to atomized Taxon Name Usage instances (where the word “usage” comes from n DwC) to track “names as objects”, and index every unique text string to track the strings themselves. GNUB is well along this path on the former for the groups that have been populated, and we are now ramping up our efforts to bulk-populate the content from various sources. Once you populate the Protonym (~=Basionym) Usages (the hardest part, by far), then you can start to capture subsequent usages en masse to represent the spectrum of how names have been actually used through history -- different spellings, different classifications and genus combinations, different ranks (e.g., as a full species vs. as a subspecies of another species), and treatments of names as valid (=accepted) or as junior heterotypic synonyms. With that properly captured & index, you can do all sorts of wonderful, magical, amazing things that have only been dreamed about.
So, the problem is well understood. And the solution is also well understood (as a result of the development of GNUB and GNI and associated services via GNA, guided by a series of NOMINA meetings going back several years). What we need to do now is populate the content (particularly GNUB), and then implement the services.
I realize this is getting pretty far away from Hilmar’s original question – but that has been reasonably well addressed by others (particularly Markus, Tony, Rod and Nicholas).
One final point to keep in mind when discussing this stuff – particularly in a flattened DwC sort of structure:
The unqualified statement “Aus bus is a synonym of Aus cus” should never be represented in the form of metadata without an “According to” to go along with it. Even objective synonyms should be understood in the context of the relevant Code (i.e., where the “According To” is effectively the Code of Nomenclature itself). This is why the “usage” qualifiers were added to the various DWC terms in the Taxon class – to emphasize that the status of a name as being either accepted, or as representing a synonym of another name, is implied to be in the context of a particular usage (i.e., a particular “according to”).
Aloha,
Rich
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Bailly, Nicolas (WorldFish) Sent: Tuesday, March 18, 2014 8:57 PM To: Roderic Page; Hilmar Lapp Cc: TDWG Content Mailing List Subject: Re: [tdwg-content] synonyms in DwC Archives
Dear Hilmar,
I completely concur with Rod. An additional remark is that Biodiversity Informatics has unfortunately twisted the meaning of the word ‘synonym’ as it is used in the different codes where it designates primarily a relationship between taxa and only secondarily by extension a relationship between names (because we designate taxa by using names). Formally, two different combinations of the same basionym are not synonyms: they are different combinations.
The flat DarwinCore cannot handle (easily) the 3 necessary Ids that have to be managed:
- TaxonId
- BasionymId
- NameAsStringId
Technically, the NameAsStringId could be used alone, but only if there is a second field that precises if it is a Basionym or not, and a third that indicates if it is the current accepted name: in the latter case, then the NameAsStringId can be used as the TaxonId. So in any case, a triplet has to be managed, otherwise, if you give an id alone, you cannot know what it is beyond a simple NameAsString.
But to be honest I prefer for clarity to have 3 different codes. It is what we have in FishBase, a SpecCode as TaxonId, a SynCode as NameAsStringId, and we use the Catalog of Fishes (CofF) CAS_SPC as the BasionymId – so we do exactly what Rod suggests, we use the acknowledged nomenclator. Strictly speaking and referring to my remark above, CofF has ids for all basionyms, whatever they are linked to valid taxa or synonym taxa. What CofF does not have is an id for the new combinations and misspellings.
Where FishBase is not completely consistent is that we use a SynCode also for the misidentifications and the misapplied names, while these relationships should be managed in another table than the table of NameAsStrings. Also like Markus described, the current accepted NameAsString is repeated both in our Taxon and our NameAsString tables. These were compromises to simplify the structure of the database.
BW
Nicolas.
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Roderic Page Sent: Wednesday, March 19, 2014 1:38 PM To: Hilmar Lapp Cc: TDWG Content Mailing List Subject: Re: [tdwg-content] synonyms in DwC Archives
Hi Hilmar,
I think the phrase “simply names” summarises part ot the problem here. Many taxon databases seem determined to conflate names and taxa, and treat names as dumb strings rather than entities of interest (or worse, as taxa, rather than names of taxa). Many synonyms are relationships between names (e.g., names that change because we move species to different genera are the “same” name), others are relationships between taxa (e.g., this set of things and that set of things are the members of the same set of things).
We could model things much more cleanly if we distinguished between taxa and names for taxa, and made use of the fact that we have databases of names (the nomenclators) that have been serving name data and stable identifiers for almost a decade. It is a pity that most taxonomic databases make little or no use of this. Map names to those ids, have ids for taxa, and link all the names to the taxa to which they’ve been applied. Instead we have a flat Darwin Core with row upon row of things that we call “taxa” but which, in many cases, aren’t.
I get why we like flat Darwin Core, but sometimes the world isn’t flat.
Regards
Rod
On 19 Mar 2014, at 02:44, Hilmar Lapp hlapp@nescent.org wrote:
Tony - your example matches pretty much what GBIF does.
John - yes indeed, if using ResourceRelationship terms, it would have to be in an extension and not the core taxon file.
Markus and all - yes, I realized after I emailed how GBIF does this. I agree that this has advantages. However, this way of doing synonyms requires that there is an identifier for the synonym. For the core use-case I'm interested in synonyms are metadata of taxon records and do not have their own identifier. For example, synonyms in NCBI don't have identifiers, and they don't in Catalog of Fishes. (I'm not sure they do in PaleoDB.)
One could of course invent identifiers on behalf of the taxonomy providers in these cases, but that's a hack. I think if there is an extension for vernacularNames, there ought to be one as well for synonyms that are simply names.
-hilmar
On Tue, Mar 18, 2014 at 6:35 PM, Markus Döring m.doering@mac.com wrote:
Hi Hilmar,
GBIF, Catalog of life and others have produced guidelines for how to express taxonomies with synonyms and these are in widespread use already since over a year. I will forward links tomorrow when Im back at my desk.
The common idea is to include synonyms together with accepted taxa in the core file. This allows one to also add extension data to synonyms, for example bibliographic references, types data, etc. The term acceptedNameUsageID is used to link to the accepted record in the core file (targeting taxonID), originalNameUsageiD for the basionym and taxonomicStatus to declare a specific type of synonym such as homo/heterotypic or later/junior synonym. The scientificName is used both for accepted and synonym records.
You should be able to find many dwca examples in the gbif dataset search when filtered for checklists: http://www.gbif.org/dataset/search?type=CHECKLIST
For example try these:
http://data.canadensys.net/ipt/archive.do?r=vascan
http://ipt.speciesfile.org:8080/archive.do?r=orthoptera
Cheers,
Markus
Am 18.03.2014 um 22:44 schrieb Chuck Miller Chuck.Miller@mobot.org:
Hilmar,
Sticking strictly to Darwin Core and not adding RDF, I think there are a couple of DwC terms that are attributes that can be used to identify a synonym:
taxonomicStatus - The status of the use of the scientificName as a label for a taxon. Requires taxonomic opinion to define the scope of a taxon. Rules of priority then are used to define the taxonomic status of the nomenclature contained in that scope, combined with the experts opinion. It must be linked to a specific taxonomic reference that defines the concept. Recommended best practice is to use a controlled vocabulary. Examples: "invalid", "misapplied", "homotypic synonym", "accepted".
relationshipofResource - The relationship of the resource identified by relatedResourceID to the subject (optionally identified by the resourceID). Recommended best practice is to use a controlled vocabulary. Examples: "duplicate of", "mother of", "endoparasite of", "host to", "sibling of", "valid synonym of", "located within".
There’s also acceptedNameUsage and acceptedNameUsageID, which if used infer that the name the terms are associated with is a synonym of the AcceptedName.
But, so far there is no guideline for how to organize synonyms in a Darwin Core Archive. They can be embedded in the core file using relationshiopofResource from a synonym name to an accepted name in the same file. Or they can be in an extension file, where the extension file may be called Synonyms and thus define a one-to-many “synonym relationship” from the taxonID in the core file to synonym names in the extension file. There are probably other ways. RDF adds the ability to be more explicit about the relationships.
Rich Pyle has lectured prolifically on this so I’m sure he has good advice to offer.
Chuck
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Hilmar Lapp Sent: Tuesday, March 18, 2014 2:55 PM To: TDWG Content Mailing List Cc: Dan Leehr Subject: [tdwg-content] synonyms in DwC Archives
I'm looking for recommendations on how best to put synonyms for taxon records into DwC Archive format.
I'm assuming that these would go into an extension file. Do I have this right? What I'm having more trouble with is determining the right column term. there's dwc:vernacularName, which is also in the examples, but what about synonyms of different types that come with taxonomies (such as NCBI's) or that result from merging taxonomies. There isn't an obvious candidate in DwC, and the list at http://rs.gbif.org/core/dwc_taxon.xml doesn't have a suggestion either that would seem pertinent.
Any suggestions, pointers to documentation or examples?
-hilmar
Makes sense too. Nicolas. Absent from tomorrow to Saturday.
From: Richard Pyle [mailto:pylediver@gmail.com] On Behalf Of Richard Pyle Sent: Thursday, March 20, 2014 2:05 AM To: Bailly, Nicolas (WorldFish); 'Roderic Page'; 'Hilmar Lapp' Cc: 'TDWG Content Mailing List' Subject: RE: [tdwg-content] synonyms in DwC Archives
Thanks, Nicolas – that makes sense.
I guess I’ve come to the view that the most sensible path is to split nomenclatural data management processes into two tracks:
Track 1: Names as text strings. This is the GNI track, and it literally treats a sequence of UTF-8 characters as a unique “thing”. These are different records in GNI, all of which refer to the same Protonym (“bus”):
Aus bus Linnaeus 1758 Aus bus Linnaeus, 1758 Aus bus L. 1758 Aus bus Linnaeus Aus bus Linn. Aus bus Lin. Aus bus L. Aus bus Aus buus Linnaeus Xus bus (L.) Xus bus (Linn.) Xus bus (Linn., 1758) Xus bus (Linn. 1758) [and so on….]
It doesn’t matter if “Aus bus” is a homonym – there is still only one record in GNI for each unique UTF-8-encoded text string.
Track 2: Names as objects. This is the GNUB track, where the subset of all Taxon Name Usage instances (TNUs) that represent Protonyms also represent the “name” as an object. As such, the text string is just a property of that object, (i.e. metadata). Records in GNUB look something like this (Simplified):
TNUID ProtID Reference ParTNUID ValidTNUID Rank Namestring --------------------------------------------------------------------------------------------------------------- 1 1 Linnaeus 1758 - 1 Gen. Aus 2 2 Linnaeus 1758 1 2 Sp. bus 3 1 Smith 1850 - 3 Gen. Aus 4 2 Smith 1850 3 4 sp. buus 5 5 Jones 1900 - 5 Gen. Aus 6 6 Jones 1900 5 6 sp. bus 7 7 Brown 1950 - 7 Gen. Xus 8 2 Brown 1950 7 8 sp. bus 9 9 Brown 1950 7 9 sp. cus 10 7 Pyle 2000 - 10 Gen. Xus 11 2 Pyle 2000 10 11 sp. bus 12 9 Pyle 2000 - 11 - cus
TNUIDs 1, 2, 5, 6, 7 & 9 are Protonyms; the others are subsequent usages (as indicated by the ProtID). TNUID 5 represents a genus-level homonym for TNUID 1; and TNUID 6 is a species-level primary homonym for TNUID 2. TNUID 4 is a misspelling of the species epithet, representing a homotypic “synonym” of Aus bus. TNUID 7 puts an existing species in a different genus, representing a homotypic “synonym” of Aus bus. TNUID 12 establishes Aus cus Brown 1950 as a junior heterotypic “synonym” of Aus bus Linnaeus 1758 (according to Pyle 2000)
The “Reference” column represents the “According to”.
No – it cannot be “entirely” automated. However, Rob Whitton has been developing tools that allow us to photograph a checklist table or index (or both!) in a book or published article, OCR it, clean up the OCR, disambiguate against Protonyms in GNUB, and then insert TNUs in GNUB – at a rate of about 100 TNUs in 8 minutes. For example, just during the past week or two while developing & testing the tools, Rob has processed nearly 4,000 TNUs from ten different publications (regional checklists, books, etc.). The key is in having the Protonyms already in GNUB to anchor the new TNUs to. In April we will flesh out some tools that focus on capturing Protonyms in a similar fashion (bulk import).
Aloha, Rich
From: Bailly, Nicolas (WorldFish) [mailto:N.Bailly@cgiar.org] Sent: Wednesday, March 19, 2014 1:36 AM To: Richard Pyle; 'Roderic Page'; 'Hilmar Lapp' Cc: 'TDWG Content Mailing List' Subject: RE: [tdwg-content] synonyms in DwC Archives
Using the string as the key … The way we manage the synonymy pro parte would not work (we repeat the string and have 2 different syn records), but it is because of compromises we do for simplicity. For homonyms, you would need mandatorily to include the authorship, but maybe it means that assigning a numeric Id is adding more meaning than the strict string, which is again a compromise. To say it in other words, we don’t have a repository of meaningless strings like GNI, our synonym table is thus closer to a subset of the GNUB implicitly. So ok you got a point: what I call NameAsString with its Id in FishBase is not strictly a name as string. I suppose it is the difference I saw comparing FishBase and GNA, but I did not pay attention enough to the vocabulary. …
Agree with the rest of your analyses and suggestions. And with the less sexy part of it: populate the GNUB! With the most tedious part: the “according to”! Unfortunately, I don’t see that it can be entirely automated …
And yes, I dream about it … !!!
BW Nicolas.
From: Richard Pyle [mailto:pylediver@gmail.com] On Behalf Of Richard Pyle Sent: Wednesday, March 19, 2014 6:16 PM To: Bailly, Nicolas (WorldFish); 'Roderic Page'; 'Hilmar Lapp' Cc: 'TDWG Content Mailing List' Subject: RE: [tdwg-content] synonyms in DwC Archives
I’m reluctant to dive into this conversation, because there are so many things going on. Very briefly, Rod and Nicolas come the closest to reflecting my own interpretation of the “simple” explanation. One question for Nicolas – why do you need “NameAsStringID”? Why not just NameString (literal)? What value is there for assigning an identifier to what is effectively a string of UTF-8-encoded characters?
The key here, as Rod (and others) alluded to, is that different people have different meanings for the word “Synonym” in this context; and this is further confused by the whole names vs. concepts thing.
Ignoring the names v. concepts issue altogether, we have various interpretations of “synonym” as including or excluding homotypic synonyms, heterotypic synonyms, and orthographic variations. That is further confounded by the “names as conceptual objects” vs, “names as literal text strings” issues.
The answer is obvious: reduce everything to atomized Taxon Name Usage instances (where the word “usage” comes from n DwC) to track “names as objects”, and index every unique text string to track the strings themselves. GNUB is well along this path on the former for the groups that have been populated, and we are now ramping up our efforts to bulk-populate the content from various sources. Once you populate the Protonym (~=Basionym) Usages (the hardest part, by far), then you can start to capture subsequent usages en masse to represent the spectrum of how names have been actually used through history -- different spellings, different classifications and genus combinations, different ranks (e.g., as a full species vs. as a subspecies of another species), and treatments of names as valid (=accepted) or as junior heterotypic synonyms. With that properly captured & index, you can do all sorts of wonderful, magical, amazing things that have only been dreamed about.
So, the problem is well understood. And the solution is also well understood (as a result of the development of GNUB and GNI and associated services via GNA, guided by a series of NOMINA meetings going back several years). What we need to do now is populate the content (particularly GNUB), and then implement the services.
I realize this is getting pretty far away from Hilmar’s original question – but that has been reasonably well addressed by others (particularly Markus, Tony, Rod and Nicholas).
One final point to keep in mind when discussing this stuff – particularly in a flattened DwC sort of structure:
The unqualified statement “Aus bus is a synonym of Aus cus” should never be represented in the form of metadata without an “According to” to go along with it. Even objective synonyms should be understood in the context of the relevant Code (i.e., where the “According To” is effectively the Code of Nomenclature itself). This is why the “usage” qualifiers were added to the various DWC terms in the Taxon class – to emphasize that the status of a name as being either accepted, or as representing a synonym of another name, is implied to be in the context of a particular usage (i.e., a particular “according to”).
Aloha, Rich
From: tdwg-content-bounces@lists.tdwg.orgmailto:tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Bailly, Nicolas (WorldFish) Sent: Tuesday, March 18, 2014 8:57 PM To: Roderic Page; Hilmar Lapp Cc: TDWG Content Mailing List Subject: Re: [tdwg-content] synonyms in DwC Archives
Dear Hilmar,
I completely concur with Rod. An additional remark is that Biodiversity Informatics has unfortunately twisted the meaning of the word ‘synonym’ as it is used in the different codes where it designates primarily a relationship between taxa and only secondarily by extension a relationship between names (because we designate taxa by using names). Formally, two different combinations of the same basionym are not synonyms: they are different combinations.
The flat DarwinCore cannot handle (easily) the 3 necessary Ids that have to be managed:
- TaxonId
- BasionymId
- NameAsStringId
Technically, the NameAsStringId could be used alone, but only if there is a second field that precises if it is a Basionym or not, and a third that indicates if it is the current accepted name: in the latter case, then the NameAsStringId can be used as the TaxonId. So in any case, a triplet has to be managed, otherwise, if you give an id alone, you cannot know what it is beyond a simple NameAsString.
But to be honest I prefer for clarity to have 3 different codes. It is what we have in FishBase, a SpecCode as TaxonId, a SynCode as NameAsStringId, and we use the Catalog of Fishes (CofF) CAS_SPC as the BasionymId – so we do exactly what Rod suggests, we use the acknowledged nomenclator. Strictly speaking and referring to my remark above, CofF has ids for all basionyms, whatever they are linked to valid taxa or synonym taxa. What CofF does not have is an id for the new combinations and misspellings.
Where FishBase is not completely consistent is that we use a SynCode also for the misidentifications and the misapplied names, while these relationships should be managed in another table than the table of NameAsStrings. Also like Markus described, the current accepted NameAsString is repeated both in our Taxon and our NameAsString tables. These were compromises to simplify the structure of the database.
BW Nicolas.
From: tdwg-content-bounces@lists.tdwg.orgmailto:tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Roderic Page Sent: Wednesday, March 19, 2014 1:38 PM To: Hilmar Lapp Cc: TDWG Content Mailing List Subject: Re: [tdwg-content] synonyms in DwC Archives
Hi Hilmar,
I think the phrase “simply names” summarises part ot the problem here. Many taxon databases seem determined to conflate names and taxa, and treat names as dumb strings rather than entities of interest (or worse, as taxa, rather than names of taxa). Many synonyms are relationships between names (e.g., names that change because we move species to different genera are the “same” name), others are relationships between taxa (e.g., this set of things and that set of things are the members of the same set of things).
We could model things much more cleanly if we distinguished between taxa and names for taxa, and made use of the fact that we have databases of names (the nomenclators) that have been serving name data and stable identifiers for almost a decade. It is a pity that most taxonomic databases make little or no use of this. Map names to those ids, have ids for taxa, and link all the names to the taxa to which they’ve been applied. Instead we have a flat Darwin Core with row upon row of things that we call “taxa” but which, in many cases, aren’t.
I get why we like flat Darwin Core, but sometimes the world isn’t flat.
Regards
Rod
On 19 Mar 2014, at 02:44, Hilmar Lapp <hlapp@nescent.orgmailto:hlapp@nescent.org> wrote:
Tony - your example matches pretty much what GBIF does.
John - yes indeed, if using ResourceRelationship terms, it would have to be in an extension and not the core taxon file.
Markus and all - yes, I realized after I emailed how GBIF does this. I agree that this has advantages. However, this way of doing synonyms requires that there is an identifier for the synonym. For the core use-case I'm interested in synonyms are metadata of taxon records and do not have their own identifier. For example, synonyms in NCBI don't have identifiers, and they don't in Catalog of Fishes. (I'm not sure they do in PaleoDB.)
One could of course invent identifiers on behalf of the taxonomy providers in these cases, but that's a hack. I think if there is an extension for vernacularNames, there ought to be one as well for synonyms that are simply names.
-hilmar
On Tue, Mar 18, 2014 at 6:35 PM, Markus Döring <m.doering@mac.commailto:m.doering@mac.com> wrote: Hi Hilmar,
GBIF, Catalog of life and others have produced guidelines for how to express taxonomies with synonyms and these are in widespread use already since over a year. I will forward links tomorrow when Im back at my desk.
The common idea is to include synonyms together with accepted taxa in the core file. This allows one to also add extension data to synonyms, for example bibliographic references, types data, etc. The term acceptedNameUsageID is used to link to the accepted record in the core file (targeting taxonID), originalNameUsageiD for the basionym and taxonomicStatus to declare a specific type of synonym such as homo/heterotypic or later/junior synonym. The scientificName is used both for accepted and synonym records.
You should be able to find many dwca examples in the gbif dataset search when filtered for checklists: http://www.gbif.org/dataset/search?type=CHECKLIST
For example try these: http://data.canadensys.net/ipt/archive.do?r=vascan http://ipt.speciesfile.org:8080/archive.do?r=orthoptera
Cheers, Markus
Am 18.03.2014 um 22:44 schrieb Chuck Miller <Chuck.Miller@mobot.orgmailto:Chuck.Miller@mobot.org>: Hilmar, Sticking strictly to Darwin Core and not adding RDF, I think there are a couple of DwC terms that are attributes that can be used to identify a synonym:
taxonomicStatus - The status of the use of the scientificName as a label for a taxon. Requires taxonomic opinion to define the scope of a taxon. Rules of priority then are used to define the taxonomic status of the nomenclature contained in that scope, combined with the experts opinion. It must be linked to a specific taxonomic reference that defines the concept. Recommended best practice is to use a controlled vocabulary. Examples: "invalid", "misapplied", "homotypic synonym", "accepted".
relationshipofResource - The relationship of the resource identified by relatedResourceID to the subject (optionally identified by the resourceID). Recommended best practice is to use a controlled vocabulary. Examples: "duplicate of", "mother of", "endoparasite of", "host to", "sibling of", "valid synonym of", "located within".
There’s also acceptedNameUsage and acceptedNameUsageID, which if used infer that the name the terms are associated with is a synonym of the AcceptedName.
But, so far there is no guideline for how to organize synonyms in a Darwin Core Archive. They can be embedded in the core file using relationshiopofResource from a synonym name to an accepted name in the same file. Or they can be in an extension file, where the extension file may be called Synonyms and thus define a one-to-many “synonym relationship” from the taxonID in the core file to synonym names in the extension file. There are probably other ways. RDF adds the ability to be more explicit about the relationships.
Rich Pyle has lectured prolifically on this so I’m sure he has good advice to offer.
Chuck
From: tdwg-content-bounces@lists.tdwg.orgmailto:tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Hilmar Lapp Sent: Tuesday, March 18, 2014 2:55 PM To: TDWG Content Mailing List Cc: Dan Leehr Subject: [tdwg-content] synonyms in DwC Archives
I'm looking for recommendations on how best to put synonyms for taxon records into DwC Archive format.
I'm assuming that these would go into an extension file. Do I have this right? What I'm having more trouble with is determining the right column term. there's dwc:vernacularName, which is also in the examples, but what about synonyms of different types that come with taxonomies (such as NCBI's) or that result from merging taxonomies. There isn't an obvious candidate in DwC, and the list at http://rs.gbif.org/core/dwc_taxon.xml doesn't have a suggestion either that would seem pertinent.
Any suggestions, pointers to documentation or examples?
-hilmar
-- Hilmar Lapp -:- informatics.nescent.org/wikihttp://informatics.nescent.org/wiki -:- lappland.iohttp://lappland.io/
_______________________________________________
tdwg-content mailing list tdwg-content@lists.tdwg.orgmailto:tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Hilmar Lapp -:- informatics.nescent.org/wikihttp://informatics.nescent.org/wiki -:- lappland.iohttp://lappland.io/
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.orgmailto:tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
--------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.ukmailto:r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 Skype: rdmpage Facebook: http://www.facebook.com/rdmpage LinkedIn: http://uk.linkedin.com/in/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html Wikipedia: http://en.wikipedia.org/wiki/Roderic_D._M._Page Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ ORCID: http://orcid.org/0000-0002-7101-9767
Rod,
Darwin Core deliberately uses the terminology NameUsage to refer to flat name OR taxon records. I was in favor of also replace dwc:taxonID with dwc:nameUsageID to achieve some consistency and be clear about what we are talking about, but we sticked to taxonID for reasons I cannot remember.
Markus
On 19 Mar 2014, at 06:37, Roderic Page Roderic.Page@glasgow.ac.uk wrote:
Hi Hilmar,
I think the phrase “simply names” summarises part ot the problem here. Many taxon databases seem determined to conflate names and taxa, and treat names as dumb strings rather than entities of interest (or worse, as taxa, rather than names of taxa). Many synonyms are relationships between names (e.g., names that change because we move species to different genera are the “same” name), others are relationships between taxa (e.g., this set of things and that set of things are the members of the same set of things).
We could model things much more cleanly if we distinguished between taxa and names for taxa, and made use of the fact that we have databases of names (the nomenclators) that have been serving name data and stable identifiers for almost a decade. It is a pity that most taxonomic databases make little or no use of this. Map names to those ids, have ids for taxa, and link all the names to the taxa to which they’ve been applied. Instead we have a flat Darwin Core with row upon row of things that we call “taxa” but which, in many cases, aren’t.
I get why we like flat Darwin Core, but sometimes the world isn’t flat.
Regards
Rod
On 19 Mar 2014, at 02:44, Hilmar Lapp hlapp@nescent.org wrote:
Tony - your example matches pretty much what GBIF does.
John - yes indeed, if using ResourceRelationship terms, it would have to be in an extension and not the core taxon file.
Markus and all - yes, I realized after I emailed how GBIF does this. I agree that this has advantages. However, this way of doing synonyms requires that there is an identifier for the synonym. For the core use-case I'm interested in synonyms are metadata of taxon records and do not have their own identifier. For example, synonyms in NCBI don't have identifiers, and they don't in Catalog of Fishes. (I'm not sure they do in PaleoDB.)
One could of course invent identifiers on behalf of the taxonomy providers in these cases, but that's a hack. I think if there is an extension for vernacularNames, there ought to be one as well for synonyms that are simply names.
-hilmar
On Tue, Mar 18, 2014 at 6:35 PM, Markus Döring m.doering@mac.com wrote: Hi Hilmar,
GBIF, Catalog of life and others have produced guidelines for how to express taxonomies with synonyms and these are in widespread use already since over a year. I will forward links tomorrow when Im back at my desk.
The common idea is to include synonyms together with accepted taxa in the core file. This allows one to also add extension data to synonyms, for example bibliographic references, types data, etc. The term acceptedNameUsageID is used to link to the accepted record in the core file (targeting taxonID), originalNameUsageiD for the basionym and taxonomicStatus to declare a specific type of synonym such as homo/heterotypic or later/junior synonym. The scientificName is used both for accepted and synonym records.
You should be able to find many dwca examples in the gbif dataset search when filtered for checklists: http://www.gbif.org/dataset/search?type=CHECKLIST
For example try these: http://data.canadensys.net/ipt/archive.do?r=vascan http://ipt.speciesfile.org:8080/archive.do?r=orthoptera
Cheers, Markus
Am 18.03.2014 um 22:44 schrieb Chuck Miller Chuck.Miller@mobot.org:
Hilmar,
Sticking strictly to Darwin Core and not adding RDF, I think there are a couple of DwC terms that are attributes that can be used to identify a synonym:
taxonomicStatus - The status of the use of the scientificName as a label for a taxon. Requires taxonomic opinion to define the scope of a taxon. Rules of priority then are used to define the taxonomic status of the nomenclature contained in that scope, combined with the experts opinion. It must be linked to a specific taxonomic reference that defines the concept. Recommended best practice is to use a controlled vocabulary. Examples: "invalid", "misapplied", "homotypic synonym", "accepted".
relationshipofResource - The relationship of the resource identified by relatedResourceID to the subject (optionally identified by the resourceID). Recommended best practice is to use a controlled vocabulary. Examples: "duplicate of", "mother of", "endoparasite of", "host to", "sibling of", "valid synonym of", "located within".
There’s also acceptedNameUsage and acceptedNameUsageID, which if used infer that the name the terms are associated with is a synonym of the AcceptedName.
But, so far there is no guideline for how to organize synonyms in a Darwin Core Archive. They can be embedded in the core file using relationshiopofResource from a synonym name to an accepted name in the same file. Or they can be in an extension file, where the extension file may be called Synonyms and thus define a one-to-many “synonym relationship” from the taxonID in the core file to synonym names in the extension file. There are probably other ways. RDF adds the ability to be more explicit about the relationships.
Rich Pyle has lectured prolifically on this so I’m sure he has good advice to offer.
Chuck
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Hilmar Lapp Sent: Tuesday, March 18, 2014 2:55 PM To: TDWG Content Mailing List Cc: Dan Leehr Subject: [tdwg-content] synonyms in DwC Archives
I'm looking for recommendations on how best to put synonyms for taxon records into DwC Archive format.
I'm assuming that these would go into an extension file. Do I have this right? What I'm having more trouble with is determining the right column term. there's dwc:vernacularName, which is also in the examples, but what about synonyms of different types that come with taxonomies (such as NCBI's) or that result from merging taxonomies. There isn't an obvious candidate in DwC, and the list at http://rs.gbif.org/core/dwc_taxon.xml doesn't have a suggestion either that would seem pertinent.
Any suggestions, pointers to documentation or examples?
-hilmar
--
Hilmar Lapp -:- informatics.nescent.org/wiki -:- lappland.io
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Hilmar Lapp -:- informatics.nescent.org/wiki -:- lappland.io
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 Skype: rdmpage Facebook: http://www.facebook.com/rdmpage LinkedIn: http://uk.linkedin.com/in/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html Wikipedia: http://en.wikipedia.org/wiki/Roderic_D._M._Page Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ ORCID: http://orcid.org/0000-0002-7101-9767
Historical Darwin Core interjection. The term taxonID was not changed because it is the ID referring to the class of information, which remained "Taxon". For both consistency (a rule that every class should have an associated identifier term with the lower camel case name of the class followed by 'ID') and for the sake of inertia (not creating undue burden for backward compatibility), the term name taxonID was retained.
On Wed, Mar 19, 2014 at 2:40 AM, Markus Döring m.doering@mac.com wrote:
Rod,
Darwin Core deliberately uses the terminology NameUsage to refer to flat name OR taxon records. I was in favor of also replace dwc:taxonID with dwc:nameUsageID to achieve some consistency and be clear about what we are talking about, but we sticked to taxonID for reasons I cannot remember.
Markus
On 19 Mar 2014, at 06:37, Roderic Page Roderic.Page@glasgow.ac.uk wrote:
Hi Hilmar,
I think the phrase "simply names" summarises part ot the problem here. Many taxon databases seem determined to conflate names and taxa, and treat names as dumb strings rather than entities of interest (or worse, as taxa, rather than names of taxa). Many synonyms are relationships between names (e.g., names that change because we move species to different genera are the "same" name), others are relationships between taxa (e.g., this set of things and that set of things are the members of the same set of things).
We could model things much more cleanly if we distinguished between taxa and names for taxa, and made use of the fact that we have databases of names (the nomenclators) that have been serving name data and stable identifiers for almost a decade. It is a pity that most taxonomic databases make little or no use of this. Map names to those ids, have ids for taxa, and link all the names to the taxa to which they've been applied. Instead we have a flat Darwin Core with row upon row of things that we call "taxa" but which, in many cases, aren't.
I get why we like flat Darwin Core, but sometimes the world isn't flat.
Regards
Rod
On 19 Mar 2014, at 02:44, Hilmar Lapp hlapp@nescent.org wrote:
Tony - your example matches pretty much what GBIF does.
John - yes indeed, if using ResourceRelationship terms, it would have to be in an extension and not the core taxon file.
Markus and all - yes, I realized after I emailed how GBIF does this. I agree that this has advantages. However, this way of doing synonyms requires that there is an identifier for the synonym. For the core use-case I'm interested in synonyms are metadata of taxon records and do not have their own identifier. For example, synonyms in NCBI don't have identifiers, and they don't in Catalog of Fishes. (I'm not sure they do in PaleoDB.)
One could of course invent identifiers on behalf of the taxonomy providers in these cases, but that's a hack. I think if there is an extension for vernacularNames, there ought to be one as well for synonyms that are simply names.
-hilmar
On Tue, Mar 18, 2014 at 6:35 PM, Markus Döring m.doering@mac.com wrote:
Hi Hilmar,
GBIF, Catalog of life and others have produced guidelines for how to express taxonomies with synonyms and these are in widespread use already since over a year. I will forward links tomorrow when Im back at my desk.
The common idea is to include synonyms together with accepted taxa in the core file. This allows one to also add extension data to synonyms, for example bibliographic references, types data, etc. The term acceptedNameUsageID is used to link to the accepted record in the core file (targeting taxonID), originalNameUsageiD for the basionym and taxonomicStatus to declare a specific type of synonym such as homo/heterotypic or later/junior synonym. The scientificName is used both for accepted and synonym records.
You should be able to find many dwca examples in the gbif dataset search when filtered for checklists: http://www.gbif.org/dataset/search?type=CHECKLIST
For example try these: http://data.canadensys.net/ipt/archive.do?r=vascan http://ipt.speciesfile.org:8080/archive.do?r=orthoptera
Cheers, Markus
Am 18.03.2014 um 22:44 schrieb Chuck Miller Chuck.Miller@mobot.org:
Hilmar,
Sticking strictly to Darwin Core and not adding RDF, I think there are a couple of DwC terms that are attributes that can be used to identify a synonym:
taxonomicStatus - The status of the use of the scientificName as a label for a taxon. Requires taxonomic opinion to define the scope of a taxon. Rules of priority then are used to define the taxonomic status of the nomenclature contained in that scope, combined with the experts opinion. It must be linked to a specific taxonomic reference that defines the concept. Recommended best practice is to use a controlled vocabulary. Examples: "invalid", "misapplied", "homotypic synonym", "accepted".
relationshipofResource - The relationship of the resource identified by relatedResourceID to the subject (optionally identified by the resourceID). Recommended best practice is to use a controlled vocabulary. Examples: "duplicate of", "mother of", "endoparasite of", "host to", "sibling of", "valid synonym of", "located within".
There's also acceptedNameUsage and acceptedNameUsageID, which if used infer that the name the terms are associated with is a synonym of the AcceptedName.
But, so far there is no guideline for how to organize synonyms in a Darwin Core Archive. They can be embedded in the core file using relationshiopofResource from a synonym name to an accepted name in the same file. Or they can be in an extension file, where the extension file may be called Synonyms and thus define a one-to-many "synonym relationship" from the taxonID in the core file to synonym names in the extension file. There are probably other ways. RDF adds the ability to be more explicit about the relationships.
Rich Pyle has lectured prolifically on this so I'm sure he has good advice to offer.
Chuck
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Hilmar Lapp Sent: Tuesday, March 18, 2014 2:55 PM To: TDWG Content Mailing List Cc: Dan Leehr Subject: [tdwg-content] synonyms in DwC Archives
I'm looking for recommendations on how best to put synonyms for taxon records into DwC Archive format.
I'm assuming that these would go into an extension file. Do I have this right? What I'm having more trouble with is determining the right column term. there's dwc:vernacularName, which is also in the examples, but what about synonyms of different types that come with taxonomies (such as NCBI's) or that result from merging taxonomies. There isn't an obvious candidate in DwC, and the list at http://rs.gbif.org/core/dwc_taxon.xml doesn't have a suggestion either that would seem pertinent.
Any suggestions, pointers to documentation or examples?
-hilmar
--
Hilmar Lapp -:- informatics.nescent.org/wiki -:- lappland.io
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Hilmar Lapp -:- informatics.nescent.org/wiki -:- lappland.io
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 Skype: rdmpage Facebook: http://www.facebook.com/rdmpage LinkedIn: http://uk.linkedin.com/in/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html Wikipedia: http://en.wikipedia.org/wiki/Roderic_D._M._Page Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ ORCID: http://orcid.org/0000-0002-7101-9767
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Hilmar,
to include synonyms in the core file they technically do not have to have ids on their own. You really only need to have synonym ids if you want to link further data (like typification) to them. Otherwise they just point to the accepted record.
Said that there is the desire in dwc archives though to require an identifier for a record, see last years occurrenceID discussion. The best would be if you could make up some identifier, for example by adding a -synX suffix to the accepted taxonID or even just using the name alone if its known to be unique within your dataset. Using an extension for synonyms would probably be useful and feel natural for many datasets, but I am concerned we introduce more and more alternative ways of expressing the same kind of data and that becomes a huge burden on the consumer side.
Here are links to the documents I mentioned before:
Publishing Species Checklists, Best Practices http://www.gbif.org/resources/2548
GBIF GNA Profile Reference Guide for Darwin Core Archive, Core Terms and Extensions: http://www.gbif.org/resources/2562
The GBIF documents are from 2011 and likely in need for some update in specific areas, but they still provide a good overview and lots of details.
In addition there is a Catalog of Life document that I cannot find online anymore so I have uploaded the last version I have here: i4Life Darwin Core Archive Profile https://dl.dropboxusercontent.com/u/457027/ChecklistExchangeFormat-v1.6.pdf
Markus
Markus and all - yes, I realized after I emailed how GBIF does this. I agree that this has advantages. However, this way of doing synonyms requires that there is an identifier for the synonym. For the core use-case I'm interested in synonyms are metadata of taxon records and do not have their own identifier. For example, synonyms in NCBI don't have identifiers, and they don't in Catalog of Fishes. (I'm not sure they do in PaleoDB.)
One could of course invent identifiers on behalf of the taxonomy providers in these cases, but that's a hack. I think if there is an extension for vernacularNames, there ought to be one as well for synonyms that are simply names.
-hilmar
On Tue, Mar 18, 2014 at 6:35 PM, Markus Döring m.doering@mac.com wrote: Hi Hilmar,
GBIF, Catalog of life and others have produced guidelines for how to express taxonomies with synonyms and these are in widespread use already since over a year. I will forward links tomorrow when Im back at my desk.
The common idea is to include synonyms together with accepted taxa in the core file. This allows one to also add extension data to synonyms, for example bibliographic references, types data, etc. The term acceptedNameUsageID is used to link to the accepted record in the core file (targeting taxonID), originalNameUsageiD for the basionym and taxonomicStatus to declare a specific type of synonym such as homo/heterotypic or later/junior synonym. The scientificName is used both for accepted and synonym records.
You should be able to find many dwca examples in the gbif dataset search when filtered for checklists: http://www.gbif.org/dataset/search?type=CHECKLIST
For example try these: http://data.canadensys.net/ipt/archive.do?r=vascan http://ipt.speciesfile.org:8080/archive.do?r=orthoptera
Cheers, Markus
Am 18.03.2014 um 22:44 schrieb Chuck Miller Chuck.Miller@mobot.org:
Hilmar,
Sticking strictly to Darwin Core and not adding RDF, I think there are a couple of DwC terms that are attributes that can be used to identify a synonym:
taxonomicStatus - The status of the use of the scientificName as a label for a taxon. Requires taxonomic opinion to define the scope of a taxon. Rules of priority then are used to define the taxonomic status of the nomenclature contained in that scope, combined with the experts opinion. It must be linked to a specific taxonomic reference that defines the concept. Recommended best practice is to use a controlled vocabulary. Examples: "invalid", "misapplied", "homotypic synonym", "accepted".
relationshipofResource - The relationship of the resource identified by relatedResourceID to the subject (optionally identified by the resourceID). Recommended best practice is to use a controlled vocabulary. Examples: "duplicate of", "mother of", "endoparasite of", "host to", "sibling of", "valid synonym of", "located within".
There’s also acceptedNameUsage and acceptedNameUsageID, which if used infer that the name the terms are associated with is a synonym of the AcceptedName.
But, so far there is no guideline for how to organize synonyms in a Darwin Core Archive. They can be embedded in the core file using relationshiopofResource from a synonym name to an accepted name in the same file. Or they can be in an extension file, where the extension file may be called Synonyms and thus define a one-to-many “synonym relationship” from the taxonID in the core file to synonym names in the extension file. There are probably other ways. RDF adds the ability to be more explicit about the relationships.
Rich Pyle has lectured prolifically on this so I’m sure he has good advice to offer.
Chuck
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Hilmar Lapp Sent: Tuesday, March 18, 2014 2:55 PM To: TDWG Content Mailing List Cc: Dan Leehr Subject: [tdwg-content] synonyms in DwC Archives
I'm looking for recommendations on how best to put synonyms for taxon records into DwC Archive format.
I'm assuming that these would go into an extension file. Do I have this right? What I'm having more trouble with is determining the right column term. there's dwc:vernacularName, which is also in the examples, but what about synonyms of different types that come with taxonomies (such as NCBI's) or that result from merging taxonomies. There isn't an obvious candidate in DwC, and the list at http://rs.gbif.org/core/dwc_taxon.xml doesn't have a suggestion either that would seem pertinent.
Any suggestions, pointers to documentation or examples?
-hilmar
--
Hilmar Lapp -:- informatics.nescent.org/wiki -:- lappland.io
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Hilmar Lapp -:- informatics.nescent.org/wiki -:- lappland.io
On Wed, Mar 19, 2014 at 5:35 AM, Markus Döring m.doering@mac.com wrote:
Hilmar,
to include synonyms in the core file they technically do not have to have ids on their own.
How would you recommend I do this then, i.e., which column header should be used for that? (I'm assuming you are not requiring that I make up IDs, which to me seems a non-starter, see below.)
Said that there is the desire in dwc archives though to require an
identifier for a record, see last years occurrenceID discussion. The best would be if you could make up some identifier, for example by adding a -synX suffix to the accepted taxonID or even just using the name alone if its known to be unique within your dataset.
It is not necessarily unique. And making up identifiers can lead to all kinds of awkward problems downstream (how do consuming applications distinguish real identifiers that can be linked to and/resolved from those that are simply hacks and otherwise bogus), so I agree it's possibility but it strikes me as a last resort; we ought to be able to do better than that.
Using an extension for synonyms would probably be useful and feel natural for many datasets, but I am concerned we introduce more and more alternative ways of expressing the same kind of data and that becomes a huge burden on the consumer side.
I share that concern in principle. However, DwCA made a deliberate choice to flatten out its core taxon table and force a 1:1 relationship between row and <taxon or whatever other designated type> record. Since there may be multiple synonyms for a taxon, of different types, I'm not sure how you would do this in the core table unless you mint an identifier for each one that doesn't have one from the nomenclator, and cast all records into type Taxon, whether that's what they are or not.
All - to come back to my original question, I fully agree with the concerns re: proper treatment of synonyms for nomenclatural applications. But we also shouldn't forget that there's a wide area of application for taxonomies as an informatics tool, to discover and connect data linked to taxon. One of the key requirements for such applications when finding data by taxon is to find it by all names that were or are being potentially used to label the taxon, including basionyms, invalid names, misspellings, vernacular names. This may turn up false positives, and I agree the more provenance one has for the synonyms the better the ability to weed those out subsequently. But having an extensive list of synonyms is still critical, even if all one can say is that it's "related". Having a rich set of synonyms was one of the driving use-cases for synthesizing the Vertebrate Taxonomy Ontology [1] (and it's predecessor, the Teleost Taxonomy Ontology); it was enough of a pain that we would have gladly used an existing nomenclator's taxonomy.
What has brought me to this in the first place is the use-case of taxonomy synthesis, and the recognition that not only have we not managed in 300 years to converge on a universal taxonomy, also there is no generally accepted format for exchanging taxonomies. There are several dozens of taxonomies each of which comes in its own idiosyncratic format, often enough a straight database dump. Perhaps there's an opportunity here for the community as a whole to converge on DwCA as a universal taxonomy exchange format. I was going to write up some thoughts on that as a blog post; hopefully I get to that over the next few days, as I think we're really not far away in terms of what's missing.
-hilmar
[1] Midford, Peter, Thomas Dececchi, James Balhoff, Wasila Dahdul, Nizar Ibrahim, Hilmar Lapp, John Lundberg, et al. 2013. "The Vertebrate Taxonomy Ontology: A Framework for Reasoning across Model Organism and Species Phenotypes." *Journal of Biomedical Semantics* 4 (1): 34. http://dx.doi.org/10.1186/2041-1480-4-34
Here are links to the documents I mentioned before:
Publishing Species Checklists, Best Practices http://www.gbif.org/resources/2548
GBIF GNA Profile Reference Guide for Darwin Core Archive, Core Terms and Extensions: http://www.gbif.org/resources/2562
The GBIF documents are from 2011 and likely in need for some update in specific areas, but they still provide a good overview and lots of details.
In addition there is a Catalog of Life document that I cannot find online anymore so I have uploaded the last version I have here: i4Life Darwin Core Archive Profile
https://dl.dropboxusercontent.com/u/457027/ChecklistExchangeFormat-v1.6.pdf
Markus
Markus and all - yes, I realized after I emailed how GBIF does this. I
agree that this has advantages. However, this way of doing synonyms requires that there is an identifier for the synonym. For the core use-case I'm interested in synonyms are metadata of taxon records and do not have their own identifier. For example, synonyms in NCBI don't have identifiers, and they don't in Catalog of Fishes. (I'm not sure they do in PaleoDB.)
One could of course invent identifiers on behalf of the taxonomy
providers in these cases, but that's a hack. I think if there is an extension for vernacularNames, there ought to be one as well for synonyms that are simply names.
-hilmar
On Tue, Mar 18, 2014 at 6:35 PM, Markus Döring m.doering@mac.com
wrote:
Hi Hilmar,
GBIF, Catalog of life and others have produced guidelines for how to
express taxonomies with synonyms and these are in widespread use already since over a year. I will forward links tomorrow when Im back at my desk.
The common idea is to include synonyms together with accepted taxa in
the core file. This allows one to also add extension data to synonyms, for example bibliographic references, types data, etc. The term acceptedNameUsageID is used to link to the accepted record in the core file (targeting taxonID), originalNameUsageiD for the basionym and taxonomicStatus to declare a specific type of synonym such as homo/heterotypic or later/junior synonym. The scientificName is used both for accepted and synonym records.
You should be able to find many dwca examples in the gbif dataset search
when filtered for checklists: http://www.gbif.org/dataset/search?type=CHECKLIST
For example try these: http://data.canadensys.net/ipt/archive.do?r=vascan http://ipt.speciesfile.org:8080/archive.do?r=orthoptera
Cheers, Markus
Am 18.03.2014 um 22:44 schrieb Chuck Miller Chuck.Miller@mobot.org:
Hilmar,
Sticking strictly to Darwin Core and not adding RDF, I think there are
a couple of DwC terms that are attributes that can be used to identify a synonym:
taxonomicStatus - The status of the use of the scientificName as a
label for a taxon. Requires taxonomic opinion to define the scope of a taxon. Rules of priority then are used to define the taxonomic status of the nomenclature contained in that scope, combined with the experts opinion. It must be linked to a specific taxonomic reference that defines the concept. Recommended best practice is to use a controlled vocabulary. Examples: "invalid", "misapplied", "homotypic synonym", "accepted".
relationshipofResource - The relationship of the resource identified by
relatedResourceID to the subject (optionally identified by the resourceID). Recommended best practice is to use a controlled vocabulary. Examples: "duplicate of", "mother of", "endoparasite of", "host to", "sibling of", "valid synonym of", "located within".
There's also acceptedNameUsage and acceptedNameUsageID, which if used
infer that the name the terms are associated with is a synonym of the AcceptedName.
But, so far there is no guideline for how to organize synonyms in a
Darwin Core Archive. They can be embedded in the core file using relationshiopofResource from a synonym name to an accepted name in the same file. Or they can be in an extension file, where the extension file may be called Synonyms and thus define a one-to-many "synonym relationship" from the taxonID in the core file to synonym names in the extension file. There are probably other ways. RDF adds the ability to be more explicit about the relationships.
Rich Pyle has lectured prolifically on this so I'm sure he has good
advice to offer.
Chuck
From: tdwg-content-bounces@lists.tdwg.org [mailto:
tdwg-content-bounces@lists.tdwg.org] On Behalf Of Hilmar Lapp
Sent: Tuesday, March 18, 2014 2:55 PM To: TDWG Content Mailing List Cc: Dan Leehr Subject: [tdwg-content] synonyms in DwC Archives
I'm looking for recommendations on how best to put synonyms for taxon
records into DwC Archive format.
I'm assuming that these would go into an extension file. Do I have this
right? What I'm having more trouble with is determining the right column term. there's dwc:vernacularName, which is also in the examples, but what about synonyms of different types that come with taxonomies (such as NCBI's) or that result from merging taxonomies. There isn't an obvious candidate in DwC, and the list at http://rs.gbif.org/core/dwc_taxon.xmldoesn%27t have a suggestion either that would seem pertinent.
Any suggestions, pointers to documentation or examples?
-hilmar
--
Hilmar Lapp -:- informatics.nescent.org/wiki -:- lappland.io
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Hilmar Lapp -:- informatics.nescent.org/wiki -:- lappland.io
On Mar 18, 2014, at 6:35 PM, Markus Döring m.doering@mac.com wrote:
The common idea is to include synonyms together with accepted taxa in the core file. This allows one to also add extension data to synonyms, for example bibliographic references, types data, etc. The term acceptedNameUsageID is used to link to the accepted record in the core file (targeting taxonID), originalNameUsageiD for the basionym and taxonomicStatus to declare a specific type of synonym such as homo/heterotypic or later/junior synonym. The scientificName is used both for accepted and synonym records.
On Mar 19, 2014, at 5:40 AM, Markus Döring m.doering@mac.com wrote:
I was in favor of also replace dwc:taxonID with dwc:nameUsageID to achieve some consistency and be clear about what we are talking about, but we sticked to taxonID for reasons I cannot remember.
Some feedback from EOL, for what it's worth. We basically follow what Markus described. We expect synonym names to be included in the same file with taxa. We expect that for synonyms, the row in the Taxon file will include the scientificName, will indicate the status of the name using taxonomicStatus, and will reference the preferred record using acceptedNameUsageID (which will point to the taxonID of the of the accepted record; its worth noting the different suffixes is confusing for users and I would support Markus's proposed nameUsageID if even as a term equivalent to taxonID).
We have noticed a common case where the provider will not use acceptedNameUsageID and will instead use parentID to link synonyms to the preferred record. So we watch for those cases where a record has a taxonomicStatus which indicates it is not preferred, and has no acceptedNameUsageID but does have a parentNameUsageID which points to something that looks to be the same rank, and we will interpret that as synonymy as well. This is not a practice to recommend, but something our parsing methods support.
EOL does not use ResourceRelationship for any purposes.
-Patrick
---------------------------- Biodiversity Informatics Encyclopedia of Life Marine Biological Laboratory Woods Hole, MA
Hilmar, I've been in multiple discussions over the last year on how to exchange complex taxonomic data with or without DwCA and we still don't have a consensus. Complex taxonomic data to me means that the dataset can include many-to-many relationships. The checklists GBIF is ingesting include one-to-many relations (e.g. many synonyms to one accepted name), and DwCA has been demonstrated to handle that structure. But, as the many different suggestions show, there are multiple methods that can be employed to do it. And it would be great if we could agree on a "primary" DwCA method for most to follow, and then make exceptions when necessary or too complex.
But, you also touch on another of the ongoing issues in biodiversity information: the desire for a "universal taxonomy" that simply lists all the "taxa" and provides all the "related names' for those taxa. This desire has been repeatedly expressed by many fields outside biological systematics - conservation, ethno-economic use, niche modelling, land use, CITES and on. Taxonomists resist but I think the closest we have come to that desire so far is Catalogue of Life with 1.5 million species. Their aim is to create a consensus global checklist for all species, but achieving 100% of all species is extremely challenging when some organismal groups simply don't have a consensus taxonomy yet.
But even if CoL were complete, eternal stability in taxonomic classification simply doesn't exist. Until thousands of taxonomists stop collecting and analyzing specimens, they will continue to make revisions to taxonomic classifications based on new discovery, and some names will change from accepted to synonym, or vice versa, for valid scientific reasons. CoL deals with that change by publishing an Annual and a Dynamic Checklist. The Annual Checklist provides a stable consensus taxonomy for at least one year, but not eternally. NCBI's taxonomy contains about 300,000 species and applies only to the sequenced organisms there. And, as they have always done, they give this disclaimer: The NCBI taxonomy database is not an authoritative source for nomenclature or classification - please consult the relevant scientific literature for the most reliable information. Yet, folks continue to use NCBI taxonomy as an authoritative source.
Chuck
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Hilmar Lapp Sent: Wednesday, March 19, 2014 9:26 AM To: Markus Döring Cc: TDWG Content Mailing List; Dan Leehr Subject: Re: [tdwg-content] synonyms in DwC Archives
On Wed, Mar 19, 2014 at 5:35 AM, Markus Döring <m.doering@mac.commailto:m.doering@mac.com> wrote: Hilmar,
to include synonyms in the core file they technically do not have to have ids on their own.
How would you recommend I do this then, i.e., which column header should be used for that? (I'm assuming you are not requiring that I make up IDs, which to me seems a non-starter, see below.)
Said that there is the desire in dwc archives though to require an identifier for a record, see last years occurrenceID discussion. The best would be if you could make up some identifier, for example by adding a -synX suffix to the accepted taxonID or even just using the name alone if its known to be unique within your dataset.
It is not necessarily unique. And making up identifiers can lead to all kinds of awkward problems downstream (how do consuming applications distinguish real identifiers that can be linked to and/resolved from those that are simply hacks and otherwise bogus), so I agree it's possibility but it strikes me as a last resort; we ought to be able to do better than that.
Using an extension for synonyms would probably be useful and feel natural for many datasets, but I am concerned we introduce more and more alternative ways of expressing the same kind of data and that becomes a huge burden on the consumer side.
I share that concern in principle. However, DwCA made a deliberate choice to flatten out its core taxon table and force a 1:1 relationship between row and <taxon or whatever other designated type> record. Since there may be multiple synonyms for a taxon, of different types, I'm not sure how you would do this in the core table unless you mint an identifier for each one that doesn't have one from the nomenclator, and cast all records into type Taxon, whether that's what they are or not.
All - to come back to my original question, I fully agree with the concerns re: proper treatment of synonyms for nomenclatural applications. But we also shouldn't forget that there's a wide area of application for taxonomies as an informatics tool, to discover and connect data linked to taxon. One of the key requirements for such applications when finding data by taxon is to find it by all names that were or are being potentially used to label the taxon, including basionyms, invalid names, misspellings, vernacular names. This may turn up false positives, and I agree the more provenance one has for the synonyms the better the ability to weed those out subsequently. But having an extensive list of synonyms is still critical, even if all one can say is that it's "related". Having a rich set of synonyms was one of the driving use-cases for synthesizing the Vertebrate Taxonomy Ontology [1] (and it's predecessor, the Teleost Taxonomy Ontology); it was enough of a pain that we would have gladly used an existing nomenclator's taxonomy.
What has brought me to this in the first place is the use-case of taxonomy synthesis, and the recognition that not only have we not managed in 300 years to converge on a universal taxonomy, also there is no generally accepted format for exchanging taxonomies. There are several dozens of taxonomies each of which comes in its own idiosyncratic format, often enough a straight database dump. Perhaps there's an opportunity here for the community as a whole to converge on DwCA as a universal taxonomy exchange format. I was going to write up some thoughts on that as a blog post; hopefully I get to that over the next few days, as I think we're really not far away in terms of what's missing.
-hilmar
[1] Midford, Peter, Thomas Dececchi, James Balhoff, Wasila Dahdul, Nizar Ibrahim, Hilmar Lapp, John Lundberg, et al. 2013. "The Vertebrate Taxonomy Ontology: A Framework for Reasoning across Model Organism and Species Phenotypes." Journal of Biomedical Semantics 4 (1): 34. http://dx.doi.org/10.1186/2041-1480-4-34
Here are links to the documents I mentioned before:
Publishing Species Checklists, Best Practices http://www.gbif.org/resources/2548
GBIF GNA Profile Reference Guide for Darwin Core Archive, Core Terms and Extensions: http://www.gbif.org/resources/2562
The GBIF documents are from 2011 and likely in need for some update in specific areas, but they still provide a good overview and lots of details.
In addition there is a Catalog of Life document that I cannot find online anymore so I have uploaded the last version I have here: i4Life Darwin Core Archive Profile https://dl.dropboxusercontent.com/u/457027/ChecklistExchangeFormat-v1.6.pdf
Markus
Markus and all - yes, I realized after I emailed how GBIF does this. I agree that this has advantages. However, this way of doing synonyms requires that there is an identifier for the synonym. For the core use-case I'm interested in synonyms are metadata of taxon records and do not have their own identifier. For example, synonyms in NCBI don't have identifiers, and they don't in Catalog of Fishes. (I'm not sure they do in PaleoDB.)
One could of course invent identifiers on behalf of the taxonomy providers in these cases, but that's a hack. I think if there is an extension for vernacularNames, there ought to be one as well for synonyms that are simply names.
-hilmar
On Tue, Mar 18, 2014 at 6:35 PM, Markus Döring <m.doering@mac.commailto:m.doering@mac.com> wrote: Hi Hilmar,
GBIF, Catalog of life and others have produced guidelines for how to express taxonomies with synonyms and these are in widespread use already since over a year. I will forward links tomorrow when Im back at my desk.
The common idea is to include synonyms together with accepted taxa in the core file. This allows one to also add extension data to synonyms, for example bibliographic references, types data, etc. The term acceptedNameUsageID is used to link to the accepted record in the core file (targeting taxonID), originalNameUsageiD for the basionym and taxonomicStatus to declare a specific type of synonym such as homo/heterotypic or later/junior synonym. The scientificName is used both for accepted and synonym records.
You should be able to find many dwca examples in the gbif dataset search when filtered for checklists: http://www.gbif.org/dataset/search?type=CHECKLIST
For example try these: http://data.canadensys.net/ipt/archive.do?r=vascan http://ipt.speciesfile.org:8080/archive.do?r=orthoptera
Cheers, Markus
Am 18.03.2014 um 22:44 schrieb Chuck Miller <Chuck.Miller@mobot.orgmailto:Chuck.Miller@mobot.org>:
Hilmar,
Sticking strictly to Darwin Core and not adding RDF, I think there are a couple of DwC terms that are attributes that can be used to identify a synonym:
taxonomicStatus - The status of the use of the scientificName as a label for a taxon. Requires taxonomic opinion to define the scope of a taxon. Rules of priority then are used to define the taxonomic status of the nomenclature contained in that scope, combined with the experts opinion. It must be linked to a specific taxonomic reference that defines the concept. Recommended best practice is to use a controlled vocabulary. Examples: "invalid", "misapplied", "homotypic synonym", "accepted".
relationshipofResource - The relationship of the resource identified by relatedResourceID to the subject (optionally identified by the resourceID). Recommended best practice is to use a controlled vocabulary. Examples: "duplicate of", "mother of", "endoparasite of", "host to", "sibling of", "valid synonym of", "located within".
There's also acceptedNameUsage and acceptedNameUsageID, which if used infer that the name the terms are associated with is a synonym of the AcceptedName.
But, so far there is no guideline for how to organize synonyms in a Darwin Core Archive. They can be embedded in the core file using relationshiopofResource from a synonym name to an accepted name in the same file. Or they can be in an extension file, where the extension file may be called Synonyms and thus define a one-to-many "synonym relationship" from the taxonID in the core file to synonym names in the extension file. There are probably other ways. RDF adds the ability to be more explicit about the relationships.
Rich Pyle has lectured prolifically on this so I'm sure he has good advice to offer.
Chuck
From: tdwg-content-bounces@lists.tdwg.orgmailto:tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.orgmailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Hilmar Lapp Sent: Tuesday, March 18, 2014 2:55 PM To: TDWG Content Mailing List Cc: Dan Leehr Subject: [tdwg-content] synonyms in DwC Archives
I'm looking for recommendations on how best to put synonyms for taxon records into DwC Archive format.
I'm assuming that these would go into an extension file. Do I have this right? What I'm having more trouble with is determining the right column term. there's dwc:vernacularName, which is also in the examples, but what about synonyms of different types that come with taxonomies (such as NCBI's) or that result from merging taxonomies. There isn't an obvious candidate in DwC, and the list at http://rs.gbif.org/core/dwc_taxon.xml doesn't have a suggestion either that would seem pertinent.
Any suggestions, pointers to documentation or examples?
-hilmar
--
Hilmar Lapp -:- informatics.nescent.org/wikihttp://informatics.nescent.org/wiki -:- lappland.iohttp://lappland.io
tdwg-content mailing list tdwg-content@lists.tdwg.orgmailto:tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Hilmar Lapp -:- informatics.nescent.org/wikihttp://informatics.nescent.org/wiki -:- lappland.iohttp://lappland.io
-- Hilmar Lapp -:- informatics.nescent.org/wikihttp://informatics.nescent.org/wiki -:- lappland.iohttp://lappland.io
I'm pretty sure we use acceptedNameUsage and acceptedNameUsageID when we ask partners to share synonyms or when we create connectors to generate DwC-A files with synonyms. I wish we had some guidelines or documentation about this, though. Will consult with Patrick Leary and get back to you.
Cyndy
On Tue, Mar 18, 2014 at 5:44 PM, Chuck Miller Chuck.Miller@mobot.orgwrote:
Hilmar,
Sticking strictly to Darwin Core and not adding RDF, I think there are a couple of DwC terms that are attributes that can be used to identify a synonym:
taxonomicStatus - The status of the use of the scientificName as a label for a taxon. Requires taxonomic opinion to define the scope of a taxon. Rules of priority then are used to define the taxonomic status of the nomenclature contained in that scope, combined with the experts opinion. It must be linked to a specific taxonomic reference that defines the concept. Recommended best practice is to use a controlled vocabulary. Examples: "invalid", "misapplied", "homotypic synonym", "accepted".
relationshipofResource - The relationship of the resource identified by relatedResourceID to the subject (optionally identified by the resourceID). Recommended best practice is to use a controlled vocabulary. Examples: "duplicate of", "mother of", "endoparasite of", "host to", "sibling of", "valid synonym of", "located within".
There's also acceptedNameUsage and acceptedNameUsageID, which if used infer that the name the terms are associated with is a synonym of the AcceptedName.
But, so far there is no guideline for how to organize synonyms in a Darwin Core Archive. They can be embedded in the core file using relationshiopofResource from a synonym name to an accepted name in the same file. Or they can be in an extension file, where the extension file may be called Synonyms and thus define a one-to-many "synonym relationship" from the taxonID in the core file to synonym names in the extension file. There are probably other ways. RDF adds the ability to be more explicit about the relationships.
Rich Pyle has lectured prolifically on this so I'm sure he has good advice to offer.
Chuck
*From:* tdwg-content-bounces@lists.tdwg.org [mailto: tdwg-content-bounces@lists.tdwg.org] *On Behalf Of *Hilmar Lapp *Sent:* Tuesday, March 18, 2014 2:55 PM
*To:* TDWG Content Mailing List *Cc:* Dan Leehr *Subject:* [tdwg-content] synonyms in DwC Archives
I'm looking for recommendations on how best to put synonyms for taxon records into DwC Archive format.
I'm assuming that these would go into an extension file. Do I have this right? What I'm having more trouble with is determining the right column term. there's dwc:vernacularName, which is also in the examples, but what about synonyms of different types that come with taxonomies (such as NCBI's) or that result from merging taxonomies. There isn't an obvious candidate in DwC, and the list at http://rs.gbif.org/core/dwc_taxon.xmlhttps://urldefense.proofpoint.com/v1/url?u=http://rs.gbif.org/core/dwc_taxon.xml&k=diZKtJPqj4jWksRIF4bjkw%3D%3D%0A&r=7lL8FaPI0tdUGKvRgo98Kw%3D%3D%0A&m=23xLjQiVV1J3LqDEA6SJ9GeIiH8my9R%2FN91aHemUvUg%3D%0A&s=9d0fbbc371b53c3c24830dfff14d416a13291b9e4d75a8f02d2a40510a43eacddoesn't have a suggestion either that would seem pertinent.
Any suggestions, pointers to documentation or examples?
-hilmar
--
Hilmar Lapp -:- informatics.nescent.org/wikihttps://urldefense.proofpoint.com/v1/url?u=http://informatics.nescent.org/wiki&k=diZKtJPqj4jWksRIF4bjkw%3D%3D%0A&r=7lL8FaPI0tdUGKvRgo98Kw%3D%3D%0A&m=23xLjQiVV1J3LqDEA6SJ9GeIiH8my9R%2FN91aHemUvUg%3D%0A&s=2a14e3a8c301550a0cbc2d9402d74a584ff0b01c053c932be756ee039e4dc3aa-:- lappland.iohttps://urldefense.proofpoint.com/v1/url?u=http://lappland.io&k=diZKtJPqj4jWksRIF4bjkw%3D%3D%0A&r=7lL8FaPI0tdUGKvRgo98Kw%3D%3D%0A&m=23xLjQiVV1J3LqDEA6SJ9GeIiH8my9R%2FN91aHemUvUg%3D%0A&s=e452680aed14715102ecec6f118d1fda7e96905ca0d8c913fe1471f219269d07
tdwg-content mailing list tdwg-content@lists.tdwg.org
https://urldefense.proofpoint.com/v1/url?u=http://lists.tdwg.org/mailman/lis...
Hi Hilmar, all,
This is how I currently handle synonyms in my IRMNG DwCA export file (rightly or wrongly):
TaxonID: 1268813 ScientificName: Herpetophytum (F.R.R. Schlechter) F.G. Brieger, 1981 TaxonomicStatus: synonym AcceptedNameUsageID: 1243758
where 1243758 is the TaxonID of Dendrobium O. Swartz, 1799, the currently accepted name for this taxon.
For the latter, the AcceptedNameUsageID will be the same as the TaxonID indicating that this is currently an accepted name (also will have TaxonomicStatus = accepted).
Is this not sufficient for the requirement to handle synonyms?
Regards - Tony
Dr Tony Rees Manager | Divisional Data Centre Marine and Atmospheric Research CSIRO E Tony Rees@csiro.au T +61 3 6232 5318 CSIRO Marine and Atmospheric Research, GPO Box 1538, Hobart, TAS 7001, Australia www.cmar.csiro.au/datacentre Manager, OBIS Australia regional Node, http://www.obis.au LinkedIn profile: http://www.linkedin.com/pub/tony-rees/18/770/36 PLEASE NOTE The information contained in this email may be confidential or privileged. Any unauthorised use or disclosure is prohibited. If you have received this email in error, please delete it immediately and notify the sender by return email. Thank you. To the extent permitted by law, CSIRO does not represent, warrant and/or guarantee that the integrity of this communication has been maintained or that the communication is free of errors, virus, interception or interference. Please consider the environment before printing this email.
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Hilmar Lapp Sent: Wednesday, 19 March 2014 6:55 AM To: TDWG Content Mailing List Cc: Dan Leehr Subject: [tdwg-content] synonyms in DwC Archives
I'm looking for recommendations on how best to put synonyms for taxon records into DwC Archive format.
I'm assuming that these would go into an extension file. Do I have this right? What I'm having more trouble with is determining the right column term. there's dwc:vernacularName, which is also in the examples, but what about synonyms of different types that come with taxonomies (such as NCBI's) or that result from merging taxonomies. There isn't an obvious candidate in DwC, and the list at http://rs.gbif.org/core/dwc_taxon.xml doesn't have a suggestion either that would seem pertinent.
Any suggestions, pointers to documentation or examples?
-hilmar
-- Hilmar Lapp -:- informatics.nescent.org/wikihttp://informatics.nescent.org/wiki -:- lappland.iohttp://lappland.io
participants (10)
-
Bailly, Nicolas (WorldFish)
-
Chuck Miller
-
Cynthia Parr
-
Hilmar Lapp
-
John Wieczorek
-
Markus Döring
-
Patrick Leary
-
Richard Pyle
-
Roderic Page
-
Tony.Rees@csiro.au