[tdwg-content] synonyms in DwC Archives

Bailly, Nicolas (WorldFish) N.Bailly at cgiar.org
Wed Mar 19 12:36:10 CET 2014


Using the string as the key …
The way we manage the synonymy pro parte would not work (we repeat the string and have 2 different syn records), but it is because of compromises we do for simplicity.
For homonyms, you would need mandatorily to include the authorship, but maybe it means that assigning a numeric Id is adding more meaning than the strict string, which is again a compromise. To say it in other words, we don’t have a repository of meaningless strings like GNI, our synonym table is thus closer to a subset of the GNUB implicitly.
So ok you got a point: what I call NameAsString with its Id in FishBase is not strictly a name as string. I suppose it is the difference I saw comparing FishBase and GNA, but I did not pay attention enough to the vocabulary. …

Agree with the rest of your analyses and suggestions. And with the less sexy part of it: populate the GNUB! With the most tedious part: the “according to”! Unfortunately, I don’t see that it can be entirely automated …

And yes, I dream about it … !!!

BW
Nicolas.


From: Richard Pyle [mailto:pylediver at gmail.com] On Behalf Of Richard Pyle
Sent: Wednesday, March 19, 2014 6:16 PM
To: Bailly, Nicolas (WorldFish); 'Roderic Page'; 'Hilmar Lapp'
Cc: 'TDWG Content Mailing List'
Subject: RE: [tdwg-content] synonyms in DwC Archives

I’m reluctant to dive into this conversation, because there are so many things going on.  Very briefly, Rod and Nicolas come the closest to reflecting my own interpretation of the “simple” explanation.  One question for Nicolas – why do you need “NameAsStringID”?  Why not just NameString (literal)? What value is there for assigning an identifier to what is effectively a string of UTF-8-encoded characters?

The key here, as Rod (and others) alluded to, is that different people have different meanings for the word “Synonym” in this context; and this is further confused by the whole names vs. concepts thing.

Ignoring the names v. concepts issue altogether, we have various interpretations of “synonym” as including or excluding homotypic synonyms, heterotypic synonyms, and orthographic variations.  That is further confounded by the “names as conceptual objects” vs, “names as literal text strings” issues.

The answer is obvious:  reduce everything to atomized Taxon Name Usage instances (where the word “usage” comes from n DwC) to track “names as objects”, and index every unique text string to track the strings themselves.  GNUB is well along this path on the former for the groups that have been populated, and we are now ramping up our efforts to bulk-populate the content from various sources.  Once you populate the Protonym (~=Basionym) Usages (the hardest part, by far), then you can start to capture subsequent usages en masse to represent the spectrum of how names have been actually used through history -- different spellings, different classifications and genus combinations, different ranks (e.g., as a full species vs. as a subspecies of another species), and treatments of names as valid (=accepted) or as junior heterotypic synonyms. With that properly captured & index, you can do all sorts of wonderful, magical, amazing things that have only been dreamed about.

So, the problem is well understood.  And the solution is also well understood (as a result of the development of GNUB and GNI and associated services via GNA, guided by a series of NOMINA meetings going back several years). What we need to do now is populate the content (particularly GNUB), and then implement the services.

I realize this is getting pretty far away from Hilmar’s original question – but that has been reasonably well addressed by others (particularly Markus, Tony, Rod and Nicholas).

One final point to keep in mind when discussing this stuff – particularly in a flattened DwC sort of structure:

The unqualified statement “Aus bus is a synonym of Aus cus” should never be represented in the form of metadata without an “According to” to go along with it.  Even objective synonyms should be understood in the context of the relevant Code (i.e., where the “According To” is effectively the Code of Nomenclature itself).  This is why the “usage” qualifiers were added to the various DWC terms in the Taxon class – to emphasize that the status of a name as being either accepted, or as representing a synonym of another name, is implied to be in the context of a particular usage (i.e., a particular “according to”).

Aloha,
Rich


From: tdwg-content-bounces at lists.tdwg.org<mailto:tdwg-content-bounces at lists.tdwg.org> [mailto:tdwg-content-bounces at lists.tdwg.org] On Behalf Of Bailly, Nicolas (WorldFish)
Sent: Tuesday, March 18, 2014 8:57 PM
To: Roderic Page; Hilmar Lapp
Cc: TDWG Content Mailing List
Subject: Re: [tdwg-content] synonyms in DwC Archives

Dear Hilmar,

I completely concur with Rod. An additional remark is that Biodiversity Informatics has unfortunately twisted the meaning of the word ‘synonym’ as it is used in the different codes where it designates primarily a relationship between taxa and only secondarily by extension a relationship between names (because we designate taxa by using names). Formally, two different combinations of the same basionym are not synonyms: they are different combinations.

The flat DarwinCore cannot handle (easily) the 3 necessary Ids that have to be managed:

-          TaxonId

-          BasionymId

-          NameAsStringId

Technically, the NameAsStringId could be used alone, but only if there is a second field that precises if it is a Basionym or not, and a third that indicates if it is the current accepted name: in the latter case, then the NameAsStringId can be used as the TaxonId. So in any case, a triplet has to be managed, otherwise, if you give an id alone, you cannot know what it is beyond a simple NameAsString.

But to be honest I prefer for clarity to have 3 different codes. It is what we have in FishBase, a SpecCode as TaxonId, a SynCode as NameAsStringId, and we use the Catalog of Fishes (CofF) CAS_SPC as the BasionymId – so we do exactly what Rod suggests, we use the acknowledged nomenclator. Strictly speaking and referring to my remark  above, CofF has ids for all basionyms, whatever they are linked to valid taxa or synonym taxa. What CofF does not have is an id for the new combinations and misspellings.

Where FishBase is not completely consistent is that we use a SynCode also for the misidentifications and the misapplied names, while these relationships should be managed in another table than the table of NameAsStrings. Also like Markus described, the current accepted NameAsString is repeated both in our Taxon and our NameAsString tables. These were compromises to simplify the structure of the database.

BW
Nicolas.

From: tdwg-content-bounces at lists.tdwg.org<mailto:tdwg-content-bounces at lists.tdwg.org> [mailto:tdwg-content-bounces at lists.tdwg.org] On Behalf Of Roderic Page
Sent: Wednesday, March 19, 2014 1:38 PM
To: Hilmar Lapp
Cc: TDWG Content Mailing List
Subject: Re: [tdwg-content] synonyms in DwC Archives

Hi Hilmar,

I think the phrase “simply names” summarises part ot the problem here. Many taxon databases seem determined to conflate names and taxa, and treat names as dumb strings rather than entities of interest (or worse, as taxa, rather than names of taxa). Many synonyms are relationships between names (e.g., names that change because we move species to different genera are the “same” name), others are relationships between taxa (e.g., this set of things and that set of things are the members of the same set of things).

We could model things much more cleanly if we distinguished between taxa and names for taxa, and made use of the fact that we have databases of names (the nomenclators) that have been serving name data and stable identifiers for almost a decade. It is a pity that most taxonomic databases make little or no use of this. Map names to those ids, have ids for taxa, and link all the names to the taxa to which they’ve been applied. Instead we have a flat Darwin Core with row upon row of things that we call “taxa” but which, in many cases, aren’t.

I get why we like flat Darwin Core, but sometimes the world isn’t flat.

Regards

Rod

On 19 Mar 2014, at 02:44, Hilmar Lapp <hlapp at nescent.org<mailto:hlapp at nescent.org>> wrote:

Tony - your example matches pretty much what GBIF does.

John - yes indeed, if using ResourceRelationship terms, it would have to be in an extension and not the core taxon file.

Markus and all - yes, I realized after I emailed how GBIF does this. I agree that this has advantages. However, this way of doing synonyms requires that there is an identifier for the synonym. For the core use-case I'm interested in synonyms are metadata of taxon records and do not have their own identifier. For example, synonyms in NCBI don't have identifiers, and they don't in Catalog of Fishes. (I'm not sure they do in PaleoDB.)

One could of course invent identifiers on behalf of the taxonomy providers in these cases, but that's a hack. I think if there is an extension for vernacularNames, there ought to be one as well for synonyms that are simply names.

   -hilmar

On Tue, Mar 18, 2014 at 6:35 PM, Markus Döring <m.doering at mac.com<mailto:m.doering at mac.com>> wrote:
Hi Hilmar,

GBIF, Catalog of life and others have produced guidelines for how to express taxonomies with synonyms and these are in widespread use already since over a year. I will forward links tomorrow when Im back at my desk.

The common idea is to include synonyms together with accepted taxa in the core file. This allows one to also add extension data to synonyms, for example bibliographic references, types data, etc. The term acceptedNameUsageID is used to link to the accepted record in the core file (targeting taxonID), originalNameUsageiD for the basionym and taxonomicStatus to declare a specific type of synonym such as homo/heterotypic or later/junior synonym. The scientificName is used both for accepted and synonym records.

You should be able to find many dwca examples in the gbif dataset search when filtered for checklists: http://www.gbif.org/dataset/search?type=CHECKLIST

For example try these:
http://data.canadensys.net/ipt/archive.do?r=vascan
http://ipt.speciesfile.org:8080/archive.do?r=orthoptera


Cheers,
Markus

Am 18.03.2014 um 22:44 schrieb Chuck Miller <Chuck.Miller at mobot.org<mailto:Chuck.Miller at mobot.org>>:
Hilmar,
Sticking strictly to Darwin Core and not adding RDF, I think there are a couple of DwC terms that are attributes that can be used to identify a synonym:

taxonomicStatus - The status of the use of the scientificName as a label for a taxon. Requires taxonomic opinion to define the scope of a taxon. Rules of priority then are used to define the taxonomic status of the nomenclature contained in that scope, combined with the experts opinion. It must be linked to a specific taxonomic reference that defines the concept. Recommended best practice is to use a controlled vocabulary. Examples: "invalid", "misapplied", "homotypic synonym", "accepted".

relationshipofResource - The relationship of the resource identified by relatedResourceID to the subject (optionally identified by the resourceID). Recommended best practice is to use a controlled vocabulary. Examples: "duplicate of", "mother of", "endoparasite of", "host to", "sibling of", "valid synonym of", "located within".

There’s also acceptedNameUsage and acceptedNameUsageID, which if used infer that the name the terms are associated with is a synonym of the AcceptedName.

But, so far there is no guideline for how to organize synonyms in a Darwin Core Archive.  They can be embedded in the core file using relationshiopofResource from a synonym name to an accepted name in the same file.  Or they can be in an extension file, where the extension file may be called Synonyms and thus define a one-to-many “synonym relationship” from the taxonID in the core file to synonym names in the extension file.  There are probably other ways.  RDF adds the ability to be more explicit about the relationships.

Rich Pyle has lectured prolifically on this so I’m sure he has good advice to offer.

Chuck

From: tdwg-content-bounces at lists.tdwg.org<mailto:tdwg-content-bounces at lists.tdwg.org> [mailto:tdwg-content-bounces at lists.tdwg.org] On Behalf Of Hilmar Lapp
Sent: Tuesday, March 18, 2014 2:55 PM
To: TDWG Content Mailing List
Cc: Dan Leehr
Subject: [tdwg-content] synonyms in DwC Archives

I'm looking for recommendations on how best to put synonyms for taxon records into DwC Archive format.

I'm assuming that these would go into an extension file. Do I have this right? What I'm having more trouble with is determining the right column term. there's dwc:vernacularName, which is also in the examples, but what about synonyms of different types that come with taxonomies (such as NCBI's) or that result from merging taxonomies. There isn't an obvious candidate in DwC, and the list at http://rs.gbif.org/core/dwc_taxon.xml doesn't have a suggestion either that would seem pertinent.

Any suggestions, pointers to documentation or examples?

  -hilmar

--
Hilmar Lapp -:- informatics.nescent.org/wiki<http://informatics.nescent.org/wiki> -:- lappland.io<http://lappland.io/>

_______________________________________________

tdwg-content mailing list
tdwg-content at lists.tdwg.org<mailto:tdwg-content at lists.tdwg.org>
http://lists.tdwg.org/mailman/listinfo/tdwg-content



--
Hilmar Lapp -:- informatics.nescent.org/wiki<http://informatics.nescent.org/wiki> -:- lappland.io<http://lappland.io/>

_______________________________________________
tdwg-content mailing list
tdwg-content at lists.tdwg.org<mailto:tdwg-content at lists.tdwg.org>
http://lists.tdwg.org/mailman/listinfo/tdwg-content

---------------------------------------------------------
Roderic Page
Professor of Taxonomy
Institute of Biodiversity, Animal Health and Comparative Medicine
College of Medical, Veterinary and Life Sciences
Graham Kerr Building
University of Glasgow
Glasgow G12 8QQ, UK

Email:                         r.page at bio.gla.ac.uk<mailto:r.page at bio.gla.ac.uk>
Tel:                             +44 141 330 4778
Fax:                +44 141 330 2792
Skype:                        rdmpage
Facebook:     http://www.facebook.com/rdmpage
LinkedIn:        http://uk.linkedin.com/in/rdmpage
Twitter:                       http://twitter.com/rdmpage
Blog:               http://iphylo.blogspot.com
Home page:  http://taxonomy.zoology.gla.ac.uk/rod/rod.html
Wikipedia:     http://en.wikipedia.org/wiki/Roderic_D._M._Page
Citations:       http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ
ORCID:                      http://orcid.org/0000-0002-7101-9767

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20140319/ca3a66b3/attachment.html 


More information about the tdwg-content mailing list