[tdwg-content] Question about "vocabulary" in Darwin Core Archives assistant

Steve Baskauf steve.baskauf at vanderbilt.edu
Sat May 7 22:12:23 CEST 2011


David,
I have had a bit of time to read the document that you linked below and 
to play around with what you suggested below.  I had a couple clarifying 
questions.  But before I ask them, I'd like to give just a bit of 
feedback on the GBIF Darwin Core page 
(http://www.gbif.org/informatics/standards-and-tools/publishing-data/data-standards/darwin-core-archives/).  
The font and color of the hyperlinks on that page are so similar to the 
default text font that I had difficulty knowing they were there.  I 
guess they are blue and the base font is black, but to my (admittedly 
aging) eyes, they were barely distinguishable.  I had to mouse over the 
text and watch the cursor to find the links.  This was the same on all 
five of the major web browsers.  Also, you might put a link on that page 
to the XML Descriptor file document you linked in the message below.  I 
don't think I would have found it if I hadn't emailed you.

OK, so here's my first question.  I understood the explanation of how to 
make a field be generated by using a variable in a static mapping.  So 
when the actual text datafile is created, do you just leave the 
statically mapped column empty?  E.g. if I have something like:

        <field  index="11" term="http://purl.org/dc/terms/source"/>
        <field  index="12"  
default="http://www.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:namebank:{11}" 
term="http://rs.tdwg.org/dwc/terms/scientificNameID"/>

would I just populate column 11 with uBio's locally unique ID number and 
leave column 12 blank (i.e. have two consecutive delimiters with no 
characters between)?

The second question is a little trickier.  The main purpose that I would 
want to use this for is to enable the use of GUIDs.  Using GUIDs rather 
than local IDs seems to be considered a better practice (or at least an 
allowed practice) in both the DwC term descriptions (e.g. 
http://rs.tdwg.org/dwc/terms/index.htm#taxonID) and in the GBIF Darwin 
Core Archives web page I complained about above.  However, it is 
difficult for me to see how one would accomplish this with the Darwin 
Core Archives format, at least if one wishes to cut down on file size 
(and transmission time) using the "variable in a static mapping" 
shortcut.  The problem is what to use for the field names for the 
locally unique identifiers.  In your example below, you suggested 
dc:source.  That's actually not an option listed in the Darwin Core 
Archives Assistant.  Is any allowable Dublin Core term valid as a DWC-A 
field if one just jury-rigs the XML file after generating it with the 
Assistant?  In the metafile documentation dcterms:identifier is also 
used and not listed in DWC-A Assistant. 

This becomes tricker still if one wants to use GUIDs for several fields 
in the datafile.  For example, I want to have dwc:taxonID, 
dwc:scientificNameID, and dwc:nameAccordingToID be fields in my taxon 
file.  I can't use dcterms:source for all three of the locally unique 
identifiers that I'd like to use as variables in the static mappings for 
the xxxID terms.  Also, I'm not sure about this, but from the examples, 
it looks like the ID field in the core Taxon file is assumed to be 
dwc:taxonID.  At least taxonID doesn't show up on the DWC-A assistant 
list and the ID field is labeled as "taxonID" in the examples.  But I'm 
going to run into problems if I actually want dwc:taxonID to be a GUID 
rather than my locally unique identifier.  I can't have the base ID and 
another column containing a static mapping having the base ID as a 
variable both be identified as dwc:taxonID.

It seems like in the interest of facilitating GUIDs, it would be 
beneficial to allow the DWC-A to include fields that are identified 
simply as local variables to be used in static mappings without 
requiring that each field to map to DwC or Dublin Core.  I haven't 
actually picked my way through the XML schema or tried doing this and 
then validating the XML file to see if I could get away with it, however.

Steve

David Remsen (GBIF) wrote:
> Hi Steve,
>
> There is a way to do what you ask but not exactly the way you specified.
>
> The way to do is via a template that refers to a particular column.  
> So if you put the ubio integer ID into dwc:scientificName you could 
> could put the following into, for example, dc:source
>
> http://www.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:namebank:<scientificName>
>
> as the default and set the column to a global.
>
> You can see more in this document on the XML Descriptor file.
>
> http://links.gbif.org/gbif_dwc-a_metafile_en_v1/
>
> The vocabularies option was, as far as I know, intended to provide a 
> URI for a vocabulary so that we might be able to validate values 
> against the vocabulary items.
>
> Best,
> David
>
> ----------------------------------------------------------------------------
> David Remsen, Senior Programme Officer
> Electronic Catalog of Names of Known Organisms
> Global Biodiversity Information Facility Secretariat
> Universitetsparken 15, DK-2100 Copenhagen, Denmark
> Tel: +45-35321472   Fax: +45-35321480
> Mobile +45 28751472
> Skype: dremsen
> ----------------------------------------------------------------------------
>
>
>
>
> On 29 Apr 2011, at 22:21, Steve Baskauf wrote:
>
>> I am playing around with Darwin Core Arcives, in particular the DwC-A
>> Assistant (http://tools.gbif.org/dwca-assistant/).  One thing that I am
>> not exactly clear about is how to use the "Vocabulary" column in the
>> assistant.  The description that comes up when you mouse over the column
>> heading says that it should ideally be a URI that identifies the
>> vocabulary and resolves to some machine readable form like RDF.  So what
>> I'm wondering is whether I can put what effectively amounts to as a
>> namespace in that spot.
>>
>> For example, a URI for the name "Acer rubrum L." that actually resolves
>> to RDF is:
>> http://www.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:namebank:456216
>> I think that would qualify as a valid HTTP URI guid because it's the
>> proxied form of an LSID.  So I would like to use it as a value for the
>> dwc:scientificNameID column in a DwC-A taxon record.  However, the only
>> part of the identifier that makes the string unique within uBio's domain
>> is the last number - if I'm always using a uBio guid, the first
>> approximately 75 characters will be the same for all of the guids.  So
>> can I just put
>> "http://www.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:namebank:" 
>>
>> in the Vocabulary column and then just put the locally unique numbers
>> (e.g. "456216") in the column for dwc:scientificNameID?  Should an
>> application using a DwC-A file be smart enough to append the
>> "vocabulary" string on the front of the actual value in the text file?  
>> Or is that not how the "Vocabulary" column is intended to be used?
>>
>> Steve
>>
>> -- 
>> Steven J. Baskauf, Ph.D., Senior Lecturer
>> Vanderbilt University Dept. of Biological Sciences
>>
>> postal mail address:
>> VU Station B 351634
>> Nashville, TN  37235-1634,  U.S.A.
>>
>> delivery address:
>> 2125 Stevenson Center
>> 1161 21st Ave., S.
>> Nashville, TN 37235
>>
>> office: 2128 Stevenson Center
>> phone: (615) 343-4582,  fax: (615) 343-6707
>> http://bioimages.vanderbilt.edu
>>
>> _______________________________________________
>> tdwg-content mailing list
>> tdwg-content at lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>>
>

-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707
http://bioimages.vanderbilt.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20110507/c00b9fe1/attachment.html 


More information about the tdwg-content mailing list