[tdwg-content] Question about "vocabulary" in Darwin Core Archives assistant

Steve Baskauf steve.baskauf at vanderbilt.edu
Mon May 16 20:51:10 CEST 2011


Thank you, David.  Yes!  This (below) is exactly what I had in mind.  In 
many cases involving GUIDs, at least part of the GUID string originating 
from a single institution will be the same for all records in a 
particular field.  The provider isn't going to need to store or transmit 
those constant characters.  Being able to supply that constant part of 
the GUID to be concatenated by the receiver could reduce the size of the 
transmitted file significantly. 

Steve

David Remsen (GBIF) wrote:
>
> I realise now that the example I gave won't work for this.   As I read 
> it now you would like to use local integer identifiers in your 
> database but expand them in the output file using a "template" that 
> would conform to the template I used in my globals example.
> In this case,  we don't want to refer to a different element we want 
> the current element to substitute the local identifier it contains 
> with the more inflated template.  In other words if your data file 
> says that taxonID=100 has a parent taxon with an ID = 99 you want to 
> conflate the integer with the more complete GUID following the 
> template.   This currently isn't something we have discussed 
> supporting but I think we could by allowing for value substitution via 
> a template placed in the default value.   We could for example, 
> support it using the example below.  Note in this case the substitute 
> variable IS the value itself.
>
> <id index="0"/      default="urn:lsid:ubio.org:namebank:{0} ">  # we 
> dont need to assign a term here.  It is implied. See next comment below.
> <field  index="1"  default="urn:lsid:ubio.org:namebank:{1} " 
> term="http://rs.tdwg.org/dwc/terms/parentNameUsageID"/>
>
>
...
> I think you have made the case and I think we could accommodate it by 
> simple interpreting the default in the way I specified.  Otherwise I 
> could imagine we would have to add a "template" attribute to the field 
> element.  However, I don't think this is needed.   I guess I'd like 
> feedback from Tim, John W or Markus on this.
>
>
>
>>
>> Steve
>>
>> David Remsen (GBIF) wrote:
>>> Hi Steve,
>>>
>>> There is a way to do what you ask but not exactly the way you specified.
>>>
>>> The way to do is via a template that refers to a particular column.  
>>> So if you put the ubio integer ID into dwc:scientificName you could 
>>> could put the following into, for example, dc:source
>>>
>>> http://www.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:namebank:<scientificName>
>>>
>>> as the default and set the column to a global.
>>>
>>> You can see more in this document on the XML Descriptor file.
>>>
>>> http://links.gbif.org/gbif_dwc-a_metafile_en_v1/
>>>
>>> The vocabularies option was, as far as I know, intended to provide a 
>>> URI for a vocabulary so that we might be able to validate values 
>>> against the vocabulary items.
>>>
>>> Best,
>>> David
>>>
>>> ----------------------------------------------------------------------------
>>> David Remsen, Senior Programme Officer
>>> Electronic Catalog of Names of Known Organisms
>>> Global Biodiversity Information Facility Secretariat
>>> Universitetsparken 15, DK-2100 Copenhagen, Denmark
>>> Tel: +45-35321472   Fax: +45-35321480
>>> Mobile +45 28751472
>>> Skype: dremsen
>>> ----------------------------------------------------------------------------
>>>
>>>
>>>
>>>
>>> On 29 Apr 2011, at 22:21, Steve Baskauf wrote:
>>>
>>>> I am playing around with Darwin Core Arcives, in particular the DwC-A
>>>> Assistant (http://tools.gbif.org/dwca-assistant/).  One thing that 
>>>> I am
>>>> not exactly clear about is how to use the "Vocabulary" column in the
>>>> assistant.  The description that comes up when you mouse over the 
>>>> column
>>>> heading says that it should ideally be a URI that identifies the
>>>> vocabulary and resolves to some machine readable form like RDF.  So 
>>>> what
>>>> I'm wondering is whether I can put what effectively amounts to as a
>>>> namespace in that spot.
>>>>
>>>> For example, a URI for the name "Acer rubrum L." that actually 
>>>> resolves
>>>> to RDF is:
>>>> http://www.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:namebank:456216
>>>> I think that would qualify as a valid HTTP URI guid because it's the
>>>> proxied form of an LSID.  So I would like to use it as a value for the
>>>> dwc:scientificNameID column in a DwC-A taxon record.  However, the 
>>>> only
>>>> part of the identifier that makes the string unique within uBio's 
>>>> domain
>>>> is the last number - if I'm always using a uBio guid, the first
>>>> approximately 75 characters will be the same for all of the guids.  So
>>>> can I just put
>>>> "http://www.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:namebank:" 
>>>>
>>>> in the Vocabulary column and then just put the locally unique numbers
>>>> (e.g. "456216") in the column for dwc:scientificNameID?  Should an
>>>> application using a DwC-A file be smart enough to append the
>>>> "vocabulary" string on the front of the actual value in the text 
>>>> file?  
>>>> Or is that not how the "Vocabulary" column is intended to be used?
>>>>
>>>> Steve
>>>>
>>>> -- 
>>>> Steven J. Baskauf, Ph.D., Senior Lecturer
>>>> Vanderbilt University Dept. of Biological Sciences
>>>>
>>>> postal mail address:
>>>> VU Station B 351634
>>>> Nashville, TN  37235-1634,  U.S.A.
>>>>
>>>> delivery address:
>>>> 2125 Stevenson Center
>>>> 1161 21st Ave., S.
>>>> Nashville, TN 37235
>>>>
>>>> office: 2128 Stevenson Center
>>>> phone: (615) 343-4582,  fax: (615) 343-6707
>>>> http://bioimages.vanderbilt.edu
>>>>
>>>> _______________________________________________
>>>> tdwg-content mailing list
>>>> tdwg-content at lists.tdwg.org
>>>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>>>>
>>>
>>
>> -- 
>> Steven J. Baskauf, Ph.D., Senior Lecturer
>> Vanderbilt University Dept. of Biological Sciences
>>
>> postal mail address:
>> VU Station B 351634
>> Nashville, TN  37235-1634,  U.S.A.
>>
>> delivery address:
>> 2125 Stevenson Center
>> 1161 21st Ave., S.
>> Nashville, TN 37235
>>
>> office: 2128 Stevenson Center
>> phone: (615) 343-4582,  fax: (615) 343-6707
>> http://bioimages.vanderbilt.edu
>>     
>

-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707
http://bioimages.vanderbilt.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20110516/7363152e/attachment.html 


More information about the tdwg-content mailing list