[tdwg-content] Question about "vocabulary" in Darwin Core Archives assistant

Gregor Hagedorn g.m.hagedorn at gmail.com
Mon May 16 21:27:27 CEST 2011


I believe the equivalnt in xml of what you are looking for would be
entities. Best known are the character entities like " but xml
allows to define arbitrary ones in any xml file. Their use is common
in RDF, where part of the URI-prefixes can be namespaces, whereas
others (the ones in values) would not be expanded.

I realize that DwCArchive is not bound to xml, but I think providing a
entity-array storage, and otherwise referring to the same xml
terminology and rules would be beneficial.

Gregor

On 16 May 2011 20:51, Steve Baskauf <steve.baskauf at vanderbilt.edu> wrote:
> Thank you, David.  Yes!  This (below) is exactly what I had in mind.  In
> many cases involving GUIDs, at least part of the GUID string originating
> from a single institution will be the same for all records in a particular
> field.  The provider isn't going to need to store or transmit those constant
> characters.  Being able to supply that constant part of the GUID to be
> concatenated by the receiver could reduce the size of the transmitted file
> significantly.
>
> Steve
>
> David Remsen (GBIF) wrote:
>
> I realise now that the example I gave won't work for this.   As I read it
> now you would like to use local integer identifiers in your database but
> expand them in the output file using a "template" that would conform to the
> template I used in my globals example.
> In this case,  we don't want to refer to a different element we want the
> current element to substitute the local identifier it contains with the more
> inflated template.  In other words if your data file says that taxonID=100
> has a parent taxon with an ID = 99 you want to conflate the integer with the
> more complete GUID following the template.   This currently isn't something
> we have discussed supporting but I think we could by allowing for value
> substitution via a template placed in the default value.   We could for
> example, support it using the example below.  Note in this case the
> substitute variable IS the value itself.
> <id index="0"/      default="urn:lsid:ubio.org:namebank:{0} ">  # we dont
> need to assign a term here.  It is implied. See next comment below.
> <field  index="1"  default="urn:lsid:ubio.org:namebank:{1} "
> term="http://rs.tdwg.org/dwc/terms/parentNameUsageID"/>
>
> ...
>
> I think you have made the case and I think we could accommodate it by simple
> interpreting the default in the way I specified.  Otherwise I could imagine
> we would have to add a "template" attribute to the field element.  However,
> I don't think this is needed.   I guess I'd like feedback from Tim, John W
> or Markus on this.
>
>
>
> Steve
>
> David Remsen (GBIF) wrote:
>
> Hi Steve,
> There is a way to do what you ask but not exactly the way you specified.
> The way to do is via a template that refers to a particular column.
> So if you put the ubio integer ID into dwc:scientificName you could could
> put the following into, for example, dc:source
> http://www.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:namebank:<scientificName>
> as the default and set the column to a global.
> You can see more in this document on the XML Descriptor file.
> http://links.gbif.org/gbif_dwc-a_metafile_en_v1/
> The vocabularies option was, as far as I know, intended to provide a URI for
> a vocabulary so that we might be able to validate values against the
> vocabulary items.
> Best,
> David
> ----------------------------------------------------------------------------
> David Remsen, Senior Programme Officer
> Electronic Catalog of Names of Known Organisms
> Global Biodiversity Information Facility Secretariat
> Universitetsparken 15, DK-2100 Copenhagen, Denmark
> Tel: +45-35321472   Fax: +45-35321480
> Mobile +45 28751472
> Skype: dremsen
> ----------------------------------------------------------------------------
>
>
>
> On 29 Apr 2011, at 22:21, Steve Baskauf wrote:
>
> I am playing around with Darwin Core Arcives, in particular the DwC-A
> Assistant (http://tools.gbif.org/dwca-assistant/).  One thing that I am
> not exactly clear about is how to use the "Vocabulary" column in the
> assistant.  The description that comes up when you mouse over the column
> heading says that it should ideally be a URI that identifies the
> vocabulary and resolves to some machine readable form like RDF.  So what
> I'm wondering is whether I can put what effectively amounts to as a
> namespace in that spot.
>
> For example, a URI for the name "Acer rubrum L." that actually resolves
> to RDF is:
> http://www.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:namebank:456216
> I think that would qualify as a valid HTTP URI guid because it's the
> proxied form of an LSID.  So I would like to use it as a value for the
> dwc:scientificNameID column in a DwC-A taxon record.  However, the only
> part of the identifier that makes the string unique within uBio's domain
> is the last number - if I'm always using a uBio guid, the first
> approximately 75 characters will be the same for all of the guids.  So
> can I just put
> "http://www.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:namebank:"
> in the Vocabulary column and then just put the locally unique numbers
> (e.g. "456216") in the column for dwc:scientificNameID?  Should an
> application using a DwC-A file be smart enough to append the
> "vocabulary" string on the front of the actual value in the text file?
> Or is that not how the "Vocabulary" column is intended to be used?
>
> Steve
>
> --
> Steven J. Baskauf, Ph.D., Senior Lecturer
> Vanderbilt University Dept. of Biological Sciences
>
> postal mail address:
> VU Station B 351634
> Nashville, TN  37235-1634,  U.S.A.
>
> delivery address:
> 2125 Stevenson Center
> 1161 21st Ave., S.
> Nashville, TN 37235
>
> office: 2128 Stevenson Center
> phone: (615) 343-4582,  fax: (615) 343-6707
> http://bioimages.vanderbilt.edu
>
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>
>
>
> --
> Steven J. Baskauf, Ph.D., Senior Lecturer
> Vanderbilt University Dept. of Biological Sciences
>
> postal mail address:
> VU Station B 351634
> Nashville, TN  37235-1634,  U.S.A.
>
> delivery address:
> 2125 Stevenson Center
> 1161 21st Ave., S.
> Nashville, TN 37235
>
> office: 2128 Stevenson Center
> phone: (615) 343-4582,  fax: (615) 343-6707
> http://bioimages.vanderbilt.edu
>
>
> --
> Steven J. Baskauf, Ph.D., Senior Lecturer
> Vanderbilt University Dept. of Biological Sciences
>
> postal mail address:
> VU Station B 351634
> Nashville, TN  37235-1634,  U.S.A.
>
> delivery address:
> 2125 Stevenson Center
> 1161 21st Ave., S.
> Nashville, TN 37235
>
> office: 2128 Stevenson Center
> phone: (615) 343-4582,  fax: (615) 343-6707
> http://bioimages.vanderbilt.edu
>
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>
>



-- 


---------------------------------
Dr. G. Hagedorn
Heinrich-Seidel-Str. 2, 12167 Berlin, Germany
Tel. +49-(0)30-831 5785
http://www.linkedin.com/in/gregorhagedorn

This communication (including all attachments) is sent on a personal
basis. It is intended only for the person(s) to whom it is addressed.
Redistributing or publishing it without permission is a violation of
privacy rights and copyright.


More information about the tdwg-content mailing list