Question about "vocabulary" in Darwin Core Archives assistant
I am playing around with Darwin Core Arcives, in particular the DwC-A Assistant (http://tools.gbif.org/dwca-assistant/). One thing that I am not exactly clear about is how to use the "Vocabulary" column in the assistant. The description that comes up when you mouse over the column heading says that it should ideally be a URI that identifies the vocabulary and resolves to some machine readable form like RDF. So what I'm wondering is whether I can put what effectively amounts to as a namespace in that spot.
For example, a URI for the name "Acer rubrum L." that actually resolves to RDF is: http://www.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:namebank:4... I think that would qualify as a valid HTTP URI guid because it's the proxied form of an LSID. So I would like to use it as a value for the dwc:scientificNameID column in a DwC-A taxon record. However, the only part of the identifier that makes the string unique within uBio's domain is the last number - if I'm always using a uBio guid, the first approximately 75 characters will be the same for all of the guids. So can I just put "http://www.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:namebank:" in the Vocabulary column and then just put the locally unique numbers (e.g. "456216") in the column for dwc:scientificNameID? Should an application using a DwC-A file be smart enough to append the "vocabulary" string on the front of the actual value in the text file? Or is that not how the "Vocabulary" column is intended to be used?
Steve
Hi Steve,
There is a way to do what you ask but not exactly the way you specified.
The way to do is via a template that refers to a particular column. So if you put the ubio integer ID into dwc:scientificName you could could put the following into, for example, dc:source
http://www.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:namebank:<scientificName>
as the default and set the column to a global.
You can see more in this document on the XML Descriptor file.
http://links.gbif.org/gbif_dwc-a_metafile_en_v1/
The vocabularies option was, as far as I know, intended to provide a URI for a vocabulary so that we might be able to validate values against the vocabulary items.
Best, David
---------------------------------------------------------------------------- David Remsen, Senior Programme Officer Electronic Catalog of Names of Known Organisms Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321472 Fax: +45-35321480 Mobile +45 28751472 Skype: dremsen ----------------------------------------------------------------------------
On 29 Apr 2011, at 22:21, Steve Baskauf wrote:
I am playing around with Darwin Core Arcives, in particular the DwC-A Assistant (http://tools.gbif.org/dwca-assistant/). One thing that I am not exactly clear about is how to use the "Vocabulary" column in the assistant. The description that comes up when you mouse over the column heading says that it should ideally be a URI that identifies the vocabulary and resolves to some machine readable form like RDF. So what I'm wondering is whether I can put what effectively amounts to as a namespace in that spot.
For example, a URI for the name "Acer rubrum L." that actually resolves to RDF is: http://www.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:namebank:4... I think that would qualify as a valid HTTP URI guid because it's the proxied form of an LSID. So I would like to use it as a value for the dwc:scientificNameID column in a DwC-A taxon record. However, the only part of the identifier that makes the string unique within uBio's domain is the last number - if I'm always using a uBio guid, the first approximately 75 characters will be the same for all of the guids. So can I just put "http://www.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:namebank:" in the Vocabulary column and then just put the locally unique numbers (e.g. "456216") in the column for dwc:scientificNameID? Should an application using a DwC-A file be smart enough to append the "vocabulary" string on the front of the actual value in the text file? Or is that not how the "Vocabulary" column is intended to be used?
Steve
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
David, I have had a bit of time to read the document that you linked below and to play around with what you suggested below. I had a couple clarifying questions. But before I ask them, I'd like to give just a bit of feedback on the GBIF Darwin Core page (http://www.gbif.org/informatics/standards-and-tools/publishing-data/data-sta...). The font and color of the hyperlinks on that page are so similar to the default text font that I had difficulty knowing they were there. I guess they are blue and the base font is black, but to my (admittedly aging) eyes, they were barely distinguishable. I had to mouse over the text and watch the cursor to find the links. This was the same on all five of the major web browsers. Also, you might put a link on that page to the XML Descriptor file document you linked in the message below. I don't think I would have found it if I hadn't emailed you.
OK, so here's my first question. I understood the explanation of how to make a field be generated by using a variable in a static mapping. So when the actual text datafile is created, do you just leave the statically mapped column empty? E.g. if I have something like:
<field index="11" term="http://purl.org/dc/terms/source"/> <field index="12" default="http://www.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:namebank:{11}" term="http://rs.tdwg.org/dwc/terms/scientificNameID"/>
would I just populate column 11 with uBio's locally unique ID number and leave column 12 blank (i.e. have two consecutive delimiters with no characters between)?
The second question is a little trickier. The main purpose that I would want to use this for is to enable the use of GUIDs. Using GUIDs rather than local IDs seems to be considered a better practice (or at least an allowed practice) in both the DwC term descriptions (e.g. http://rs.tdwg.org/dwc/terms/index.htm#taxonID) and in the GBIF Darwin Core Archives web page I complained about above. However, it is difficult for me to see how one would accomplish this with the Darwin Core Archives format, at least if one wishes to cut down on file size (and transmission time) using the "variable in a static mapping" shortcut. The problem is what to use for the field names for the locally unique identifiers. In your example below, you suggested dc:source. That's actually not an option listed in the Darwin Core Archives Assistant. Is any allowable Dublin Core term valid as a DWC-A field if one just jury-rigs the XML file after generating it with the Assistant? In the metafile documentation dcterms:identifier is also used and not listed in DWC-A Assistant.
This becomes tricker still if one wants to use GUIDs for several fields in the datafile. For example, I want to have dwc:taxonID, dwc:scientificNameID, and dwc:nameAccordingToID be fields in my taxon file. I can't use dcterms:source for all three of the locally unique identifiers that I'd like to use as variables in the static mappings for the xxxID terms. Also, I'm not sure about this, but from the examples, it looks like the ID field in the core Taxon file is assumed to be dwc:taxonID. At least taxonID doesn't show up on the DWC-A assistant list and the ID field is labeled as "taxonID" in the examples. But I'm going to run into problems if I actually want dwc:taxonID to be a GUID rather than my locally unique identifier. I can't have the base ID and another column containing a static mapping having the base ID as a variable both be identified as dwc:taxonID.
It seems like in the interest of facilitating GUIDs, it would be beneficial to allow the DWC-A to include fields that are identified simply as local variables to be used in static mappings without requiring that each field to map to DwC or Dublin Core. I haven't actually picked my way through the XML schema or tried doing this and then validating the XML file to see if I could get away with it, however.
Steve
David Remsen (GBIF) wrote:
Hi Steve,
There is a way to do what you ask but not exactly the way you specified.
The way to do is via a template that refers to a particular column. So if you put the ubio integer ID into dwc:scientificName you could could put the following into, for example, dc:source
http://www.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:namebank:<scientificName>
as the default and set the column to a global.
You can see more in this document on the XML Descriptor file.
http://links.gbif.org/gbif_dwc-a_metafile_en_v1/
The vocabularies option was, as far as I know, intended to provide a URI for a vocabulary so that we might be able to validate values against the vocabulary items.
Best, David
David Remsen, Senior Programme Officer Electronic Catalog of Names of Known Organisms Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321472 Fax: +45-35321480 Mobile +45 28751472 Skype: dremsen
On 29 Apr 2011, at 22:21, Steve Baskauf wrote:
I am playing around with Darwin Core Arcives, in particular the DwC-A Assistant (http://tools.gbif.org/dwca-assistant/). One thing that I am not exactly clear about is how to use the "Vocabulary" column in the assistant. The description that comes up when you mouse over the column heading says that it should ideally be a URI that identifies the vocabulary and resolves to some machine readable form like RDF. So what I'm wondering is whether I can put what effectively amounts to as a namespace in that spot.
For example, a URI for the name "Acer rubrum L." that actually resolves to RDF is: http://www.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:namebank:4... I think that would qualify as a valid HTTP URI guid because it's the proxied form of an LSID. So I would like to use it as a value for the dwc:scientificNameID column in a DwC-A taxon record. However, the only part of the identifier that makes the string unique within uBio's domain is the last number - if I'm always using a uBio guid, the first approximately 75 characters will be the same for all of the guids. So can I just put "http://www.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:namebank:"
in the Vocabulary column and then just put the locally unique numbers (e.g. "456216") in the column for dwc:scientificNameID? Should an application using a DwC-A file be smart enough to append the "vocabulary" string on the front of the actual value in the text file? Or is that not how the "Vocabulary" column is intended to be used?
Steve
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
On 08/05/2011, at 6:12 AM, Steve wrote:
The second question is a little trickier. The main purpose that I would want to use this for is to enable the use of GUIDs. Using GUIDs rather than local IDs seems to be considered a better practice (or at least an allowed practice) in both the DwC term descriptions (e.g.
Not sure if this is relevant at all, but rfc 4122 (http://www.iana.org/go/rfc4122) defines a urn namespace for uuids (=guids).
If the problem is that guids "look like" local identifiers because they are not URIs, a correct way to convert a GUID into a URI is by prefixing it with "urn:uuid:" . Note that there is no resolution service or anything like that for these urns - it's just a semweb-compatible namespace.
(I think that it would be reasonable, for any LSID that just uses a guid as the objectid (and does not have a version component), to declare it to be owl:sameas the uuid urn. heck - same thing applies to linkedData http: URIs if a GUID alone potentially identifies the resource. After all - the whole point is that these things should not collide. Admittedly, it's a bit of a "solution looking for a problem". I'm not sure what the utility of doing this would be.)
If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email.
Steve - Sorry for the delay. Comments inline
On 7 May 2011, at 22:12, Steve Baskauf wrote:
David, I have had a bit of time to read the document that you linked below and to play around with what you suggested below. I had a couple clarifying questions. But before I ask them, I'd like to give just a bit of feedback on the GBIF Darwin Core page (http://www.gbif.org/informatics/standards-and-tools/publishing-data/data-sta...). The font and color of the hyperlinks on that page are so similar to the default text font that I had difficulty knowing they were there. I guess they are blue and the base font is black, but to my (admittedly aging) eyes, they were barely distinguishable. I had to mouse over the text and watch the cursor to find the links. This was the same on all five of the major web browsers. Also, you might put a link on that page to the XML Descriptor file document you linked in the message below. I don't think I would have found it if I hadn't emailed you.
OK, so here's my first question. I understood the explanation of how to make a field be generated by using a variable in a static mapping. So when the actual text datafile is created, do you just leave the statically mapped column empty? E.g. if I have something like:
<field index="11" term="http://purl.org/dc/terms/source"/> <field index="12" default="http://www.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:namebank:{11}" term="http://rs.tdwg.org/dwc/terms/scientificNameID"/>
would I just populate column 11 with uBio's locally unique ID number and leave column 12 blank (i.e. have two consecutive delimiters with no characters between)?
The example I gave is slightly different. First, if you had just the uBio integer identifier in the scientificNameID field
<field index="11" term="http://rs.tdwg.org/dwc/terms/scientificNameID"/>
You could assign it to the dc:source term as a global value. Note that by making this a global value it doesn't get a index number as it doesn't map to a column in your data file. It is as if you are creating a new column to store this new concatenated value.
<field index="11" term="http://rs.tdwg.org/dwc/terms/scientificNameID"/> <field term="http://purl.org/dc/terms/source" default="http://www.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:namebank:{11}" />
I realise now that this won't work for you in what you asked. It works once but it doesn't work for all the
The second question is a little trickier. The main purpose that I would want to use this for is to enable the use of GUIDs. Using GUIDs rather than local IDs seems to be considered a better practice (or at least an allowed practice) in both the DwC term descriptions (e.g. http://rs.tdwg.org/dwc/terms/index.htm#taxonID) and in the GBIF Darwin Core Archives web page I complained about above. However, it is difficult for me to see how one would accomplish this with the Darwin Core Archives format, at least if one wishes to cut down on file size (and transmission time) using the "variable in a static mapping" shortcut. The problem is what to use for the field names for the locally unique identifiers. In your example below, you suggested dc:source. That's actually not an option listed in the Darwin Core Archives Assistant.
dc:source is the "source" element in both the Taxon and Occurrence Core definitions in the DWC-A Asst.
Is any allowable Dublin Core term valid as a DWC-A field if one just jury-rigs the XML file after generating it with the Assistant? In the metafile documentation dcterms:identifier is also used and not listed in DWC-A Assistant.
dc:identifier is not used in the core definitions but is used in the Alternative Identifiers extension.
This becomes tricker still if one wants to use GUIDs for several fields in the datafile. For example, I want to have dwc:taxonID, dwc:scientificNameID, and dwc:nameAccordingToID be fields in my taxon file. I can't use dcterms:source for all three of the locally unique identifiers that I'd like to use as variables in the static mappings for the xxxID terms.
I realise now that the example I gave won't work for this. As I read it now you would like to use local integer identifiers in your database but expand them in the output file using a "template" that would conform to the template I used in my globals example. In this case, we don't want to refer to a different element we want the current element to substitute the local identifier it contains with the more inflated template. In other words if your data file says that taxonID=100 has a parent taxon with an ID = 99 you want to conflate the integer with the more complete GUID following the template. This currently isn't something we have discussed supporting but I think we could by allowing for value substitution via a template placed in the default value. We could for example, support it using the example below. Note in this case the substitute variable IS the value itself.
<id index="0"/ default="urn:lsid:ubio.org:namebank:{0} "> # we dont need to assign a term here. It is implied. See next comment below. <field index="1" default="urn:lsid:ubio.org:namebank:{1} " term="http://rs.tdwg.org/dwc/terms/parentNameUsageID"/>
Also, I'm not sure about this, but from the examples, it looks like the ID field in the core Taxon file is assumed to be dwc:taxonID.
That is correct. It is implied that the core ID is dwc:taxonID for Taxon and dwc:occurrenceID for Occurrence.
At least taxonID doesn't show up on the DWC-A assistant list and the ID field is labeled as "taxonID" in the examples. But I'm going to run into problems if I actually want dwc:taxonID to be a GUID rather than my locally unique identifier. I can't have the base ID and another column containing a static mapping having the base ID as a variable both be identified as dwc:taxonID.
It seems like in the interest of facilitating GUIDs, it would be beneficial to allow the DWC-A to include fields that are identified simply as local variables to be used in static mappings without requiring that each field to map to DwC or Dublin Core. I haven't actually picked my way through the XML schema or tried doing this and then validating the XML file to see if I could get away with it, however.
I think you have made the case and I think we could accommodate it by simple interpreting the default in the way I specified. Otherwise I could imagine we would have to add a "template" attribute to the field element. However, I don't think this is needed. I guess I'd like feedback from Tim, John W or Markus on this.
Steve
David Remsen (GBIF) wrote:
Hi Steve,
There is a way to do what you ask but not exactly the way you specified.
The way to do is via a template that refers to a particular column. So if you put the ubio integer ID into dwc:scientificName you could could put the following into, for example, dc:source
http://www.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:namebank:<scientificName>
as the default and set the column to a global.
You can see more in this document on the XML Descriptor file.
http://links.gbif.org/gbif_dwc-a_metafile_en_v1/
The vocabularies option was, as far as I know, intended to provide a URI for a vocabulary so that we might be able to validate values against the vocabulary items.
Best, David
David Remsen, Senior Programme Officer Electronic Catalog of Names of Known Organisms Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321472 Fax: +45-35321480 Mobile +45 28751472 Skype: dremsen
On 29 Apr 2011, at 22:21, Steve Baskauf wrote:
I am playing around with Darwin Core Arcives, in particular the DwC-A Assistant (http://tools.gbif.org/dwca-assistant/). One thing that I am not exactly clear about is how to use the "Vocabulary" column in the assistant. The description that comes up when you mouse over the column heading says that it should ideally be a URI that identifies the vocabulary and resolves to some machine readable form like RDF. So what I'm wondering is whether I can put what effectively amounts to as a namespace in that spot.
For example, a URI for the name "Acer rubrum L." that actually resolves to RDF is: http://www.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:namebank:4... I think that would qualify as a valid HTTP URI guid because it's the proxied form of an LSID. So I would like to use it as a value for the dwc:scientificNameID column in a DwC-A taxon record. However, the only part of the identifier that makes the string unique within uBio's domain is the last number - if I'm always using a uBio guid, the first approximately 75 characters will be the same for all of the guids. So can I just put "http://www.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:namebank:" in the Vocabulary column and then just put the locally unique numbers (e.g. "456216") in the column for dwc:scientificNameID? Should an application using a DwC-A file be smart enough to append the "vocabulary" string on the front of the actual value in the text file? Or is that not how the "Vocabulary" column is intended to be used?
Steve
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
Thank you, David. Yes! This (below) is exactly what I had in mind. In many cases involving GUIDs, at least part of the GUID string originating from a single institution will be the same for all records in a particular field. The provider isn't going to need to store or transmit those constant characters. Being able to supply that constant part of the GUID to be concatenated by the receiver could reduce the size of the transmitted file significantly.
Steve
David Remsen (GBIF) wrote:
I realise now that the example I gave won't work for this. As I read it now you would like to use local integer identifiers in your database but expand them in the output file using a "template" that would conform to the template I used in my globals example. In this case, we don't want to refer to a different element we want the current element to substitute the local identifier it contains with the more inflated template. In other words if your data file says that taxonID=100 has a parent taxon with an ID = 99 you want to conflate the integer with the more complete GUID following the template. This currently isn't something we have discussed supporting but I think we could by allowing for value substitution via a template placed in the default value. We could for example, support it using the example below. Note in this case the substitute variable IS the value itself.
<id index="0"/ default="urn:lsid:ubio.org:namebank:{0} "> # we dont need to assign a term here. It is implied. See next comment below. <field index="1" default="urn:lsid:ubio.org:namebank:{1} " term="http://rs.tdwg.org/dwc/terms/parentNameUsageID"/>
...
I think you have made the case and I think we could accommodate it by simple interpreting the default in the way I specified. Otherwise I could imagine we would have to add a "template" attribute to the field element. However, I don't think this is needed. I guess I'd like feedback from Tim, John W or Markus on this.
Steve
David Remsen (GBIF) wrote:
Hi Steve,
There is a way to do what you ask but not exactly the way you specified.
The way to do is via a template that refers to a particular column. So if you put the ubio integer ID into dwc:scientificName you could could put the following into, for example, dc:source
http://www.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:namebank:<scientificName>
as the default and set the column to a global.
You can see more in this document on the XML Descriptor file.
http://links.gbif.org/gbif_dwc-a_metafile_en_v1/
The vocabularies option was, as far as I know, intended to provide a URI for a vocabulary so that we might be able to validate values against the vocabulary items.
Best, David
David Remsen, Senior Programme Officer Electronic Catalog of Names of Known Organisms Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321472 Fax: +45-35321480 Mobile +45 28751472 Skype: dremsen
On 29 Apr 2011, at 22:21, Steve Baskauf wrote:
I am playing around with Darwin Core Arcives, in particular the DwC-A Assistant (http://tools.gbif.org/dwca-assistant/). One thing that I am not exactly clear about is how to use the "Vocabulary" column in the assistant. The description that comes up when you mouse over the column heading says that it should ideally be a URI that identifies the vocabulary and resolves to some machine readable form like RDF. So what I'm wondering is whether I can put what effectively amounts to as a namespace in that spot.
For example, a URI for the name "Acer rubrum L." that actually resolves to RDF is: http://www.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:namebank:4... I think that would qualify as a valid HTTP URI guid because it's the proxied form of an LSID. So I would like to use it as a value for the dwc:scientificNameID column in a DwC-A taxon record. However, the only part of the identifier that makes the string unique within uBio's domain is the last number - if I'm always using a uBio guid, the first approximately 75 characters will be the same for all of the guids. So can I just put "http://www.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:namebank:"
in the Vocabulary column and then just put the locally unique numbers (e.g. "456216") in the column for dwc:scientificNameID? Should an application using a DwC-A file be smart enough to append the "vocabulary" string on the front of the actual value in the text file? Or is that not how the "Vocabulary" column is intended to be used?
Steve
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
I believe the equivalnt in xml of what you are looking for would be entities. Best known are the character entities like " but xml allows to define arbitrary ones in any xml file. Their use is common in RDF, where part of the URI-prefixes can be namespaces, whereas others (the ones in values) would not be expanded.
I realize that DwCArchive is not bound to xml, but I think providing a entity-array storage, and otherwise referring to the same xml terminology and rules would be beneficial.
Gregor
On 16 May 2011 20:51, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
Thank you, David. Yes! This (below) is exactly what I had in mind. In many cases involving GUIDs, at least part of the GUID string originating from a single institution will be the same for all records in a particular field. The provider isn't going to need to store or transmit those constant characters. Being able to supply that constant part of the GUID to be concatenated by the receiver could reduce the size of the transmitted file significantly.
Steve
David Remsen (GBIF) wrote:
I realise now that the example I gave won't work for this. As I read it now you would like to use local integer identifiers in your database but expand them in the output file using a "template" that would conform to the template I used in my globals example. In this case, we don't want to refer to a different element we want the current element to substitute the local identifier it contains with the more inflated template. In other words if your data file says that taxonID=100 has a parent taxon with an ID = 99 you want to conflate the integer with the more complete GUID following the template. This currently isn't something we have discussed supporting but I think we could by allowing for value substitution via a template placed in the default value. We could for example, support it using the example below. Note in this case the substitute variable IS the value itself. <id index="0"/ default="urn:lsid:ubio.org:namebank:{0} "> # we dont need to assign a term here. It is implied. See next comment below. <field index="1" default="urn:lsid:ubio.org:namebank:{1} " term="http://rs.tdwg.org/dwc/terms/parentNameUsageID%22/%3E
...
I think you have made the case and I think we could accommodate it by simple interpreting the default in the way I specified. Otherwise I could imagine we would have to add a "template" attribute to the field element. However, I don't think this is needed. I guess I'd like feedback from Tim, John W or Markus on this.
Steve
David Remsen (GBIF) wrote:
Hi Steve, There is a way to do what you ask but not exactly the way you specified. The way to do is via a template that refers to a particular column. So if you put the ubio integer ID into dwc:scientificName you could could put the following into, for example, dc:source http://www.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:namebank:<scientificName> as the default and set the column to a global. You can see more in this document on the XML Descriptor file. http://links.gbif.org/gbif_dwc-a_metafile_en_v1/ The vocabularies option was, as far as I know, intended to provide a URI for a vocabulary so that we might be able to validate values against the vocabulary items. Best, David
David Remsen, Senior Programme Officer Electronic Catalog of Names of Known Organisms Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321472 Fax: +45-35321480 Mobile +45 28751472 Skype: dremsen
On 29 Apr 2011, at 22:21, Steve Baskauf wrote:
I am playing around with Darwin Core Arcives, in particular the DwC-A Assistant (http://tools.gbif.org/dwca-assistant/). One thing that I am not exactly clear about is how to use the "Vocabulary" column in the assistant. The description that comes up when you mouse over the column heading says that it should ideally be a URI that identifies the vocabulary and resolves to some machine readable form like RDF. So what I'm wondering is whether I can put what effectively amounts to as a namespace in that spot.
For example, a URI for the name "Acer rubrum L." that actually resolves to RDF is: http://www.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:namebank:4... I think that would qualify as a valid HTTP URI guid because it's the proxied form of an LSID. So I would like to use it as a value for the dwc:scientificNameID column in a DwC-A taxon record. However, the only part of the identifier that makes the string unique within uBio's domain is the last number - if I'm always using a uBio guid, the first approximately 75 characters will be the same for all of the guids. So can I just put "http://www.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:namebank:" in the Vocabulary column and then just put the locally unique numbers (e.g. "456216") in the column for dwc:scientificNameID? Should an application using a DwC-A file be smart enough to append the "vocabulary" string on the front of the actual value in the text file? Or is that not how the "Vocabulary" column is intended to be used?
Steve
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
However, it is difficult for me to see how one would accomplish this with the Darwin Core Archives format, at least if one wishes to cut down on file size (and transmission time) using the "variable in a static mapping" shortcut. The problem is what to use for the field names for the locally unique identifiers.
I realise now that the example I gave won't work for this. As I read it now you would like to use local integer identifiers in your database but expand them in the output file using a "template" that would conform to the template I used in my globals example. In this case, we don't want to refer to a different element we want the current element to substitute the local identifier it contains with the more inflated template. In other words if your data file says that taxonID=100 has a parent taxon with an ID = 99 you want to conflate the integer with the more complete GUID following the template.
if you are worried about transmission time, compression by its nature squishes down redundant things such as prefixes that occur repeadedly. If every single column has
http://www.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:namebank:%...
on the front of it, it's not really an issue as far as file size goes.
I am surprised, however, that this string contains the http address of the lsid resolver. If all you are doing is passing ids around, the LSID urn:lsid:ubio.org:namebank:{11} stands as a valid URI in its own right. If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email.
On 30/04/2011, at 6:21 AM, Steve Baskauf wrote:
I'm wondering is whether I can put what effectively amounts to as a namespace in that spot.
At a guess:
Owl supports the idea of an 'Ontology' object. Usually this is done in the rdf as that object with no fragment identifier. Thus, when you fetch
You get RDF that describes the OWL "ontology" object whose URI is
As well as RDF that describes other objects
http://example.org/voc/Test#TestCase http://example.org/voc/Test#hasTest
and so on.
So ... a namespace is not necessarily the same thing as an ontology object, but it's usually done that way. If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email.
participants (4)
-
David Remsen (GBIF)
-
Gregor Hagedorn
-
Paul Murray
-
Steve Baskauf