Hi all,
Yesterday I had a meeting with Jens Fitzke from the Lat/Lon company.
This company made the reference implementation of the WMS standard of
OGC. He gave me a lot of nice information about how things work in
OGC. I am happy to have him as a partner in my project.
Well, one thing he pointed me out was a conformance test that the OGC
community created to see if the different implementations of the
standard were interoperable...
They first used a commercial software, I don't remember which one but
I suppose that it can be found, and the problem is that because of
the generic philosophy of the software they spent a lot of person/
months in make it run. Now they are developing an open source
software to do the same but not so generic.
What they have already is two services to test WMS and WFS services.
You can find some information here:
http://150.128.82.168/webACEGIS/ACEGIS_Test_Engines.htm
Maybe we can consider something similar for TAPIR, but it must be a
much more simple thing because we don't have their resources. There
are two things we have to test:
-Implementations interoperability... things like what happens when
some parameters are missing, etc... purely technical
-Content test: set up the same database and try different
implemntations to see we get what we expect.
Maybe during the week someone, me?, can work on this, I hope we can
reuse software from OGC to do that.
Cheers.
Javier.
Hi Rod,
I was just pointing out that if you include CollectionCode in your
example then you would not have the duplication of records that occurs
in the example. The combination of InstitutionCode, CollectionCode, and
CatalogNumber should provide a GUID to a specimen record. So to
slightly modify your example
DiGIR provider URL : resource : CollectionCode : specimen code
will generally be sufficient, but in some cases, a single server
resource may offer records from several intitutions, hence:
DiGIR provider URL : resource : InstitutionCode : CollectionCode :
specimen code
would be unique. It would be a simple matter to extend DiGIR slightly
to support direct resolution of such an identifier. Perhaps something like:
http://some.server/digir.php?id=resource/InstitutionCode/CollectionCode/Cat…
would be sufficient to identify a single record and retrieve its digital
representation as well.
regards,
Dave V.
Roderic Page wrote:
> My point is that it isn't always done (and the MVZ example concerns
> totally different specimens, rather than preparations of the same
> specimen). My aim is not to criticise DiGIR and Darwin Core
> specifically (although the absence of a GUID is a major weakness), but
> simply to provide a concrete example where digital records for totally
> different specimens are not clearly distinguished. In the MVZ example,
> one could retrieve the record for the desired specimen if one searched
> on the taxonomic name, but this is cumbersome -- ideally I want a GUID
> that can be resolved to the appropriate specimen independent of any
> other information. DiGIR can do this, so long as DiGIR providers using
> different resource names for different collections.
>
> Regards
>
> Rod
>
>
>
> On 20 Oct 2005, at 23:11, Dave Vieglais wrote:
>
>> Hi Roderic,
>> In general, for records retrieved from data sources exposed using the
>> Darwin Core one should be able to combine InstitutionCode,
>> CollectionCode and CatalogNumber to provide unique identifiers for
>> those
>> records. This is not always the case however, the most common example
>> of which is probably the presence of records for different preparations
>> of the same specimen.
>>
>> regards,
>> Dave V.
>>
>> Roderic Page wrote:
>>
>>> As a consumer of specimen GUIDs, I've found specimens to be
>>> frustrating
>>> to deal with as individual collections don't guarantee uniqueness of
>>> identifiers (Donald's point 2 below). For example, in the absence of
>>> specimen GUIDs (such as LSIDs) I'd hoped to use a three part
>>> identifier
>>> based on the DiGIR provider, e.g.
>>>
>>> DiGIR provider URL : resource : specimen code
>>>
>>> Hence,
>>>
>>> digir.fieldmuseum.org/digir/DiGIR.php:MammalsDwC2:FMNH158106
>>> \-----------------------------------/ \---------/ \--------/
>>> provider resource specimen
>>>
>>> identifies specimen FMNH 158106 of Tatera robusta at the Field Museum
>>> in
>>> Chicago. The idea behind this crude hack is that the identifier can be
>>> resolved (there's enough information in the identifier to retrieve the
>>> record, see for example
>>> http://darwin.zoology.gla.ac.uk/~rpage/hacks/2/index.html ).
>>>
>>> To my horror, if I do this for MVZ 148946, I get three specimens back,
>>> one each for Chaetodipus baileyi baileyi, Calidris mauri, and Rana
>>> cascadae. This is an instance where the same specimen code is being
>>> used
>>> in three different collections (mammals, birds, and herps). I guess
>>> MVZ
>>> could have avoided this by using a different name for the 'resource'
>>> field for each collection.
>>>
>>> I offer this as an example of where GUIDs are vital if we are to avoid
>>> linking to the wrong information, and also where individual providers
>>> need to ensure that the identifiers they generate are unique.
>>>
>>> Regards
>>>
>>> Rod
>>>
>>>
>>> Professor Roderic D. M. Page
>>> Editor, Systematic Biology
>>> DEEB, IBLS
>>> Graham Kerr Building
>>> University of Glasgow
>>> Glasgow G12 8QP
>>> United Kingdom
>>>
>>> Phone: +44 141 330 4778
>>> Fax: +44 141 330 2792
>>> email: r.page(a)bio.gla.ac.uk
>>> web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
>>> reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
>>>
>>> Subscribe to Systematic Biology through the Society of Systematic
>>> Biologists Website: http://systematicbiology.org
>>> Search for taxon names at
>>> http://darwin.zoology.gla.ac.uk/~rpage/portal/
>>>
>>>
>>>
>>>
>>
>>
> Professor Roderic D. M. Page
> Editor, Systematic Biology
> DEEB, IBLS
> Graham Kerr Building
> University of Glasgow
> Glasgow G12 8QP
> United Kingdom
>
> Phone: +44 141 330 4778
> Fax: +44 141 330 2792
> email: r.page(a)bio.gla.ac.uk
> web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
> reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
>
> Subscribe to Systematic Biology through the Society of Systematic
> Biologists Website: http://systematicbiology.org
> Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/
>
Hi Roderic,
In general, for records retrieved from data sources exposed using the
Darwin Core one should be able to combine InstitutionCode,
CollectionCode and CatalogNumber to provide unique identifiers for those
records. This is not always the case however, the most common example
of which is probably the presence of records for different preparations
of the same specimen.
regards,
Dave V.
Roderic Page wrote:
> As a consumer of specimen GUIDs, I've found specimens to be frustrating
> to deal with as individual collections don't guarantee uniqueness of
> identifiers (Donald's point 2 below). For example, in the absence of
> specimen GUIDs (such as LSIDs) I'd hoped to use a three part identifier
> based on the DiGIR provider, e.g.
>
> DiGIR provider URL : resource : specimen code
>
> Hence,
>
> digir.fieldmuseum.org/digir/DiGIR.php:MammalsDwC2:FMNH158106
> \-----------------------------------/ \---------/ \--------/
> provider resource specimen
>
> identifies specimen FMNH 158106 of Tatera robusta at the Field Museum in
> Chicago. The idea behind this crude hack is that the identifier can be
> resolved (there's enough information in the identifier to retrieve the
> record, see for example
> http://darwin.zoology.gla.ac.uk/~rpage/hacks/2/index.html ).
>
> To my horror, if I do this for MVZ 148946, I get three specimens back,
> one each for Chaetodipus baileyi baileyi, Calidris mauri, and Rana
> cascadae. This is an instance where the same specimen code is being used
> in three different collections (mammals, birds, and herps). I guess MVZ
> could have avoided this by using a different name for the 'resource'
> field for each collection.
>
> I offer this as an example of where GUIDs are vital if we are to avoid
> linking to the wrong information, and also where individual providers
> need to ensure that the identifiers they generate are unique.
>
> Regards
>
> Rod
>
>
> Professor Roderic D. M. Page
> Editor, Systematic Biology
> DEEB, IBLS
> Graham Kerr Building
> University of Glasgow
> Glasgow G12 8QP
> United Kingdom
>
> Phone: +44 141 330 4778
> Fax: +44 141 330 2792
> email: r.page(a)bio.gla.ac.uk
> web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
> reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
>
> Subscribe to Systematic Biology through the Society of Systematic
> Biologists Website: http://systematicbiology.org
> Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/
>
>
>
>
Hi Everyone,
I am nervous at being the first to post to this most esteemed list but
here goes.
I have just added two pages to the wiki concerning minor changes that
could be made to the protocol to make it easier to implement 'Lite'
versions of Tapir providers.
http://ww3.bgbm.org/protocolwiki/TapirLitehttp://ww3.bgbm.org/protocolwiki/SimpleFiltering
Please read and add your support or reservations to the wiki or discuss
it here.
All the best,
Roger
--
-------------------------------------
Roger Hyam
Technical Architect
Taxonomic Databases Working Group
-------------------------------------
http://www.tdwg.org
roger(a)tdwg.org
+44 1578 722782
-------------------------------------
My point is that it isn't always done (and the MVZ example concerns
totally different specimens, rather than preparations of the same
specimen). My aim is not to criticise DiGIR and Darwin Core
specifically (although the absence of a GUID is a major weakness), but
simply to provide a concrete example where digital records for totally
different specimens are not clearly distinguished. In the MVZ example,
one could retrieve the record for the desired specimen if one searched
on the taxonomic name, but this is cumbersome -- ideally I want a GUID
that can be resolved to the appropriate specimen independent of any
other information. DiGIR can do this, so long as DiGIR providers using
different resource names for different collections.
Regards
Rod
On 20 Oct 2005, at 23:11, Dave Vieglais wrote:
> Hi Roderic,
> In general, for records retrieved from data sources exposed using the
> Darwin Core one should be able to combine InstitutionCode,
> CollectionCode and CatalogNumber to provide unique identifiers for
> those
> records. This is not always the case however, the most common example
> of which is probably the presence of records for different preparations
> of the same specimen.
>
> regards,
> Dave V.
>
> Roderic Page wrote:
>> As a consumer of specimen GUIDs, I've found specimens to be
>> frustrating
>> to deal with as individual collections don't guarantee uniqueness of
>> identifiers (Donald's point 2 below). For example, in the absence of
>> specimen GUIDs (such as LSIDs) I'd hoped to use a three part
>> identifier
>> based on the DiGIR provider, e.g.
>>
>> DiGIR provider URL : resource : specimen code
>>
>> Hence,
>>
>> digir.fieldmuseum.org/digir/DiGIR.php:MammalsDwC2:FMNH158106
>> \-----------------------------------/ \---------/ \--------/
>> provider resource specimen
>>
>> identifies specimen FMNH 158106 of Tatera robusta at the Field Museum
>> in
>> Chicago. The idea behind this crude hack is that the identifier can be
>> resolved (there's enough information in the identifier to retrieve the
>> record, see for example
>> http://darwin.zoology.gla.ac.uk/~rpage/hacks/2/index.html ).
>>
>> To my horror, if I do this for MVZ 148946, I get three specimens back,
>> one each for Chaetodipus baileyi baileyi, Calidris mauri, and Rana
>> cascadae. This is an instance where the same specimen code is being
>> used
>> in three different collections (mammals, birds, and herps). I guess
>> MVZ
>> could have avoided this by using a different name for the 'resource'
>> field for each collection.
>>
>> I offer this as an example of where GUIDs are vital if we are to avoid
>> linking to the wrong information, and also where individual providers
>> need to ensure that the identifiers they generate are unique.
>>
>> Regards
>>
>> Rod
>>
>>
>> Professor Roderic D. M. Page
>> Editor, Systematic Biology
>> DEEB, IBLS
>> Graham Kerr Building
>> University of Glasgow
>> Glasgow G12 8QP
>> United Kingdom
>>
>> Phone: +44 141 330 4778
>> Fax: +44 141 330 2792
>> email: r.page(a)bio.gla.ac.uk
>> web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
>> reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
>>
>> Subscribe to Systematic Biology through the Society of Systematic
>> Biologists Website: http://systematicbiology.org
>> Search for taxon names at
>> http://darwin.zoology.gla.ac.uk/~rpage/portal/
>>
>>
>>
>>
>
>
Professor Roderic D. M. Page
Editor, Systematic Biology
DEEB, IBLS
Graham Kerr Building
University of Glasgow
Glasgow G12 8QP
United Kingdom
Phone: +44 141 330 4778
Fax: +44 141 330 2792
email: r.page(a)bio.gla.ac.uk
web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic
Biologists Website: http://systematicbiology.org
Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/
As a consumer of specimen GUIDs, I've found specimens to be frustrating
to deal with as individual collections don't guarantee uniqueness of
identifiers (Donald's point 2 below). For example, in the absence of
specimen GUIDs (such as LSIDs) I'd hoped to use a three part identifier
based on the DiGIR provider, e.g.
DiGIR provider URL : resource : specimen code
Hence,
digir.fieldmuseum.org/digir/DiGIR.php:MammalsDwC2:FMNH158106
\-----------------------------------/ \---------/ \--------/
provider resource specimen
identifies specimen FMNH 158106 of Tatera robusta at the Field Museum
in Chicago. The idea behind this crude hack is that the identifier can
be resolved (there's enough information in the identifier to retrieve
the record, see for example
http://darwin.zoology.gla.ac.uk/~rpage/hacks/2/index.html ).
To my horror, if I do this for MVZ 148946, I get three specimens back,
one each for Chaetodipus baileyi baileyi, Calidris mauri, and Rana
cascadae. This is an instance where the same specimen code is being
used in three different collections (mammals, birds, and herps). I
guess MVZ could have avoided this by using a different name for the
'resource' field for each collection.
I offer this as an example of where GUIDs are vital if we are to avoid
linking to the wrong information, and also where individual providers
need to ensure that the identifiers they generate are unique.
Regards
Rod
Professor Roderic D. M. Page
Editor, Systematic Biology
DEEB, IBLS
Graham Kerr Building
University of Glasgow
Glasgow G12 8QP
United Kingdom
Phone: +44 141 330 4778
Fax: +44 141 330 2792
email: r.page(a)bio.gla.ac.uk
web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic
Biologists Website: http://systematicbiology.org
Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/
[ Another topic for comments. Please keep the Topic number in responses ]
Topic 2: GUIDs for Collections and Specimens
Identifiers to assist with management of collection data have been at the
centre of TDWG's GUID investigation from the beginning. A primary
motivation for this work has been the need to recognise where two data
providers are offering access to information on the same specimen. Two very
basic scenarios for specimen identifiers are described on the wiki at
http://wiki.gbif.org/guidwiki/wikka.php?wakka=GUIDUseCases. However we do
need to make sure we understand the actual scenarios requiring such
identifiers across the range of biological collections. I am therefore
looking for descriptions of the situations in which your current processes,
systems and applications already use identifiers for specimens (and where
perhaps genuinely globally unique identifiers may help), and of any policies
and processes around collection management which might affect how we are
able to assign, manage or resolve identifiers.
When I speak of 'specimens' I am primarily thinking of organisms (living or
dead, including subsamples) held in collections (including zoos, aquaria,
culture collections and seed banks), but I am also very interested in
parallel situations involving the assignment of identifiers to observation
events in the field.
Some more specific questions to try to shape discussion:
1. What identifiers (how many per specimen) get assigned to specimens
in your organisation or domain (field numbers, catalogue numbers, etc.)?
2. What is the scope of uniqueness for each of these identifiers
(notebook page, collector, database, institution, global, etc.)?
3. Can you explain the life cycle of each of these identifiers (who
assigns them, how they are subsequently tracked)?
4. Can you give examples of how these identifiers are used to retrieve
the specimen and/or information on the specimen?
5. Would there be any social or technical roadblocks to replacing these
identifiers with a single identifier that was guaranteed to be unique?
6. In the case of subsamples from a specimen, can you identify issues
around associating the sample and associated information with the source
specimen and associated information?
The subject of specimen identifiers is somewhat linked to that of collection
identifiers, since Darwin Core and the ABCD Schema have used institution and
collection codes together with catalogue numbers to identify specimens in
the absence of GUIDs. It would also be useful here to collect information
on the following:
7. How are your specimens organised into larger identifiable sets
(collections, named collections, databases, institutions, etc.)?
8. What identifiers get assigned to each of these sets in your
organization or domain (institution codes, collection codes, Index Herbarium
acronyms, etc.)?
9. Can you explain the life cycle of each of these identifiers (who
assigns them, how they are subsequently tracked)?
10. Can you give examples of how these identifiers are used to locate
the set and/or information on the set?
11. Would there be any social or technical roadblocks to replacing these
identifiers with a single identifier that was guaranteed to be unique?
To help you a little, my aim is to use this information to develop
additional scenarios as use cases which will complement those already on the
wiki (and yes, I do realise that the existing "use case" pages are not
formal use cases!). If you feel able simply to add pages to the wiki which
describe scenarios for using identifiers to manage specimen and collection
data, please go ahead (and include links to your new scenarios from the
GUIDUseCases page).
Thanks,
Donald
---------------------------------------------------------------
Donald Hobern (dhobern(a)gbif.org)
Programme Officer for Data Access and Database Interoperability
Global Biodiversity Information Facility Secretariat
Universitetsparken 15, DK-2100 Copenhagen, Denmark
Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480
---------------------------------------------------------------
My thoughts...
First problem - any GUIDs currently in place that are not using the decided upon system will need to be "resolved"/"translated" from the existing system to the decided system and vice versa - just something that will need to be done for any data provider wishing to buy into the TDWG GUID system.
Second - one of the reasons I feel GUIDs should only really represent digital information and not real wolrd objects is exactly the problem described by Jerry. It is quite difficult to determine if two digital objects are actually referring to the same real world object and hence very error prone (equivalence of objects is an application specific problem and shouldnt alter with the GUID mechanism). Eg trying to work out if 2 digital representations of a person are referring to the same person has been a big problem for a long time.
Third - the relationship problem between different data objects is solved in the LSID world by the use of metadata - ie the LSID metadata for an object could describe the format the data object is served up in and the relationships to other LSIDs. The idea with LSIDs is that the data returned for an LSID should only be the data for that object - any related objects are referred to in the metadata. Ie the objects that an LSID refers to would need to be fairly low level, eg a Name Object from the TCS schema. Higher level objects could be represented by LSIDs but it gets harder to maintain these and guarantee the 'bit for bit' consistency of a data object.
Kevin
>>> CooperJ(a)LANDCARERESEARCH.CO.NZ 17/10/2005 4:29 p.m. >>>
Hi all,
I don't really feel qualified to get involved but I have a number of issues that worry me. They are probably non-issues and result from my ignorance and lack of time to read the background info but I thought I'd chuck them into the debating pot anyway. Feel free to tell me to buzz-off and read some more...
First is that many of us are already using GUIDS to identify objects that have equivalents elsewhere and some of us are reluctant to give them up for good reason. So for example we have our name GUIDs in our names catalogue and IndexFungorum has its name GUIDs for the same name strings (same real world entity but differet 'bit for bit' digital representation). What we need is the ability to handle the fact that multiple digital object GUIDS may reference the same real-world entity and sometimes some of those 'duplicates' will need deprecating (but not loosing), and some 'synonyms' we will just have to live with, and resolve. Second is an issue touched on, and that is the 'bit equivalence' of the digital object being referenced. In many cases the real-world entities being referenced by different object GUIDs will be identical but their 'bit for bit' digital object representation may not, i.e. the need for a 'synonymy' of GUIDs may have a real-world basis. Third is a worry I have about the lack of definition of the scope of the entity being referenced. The more that is inlcuded in a particular scope the more need there is for versioning of digital objects representing the real-world entities, and the more likley will be 'bit for bit' discrepancies in the various extant digital objects represented by extant GUIDs. And then surely many (most) objects for which a GUID is required will be composed of sub-elements which may require their own GUIDs. How do you represent, and resolve a potential cascading chain of GUIDs where each GUID has the potential for many synonyms? . This just re-invents the 'taxon concept' problem in GUID-space and highlights the fact that nested layers of indirection do not actually solve the problem.
Jerry
Jerry Cooper PhD
Research Leader: Biodiversity Informatics
Landcare Research
PO Box 40, Lincoln 8152
New Zealand
+64 3 325 6701 ext 3734
CooperJ(a)LandcareResearch.CO.NZ
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
WARNING: This email and any attachments may be confidential and/or
privileged. They are intended for the addressee only and are not to be read,
used, copied or disseminated by anyone receiving them in error. If you are
not the intended recipient, please notify the sender by return email and
delete this message and any attachments.
The views expressed in this email are those of the sender and do not
necessarily reflect the official views of Landcare Research.
Landcare Research
http://www.landcareresearch.co.nz
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
WARNING: This email and any attachments may be confidential and/or
privileged. They are intended for the addressee only and are not to be read,
used, copied or disseminated by anyone receiving them in error. If you are
not the intended recipient, please notify the sender by return email and
delete this message and any attachments.
The views expressed in this email are those of the sender and do not
necessarily reflect the official views of Landcare Research.
Landcare Research
http://www.landcareresearch.co.nz
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Hi all,
I don't really feel qualified to get involved but I have a number of issues that worry me. They are probably non-issues and result from my ignorance and lack of time to read the background info but I thought I'd chuck them into the debating pot anyway. Feel free to tell me to buzz-off and read some more...
First is that many of us are already using GUIDS to identify objects that have equivalents elsewhere and some of us are reluctant to give them up for good reason. So for example we have our name GUIDs in our names catalogue and IndexFungorum has its name GUIDs for the same name strings (same real world entity but differet 'bit for bit' digital representation). What we need is the ability to handle the fact that multiple digital object GUIDS may reference the same real-world entity and sometimes some of those 'duplicates' will need deprecating (but not loosing), and some 'synonyms' we will just have to live with, and resolve. Second is an issue touched on, and that is the 'bit equivalence' of the digital object being referenced. In many cases the real-world entities being referenced by different object GUIDs will be identical but their 'bit for bit' digital object representation may not, i.e. the need for a 'synonymy' of GUIDs may have a real-world basis. Third is a worry I have about the lack of definition of the scope of the entity being referenced. The more that is inlcuded in a particular scope the more need there is for versioning of digital objects representing the real-world entities, and the more likley will be 'bit for bit' discrepancies in the various extant digital objects represented by extant GUIDs. And then surely many (most) objects for which a GUID is required will be composed of sub-elements which may require their own GUIDs. How do you represent, and resolve a potential cascading chain of GUIDs where each GUID has the potential for many synonyms? . This just re-invents the 'taxon concept' problem in GUID-space and highlights the fact that nested layers of indirection do not actually solve the problem.
Jerry
Jerry Cooper PhD
Research Leader: Biodiversity Informatics
Landcare Research
PO Box 40, Lincoln 8152
New Zealand
+64 3 325 6701 ext 3734
CooperJ(a)LandcareResearch.CO.NZ
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
WARNING: This email and any attachments may be confidential and/or
privileged. They are intended for the addressee only and are not to be read,
used, copied or disseminated by anyone receiving them in error. If you are
not the intended recipient, please notify the sender by return email and
delete this message and any attachments.
The views expressed in this email are those of the sender and do not
necessarily reflect the official views of Landcare Research.
Landcare Research
http://www.landcareresearch.co.nz
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
James Ytow wrote:
> Here we see a tipical trouble with identifier and identity. Do you
> mean identity of an object (a unique thing, so we don't need
> identifier because it is the thing) or equivalence of data (there can
> be multiple data objects having the same value)? Where we need GUID
> we can't rely on identity, in my understanding.
The DOI Handbook (http://dx.doi.org/10.1000/186) addresses this issue. The
DOI approach centres on the use of Application Profiles. Each DOI can be
associated with a profile which defines the metadata elements with which it
is to be associated (and therefore by implication which pieces of
information must remain stable for two records to be considered copies of
the same object).
On page 17, the Handbook states:
A common question is: if I identify entity A with a DOI, and then I
adapt it in some way to create entity B, should I assign a new DOI to
entity B?
The answer is: there can be no general rule which applies to all cases
and each must be treated in context. If a registrant finds it useful to
do so, they may. The rules of Application profiles, and business rules
of Registration Agencies, will help in deciding for DOIs registered in
Application Profiles. The key point is that one should precisely specify
what A is and what B is; two digital entities are never the same in any
absolute sense and can be considered copies of each other only in the
context of some defined purpose.
For a more detailed explanation of this fundamental topic, see the
article "On Making and Identifying a Copy"
http://dx.doi.org/10.1045/january2003-paskin
This is a very important point, and one that we will need to bear in mind as
we consider the application of GUIDs to digital representations of real
world objects such as specimen records. Two completely differently
formatted records may be considered as copies of the same object if they
serve the same purpose for some user. This is one area which the DOI
infrastructure particularly tries to address.
Donald
---------------------------------------------------------------
Donald Hobern (dhobern(a)gbif.org)
Programme Officer for Data Access and Database Interoperability
Global Biodiversity Information Facility Secretariat
Universitetsparken 15, DK-2100 Copenhagen, Denmark
Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480
---------------------------------------------------------------
-----Original Message-----
From: Taxonomic Databases Working Group GUID Project
[mailto:TDWG-GUID@LISTSERV.NHM.KU.EDU] On Behalf Of Nozomi Ytow
Sent: 12 October 2005 18:40
To: TDWG-GUID(a)LISTSERV.NHM.KU.EDU
Subject: Re: What do we mean by GUID?
Matt Jones wrote:
> The term GUID is one we started using in SEEK when looking for a
> solution to the identity and resolution problems that we saw looming for
> the Taxonomic COncept Standard. Dave Thau's presentation on this
> (linked on the GUID wiki) defines this pretty well and explores the
issues.
Here we see a tipical trouble with identifier and identity. Do you
mean identity of an object (a unique thing, so we don't need
identifier because it is the thing) or equivalence of data (there can
be multiple data objects having the same value)? Where we need GUID
we can't rely on identity, in my understanding.
> "globally unique" means simply that an identifier that is issued can
> only have one valid interpretation across all possible systems.
What do you mean by valid? Suppose a data object in data provider's
database. A GBIF portal has its copy when last a user accessed to the
data object. The data provider changes its contents for some reason
afther the last access through the GBIF portal. What is the
valid interpretation of these data objects? Tha provider's one?
> Regardless of the mechanism used to resolve the identifier, the object
> that the id 'identifies' will be bit-for-bit identical.
So you mean equivalence, not identity. If it is bit-for-bit
equivalence, why do you need GUID? The contents IS the GUID
you defined.
> There are some tricky issues
> dealing with granularity of the identifier for digital data (does the
> identifier point at a tuple in an entity, or at a whole entity, or at
> multiple entities).
Do you mean your bit-for-bit GUID requires scope disamibugater also?
Isn't it assigned to a data object, i.e. unit to be handled as a
chunk?
It may be better to use other words such as globally disambiguateor
or distinguisher, because we do not mean identity by identifier.
Cheers,
James