- tdwg-tag - lists.tdwg.org

[tdwg-tapir] Conformance test
by Javier de la Torre 21 Oct '05

21 Oct '05

Hi all, Yesterday I had a meeting with Jens Fitzke from the Lat/Lon company. This company made the reference implementation of the WMS standard of OGC. He gave me a lot of nice information about how things work in OGC. I am happy to have him as a partner in my project. Well, one thing he pointed me out was a conformance test that the OGC community created to see if the different implementations of the standard were interoperable... They first used a commercial software, I don't remember which one but I suppose that it can be found, and the problem is that because of the generic philosophy of the software they spent a lot of person/ months in make it run. Now they are developing an open source software to do the same but not so generic. What they have already is two services to test WMS and WFS services. You can find some information here: http://150.128.82.168/webACEGIS/ACEGIS_Test_Engines.htm Maybe we can consider something similar for TAPIR, but it must be a much more simple thing because we don't have their resources. There are two things we have to test: -Implementations interoperability... things like what happens when some parameters are missing, etc... purely technical -Content test: set up the same database and try different implemntations to see we get what we expect. Maybe during the week someone, me?, can work on this, I hope we can reuse software from OGC to do that. Cheers. Javier.

2 1

Re: Topic 2: GUIDs for Collections and Specimens
by Dave Vieglais 21 Oct '05

21 Oct '05

Hi Rod, I was just pointing out that if you include CollectionCode in your example then you would not have the duplication of records that occurs in the example. The combination of InstitutionCode, CollectionCode, and CatalogNumber should provide a GUID to a specimen record. So to slightly modify your example DiGIR provider URL : resource : CollectionCode : specimen code will generally be sufficient, but in some cases, a single server resource may offer records from several intitutions, hence: DiGIR provider URL : resource : InstitutionCode : CollectionCode : specimen code would be unique. It would be a simple matter to extend DiGIR slightly to support direct resolution of such an identifier. Perhaps something like: http://some.server/digir.php?id=resource/InstitutionCode/CollectionCode/Cat… would be sufficient to identify a single record and retrieve its digital representation as well. regards, Dave V. Roderic Page wrote: > My point is that it isn't always done (and the MVZ example concerns > totally different specimens, rather than preparations of the same > specimen). My aim is not to criticise DiGIR and Darwin Core > specifically (although the absence of a GUID is a major weakness), but > simply to provide a concrete example where digital records for totally > different specimens are not clearly distinguished. In the MVZ example, > one could retrieve the record for the desired specimen if one searched > on the taxonomic name, but this is cumbersome -- ideally I want a GUID > that can be resolved to the appropriate specimen independent of any > other information. DiGIR can do this, so long as DiGIR providers using > different resource names for different collections. > > Regards > > Rod > > > > On 20 Oct 2005, at 23:11, Dave Vieglais wrote: > >> Hi Roderic, >> In general, for records retrieved from data sources exposed using the >> Darwin Core one should be able to combine InstitutionCode, >> CollectionCode and CatalogNumber to provide unique identifiers for >> those >> records. This is not always the case however, the most common example >> of which is probably the presence of records for different preparations >> of the same specimen. >> >> regards, >> Dave V. >> >> Roderic Page wrote: >> >>> As a consumer of specimen GUIDs, I've found specimens to be >>> frustrating >>> to deal with as individual collections don't guarantee uniqueness of >>> identifiers (Donald's point 2 below). For example, in the absence of >>> specimen GUIDs (such as LSIDs) I'd hoped to use a three part >>> identifier >>> based on the DiGIR provider, e.g. >>> >>> DiGIR provider URL : resource : specimen code >>> >>> Hence, >>> >>> digir.fieldmuseum.org/digir/DiGIR.php:MammalsDwC2:FMNH158106 >>> \-----------------------------------/ \---------/ \--------/ >>> provider resource specimen >>> >>> identifies specimen FMNH 158106 of Tatera robusta at the Field Museum >>> in >>> Chicago. The idea behind this crude hack is that the identifier can be >>> resolved (there's enough information in the identifier to retrieve the >>> record, see for example >>> http://darwin.zoology.gla.ac.uk/~rpage/hacks/2/index.html ). >>> >>> To my horror, if I do this for MVZ 148946, I get three specimens back, >>> one each for Chaetodipus baileyi baileyi, Calidris mauri, and Rana >>> cascadae. This is an instance where the same specimen code is being >>> used >>> in three different collections (mammals, birds, and herps). I guess >>> MVZ >>> could have avoided this by using a different name for the 'resource' >>> field for each collection. >>> >>> I offer this as an example of where GUIDs are vital if we are to avoid >>> linking to the wrong information, and also where individual providers >>> need to ensure that the identifiers they generate are unique. >>> >>> Regards >>> >>> Rod >>> >>> >>> Professor Roderic D. M. Page >>> Editor, Systematic Biology >>> DEEB, IBLS >>> Graham Kerr Building >>> University of Glasgow >>> Glasgow G12 8QP >>> United Kingdom >>> >>> Phone: +44 141 330 4778 >>> Fax: +44 141 330 2792 >>> email: r.page(a)bio.gla.ac.uk >>> web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html >>> reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html >>> >>> Subscribe to Systematic Biology through the Society of Systematic >>> Biologists Website: http://systematicbiology.org >>> Search for taxon names at >>> http://darwin.zoology.gla.ac.uk/~rpage/portal/ >>> >>> >>> >>> >> >> > Professor Roderic D. M. Page > Editor, Systematic Biology > DEEB, IBLS > Graham Kerr Building > University of Glasgow > Glasgow G12 8QP > United Kingdom > > Phone: +44 141 330 4778 > Fax: +44 141 330 2792 > email: r.page(a)bio.gla.ac.uk > web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html > reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html > > Subscribe to Systematic Biology through the Society of Systematic > Biologists Website: http://systematicbiology.org > Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/ >

1 0

Re: Topic 2: GUIDs for Collections and Specimens
by Dave Vieglais 21 Oct '05

21 Oct '05

Hi Roderic, In general, for records retrieved from data sources exposed using the Darwin Core one should be able to combine InstitutionCode, CollectionCode and CatalogNumber to provide unique identifiers for those records. This is not always the case however, the most common example of which is probably the presence of records for different preparations of the same specimen. regards, Dave V. Roderic Page wrote: > As a consumer of specimen GUIDs, I've found specimens to be frustrating > to deal with as individual collections don't guarantee uniqueness of > identifiers (Donald's point 2 below). For example, in the absence of > specimen GUIDs (such as LSIDs) I'd hoped to use a three part identifier > based on the DiGIR provider, e.g. > > DiGIR provider URL : resource : specimen code > > Hence, > > digir.fieldmuseum.org/digir/DiGIR.php:MammalsDwC2:FMNH158106 > \-----------------------------------/ \---------/ \--------/ > provider resource specimen > > identifies specimen FMNH 158106 of Tatera robusta at the Field Museum in > Chicago. The idea behind this crude hack is that the identifier can be > resolved (there's enough information in the identifier to retrieve the > record, see for example > http://darwin.zoology.gla.ac.uk/~rpage/hacks/2/index.html ). > > To my horror, if I do this for MVZ 148946, I get three specimens back, > one each for Chaetodipus baileyi baileyi, Calidris mauri, and Rana > cascadae. This is an instance where the same specimen code is being used > in three different collections (mammals, birds, and herps). I guess MVZ > could have avoided this by using a different name for the 'resource' > field for each collection. > > I offer this as an example of where GUIDs are vital if we are to avoid > linking to the wrong information, and also where individual providers > need to ensure that the identifiers they generate are unique. > > Regards > > Rod > > > Professor Roderic D. M. Page > Editor, Systematic Biology > DEEB, IBLS > Graham Kerr Building > University of Glasgow > Glasgow G12 8QP > United Kingdom > > Phone: +44 141 330 4778 > Fax: +44 141 330 2792 > email: r.page(a)bio.gla.ac.uk > web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html > reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html > > Subscribe to Systematic Biology through the Society of Systematic > Biologists Website: http://systematicbiology.org > Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/ > > > >

1 0

[tdwg-tapir] TapirLite and SimpleFiltering pages
by Roger Hyam 21 Oct '05

21 Oct '05

Hi Everyone, I am nervous at being the first to post to this most esteemed list but here goes. I have just added two pages to the wiki concerning minor changes that could be made to the protocol to make it easier to implement 'Lite' versions of Tapir providers. http://ww3.bgbm.org/protocolwiki/TapirLite http://ww3.bgbm.org/protocolwiki/SimpleFiltering Please read and add your support or reservations to the wiki or discuss it here. All the best, Roger -- ------------------------------------- Roger Hyam Technical Architect Taxonomic Databases Working Group ------------------------------------- http://www.tdwg.org roger(a)tdwg.org +44 1578 722782 -------------------------------------

2 2

Re: Topic 2: GUIDs for Collections and Specimens
by Roderic Page 20 Oct '05

20 Oct '05

My point is that it isn't always done (and the MVZ example concerns totally different specimens, rather than preparations of the same specimen). My aim is not to criticise DiGIR and Darwin Core specifically (although the absence of a GUID is a major weakness), but simply to provide a concrete example where digital records for totally different specimens are not clearly distinguished. In the MVZ example, one could retrieve the record for the desired specimen if one searched on the taxonomic name, but this is cumbersome -- ideally I want a GUID that can be resolved to the appropriate specimen independent of any other information. DiGIR can do this, so long as DiGIR providers using different resource names for different collections. Regards Rod On 20 Oct 2005, at 23:11, Dave Vieglais wrote: > Hi Roderic, > In general, for records retrieved from data sources exposed using the > Darwin Core one should be able to combine InstitutionCode, > CollectionCode and CatalogNumber to provide unique identifiers for > those > records. This is not always the case however, the most common example > of which is probably the presence of records for different preparations > of the same specimen. > > regards, > Dave V. > > Roderic Page wrote: >> As a consumer of specimen GUIDs, I've found specimens to be >> frustrating >> to deal with as individual collections don't guarantee uniqueness of >> identifiers (Donald's point 2 below). For example, in the absence of >> specimen GUIDs (such as LSIDs) I'd hoped to use a three part >> identifier >> based on the DiGIR provider, e.g. >> >> DiGIR provider URL : resource : specimen code >> >> Hence, >> >> digir.fieldmuseum.org/digir/DiGIR.php:MammalsDwC2:FMNH158106 >> \-----------------------------------/ \---------/ \--------/ >> provider resource specimen >> >> identifies specimen FMNH 158106 of Tatera robusta at the Field Museum >> in >> Chicago. The idea behind this crude hack is that the identifier can be >> resolved (there's enough information in the identifier to retrieve the >> record, see for example >> http://darwin.zoology.gla.ac.uk/~rpage/hacks/2/index.html ). >> >> To my horror, if I do this for MVZ 148946, I get three specimens back, >> one each for Chaetodipus baileyi baileyi, Calidris mauri, and Rana >> cascadae. This is an instance where the same specimen code is being >> used >> in three different collections (mammals, birds, and herps). I guess >> MVZ >> could have avoided this by using a different name for the 'resource' >> field for each collection. >> >> I offer this as an example of where GUIDs are vital if we are to avoid >> linking to the wrong information, and also where individual providers >> need to ensure that the identifiers they generate are unique. >> >> Regards >> >> Rod >> >> >> Professor Roderic D. M. Page >> Editor, Systematic Biology >> DEEB, IBLS >> Graham Kerr Building >> University of Glasgow >> Glasgow G12 8QP >> United Kingdom >> >> Phone: +44 141 330 4778 >> Fax: +44 141 330 2792 >> email: r.page(a)bio.gla.ac.uk >> web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html >> reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html >> >> Subscribe to Systematic Biology through the Society of Systematic >> Biologists Website: http://systematicbiology.org >> Search for taxon names at >> http://darwin.zoology.gla.ac.uk/~rpage/portal/ >> >> >> >> > > Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page(a)bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/

1 0

Re: Topic 2: GUIDs for Collections and Specimens
by Roderic Page 20 Oct '05

20 Oct '05

As a consumer of specimen GUIDs, I've found specimens to be frustrating to deal with as individual collections don't guarantee uniqueness of identifiers (Donald's point 2 below). For example, in the absence of specimen GUIDs (such as LSIDs) I'd hoped to use a three part identifier based on the DiGIR provider, e.g. DiGIR provider URL : resource : specimen code Hence, digir.fieldmuseum.org/digir/DiGIR.php:MammalsDwC2:FMNH158106 \-----------------------------------/ \---------/ \--------/ provider resource specimen identifies specimen FMNH 158106 of Tatera robusta at the Field Museum in Chicago. The idea behind this crude hack is that the identifier can be resolved (there's enough information in the identifier to retrieve the record, see for example http://darwin.zoology.gla.ac.uk/~rpage/hacks/2/index.html ). To my horror, if I do this for MVZ 148946, I get three specimens back, one each for Chaetodipus baileyi baileyi, Calidris mauri, and Rana cascadae. This is an instance where the same specimen code is being used in three different collections (mammals, birds, and herps). I guess MVZ could have avoided this by using a different name for the 'resource' field for each collection. I offer this as an example of where GUIDs are vital if we are to avoid linking to the wrong information, and also where individual providers need to ensure that the identifiers they generate are unique. Regards Rod Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page(a)bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/

1 0

Topic 2: GUIDs for Collections and Specimens
by Donald Hobern 20 Oct '05

20 Oct '05

[ Another topic for comments. Please keep the Topic number in responses ] Topic 2: GUIDs for Collections and Specimens Identifiers to assist with management of collection data have been at the centre of TDWG's GUID investigation from the beginning. A primary motivation for this work has been the need to recognise where two data providers are offering access to information on the same specimen. Two very basic scenarios for specimen identifiers are described on the wiki at http://wiki.gbif.org/guidwiki/wikka.php?wakka=GUIDUseCases. However we do need to make sure we understand the actual scenarios requiring such identifiers across the range of biological collections. I am therefore looking for descriptions of the situations in which your current processes, systems and applications already use identifiers for specimens (and where perhaps genuinely globally unique identifiers may help), and of any policies and processes around collection management which might affect how we are able to assign, manage or resolve identifiers. When I speak of 'specimens' I am primarily thinking of organisms (living or dead, including subsamples) held in collections (including zoos, aquaria, culture collections and seed banks), but I am also very interested in parallel situations involving the assignment of identifiers to observation events in the field. Some more specific questions to try to shape discussion: 1. What identifiers (how many per specimen) get assigned to specimens in your organisation or domain (field numbers, catalogue numbers, etc.)? 2. What is the scope of uniqueness for each of these identifiers (notebook page, collector, database, institution, global, etc.)? 3. Can you explain the life cycle of each of these identifiers (who assigns them, how they are subsequently tracked)? 4. Can you give examples of how these identifiers are used to retrieve the specimen and/or information on the specimen? 5. Would there be any social or technical roadblocks to replacing these identifiers with a single identifier that was guaranteed to be unique? 6. In the case of subsamples from a specimen, can you identify issues around associating the sample and associated information with the source specimen and associated information? The subject of specimen identifiers is somewhat linked to that of collection identifiers, since Darwin Core and the ABCD Schema have used institution and collection codes together with catalogue numbers to identify specimens in the absence of GUIDs. It would also be useful here to collect information on the following: 7. How are your specimens organised into larger identifiable sets (collections, named collections, databases, institutions, etc.)? 8. What identifiers get assigned to each of these sets in your organization or domain (institution codes, collection codes, Index Herbarium acronyms, etc.)? 9. Can you explain the life cycle of each of these identifiers (who assigns them, how they are subsequently tracked)? 10. Can you give examples of how these identifiers are used to locate the set and/or information on the set? 11. Would there be any social or technical roadblocks to replacing these identifiers with a single identifier that was guaranteed to be unique? To help you a little, my aim is to use this information to develop additional scenarios as use cases which will complement those already on the wiki (and yes, I do realise that the existing "use case" pages are not formal use cases!). If you feel able simply to add pages to the wiki which describe scenarios for using identifiers to manage specimen and collection data, please go ahead (and include links to your new scenarios from the GUIDUseCases page). Thanks, Donald --------------------------------------------------------------- Donald Hobern (dhobern(a)gbif.org) Programme Officer for Data Access and Database Interoperability Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480 ---------------------------------------------------------------

1 0

Re: I probably don't know what I'm talking about but ...
by Kevin Richards 17 Oct '05

17 Oct '05

My thoughts... First problem - any GUIDs currently in place that are not using the decided upon system will need to be "resolved"/"translated" from the existing system to the decided system and vice versa - just something that will need to be done for any data provider wishing to buy into the TDWG GUID system. Second - one of the reasons I feel GUIDs should only really represent digital information and not real wolrd objects is exactly the problem described by Jerry. It is quite difficult to determine if two digital objects are actually referring to the same real world object and hence very error prone (equivalence of objects is an application specific problem and shouldnt alter with the GUID mechanism). Eg trying to work out if 2 digital representations of a person are referring to the same person has been a big problem for a long time. Third - the relationship problem between different data objects is solved in the LSID world by the use of metadata - ie the LSID metadata for an object could describe the format the data object is served up in and the relationships to other LSIDs. The idea with LSIDs is that the data returned for an LSID should only be the data for that object - any related objects are referred to in the metadata. Ie the objects that an LSID refers to would need to be fairly low level, eg a Name Object from the TCS schema. Higher level objects could be represented by LSIDs but it gets harder to maintain these and guarantee the 'bit for bit' consistency of a data object. Kevin >>> CooperJ(a)LANDCARERESEARCH.CO.NZ 17/10/2005 4:29 p.m. >>> Hi all, I don't really feel qualified to get involved but I have a number of issues that worry me. They are probably non-issues and result from my ignorance and lack of time to read the background info but I thought I'd chuck them into the debating pot anyway. Feel free to tell me to buzz-off and read some more... First is that many of us are already using GUIDS to identify objects that have equivalents elsewhere and some of us are reluctant to give them up for good reason. So for example we have our name GUIDs in our names catalogue and IndexFungorum has its name GUIDs for the same name strings (same real world entity but differet 'bit for bit' digital representation). What we need is the ability to handle the fact that multiple digital object GUIDS may reference the same real-world entity and sometimes some of those 'duplicates' will need deprecating (but not loosing), and some 'synonyms' we will just have to live with, and resolve. Second is an issue touched on, and that is the 'bit equivalence' of the digital object being referenced. In many cases the real-world entities being referenced by different object GUIDs will be identical but their 'bit for bit' digital object representation may not, i.e. the need for a 'synonymy' of GUIDs may have a real-world basis. Third is a worry I have about the lack of definition of the scope of the entity being referenced. The more that is inlcuded in a particular scope the more need there is for versioning of digital objects representing the real-world entities, and the more likley will be 'bit for bit' discrepancies in the various extant digital objects represented by extant GUIDs. And then surely many (most) objects for which a GUID is required will be composed of sub-elements which may require their own GUIDs. How do you represent, and resolve a potential cascading chain of GUIDs where each GUID has the potential for many synonyms? . This just re-invents the 'taxon concept' problem in GUID-space and highlights the fact that nested layers of indirection do not actually solve the problem. Jerry Jerry Cooper PhD Research Leader: Biodiversity Informatics Landcare Research PO Box 40, Lincoln 8152 New Zealand +64 3 325 6701 ext 3734 CooperJ(a)LandcareResearch.CO.NZ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ WARNING: This email and any attachments may be confidential and/or privileged. They are intended for the addressee only and are not to be read, used, copied or disseminated by anyone receiving them in error. If you are not the intended recipient, please notify the sender by return email and delete this message and any attachments. The views expressed in this email are those of the sender and do not necessarily reflect the official views of Landcare Research. Landcare Research http://www.landcareresearch.co.nz ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ WARNING: This email and any attachments may be confidential and/or privileged. They are intended for the addressee only and are not to be read, used, copied or disseminated by anyone receiving them in error. If you are not the intended recipient, please notify the sender by return email and delete this message and any attachments. The views expressed in this email are those of the sender and do not necessarily reflect the official views of Landcare Research. Landcare Research http://www.landcareresearch.co.nz ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

1 0

I probably don't know what I'm talking about but ...
by Jerry Cooper 17 Oct '05

17 Oct '05

Hi all, I don't really feel qualified to get involved but I have a number of issues that worry me. They are probably non-issues and result from my ignorance and lack of time to read the background info but I thought I'd chuck them into the debating pot anyway. Feel free to tell me to buzz-off and read some more... First is that many of us are already using GUIDS to identify objects that have equivalents elsewhere and some of us are reluctant to give them up for good reason. So for example we have our name GUIDs in our names catalogue and IndexFungorum has its name GUIDs for the same name strings (same real world entity but differet 'bit for bit' digital representation). What we need is the ability to handle the fact that multiple digital object GUIDS may reference the same real-world entity and sometimes some of those 'duplicates' will need deprecating (but not loosing), and some 'synonyms' we will just have to live with, and resolve. Second is an issue touched on, and that is the 'bit equivalence' of the digital object being referenced. In many cases the real-world entities being referenced by different object GUIDs will be identical but their 'bit for bit' digital object representation may not, i.e. the need for a 'synonymy' of GUIDs may have a real-world basis. Third is a worry I have about the lack of definition of the scope of the entity being referenced. The more that is inlcuded in a particular scope the more need there is for versioning of digital objects representing the real-world entities, and the more likley will be 'bit for bit' discrepancies in the various extant digital objects represented by extant GUIDs. And then surely many (most) objects for which a GUID is required will be composed of sub-elements which may require their own GUIDs. How do you represent, and resolve a potential cascading chain of GUIDs where each GUID has the potential for many synonyms? . This just re-invents the 'taxon concept' problem in GUID-space and highlights the fact that nested layers of indirection do not actually solve the problem. Jerry Jerry Cooper PhD Research Leader: Biodiversity Informatics Landcare Research PO Box 40, Lincoln 8152 New Zealand +64 3 325 6701 ext 3734 CooperJ(a)LandcareResearch.CO.NZ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ WARNING: This email and any attachments may be confidential and/or privileged. They are intended for the addressee only and are not to be read, used, copied or disseminated by anyone receiving them in error. If you are not the intended recipient, please notify the sender by return email and delete this message and any attachments. The views expressed in this email are those of the sender and do not necessarily reflect the official views of Landcare Research. Landcare Research http://www.landcareresearch.co.nz ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

1 0

Re: What do we mean by GUID?
by Donald Hobern 13 Oct '05

13 Oct '05

James Ytow wrote: > Here we see a tipical trouble with identifier and identity. Do you > mean identity of an object (a unique thing, so we don't need > identifier because it is the thing) or equivalence of data (there can > be multiple data objects having the same value)? Where we need GUID > we can't rely on identity, in my understanding. The DOI Handbook (http://dx.doi.org/10.1000/186) addresses this issue. The DOI approach centres on the use of Application Profiles. Each DOI can be associated with a profile which defines the metadata elements with which it is to be associated (and therefore by implication which pieces of information must remain stable for two records to be considered copies of the same object). On page 17, the Handbook states: A common question is: if I identify entity A with a DOI, and then I adapt it in some way to create entity B, should I assign a new DOI to entity B? The answer is: there can be no general rule which applies to all cases and each must be treated in context. If a registrant finds it useful to do so, they may. The rules of Application profiles, and business rules of Registration Agencies, will help in deciding for DOIs registered in Application Profiles. The key point is that one should precisely specify what A is and what B is; two digital entities are never the same in any absolute sense and can be considered copies of each other only in the context of some defined purpose. For a more detailed explanation of this fundamental topic, see the article "On Making and Identifying a Copy" http://dx.doi.org/10.1045/january2003-paskin This is a very important point, and one that we will need to bear in mind as we consider the application of GUIDs to digital representations of real world objects such as specimen records. Two completely differently formatted records may be considered as copies of the same object if they serve the same purpose for some user. This is one area which the DOI infrastructure particularly tries to address. Donald --------------------------------------------------------------- Donald Hobern (dhobern(a)gbif.org) Programme Officer for Data Access and Database Interoperability Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480 --------------------------------------------------------------- -----Original Message----- From: Taxonomic Databases Working Group GUID Project [mailto:TDWG-GUID@LISTSERV.NHM.KU.EDU] On Behalf Of Nozomi Ytow Sent: 12 October 2005 18:40 To: TDWG-GUID(a)LISTSERV.NHM.KU.EDU Subject: Re: What do we mean by GUID? Matt Jones wrote: > The term GUID is one we started using in SEEK when looking for a > solution to the identity and resolution problems that we saw looming for > the Taxonomic COncept Standard. Dave Thau's presentation on this > (linked on the GUID wiki) defines this pretty well and explores the issues. Here we see a tipical trouble with identifier and identity. Do you mean identity of an object (a unique thing, so we don't need identifier because it is the thing) or equivalence of data (there can be multiple data objects having the same value)? Where we need GUID we can't rely on identity, in my understanding. > "globally unique" means simply that an identifier that is issued can > only have one valid interpretation across all possible systems. What do you mean by valid? Suppose a data object in data provider's database. A GBIF portal has its copy when last a user accessed to the data object. The data provider changes its contents for some reason afther the last access through the GBIF portal. What is the valid interpretation of these data objects? Tha provider's one? > Regardless of the mechanism used to resolve the identifier, the object > that the id 'identifies' will be bit-for-bit identical. So you mean equivalence, not identity. If it is bit-for-bit equivalence, why do you need GUID? The contents IS the GUID you defined. > There are some tricky issues > dealing with granularity of the identifier for digital data (does the > identifier point at a tuple in an entity, or at a whole entity, or at > multiple entities). Do you mean your bit-for-bit GUID requires scope disamibugater also? Isn't it assigned to a data object, i.e. unit to be handled as a chunk? It may be better to use other words such as globally disambiguateor or distinguisher, because we do not mean identity by identifier. Cheers, James

1 0