- tdwg-tag - lists.tdwg.org

Re: What do we mean by GUID?
by Patricia Mergen 12 Oct '05

12 Oct '05

I agree with Chuck that we should clarify the issues he is pointing to in his statement here under. My first mail was also intended to address more generally the point of identifying versus localizing not neccessarly specifically related to the ARK,LSID or DOI initiatives. Patricia --- Chuck Miller <Chuck.Miller(a)MOBOT.ORG> wrote: > > I have been pondering this question of what exactly > is meant by a GUID since > Donald's first call for input. > > First, is it correct that GUID stands for Globally > Unique Identifier? > > Second, > What do we mean by Globally? > What do we mean by Unique? > What do we mean by Identifier? Or, specifically > what are we identifying? > > I believe we are far from consensus on all three of > those definitions, even > the meaning of globally. And, I believe we will be > going around in circles > on LSIDs, ARNs, and such until we get the > expectation more clearly defined. > > What I hear in most of the discussions so far are > descriptions of a GLID - > Globally Locatable IDentifier. In the Internet > world, GLIDs started with > the URL - Universal Resource Locator which has > evolved to the URI - > Universal Resource Identifier concept. Another form > of URI is the URN - > Uniform Resource Name which enables a persistent > name, independent of server > location. This is the kind of thing I think we want > and are discussing in > this GUID thread. > > I think we should draw a distinction between GUID > and GLID. > > An identifier of a thing can be globally unique > without stating its > location. But, again, it raises the question of > what the definition of > unique is. An ISBN number identifies a book > "uniquely", but there may be > millions of "unique" copies of it. Similarly, > duplicate sheets of a > collected plant specimen are all from the same > "unique" organism and may > each even be referred to by the same "unique" > collector and number. But, > each sheet itself is also unique. We need a clear > definition. > > An identifier of a place can also be globally > unique, like a URL. But, > being able to go to that place requires a global > infrastructure to handle > the addressing. > > Where it gets really messy is when we want an > identifier of a thing that is > unique but can move around to different places, like > a URN. The addressing > has to work like an administrative assistant who > keeps tabs on where the > staff is currently located so she can direct phone > calls to them. Without > the administrative assistant, people who move around > can't be contacted. It > looks like a lot of what LSID, ARN, and such seem to > about is > "administrative assistant" addressing schemes, how > to navigate to the entity > through layers of address abstraction. But, in each > case it raises the > issue of who/where is the administrative assistant, > on top of the question > of the addressing scheme itself. > > Shouldn't we get these definitions and expectations > nailed down first? Then > look at solutions? > > Chuck Miller > Chief Information Officer > Missouri Botanical Garden > 4344 Shaw Boulevard > Saint Louis, Missouri 63119 > Phone: 1-314-577-9419 > Cell: 1-314-614-6952 > > > > __________________________________ Yahoo! Mail - PC Magazine Editors' Choice 2005 http://mail.yahoo.com

1 0

versioning for data versus metadata
by Matt Jones 12 Oct '05

12 Oct '05

In my previous email I referred to problems in distinguishing data and metadata. I'd like to elaborate here. Traditional definitions of metadata are 'data about data' or 'data documentation'. These are good definitions from a pragmatic standpoint, but become somewhat less than helpful when trying to build real working systems that utilize both data and metadata and try to preserve replicability of analyses through versioning. A simple example will illustrate. Sometimes people record repeating information about data as separate metadata (for example, the date on which data were collected). Other times, they might include that information directly in their data model. Take two entities, A and B: Entity A: --------- Metadata: AttributeLabels = Site,Date,Abundance Data: Foo 20041010 19 Bar 20050712 20 Foo 20051010 20 Entity B: ----------- Metadata: AttributeLabels = Site,Abundance CollectionDate = 20011002 Data: Foo 24.3 Bar 21.3 Baz 20.4 Note that both entities contain the same information, but the second places the date of collection as a metadata property, while the first puts it in the data model. If one were to integrate these data entities to produce a time-series plot of abundance by site, one would need to extract the CollectionDate information from the metadata of entity B before proceeding. Thus, for the purpose of an integrated analysis, "CollectionDate" is really data. Which way people model the information is somewhat of an arbitrary decision, but typically comes down to looking at 1) rate of change of the information across tuples, and 2) intended use of the final data. So, to bring this back to the identifier issue. If one were to assign an identifer to Entity A and another to Entity B, resolving the identifier should allow one to retrieve the data. But in these two cases the data that is returned will have different schemas and will have different dependencies on the metadata. To do integrated analyses of the two entities, one really needs to be able to utilize both the metadata and the data together and be assured that both are consistent. LSIDs require that the data retrieved from an LSID never changes to guarantee replicability and persistence, but allows the metadata to change. Clearly, if the 'CollectionDate' metadata for entity B were to be changed any analyses that were performed using the original metadata could no longer be replicated. This causes a lot of trouble for analytical systems that emphasize provenance and lineage for derived data products. It indicates that there is a strong case to be made that metadata should be versioned as well and that both the metadata and data associated with an identifier really should be immutable with respect to the identifier. Any changes to data or metadata should require updates to the identifier revision. Matt -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Matt Jones jones(a)nceas.ucsb.edu Ph: 907-789-0496 National Center for Ecological Analysis and Synthesis (NCEAS) UC Santa Barbara http://www.nceas.ucsb.edu/ecoinformatics ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1 0

Re: What do we mean by GUID?
by Matt Jones 12 Oct '05

12 Oct '05

From my perspective several of these issues are pretty clear. Let me attempt answers below and see if we're all in agreement. Chuck Miller wrote: > > I have been pondering this question of what exactly is meant by a GUID > since Donald's first call for input. The term GUID is one we started using in SEEK when looking for a solution to the identity and resolution problems that we saw looming for the Taxonomic COncept Standard. Dave Thau's presentation on this (linked on the GUID wiki) defines this pretty well and explores the issues. > > First, is it correct that GUID stands for Globally Unique Identifier? > > Second, > What do we mean by Globally? > What do we mean by Unique? "globally unique" means simply that an identifier that is issued can only have one valid interpretation across all possible systems. Regardless of the mechanism used to resolve the identifier, the object that the id 'identifies' will be bit-for-bit identical. However, note that a given object can legitimately have more than one identifier. > What do we mean by Identifier? Or, specifically what are we identifying? This is a bit trickier. We will clearly be identifying several different types of objects, some of which are physical (e.g., specimens), and some of which are digital (e.g., observation data). On the specimen side, resolution of the identifier and retrieving the 'data' makes no sense because the 'data' is a physical object that cannot be electronically transported. On the 'digital' side, it makes sense to resolve and retrieve the data. There are some tricky issues dealing with granularity of the identifier for digital data (does the identifier point at a tuple in an entity, or at a whole entity, or at multiple entities). In addition you still have the very thorny issue of what is data and what is metadata. I'll write another note regarding this issue. Matt > > I believe we are far from consensus on all three of those definitions, > even the meaning of globally. And, I believe we will be going around in > circles on LSIDs, ARNs, and such until we get the expectation more > clearly defined. > > What I hear in most of the discussions so far are descriptions of a GLID > - Globally Locatable IDentifier. In the Internet world, GLIDs started > with the URL - Universal Resource Locator which has evolved to the URI - > Universal Resource Identifier concept. Another form of URI is the URN - > Uniform Resource Name which enables a persistent name, independent of > server location. This is the kind of thing I think we want and are > discussing in this GUID thread. > > I think we should draw a distinction between GUID and GLID. > > An identifier of a thing can be globally unique without stating its > location. But, again, it raises the question of what the definition of > unique is. An ISBN number identifies a book "uniquely", but there may > be millions of "unique" copies of it. Similarly, duplicate sheets of a > collected plant specimen are all from the same "unique" organism and may > each even be referred to by the same "unique" collector and number. But, > each sheet itself is also unique. We need a clear definition. > > An identifier of a place can also be globally unique, like a URL. But, > being able to go to that place requires a global infrastructure to > handle the addressing. > > Where it gets really messy is when we want an identifier of a thing that > is unique but can move around to different places, like a URN. The > addressing has to work like an administrative assistant who keeps tabs > on where the staff is currently located so she can direct phone calls to > them. Without the administrative assistant, people who move around > can't be contacted. It looks like a lot of what LSID, ARN, and such seem > to about is "administrative assistant" addressing schemes, how to > navigate to the entity through layers of address abstraction. But, in > each case it raises the issue of who/where is the administrative > assistant, on top of the question of the addressing scheme itself. > > Shouldn't we get these definitions and expectations nailed down first? > Then look at solutions? > > Chuck Miller > Chief Information Officer > Missouri Botanical Garden > 4344 Shaw Boulevard > Saint Louis, Missouri 63119 > Phone: 1-314-577-9419 > Cell: 1-314-614-6952 > > > -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Matt Jones jones(a)nceas.ucsb.edu Ph: 907-789-0496 National Center for Ecological Analysis and Synthesis (NCEAS) UC Santa Barbara http://www.nceas.ucsb.edu/ecoinformatics ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1 0

Re: Topic 1: What do we mean by "GUID"?
by Matt Jones 12 Oct '05

12 Oct '05

I agree with all of Rod's points. I'd like to add some details about LSID resolution as well. In current practice, LSID relies on the DNS to locate the authority that is to be used to resolve (localize) an LSID. This can be considered a minor weakness (in that the DNS name is part of the identifier), but also a major strength (the DNS is by far the most robust system we have for persistent naming). For an LSID that might be issued by, e.g., gbif.org, a change in gbif.org's ability to maintain the authority can be fixed by simply pointing gbif.org's SRV record for the lsid service to a new authority. This is commonly recognized and used, and represents a distributed resolution mechanism. However, in addition, in the LSID spec the use of the DDDS (Dynamic Delegation Discovery System) is also described top allow the use of a centralized registry of resolvers. The DDDS uses NAPTR records to associate the NID "lsid" with a particular DNS server, which is then queried for more NAPTR records that specify rewriting rules to obtain the DNS name of the authority that should be used for a given LSID. For example, if the authority in an LSID is set to 'gbif.org', the rewriting rules might turn that into the authority 'gbif.org.lsid.lsidauthority.org'. This allows the central IANA-registered owner of the "lsid" NID to create a service that overrides the DNS in particular cases to provide an alternative authority in a centralized manner. It also allows the use of non-DNS based authority strings (e.g., myauthority). This approch to resolution is centralized in exactly the same manner that the centralized DOI and ARK registries are. However, as far as I can tell there are no LSID resolvers that utilize this capability, and I don't know if the "lsid" NID has been registered with an NAPTR record or not . But nevertheless, the LSID spec provides for both distributed and centralized authority resolution, and so is a superset of the capabilities in DOI and ARK. It also has the advantage of being Internet standards-based for all of its resolution mechanisms. Matt Roderic Page wrote: > I think this is a very nice statement of the issues. > > My own view is that ARK is interesting, but I'm not sure ARK is the best > way forward. Persistence is a (perhaps the) key issue, and it is a > social one not a technological one, as the DOI people make very clear. > DOIs only work because the publishing industry has invested in the > infrastructure to support them. > > In some ways, DOIs and ARK are very similar. If I use the DOI resolver > to resolve a DOI > > http://dx.doi.org/10.1086/303303 > \--------/ \-----/ \----/ > | | | > | Name Name > Name mapping Assigning > Authority Authority Number (NAAN) > Hostport (NMAH) > > then I have a URL very like an ARK, where the authority assigning the > name (such as a publisher, in this case the University of Chicago) is > different from the authority makes the identifier actionable (doi.org). > One could imagine that if DOI.org were to fall over, one could > substitute another authority, such as doi.reborn.org. Indeed some > publishers almost do essentially this, for example > http://www.journals.uchicago.edu/cgi-bin/resolve?id=doi:10.1086/303303 > (although this will only resolve local DOIs). ARK simply makes this > possibility explicit. LSIDs are more strongly tied to the DNS (the > uniqueness of an LSID is partly guaranteed by using Internet domain > names), although they do have limited support for foreign authorities > (other providers that can serve metadata for objects that those > providers don't actually own). > > ARK also adds the ability to retrieve a statement of commitment. I'm > less impressed by this, as a statement is all very well, but will > service providers actually honour it? I guess this is an issue of trust. > I suspect that user's rating of service providers will be much more > accurate than a rating provided by a service provider. > > One issue not on this list is who generates GUIDs? ARKs and DOIs require > some degree of centralisation because both require unique identifiers > for organisations providing data (e.g., 10.10086 identifies the > University of Chicago Press). This in itself requires some degree of > service commitment. LSIDs are decentralised, in that the unique > identifier for an organisation is provided by the DNS. If, for example, > GBIF took on the role of providing unique identifiers for organisations, > but then closed due to funding issues (heaven forbid), then we have a > problem. If the DNS goes belly up, then we will have much more pressing > issues to worry about... > > > Regards > > Rod > > On 11 Oct 2005, at 15:37, Donald Hobern wrote: > > > [ I will be trying to provide some structure to discussions in this > mailing list by raising specific topics and looking for comments. > Please keep the Topic number in responses ] > > Topic 1: What do we mean by GUID? > > The most fundamental thing that we need to establish as we consider > a GUID implementation is a definition for “GUID” in this context. > We have been using a number of terms to describe the identifiers we > need (unique, resolvable, persistent, etc.). > > I’ve been spending some time following up on Rod Page’s > recommendation that we consider the use of Archival Resource Keys > (ARK) from the California Digital Library (see > http://wiki.gbif.org/guidwiki/wikka.php?wakka=ARK). The CDL web > site includes an excellent overview of this GUID model, which also > serves as an excellent introduction to the issues involved. I would > urge you all to read this document – it’s only nine pages long!): > > http://www.cdlib.org/inside/diglib/ark/arkcdl.pdf > > This document arrives at the following problem definition for > persistent, actionable identifiers: > > 1 The goal: long-term actionable identifiers. > a Requirement: that identifiers deliver you to objects (where > feasible). > b Requirement: that identifiers deliver you to object metadata. > c Desirable: each object should wear its own identifier. > d Requirement: that identifiers deliver you to statements of > commitment. > 2 The problem: URLs break for some objects (that is, associations > between URLs and objects are not maintained), and we have no way to > tell which ones will or won’t break. > 3 Why URLs break: because objects are moved, removed, and replaced – > completely normal activities – and the provider in each case > demonstrates insufficient commitment to update indirection tables, > or to plan identifier assignment carefully. Persistence is in the > mission of few organizations. > 4 Conventional hypothesis: use indirect names (PURLs, URNs, Handles) > instead of URLs; what worked for DNS should work for digital object > references. Wrong. Indirection is spectacularly successful and > elegant in DNS, but it’s a side issue in the provision of digital > object persistence. > > This document clearly identifies issues around provider service > commitments as the key problem that needs solving. The construction > of ARKs seeks to address this in a couple of ways. It separates the > role of Name Assigning Authority (i.e. who initially assigns the > identifier) from that of the Name Mapping Authority (i.e. who is > able to map the identifier to the data object at any particular > time). It also defines a simple standard relationship between three > things: the data object, the metadata for the object, and a > commitment statement from the provider as to what aspects of > persistence are guaranteed. > > ARK is a technology that we have not really considered up to this > point. My question for discussion is what, if anything, is missing > or wrong about the problem definition provided in this document? If > we agree that it provides a crisp definition of what we need, that > in itself will be a major step forward. > > Please provide your thoughts. > > Donald > > --------------------------------------------------------------- > Donald Hobern (dhobern(a)gbif.org) > Programme Officer for Data Access and Database Interoperability > Global Biodiversity Information Facility Secretariat > Universitetsparken 15, DK-2100 Copenhagen, Denmark > Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480 > --------------------------------------------------------------- > > > Professor Roderic D. M. Page > Editor, Systematic Biology > DEEB, IBLS > Graham Kerr Building > University of Glasgow > Glasgow G12 8QP > United Kingdom > > Phone: +44 141 330 4778 > Fax: +44 141 330 2792 > email: r.page(a)bio.gla.ac.uk > web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html > reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html > > Subscribe to Systematic Biology through the Society of Systematic > Biologists Website: http://systematicbiology.org > Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/ > > > > -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Matt Jones jones(a)nceas.ucsb.edu Ph: 907-789-0496 National Center for Ecological Analysis and Synthesis (NCEAS) UC Santa Barbara http://www.nceas.ucsb.edu/ecoinformatics ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1 0

Topic 1: What do we mean by "GUID"?
by Donald Hobern 11 Oct '05

11 Oct '05

[ I will be trying to provide some structure to discussions in this mailing list by raising specific topics and looking for comments. Please keep the Topic number in responses ] Topic 1: What do we mean by GUID? The most fundamental thing that we need to establish as we consider a GUID implementation is a definition for "GUID" in this context. We have been using a number of terms to describe the identifiers we need (unique, resolvable, persistent, etc.). I've been spending some time following up on Rod Page's recommendation that we consider the use of Archival Resource Keys (ARK) from the California Digital Library (see http://wiki.gbif.org/guidwiki/wikka.php?wakka=ARK). The CDL web site includes an excellent overview of this GUID model, which also serves as an excellent introduction to the issues involved. I would urge you all to read this document - it's only nine pages long!): http://www.cdlib.org/inside/diglib/ark/arkcdl.pdf This document arrives at the following problem definition for persistent, actionable identifiers: 1. The goal: long-term actionable identifiers. a. Requirement: that identifiers deliver you to objects (where feasible). b. Requirement: that identifiers deliver you to object metadata. c. Desirable: each object should wear its own identifier. d. Requirement: that identifiers deliver you to statements of commitment. 2. The problem: URLs break for some objects (that is, associations between URLs and objects are not maintained), and we have no way to tell which ones will or won't break. 3. Why URLs break: because objects are moved, removed, and replaced - completely normal activities - and the provider in each case demonstrates insufficient commitment to update indirection tables, or to plan identifier assignment carefully. Persistence is in the mission of few organizations. 4. Conventional hypothesis: use indirect names (PURLs, URNs, Handles) instead of URLs; what worked for DNS should work for digital object references. Wrong. Indirection is spectacularly successful and elegant in DNS, but it's a side issue in the provision of digital object persistence. This document clearly identifies issues around provider service commitments as the key problem that needs solving. The construction of ARKs seeks to address this in a couple of ways. It separates the role of Name Assigning Authority (i.e. who initially assigns the identifier) from that of the Name Mapping Authority (i.e. who is able to map the identifier to the data object at any particular time). It also defines a simple standard relationship between three things: the data object, the metadata for the object, and a commitment statement from the provider as to what aspects of persistence are guaranteed. ARK is a technology that we have not really considered up to this point. My question for discussion is what, if anything, is missing or wrong about the problem definition provided in this document? If we agree that it provides a crisp definition of what we need, that in itself will be a major step forward. Please provide your thoughts. Donald --------------------------------------------------------------- Donald Hobern (dhobern(a)gbif.org) Programme Officer for Data Access and Database Interoperability Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480 ---------------------------------------------------------------

1 0

Re: Topic 1: What do we mean by "GUID"?
by Patricia Mergen 11 Oct '05

11 Oct '05

Dear Donald and colleagues from the GUID group Please find some comments here below There are two concepts which may be kept separated: Identify an item (object or representations of the object) which should be unique and stable which can be understood as a GUID. Localisation of an object or its representation which may and often is not stable. Many of the tools and systems suggested so far often mix up these two concepts and intend to identify and localise at the same time. Problems than occur often because localisation is not stable and apparently not workable as a GUID. Maybe for solving the problem, we could try for GBIF/TDWG to keep these two concepts separated and look for a system which is most appropriate to identify uniquely and in a stable way the unit level data. This can be an Accession number with no meaning like those used for Sequence data. Localisation of the object can be dealt with as being information about the object that can change in time like any other. Best regards Patricia Mergen --- Donald Hobern <dhobern(a)GBIF.ORG> wrote: > [ I will be trying to provide some structure to > discussions in this mailing > list by raising specific topics and looking for > comments. Please keep the > Topic number in responses ] > > > > Topic 1: What do we mean by GUID? > > > > The most fundamental thing that we need to establish > as we consider a GUID > implementation is a definition for "GUID" in this > context. We have been > using a number of terms to describe the identifiers > we need (unique, > resolvable, persistent, etc.). > > > > I've been spending some time following up on Rod > Page's recommendation that > we consider the use of Archival Resource Keys (ARK) > from the California > Digital Library (see > http://wiki.gbif.org/guidwiki/wikka.php?wakka=ARK). > The CDL web site includes an excellent overview of > this GUID model, which > also serves as an excellent introduction to the > issues involved. I would > urge you all to read this document - it's only nine > pages long!): > > > > http://www.cdlib.org/inside/diglib/ark/arkcdl.pdf > > > > This document arrives at the following problem > definition for persistent, > actionable identifiers: > > > > 1. The goal: long-term actionable identifiers. > > a. Requirement: that identifiers deliver you to > objects (where > feasible). > b. Requirement: that identifiers deliver you to > object metadata. > c. Desirable: each object should wear its own > identifier. > d. Requirement: that identifiers deliver you to > statements of > commitment. > > 2. The problem: URLs break for some objects > (that is, associations > between URLs and objects are not maintained), and we > have no way to tell > which ones will or won't break. > 3. Why URLs break: because objects are moved, > removed, and replaced - > completely normal activities - and the provider in > each case demonstrates > insufficient commitment to update indirection > tables, or to plan identifier > assignment carefully. Persistence is in the mission > of few organizations. > 4. Conventional hypothesis: use indirect names > (PURLs, URNs, Handles) > instead of URLs; what worked for DNS should work for > digital object > references. Wrong. Indirection is spectacularly > successful and elegant in > DNS, but it's a side issue in the provision of > digital object persistence. > > > > This document clearly identifies issues around > provider service commitments > as the key problem that needs solving. The > construction of ARKs seeks to > address this in a couple of ways. It separates the > role of Name Assigning > Authority (i.e. who initially assigns the > identifier) from that of the Name > Mapping Authority (i.e. who is able to map the > identifier to the data object > at any particular time). It also defines a simple > standard relationship > between three things: the data object, the metadata > for the object, and a > commitment statement from the provider as to what > aspects of persistence are > guaranteed. > > > > ARK is a technology that we have not really > considered up to this point. My > question for discussion is what, if anything, is > missing or wrong about the > problem definition provided in this document? If we > agree that it provides > a crisp definition of what we need, that in itself > will be a major step > forward. > > > > Please provide your thoughts. > > > > Donald > > --------------------------------------------------------------- > Donald Hobern (dhobern(a)gbif.org) > Programme Officer for Data Access and Database > Interoperability > Global Biodiversity Information Facility Secretariat > Universitetsparken 15, DK-2100 Copenhagen, Denmark > Tel: +45-35321483 Mobile: +45-28751483 Fax: > +45-35321480 > --------------------------------------------------------------- > > > > __________________________________ Start your day with Yahoo! - Make it your home page! http://www.yahoo.com/r/hs

1 0

guids
by Kevin Richards 27 Sep '05

27 Sep '05

A few thoughts... To me, the only way I can see GUIDs being used effectively is to assign a GUID to all data objects that will be exposed outside the local system where it is stored, eg database records, image files, document files - I cant see how you could assign GUIDs to specimens themselves as the specimen itself cannot be returned via a query (unless it is a loan request?). This will result in a lot of GUIDs, distributed through a lot of systems. This may seem a concern due to objects in different systems referring to the "same" object but having different GUIDs. But it is not really a problem - all we have acheived at this stage is giving every data object in all systems a globally unique ID, and one that is hopefully resolvable. Whether they may refer to the same "entity"/"concept" as another object in the system does not matter at this point. (by "concept" I dont mean "taxon concept") Objects that refer to the same "concept", such as the same taxonomic name represented in two different systems, needs to be "cross referenced" in some other way. One way this could be done is by building up a table of mappings between systems where the qualified person in each of those systems has decided the objects refer to the same "concept". These mappings can then be returned as metadata for either of the GUIDs in the two systems, as "other data objects that refer to the same thing". This will result in a fairly complex global network of mappings, and will be difficulat to maintain. Another way to do this is to have a central authoritative table of "concepts" with GUIDs that all other synonymic objects in other systems point to. Then when an object is access via its GUID, part of the metadata will be the central "concept" GUID, and therefore can be compared to other objects in other systems with the same "concept" GUID. This will also allow queries to be executed against the different systems for example "give me all objects with the central concept GUID x". It seems DOI and LSIDs are the preferred options at this stage. I personally prefer LSIDs for several reasons including their standard URN format and the fixed 1:1 mapping of objects and IDs. I see that DOIs allow, through the handle system, to query for "different" objects of an entity depending on what you want to query for. Eg for a specimen, you could query for the image of that specimen. This allows multiple objects to be accessed through a single GUID (this also seems to be how a lot of people I have talked to "view" a GUID system to work). I can see this being useful in the biodiversity informatics world but I can also see problems it may cause - each of the objects accessed through the single DOI, should have their own GUIDs otherwise they will end up being referenced from somewhere "through" another object - the maintenance of the GUIDs then becomes more challenging. My main concern with LSIDs is the "must return the exact same set of bytes everytime" requirement of LSIDs. This can be overcome however by providing all the data in the metadata of the LSID (which seems a bit backward) and only returning a "label/name" that will never change for the data of the LSID. Otherwise the versioning component of LSIDs can be used to handle changes within the data. Kevin Richards ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ WARNING: This email and any attachments may be confidential and/or privileged. They are intended for the addressee only and are not to be read, used, copied or disseminated by anyone receiving them in error. If you are not the intended recipient, please notify the sender by return email and delete this message and any attachments. The views expressed in this email are those of the sender and do not necessarily reflect the official views of Landcare Research. Landcare Research http://www.landcareresearch.co.nz ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

1 0

Re: Status of LSID
by Roderic Page 27 Sep '05

27 Sep '05

The SourceForge site (http://lsid.sourceforge.net/ ) does contain quite a bit of background on the project, and some examples of how to implement clients and servers. At least some of the code is in production use. The web resolver provided by Biopathways.org (http://lsid.biopathways.org/resolver/ ) uses IBM's Perl code. The LTER LSID authority (http://lsid.limnology.wisc.edu/ ) builds on IBM's Java code. I'm assuming that BioMoby and Taverna are built on IBM's code. I also use LSIDs in my Taxonomic Search Engine. One can also gauge activity from browsing the CVS repository at SourceForge (http://cvs.sourceforge.net/viewcvs.py/lsid/ ). Most commits happened two years ago, but there are more recent commits (e.g., 7 days ago). This suggests IBM is still doing some work on it. Lastly, LSIDs are mentioned in a recent article on the Semantic Web and life sciences (http://dx.doi.org/10.1126/stke.2832005pe22). Regards Rod On 27 Sep 2005, at 09:44, Robert Huber wrote: > Dear all, > > I wonder a bit about the status of LSID. Some links lead to a > sourcefourge site, but there > is little or no information there. Some googling brougt me to a former > IBM site and to some > projects (these are also at the WIKI) which seem to promote and/or use > LSIDs. > > I am now a bit confused and would like to know how you judge the > current status of the LSID > system? Did it ever reach production status, is it still maintained > and did this system ever leave > the project status? > > best regards, > > Dr. Robert Huber > WDC-MARE / PANGAEA - www.pangaea.de, www.wdc-mare.org > Stratigraphy.net - www.stratigraphy.net > _____________________________________________ > MARUM - Institute for Marine Environmental Sciences (location) > University Bremen > Leobener Strasse > POP 330 440 > 28359 Bremen > Phone ++49 421 218-65593, Fax ++49 421 218-65505 > e-mail rhuber(a)wdc-mare.org, robert.huber(a)stratigraphy.net Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page(a)bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/

1 0

Status of LSID
by Robert Huber 27 Sep '05

27 Sep '05

Dear all, I wonder a bit about the status of LSID. Some links lead to a sourcefourge site, but there is little or no information there. Some googling brougt me to a former IBM site and to some projects (these are also at the WIKI) which seem to promote and/or use LSIDs. I am now a bit confused and would like to know how you judge the current status of the LSID system? Did it ever reach production status, is it still maintained and did this system ever leave the project status? best regards, Dr. Robert Huber WDC-MARE / PANGAEA - www.pangaea.de, www.wdc-mare.org Stratigraphy.net - www.stratigraphy.net _____________________________________________ MARUM - Institute for Marine Environmental Sciences (location) University Bremen Leobener Strasse POP 330 440 28359 Bremen Phone ++49 421 218-65593, Fax ++49 421 218-65505 e-mail rhuber(a)wdc-mare.org, robert.huber(a)stratigraphy.net

1 0

Re: interest in guid workshop
by Ricardo Scachetti Pereira 22 Sep '05

22 Sep '05

Matt, Many thanks for expressing your interest in our discussion and workshop on GUIDs. Also thanks for adding your project information to our wiki. As far as I'm concerned there is nothing that would exclude the data items you mention from the discussion (except for some wording on the wiki - sorry). As a matter of fact, one of the aspects of the discussion here has been not only to assign and resolve globally unique identifiers to data items but also to cross-reference them in a meaninful and useful way. Certainly field observations and experiments are part of that web. Again, welcome to our group. Regards, Ricardo Matt Jones wrote: >I sent an email the other day expressing my interest in this workshop, >but it didn't seem to go through to the list. In the meantime, I >created a wiki page with the answers to your questions about why I am >interested at: > >http://wiki.gbif.org/guidwiki/wikka.php?wakka=MattJones > >I also added descriptions of GUID issues for the SEEK EcoGrid and Kepler >projects and some details about the TCS. I also will eventually add in >information about use of LSIDs in the SEEK Taxonomic Object Server (TOS) >that complements the TCS work. > >http://wiki.gbif.org/guidwiki/wikka.php?wakka=ExistingProjects > >I guess my first question is to what extent GBIF and the members of this >list think that this discussion should include use cases outside >specimen collections? Much of my work with LSIDs has been focused on >archiving field observations and experiments done by ecologists and >environmental biologists. It seems like this should be in scope for >GBIF, but thus far the limited discussion on this list has focused on >specimen collections and taxonomic nomenclature/concept issues. Should >we broaden it? > >Thanks, and looking forward to the discussion. > >Matt >-- >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >Matt Jones >jones(a)nceas.ucsb.edu >National Center for Ecological Analysis and Synthesis (NCEAS) >UC Santa Barbara http://www.nceas.ucsb.edu/ecoinformatics >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > _______________________________________________________ Novo Yahoo! Messenger com voz: ligações, Yahoo! Avatars, novos emoticons e muito mais. Instale agora! www.yahoo.com.br/messenger/

1 0