I agree with Chuck that we should clarify the issues
he is pointing to in his statement here under.
My first mail was also intended to address more
generally the point of identifying versus localizing
not neccessarly specifically related to the ARK,LSID
or DOI initiatives.
Patricia
--- Chuck Miller <Chuck.Miller(a)MOBOT.ORG> wrote:
>
> I have been pondering this question of what exactly
> is meant by a GUID since
> Donald's first call for input.
>
> First, is it correct that GUID stands for Globally
> Unique Identifier?
>
> Second,
> What do we mean by Globally?
> What do we mean by Unique?
> What do we mean by Identifier? Or, specifically
> what are we identifying?
>
> I believe we are far from consensus on all three of
> those definitions, even
> the meaning of globally. And, I believe we will be
> going around in circles
> on LSIDs, ARNs, and such until we get the
> expectation more clearly defined.
>
> What I hear in most of the discussions so far are
> descriptions of a GLID -
> Globally Locatable IDentifier. In the Internet
> world, GLIDs started with
> the URL - Universal Resource Locator which has
> evolved to the URI -
> Universal Resource Identifier concept. Another form
> of URI is the URN -
> Uniform Resource Name which enables a persistent
> name, independent of server
> location. This is the kind of thing I think we want
> and are discussing in
> this GUID thread.
>
> I think we should draw a distinction between GUID
> and GLID.
>
> An identifier of a thing can be globally unique
> without stating its
> location. But, again, it raises the question of
> what the definition of
> unique is. An ISBN number identifies a book
> "uniquely", but there may be
> millions of "unique" copies of it. Similarly,
> duplicate sheets of a
> collected plant specimen are all from the same
> "unique" organism and may
> each even be referred to by the same "unique"
> collector and number. But,
> each sheet itself is also unique. We need a clear
> definition.
>
> An identifier of a place can also be globally
> unique, like a URL. But,
> being able to go to that place requires a global
> infrastructure to handle
> the addressing.
>
> Where it gets really messy is when we want an
> identifier of a thing that is
> unique but can move around to different places, like
> a URN. The addressing
> has to work like an administrative assistant who
> keeps tabs on where the
> staff is currently located so she can direct phone
> calls to them. Without
> the administrative assistant, people who move around
> can't be contacted. It
> looks like a lot of what LSID, ARN, and such seem to
> about is
> "administrative assistant" addressing schemes, how
> to navigate to the entity
> through layers of address abstraction. But, in each
> case it raises the
> issue of who/where is the administrative assistant,
> on top of the question
> of the addressing scheme itself.
>
> Shouldn't we get these definitions and expectations
> nailed down first? Then
> look at solutions?
>
> Chuck Miller
> Chief Information Officer
> Missouri Botanical Garden
> 4344 Shaw Boulevard
> Saint Louis, Missouri 63119
> Phone: 1-314-577-9419
> Cell: 1-314-614-6952
>
>
>
>
__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com
In my previous email I referred to problems in distinguishing data and
metadata. I'd like to elaborate here.
Traditional definitions of metadata are 'data about data' or 'data
documentation'. These are good definitions from a pragmatic standpoint,
but become somewhat less than helpful when trying to build real working
systems that utilize both data and metadata and try to preserve
replicability of analyses through versioning. A simple example will
illustrate.
Sometimes people record repeating information about data as separate
metadata (for example, the date on which data were collected). Other
times, they might include that information directly in their data model.
Take two entities, A and B:
Entity A:
---------
Metadata:
AttributeLabels = Site,Date,Abundance
Data:
Foo 20041010 19
Bar 20050712 20
Foo 20051010 20
Entity B:
-----------
Metadata:
AttributeLabels = Site,Abundance
CollectionDate = 20011002
Data:
Foo 24.3
Bar 21.3
Baz 20.4
Note that both entities contain the same information, but the second
places the date of collection as a metadata property, while the first
puts it in the data model. If one were to integrate these data entities
to produce a time-series plot of abundance by site, one would need to
extract the CollectionDate information from the metadata of entity B
before proceeding. Thus, for the purpose of an integrated analysis,
"CollectionDate" is really data.
Which way people model the information is somewhat of an arbitrary
decision, but typically comes down to looking at 1) rate of change of
the information across tuples, and 2) intended use of the final data.
So, to bring this back to the identifier issue. If one were to assign
an identifer to Entity A and another to Entity B, resolving the
identifier should allow one to retrieve the data. But in these two
cases the data that is returned will have different schemas and will
have different dependencies on the metadata. To do integrated analyses
of the two entities, one really needs to be able to utilize both the
metadata and the data together and be assured that both are consistent.
LSIDs require that the data retrieved from an LSID never changes to
guarantee replicability and persistence, but allows the metadata to
change. Clearly, if the 'CollectionDate' metadata for entity B were to
be changed any analyses that were performed using the original metadata
could no longer be replicated. This causes a lot of trouble for
analytical systems that emphasize provenance and lineage for derived
data products. It indicates that there is a strong case to be made that
metadata should be versioned as well and that both the metadata and data
associated with an identifier really should be immutable with respect to
the identifier. Any changes to data or metadata should require updates
to the identifier revision.
Matt
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Matt Jones
jones(a)nceas.ucsb.edu Ph: 907-789-0496
National Center for Ecological Analysis and Synthesis (NCEAS)
UC Santa Barbara http://www.nceas.ucsb.edu/ecoinformatics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From my perspective several of these issues are pretty clear. Let me
attempt answers below and see if we're all in agreement.
Chuck Miller wrote:
>
> I have been pondering this question of what exactly is meant by a GUID
> since Donald's first call for input.
The term GUID is one we started using in SEEK when looking for a
solution to the identity and resolution problems that we saw looming for
the Taxonomic COncept Standard. Dave Thau's presentation on this
(linked on the GUID wiki) defines this pretty well and explores the issues.
>
> First, is it correct that GUID stands for Globally Unique Identifier?
>
> Second,
> What do we mean by Globally?
> What do we mean by Unique?
"globally unique" means simply that an identifier that is issued can
only have one valid interpretation across all possible systems.
Regardless of the mechanism used to resolve the identifier, the object
that the id 'identifies' will be bit-for-bit identical. However, note
that a given object can legitimately have more than one identifier.
> What do we mean by Identifier? Or, specifically what are we identifying?
This is a bit trickier. We will clearly be identifying several
different types of objects, some of which are physical (e.g.,
specimens), and some of which are digital (e.g., observation data). On
the specimen side, resolution of the identifier and retrieving the
'data' makes no sense because the 'data' is a physical object that
cannot be electronically transported. On the 'digital' side, it makes
sense to resolve and retrieve the data. There are some tricky issues
dealing with granularity of the identifier for digital data (does the
identifier point at a tuple in an entity, or at a whole entity, or at
multiple entities). In addition you still have the very thorny issue of
what is data and what is metadata. I'll write another note regarding
this issue.
Matt
>
> I believe we are far from consensus on all three of those definitions,
> even the meaning of globally. And, I believe we will be going around in
> circles on LSIDs, ARNs, and such until we get the expectation more
> clearly defined.
>
> What I hear in most of the discussions so far are descriptions of a GLID
> - Globally Locatable IDentifier. In the Internet world, GLIDs started
> with the URL - Universal Resource Locator which has evolved to the URI -
> Universal Resource Identifier concept. Another form of URI is the URN -
> Uniform Resource Name which enables a persistent name, independent of
> server location. This is the kind of thing I think we want and are
> discussing in this GUID thread.
>
> I think we should draw a distinction between GUID and GLID.
>
> An identifier of a thing can be globally unique without stating its
> location. But, again, it raises the question of what the definition of
> unique is. An ISBN number identifies a book "uniquely", but there may
> be millions of "unique" copies of it. Similarly, duplicate sheets of a
> collected plant specimen are all from the same "unique" organism and may
> each even be referred to by the same "unique" collector and number. But,
> each sheet itself is also unique. We need a clear definition.
>
> An identifier of a place can also be globally unique, like a URL. But,
> being able to go to that place requires a global infrastructure to
> handle the addressing.
>
> Where it gets really messy is when we want an identifier of a thing that
> is unique but can move around to different places, like a URN. The
> addressing has to work like an administrative assistant who keeps tabs
> on where the staff is currently located so she can direct phone calls to
> them. Without the administrative assistant, people who move around
> can't be contacted. It looks like a lot of what LSID, ARN, and such seem
> to about is "administrative assistant" addressing schemes, how to
> navigate to the entity through layers of address abstraction. But, in
> each case it raises the issue of who/where is the administrative
> assistant, on top of the question of the addressing scheme itself.
>
> Shouldn't we get these definitions and expectations nailed down first?
> Then look at solutions?
>
> Chuck Miller
> Chief Information Officer
> Missouri Botanical Garden
> 4344 Shaw Boulevard
> Saint Louis, Missouri 63119
> Phone: 1-314-577-9419
> Cell: 1-314-614-6952
>
>
>
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Matt Jones
jones(a)nceas.ucsb.edu Ph: 907-789-0496
National Center for Ecological Analysis and Synthesis (NCEAS)
UC Santa Barbara http://www.nceas.ucsb.edu/ecoinformatics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I agree with all of Rod's points. I'd like to add some details about
LSID resolution as well.
In current practice, LSID relies on the DNS to locate the authority that
is to be used to resolve (localize) an LSID. This can be considered a
minor weakness (in that the DNS name is part of the identifier), but
also a major strength (the DNS is by far the most robust system we have
for persistent naming). For an LSID that might be issued by, e.g.,
gbif.org, a change in gbif.org's ability to maintain the authority can
be fixed by simply pointing gbif.org's SRV record for the lsid service
to a new authority. This is commonly recognized and used, and
represents a distributed resolution mechanism.
However, in addition, in the LSID spec the use of the DDDS (Dynamic
Delegation Discovery System) is also described top allow the use of a
centralized registry of resolvers. The DDDS uses NAPTR records to
associate the NID "lsid" with a particular DNS server, which is then
queried for more NAPTR records that specify rewriting rules to obtain
the DNS name of the authority that should be used for a given LSID. For
example, if the authority in an LSID is set to 'gbif.org', the rewriting
rules might turn that into the authority
'gbif.org.lsid.lsidauthority.org'. This allows the central
IANA-registered owner of the "lsid" NID to create a service that
overrides the DNS in particular cases to provide an alternative
authority in a centralized manner. It also allows the use of non-DNS
based authority strings (e.g., myauthority). This approch to resolution
is centralized in exactly the same manner that the centralized DOI and
ARK registries are. However, as far as I can tell there are no LSID
resolvers that utilize this capability, and I don't know if the "lsid"
NID has been registered with an NAPTR record or not . But nevertheless,
the LSID spec provides for both distributed and centralized authority
resolution, and so is a superset of the capabilities in DOI and ARK.
It also has the advantage of being Internet standards-based for all of
its resolution mechanisms.
Matt
Roderic Page wrote:
> I think this is a very nice statement of the issues.
>
> My own view is that ARK is interesting, but I'm not sure ARK is the best
> way forward. Persistence is a (perhaps the) key issue, and it is a
> social one not a technological one, as the DOI people make very clear.
> DOIs only work because the publishing industry has invested in the
> infrastructure to support them.
>
> In some ways, DOIs and ARK are very similar. If I use the DOI resolver
> to resolve a DOI
>
> http://dx.doi.org/10.1086/303303
> \--------/ \-----/ \----/
> | | |
> | Name Name
> Name mapping Assigning
> Authority Authority Number (NAAN)
> Hostport (NMAH)
>
> then I have a URL very like an ARK, where the authority assigning the
> name (such as a publisher, in this case the University of Chicago) is
> different from the authority makes the identifier actionable (doi.org).
> One could imagine that if DOI.org were to fall over, one could
> substitute another authority, such as doi.reborn.org. Indeed some
> publishers almost do essentially this, for example
> http://www.journals.uchicago.edu/cgi-bin/resolve?id=doi:10.1086/303303
> (although this will only resolve local DOIs). ARK simply makes this
> possibility explicit. LSIDs are more strongly tied to the DNS (the
> uniqueness of an LSID is partly guaranteed by using Internet domain
> names), although they do have limited support for foreign authorities
> (other providers that can serve metadata for objects that those
> providers don't actually own).
>
> ARK also adds the ability to retrieve a statement of commitment. I'm
> less impressed by this, as a statement is all very well, but will
> service providers actually honour it? I guess this is an issue of trust.
> I suspect that user's rating of service providers will be much more
> accurate than a rating provided by a service provider.
>
> One issue not on this list is who generates GUIDs? ARKs and DOIs require
> some degree of centralisation because both require unique identifiers
> for organisations providing data (e.g., 10.10086 identifies the
> University of Chicago Press). This in itself requires some degree of
> service commitment. LSIDs are decentralised, in that the unique
> identifier for an organisation is provided by the DNS. If, for example,
> GBIF took on the role of providing unique identifiers for organisations,
> but then closed due to funding issues (heaven forbid), then we have a
> problem. If the DNS goes belly up, then we will have much more pressing
> issues to worry about...
>
>
> Regards
>
> Rod
>
> On 11 Oct 2005, at 15:37, Donald Hobern wrote:
>
>
> [ I will be trying to provide some structure to discussions in this
> mailing list by raising specific topics and looking for comments.
> Please keep the Topic number in responses ]
>
> Topic 1: What do we mean by GUID?
>
> The most fundamental thing that we need to establish as we consider
> a GUID implementation is a definition for “GUID” in this context.
> We have been using a number of terms to describe the identifiers we
> need (unique, resolvable, persistent, etc.).
>
> I’ve been spending some time following up on Rod Page’s
> recommendation that we consider the use of Archival Resource Keys
> (ARK) from the California Digital Library (see
> http://wiki.gbif.org/guidwiki/wikka.php?wakka=ARK). The CDL web
> site includes an excellent overview of this GUID model, which also
> serves as an excellent introduction to the issues involved. I would
> urge you all to read this document – it’s only nine pages long!):
>
> http://www.cdlib.org/inside/diglib/ark/arkcdl.pdf
>
> This document arrives at the following problem definition for
> persistent, actionable identifiers:
>
> 1 The goal: long-term actionable identifiers.
> a Requirement: that identifiers deliver you to objects (where
> feasible).
> b Requirement: that identifiers deliver you to object metadata.
> c Desirable: each object should wear its own identifier.
> d Requirement: that identifiers deliver you to statements of
> commitment.
> 2 The problem: URLs break for some objects (that is, associations
> between URLs and objects are not maintained), and we have no way to
> tell which ones will or won’t break.
> 3 Why URLs break: because objects are moved, removed, and replaced –
> completely normal activities – and the provider in each case
> demonstrates insufficient commitment to update indirection tables,
> or to plan identifier assignment carefully. Persistence is in the
> mission of few organizations.
> 4 Conventional hypothesis: use indirect names (PURLs, URNs, Handles)
> instead of URLs; what worked for DNS should work for digital object
> references. Wrong. Indirection is spectacularly successful and
> elegant in DNS, but it’s a side issue in the provision of digital
> object persistence.
>
> This document clearly identifies issues around provider service
> commitments as the key problem that needs solving. The construction
> of ARKs seeks to address this in a couple of ways. It separates the
> role of Name Assigning Authority (i.e. who initially assigns the
> identifier) from that of the Name Mapping Authority (i.e. who is
> able to map the identifier to the data object at any particular
> time). It also defines a simple standard relationship between three
> things: the data object, the metadata for the object, and a
> commitment statement from the provider as to what aspects of
> persistence are guaranteed.
>
> ARK is a technology that we have not really considered up to this
> point. My question for discussion is what, if anything, is missing
> or wrong about the problem definition provided in this document? If
> we agree that it provides a crisp definition of what we need, that
> in itself will be a major step forward.
>
> Please provide your thoughts.
>
> Donald
>
> ---------------------------------------------------------------
> Donald Hobern (dhobern(a)gbif.org)
> Programme Officer for Data Access and Database Interoperability
> Global Biodiversity Information Facility Secretariat
> Universitetsparken 15, DK-2100 Copenhagen, Denmark
> Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480
> ---------------------------------------------------------------
>
>
> Professor Roderic D. M. Page
> Editor, Systematic Biology
> DEEB, IBLS
> Graham Kerr Building
> University of Glasgow
> Glasgow G12 8QP
> United Kingdom
>
> Phone: +44 141 330 4778
> Fax: +44 141 330 2792
> email: r.page(a)bio.gla.ac.uk
> web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
> reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
>
> Subscribe to Systematic Biology through the Society of Systematic
> Biologists Website: http://systematicbiology.org
> Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/
>
>
>
>
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Matt Jones
jones(a)nceas.ucsb.edu Ph: 907-789-0496
National Center for Ecological Analysis and Synthesis (NCEAS)
UC Santa Barbara http://www.nceas.ucsb.edu/ecoinformatics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[ I will be trying to provide some structure to discussions in this mailing
list by raising specific topics and looking for comments. Please keep the
Topic number in responses ]
Topic 1: What do we mean by GUID?
The most fundamental thing that we need to establish as we consider a GUID
implementation is a definition for "GUID" in this context. We have been
using a number of terms to describe the identifiers we need (unique,
resolvable, persistent, etc.).
I've been spending some time following up on Rod Page's recommendation that
we consider the use of Archival Resource Keys (ARK) from the California
Digital Library (see http://wiki.gbif.org/guidwiki/wikka.php?wakka=ARK).
The CDL web site includes an excellent overview of this GUID model, which
also serves as an excellent introduction to the issues involved. I would
urge you all to read this document - it's only nine pages long!):
http://www.cdlib.org/inside/diglib/ark/arkcdl.pdf
This document arrives at the following problem definition for persistent,
actionable identifiers:
1. The goal: long-term actionable identifiers.
a. Requirement: that identifiers deliver you to objects (where
feasible).
b. Requirement: that identifiers deliver you to object metadata.
c. Desirable: each object should wear its own identifier.
d. Requirement: that identifiers deliver you to statements of
commitment.
2. The problem: URLs break for some objects (that is, associations
between URLs and objects are not maintained), and we have no way to tell
which ones will or won't break.
3. Why URLs break: because objects are moved, removed, and replaced -
completely normal activities - and the provider in each case demonstrates
insufficient commitment to update indirection tables, or to plan identifier
assignment carefully. Persistence is in the mission of few organizations.
4. Conventional hypothesis: use indirect names (PURLs, URNs, Handles)
instead of URLs; what worked for DNS should work for digital object
references. Wrong. Indirection is spectacularly successful and elegant in
DNS, but it's a side issue in the provision of digital object persistence.
This document clearly identifies issues around provider service commitments
as the key problem that needs solving. The construction of ARKs seeks to
address this in a couple of ways. It separates the role of Name Assigning
Authority (i.e. who initially assigns the identifier) from that of the Name
Mapping Authority (i.e. who is able to map the identifier to the data object
at any particular time). It also defines a simple standard relationship
between three things: the data object, the metadata for the object, and a
commitment statement from the provider as to what aspects of persistence are
guaranteed.
ARK is a technology that we have not really considered up to this point. My
question for discussion is what, if anything, is missing or wrong about the
problem definition provided in this document? If we agree that it provides
a crisp definition of what we need, that in itself will be a major step
forward.
Please provide your thoughts.
Donald
---------------------------------------------------------------
Donald Hobern (dhobern(a)gbif.org)
Programme Officer for Data Access and Database Interoperability
Global Biodiversity Information Facility Secretariat
Universitetsparken 15, DK-2100 Copenhagen, Denmark
Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480
---------------------------------------------------------------
Dear Donald and colleagues from the GUID group
Please find some comments here below
There are two concepts which may be kept separated:
Identify an item (object or representations of the
object) which should be unique and stable which can be
understood as a GUID.
Localisation of an object or its representation which
may and often is not stable.
Many of the tools and systems suggested so far often
mix up these two concepts and intend to identify and
localise at the same time. Problems than occur often
because localisation is not stable and apparently not
workable as a GUID.
Maybe for solving the problem, we could try for
GBIF/TDWG to keep these two concepts separated and
look for a system which is most appropriate to
identify uniquely and in a stable way the unit level
data. This can be an Accession number with no meaning
like those used for Sequence data.
Localisation of the object can be dealt with as being
information about the object that can change in time
like any other.
Best regards
Patricia Mergen
--- Donald Hobern <dhobern(a)GBIF.ORG> wrote:
> [ I will be trying to provide some structure to
> discussions in this mailing
> list by raising specific topics and looking for
> comments. Please keep the
> Topic number in responses ]
>
>
>
> Topic 1: What do we mean by GUID?
>
>
>
> The most fundamental thing that we need to establish
> as we consider a GUID
> implementation is a definition for "GUID" in this
> context. We have been
> using a number of terms to describe the identifiers
> we need (unique,
> resolvable, persistent, etc.).
>
>
>
> I've been spending some time following up on Rod
> Page's recommendation that
> we consider the use of Archival Resource Keys (ARK)
> from the California
> Digital Library (see
> http://wiki.gbif.org/guidwiki/wikka.php?wakka=ARK).
> The CDL web site includes an excellent overview of
> this GUID model, which
> also serves as an excellent introduction to the
> issues involved. I would
> urge you all to read this document - it's only nine
> pages long!):
>
>
>
> http://www.cdlib.org/inside/diglib/ark/arkcdl.pdf
>
>
>
> This document arrives at the following problem
> definition for persistent,
> actionable identifiers:
>
>
>
> 1. The goal: long-term actionable identifiers.
>
> a. Requirement: that identifiers deliver you to
> objects (where
> feasible).
> b. Requirement: that identifiers deliver you to
> object metadata.
> c. Desirable: each object should wear its own
> identifier.
> d. Requirement: that identifiers deliver you to
> statements of
> commitment.
>
> 2. The problem: URLs break for some objects
> (that is, associations
> between URLs and objects are not maintained), and we
> have no way to tell
> which ones will or won't break.
> 3. Why URLs break: because objects are moved,
> removed, and replaced -
> completely normal activities - and the provider in
> each case demonstrates
> insufficient commitment to update indirection
> tables, or to plan identifier
> assignment carefully. Persistence is in the mission
> of few organizations.
> 4. Conventional hypothesis: use indirect names
> (PURLs, URNs, Handles)
> instead of URLs; what worked for DNS should work for
> digital object
> references. Wrong. Indirection is spectacularly
> successful and elegant in
> DNS, but it's a side issue in the provision of
> digital object persistence.
>
>
>
> This document clearly identifies issues around
> provider service commitments
> as the key problem that needs solving. The
> construction of ARKs seeks to
> address this in a couple of ways. It separates the
> role of Name Assigning
> Authority (i.e. who initially assigns the
> identifier) from that of the Name
> Mapping Authority (i.e. who is able to map the
> identifier to the data object
> at any particular time). It also defines a simple
> standard relationship
> between three things: the data object, the metadata
> for the object, and a
> commitment statement from the provider as to what
> aspects of persistence are
> guaranteed.
>
>
>
> ARK is a technology that we have not really
> considered up to this point. My
> question for discussion is what, if anything, is
> missing or wrong about the
> problem definition provided in this document? If we
> agree that it provides
> a crisp definition of what we need, that in itself
> will be a major step
> forward.
>
>
>
> Please provide your thoughts.
>
>
>
> Donald
>
>
---------------------------------------------------------------
> Donald Hobern (dhobern(a)gbif.org)
> Programme Officer for Data Access and Database
> Interoperability
> Global Biodiversity Information Facility Secretariat
> Universitetsparken 15, DK-2100 Copenhagen, Denmark
> Tel: +45-35321483 Mobile: +45-28751483 Fax:
> +45-35321480
>
---------------------------------------------------------------
>
>
>
>
__________________________________
Start your day with Yahoo! - Make it your home page!
http://www.yahoo.com/r/hs
A few thoughts...
To me, the only way I can see GUIDs being used effectively is to assign a GUID to all data objects that will be exposed outside the local system where it is stored, eg database records, image files, document files - I cant see how you could assign GUIDs to specimens themselves as the specimen itself cannot be returned via a query (unless it is a loan request?). This will result in a lot of GUIDs, distributed through a lot of systems. This may seem a concern due to objects in different systems referring to the "same" object but having different GUIDs. But it is not really a problem - all we have acheived at this stage is giving every data object in all systems a globally unique ID, and one that is hopefully resolvable. Whether they may refer to the same "entity"/"concept" as another object in the system does not matter at this point. (by "concept" I dont mean "taxon concept")
Objects that refer to the same "concept", such as the same taxonomic name represented in two different systems, needs to be "cross referenced" in some other way. One way this could be done is by building up a table of mappings between systems where the qualified person in each of those systems has decided the objects refer to the same "concept". These mappings can then be returned as metadata for either of the GUIDs in the two systems, as "other data objects that refer to the same thing". This will result in a fairly complex global network of mappings, and will be difficulat to maintain. Another way to do this is to have a central authoritative table of "concepts" with GUIDs that all other synonymic objects in other systems point to. Then when an object is access via its GUID, part of the metadata will be the central "concept" GUID, and therefore can be compared to other objects in other systems with the same "concept" GUID. This will also allow queries to be executed against the different systems for example "give me all objects with the central concept GUID x".
It seems DOI and LSIDs are the preferred options at this stage. I personally prefer LSIDs for several reasons including their standard URN format and the fixed 1:1 mapping of objects and IDs. I see that DOIs allow, through the handle system, to query for "different" objects of an entity depending on what you want to query for. Eg for a specimen, you could query for the image of that specimen. This allows multiple objects to be accessed through a single GUID (this also seems to be how a lot of people I have talked to "view" a GUID system to work). I can see this being useful in the biodiversity informatics world but I can also see problems it may cause - each of the objects accessed through the single DOI, should have their own GUIDs otherwise they will end up being referenced from somewhere "through" another object - the maintenance of the GUIDs then becomes more challenging.
My main concern with LSIDs is the "must return the exact same set of bytes everytime" requirement of LSIDs. This can be overcome however by providing all the data in the metadata of the LSID (which seems a bit backward) and only returning a "label/name" that will never change for the data of the LSID. Otherwise the versioning component of LSIDs can be used to handle changes within the data.
Kevin Richards
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
WARNING: This email and any attachments may be confidential and/or
privileged. They are intended for the addressee only and are not to be read,
used, copied or disseminated by anyone receiving them in error. If you are
not the intended recipient, please notify the sender by return email and
delete this message and any attachments.
The views expressed in this email are those of the sender and do not
necessarily reflect the official views of Landcare Research.
Landcare Research
http://www.landcareresearch.co.nz
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
The SourceForge site (http://lsid.sourceforge.net/ ) does contain quite
a bit of background on the project, and some examples of how to
implement clients and servers.
At least some of the code is in production use. The web resolver
provided by Biopathways.org (http://lsid.biopathways.org/resolver/ )
uses IBM's Perl code. The LTER LSID authority
(http://lsid.limnology.wisc.edu/ ) builds on IBM's Java code. I'm
assuming that BioMoby and Taverna are built on IBM's code. I also use
LSIDs in my Taxonomic Search Engine.
One can also gauge activity from browsing the CVS repository at
SourceForge (http://cvs.sourceforge.net/viewcvs.py/lsid/ ). Most
commits happened two years ago, but there are more recent commits
(e.g., 7 days ago). This suggests IBM is still doing some work on it.
Lastly, LSIDs are mentioned in a recent article on the Semantic Web and
life sciences (http://dx.doi.org/10.1126/stke.2832005pe22).
Regards
Rod
On 27 Sep 2005, at 09:44, Robert Huber wrote:
> Dear all,
>
> I wonder a bit about the status of LSID. Some links lead to a
> sourcefourge site, but there
> is little or no information there. Some googling brougt me to a former
> IBM site and to some
> projects (these are also at the WIKI) which seem to promote and/or use
> LSIDs.
>
> I am now a bit confused and would like to know how you judge the
> current status of the LSID
> system? Did it ever reach production status, is it still maintained
> and did this system ever leave
> the project status?
>
> best regards,
>
> Dr. Robert Huber
> WDC-MARE / PANGAEA - www.pangaea.de, www.wdc-mare.org
> Stratigraphy.net - www.stratigraphy.net
> _____________________________________________
> MARUM - Institute for Marine Environmental Sciences (location)
> University Bremen
> Leobener Strasse
> POP 330 440
> 28359 Bremen
> Phone ++49 421 218-65593, Fax ++49 421 218-65505
> e-mail rhuber(a)wdc-mare.org, robert.huber(a)stratigraphy.net
Professor Roderic D. M. Page
Editor, Systematic Biology
DEEB, IBLS
Graham Kerr Building
University of Glasgow
Glasgow G12 8QP
United Kingdom
Phone: +44 141 330 4778
Fax: +44 141 330 2792
email: r.page(a)bio.gla.ac.uk
web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic
Biologists Website: http://systematicbiology.org
Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/
Dear all,
I wonder a bit about the status of LSID. Some links lead to a sourcefourge
site, but there
is little or no information there. Some googling brougt me to a former IBM
site and to some
projects (these are also at the WIKI) which seem to promote and/or use
LSIDs.
I am now a bit confused and would like to know how you judge the current
status of the LSID
system? Did it ever reach production status, is it still maintained and did
this system ever leave
the project status?
best regards,
Dr. Robert Huber
WDC-MARE / PANGAEA - www.pangaea.de, www.wdc-mare.orgStratigraphy.net - www.stratigraphy.net
_____________________________________________
MARUM - Institute for Marine Environmental Sciences (location)
University Bremen
Leobener Strasse
POP 330 440
28359 Bremen
Phone ++49 421 218-65593, Fax ++49 421 218-65505
e-mail rhuber(a)wdc-mare.org, robert.huber(a)stratigraphy.net
Matt,
Many thanks for expressing your interest in our discussion and
workshop on GUIDs. Also thanks for adding your project information to
our wiki.
As far as I'm concerned there is nothing that would exclude the data
items you mention from the discussion (except for some wording on the
wiki - sorry).
As a matter of fact, one of the aspects of the discussion here has
been not only to assign and resolve globally unique identifiers to data
items but also to cross-reference them in a meaninful and useful way.
Certainly field observations and experiments are part of that web.
Again, welcome to our group.
Regards,
Ricardo
Matt Jones wrote:
>I sent an email the other day expressing my interest in this workshop,
>but it didn't seem to go through to the list. In the meantime, I
>created a wiki page with the answers to your questions about why I am
>interested at:
>
>http://wiki.gbif.org/guidwiki/wikka.php?wakka=MattJones
>
>I also added descriptions of GUID issues for the SEEK EcoGrid and Kepler
>projects and some details about the TCS. I also will eventually add in
>information about use of LSIDs in the SEEK Taxonomic Object Server (TOS)
>that complements the TCS work.
>
>http://wiki.gbif.org/guidwiki/wikka.php?wakka=ExistingProjects
>
>I guess my first question is to what extent GBIF and the members of this
>list think that this discussion should include use cases outside
>specimen collections? Much of my work with LSIDs has been focused on
>archiving field observations and experiments done by ecologists and
>environmental biologists. It seems like this should be in scope for
>GBIF, but thus far the limited discussion on this list has focused on
>specimen collections and taxonomic nomenclature/concept issues. Should
>we broaden it?
>
>Thanks, and looking forward to the discussion.
>
>Matt
>--
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>Matt Jones
>jones(a)nceas.ucsb.edu
>National Center for Ecological Analysis and Synthesis (NCEAS)
>UC Santa Barbara http://www.nceas.ucsb.edu/ecoinformatics
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>
>
_______________________________________________________
Novo Yahoo! Messenger com voz: ligações, Yahoo! Avatars, novos emoticons e muito mais. Instale agora!
www.yahoo.com.br/messenger/