Different reasons for different GUIDs

Wed Sep 14 08:31:41 CEST 2005

Rod,

> Of course, an electronic version of a paper, and its abstract stored in
> another database are not the same thing (at least, not electronically).
> This is an example were a simple cross reference can link the two GUIDs
> (in many cases, the PubMed record provides this cross reference).

there's a thin line between entities and instances, and dependent on the
application some 'things' that were previously regarded as instances might be
raised to the entity level. The Object Management Group (OMG) has created a nice
standard for that [Cattell R & Barry D (1997). The Object Database Standard:
ODMG 2.0. Morgan Kaufmann Publishers] which is regarded as the 'standard model'
for object oriented design. They use a slightly different terminology where-in
they use the term 'first class object' for 'entity' (an object carrying
identity; in other words has a unique identifier assigned to it) and a 'literal'
(an object without identity as an instance that does not need identity, because
no tracking and tracing on occurrences of the object are regarded necessary).
However, at some point in time some applications might need tracking and tracing
of literals, so that they are turned into first-class object by assigning
identity to them, and finding instances of their occurrence (this is exactly the
task that is ahead of us to turn life science literals (eg. pure names that had
no previous tracking capabilities) into first-class objects (with a GUID
assigned to them).

In the example you mention, my intuition would say that in this particular case
their is no need to assign two different GUIDs to both the full text version and
the abstract version of the paper. What we need here I think is the application
of what is now commonly seen as a need for multiple resolution, where a single
identifier (in this case a GUID referencing a 'publication' object) is made
actionable, in such a way that several services can be attached to the single
identifier. In the case you mentioned, this would be one service that directs
the user to the full paper version of the article (if he has authorization to do
so) and a second service that directs the user to the abstract version of the
paper.

Only at the application level should then be decided how these services are
used. One possibility would be give the user a list of all possible services
(eg. do you want full version or abstract only). Another could be to check
wether the user has authorization to see the full paper version of the paper. If
so, then make use of the full paper version service, otherwise use the abstract
only service.

This issue is termed 'making an identifier actionable' in the DOI handbook,
which I recommend as mandatory reading for all the people in this discussion
group. In not only deals with technical issues on the introduction of GUIDs, but
also tackles social and legal (IPR) implications. Also the business plan of DOI
is well documented.

> Regarding mapping between GUIDs, I suspect the mapping may get very
> complicated, and a simple tree might not be appropriate. Attached is an
> example of relationships between names in MOBOT, based on RDF I
> generate for LSIDs for MOBOT names.

This is a nice example of an entity-relationship diagram, where indeed the
relationships between entities can be much more complicated than only a
tree-like structure. What I was trying to explain in a previous mailed was
merely a system to manage identity (uniqueness) of entities alone. The example
you have shown, reminds me of the work done by George Garrity on the taxonomy of
bacteria, for which he has made a nicely animated presentation that can be
downloaded from

http://www.cpdr.ucl.ac.be/bioinf/papers/bioinf/BrusselsGarrity0710052b.ppt

this also shows the complex interrelationsships that have been created during
the establishment of new taxon concepts and names.

Peter Dawyndt