We need to decide under which circumstances we want this to be a bidirectional implication, i.e. that:
B) O1 == O2 --> I1 == I2
My 30 second reaction to this is that it assumes that the metadata attached to I1 and I2 are the same, which may often not be the case.
Rod is of course right. The example I gave was a major oversimplification. I would say that we need to distinguish several more levels in order to understand what we mean by "the same data object".
Let me expand a little with some extra symbols.
O1 and O2 are the (real-world) objects or events to which we wish to refer in our computer systems. Examples would be a specimen in a collection, or a nomenclatural act.
D1 and D2 are digital representations of O1 and O2. This immediately raises a large number of questions which are probably external to the development of a basic GUID infrastructure, but which need to be addressed in applicability statements for different subdomains. We need to be sure that we agree on what is a data representation for the object/event and the extent to which it must have a standard form (including whether or not identity of byte streams matters to us). In some cases it may be hard for us to identify anything that we regard as essential to a canonical digital representation of an object/event. (This has already been identified as a problem that we would need to address with LSIDs.) It is also worth noting that there are cases in which distinguishing between objects and events and their digital representations is difficult or perhaps meaningless. Some objects of interest to us may not exist except in a digital form.
M1 and M2 are representations of metadata that describe D1 and D2. Again the distinction between digital representations and associated metadata is rarely completely clear (applicability statements again needed).
We then have the identifiers I1 and I2. In different situations these may serve as identifiers for O1 and O2 or for D1 and D2 or for D1+M1 and D2+M2. None of these is necessarily right or wrong. Appropriate practices must be defined in each case.
The meaning of the identity I1==I2 will vary according to these defined practices. If I1==I2-->O1==O2 (and nothing more), we may retrieve different data records and metadata as alternative resolutions of the same identifier.
Returning full circle to my original point, the inequality I1!=I2 may tell us that M1 and M2 differ, or that M1 and M2 and D1 and D2 differ, or that M1 and M2 and D1 and D2 and O1 and O2 differ, or in other cases may imply none of the above. We need to be sure which of these inferences we need to support for each subdomain.
On top of this we also need to consider whether the same identifier should be used for D1 and for D1', where D1' is a later version of D1 (with some corrections or modifications of the data elements).
I hope this makes some things clearer for someone. It helps me with thinking about the problems.
Thanks,
Donald
--------------------------------------------------------------- Donald Hobern (dhobern@gbif.org) Programme Officer for Data Access and Database Interoperability Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480 ---------------------------------------------------------------
participants (1)
-
Donald Hobern