Different reasons for different GUIDs
Peter.Dawyndt at UGENT.BE
Wed Sep 14 10:37:52 CEST 2005
Roderic wrote: "I say 'same' because, leaving taxon-concepts aside, the presence
of homonyms between the codes makes mapping names as strings problematic"
Richard replied: "Indeed! One of the main reasons for using GUIDs instead of
text strings in the first place!!"
There's two thing in getting rid of homonymy. The first is entity
identification, which is indeed as Richard said what the assignment of unique
identifiers is about. The second thing is name resolution, ie. the name to the
correctly identified entity. And as I said the latter might either need
user-intervention (question raised from the resolution system) or
context-dependent resolution. Here's a simple example that has nothing to do
Suppose there are three people called "Roderic Page" on this planet, one in the
UK, one in Hawaii and one in Belgium. Let's consider we can take their social
security number (SSN) as a global unique identifier, so each of the three will
have been assigned a different SSN. If we have their SSN, then their identity is
guaranteed to be unique. If we lookup their identity by means of their name
(knowing that there is homonymy) than will come up with three possible people.
So we will need additional information to resolve between the options. Either we
can ask about their country of origin, which would be sufficient in this case
for resolution, or if the name appears in a text we could extract enough
contextual information to single out original name to one of the possible
entities. This is what the human brain would do to resolve names to concepts
they know, and this is exactly what computers do when performing
In the case of taxonomic names, apart from the name itself, the original
publication, authors of the publication, or whatever other information could be
considered as contextual information that helps resolving the name. What would
be sufficient (at the end there should be one entity left after resolution) and
necessary (we do not want to require too much context information, otherwise the
resolution might become unpractical) context information for resolving taxonomic
names, I leave to the experts in the field.
As was pointed out before by someone: the GUIDs help to 'remember' the outcome
of the resolution, so that each instance of the name needs to be resolved only
once and is replaced (at least through the eyes of the information system) by
the corresponding GUID.
One also has to take into account that once there is a clearly established
system of globally unique identifiers that is accepted by a community, than all
of the problems are resolved if the assignment of identifiers for new entities
is carefully organized. All we are stuck with is the legacy of the past, as to
assign the identifiers to the occurrences of the names that carried no
identifiers with them.
More information about the tdwg-tag