Hi all,
Do we move to fast into the mechanism on how to resolve the GUID before we have defined how the GUID will look like? Sorry if I am to jump too fast on this train in my remarks below.
I think that what Arthur and Roger was touching, is that we already have various production mechanisms for resolving metadata object relations implemented. Many organizations already have systems established and populated with metadata and logic to resolve information and relation for the objects that we will assign GUIDs to? I believe this is in part why we are interested to come up with an improved method to address this task?
I know we in the European seed bank community have discussed even the option to receive and redirect requests on living biological material (seeds) from crop metadata portals. The portals themselves do not own (or control) either the metadata nor the physical seed samples. There are also services providing additional or corrections to the published metadata on the seed samples. All these activities struggle to achieve a stable link back to the original physical sample and would be significantly improved by a stable GUID mechanism to resolve this link. There are methods and logic implemented to address this problem. Often the link back is lost, and often severe manual work is done to currate the link between these objects.
So I think there would be a transition period to migrate the logic to RDFs and to re-engineer the mechanisms to resolve the link between the metadata objects and between the metadata and the physical sample. I am not very well into the theory of the RDF and need to read up before I would be able to start to migrate the logic we have established to resolve relationsships in the seed bank community to RDF. I would however be happy to do the reading if this is the standard GUID representation we will go for. I hope however we will not build the GUID resolver so complex that knowledge about how to use the GUID will be too much of a limitation...? For example if the GUID itself would be something you could paste directly into a browser and get information about the object, would less user threshold than if you would need to install some kind of RDF client to be able to "use" the GUID? We should perhaps start with a very basic GUID solution and remember to think about how this can be integrated with existing mechanisms. Not all existing solutions involve schemas described in XML. And I also believe that migrating even logic described in XML to RDFs would provide us with some challenges? This is why the call for use cases by Donald is important?
Cheers Dag Terje
Quoting Roderic Page r.page@BIO.GLA.AC.UK:
On 25 Nov 2005, at 15:34, Roger Hyam wrote:
But wait... there is more. As Arthur points out we already have
most
of this stuff defined. TCS encapsulates a whole load of semantics
about nomenclatural relationship (types of type etc) and
TaxonConcept
relationship (child taxon of, hybrid parent of etc) and ABCD has similar knowledge about collections. A great deal of
re-engineering
and transition is involved. We mustn't go throwing any babies out
with
the bath water.
Maybe I'm missing something (since I've avoided XML schema like the
plague), but given that RDF is (usually) expressed in XML, shouldn't we be able with a minimum of fuss to "port" the relevant bits from an XML schema to an RDF schema? Maybe I'm being naive but I don't see why this is such an engineering nightmare.
Also serving this stuff may be problematic....
Serving what stuff?
So yes GUID+RDF seems to solve most every problem just at the
moment.
Roger
Arthur Chapman wrote:Rod
This is a neat solution and may well work. I like it!
It is somewhat akin to the "Relation" element in Dublin Core but
which has generally not been implemented with a controlled
vocabulary
as was recommended at the Canberra meeting of Dublin Core in about
1996 or 1997.
It was implemented in the Australian Government Locator Service (AGLS) as Australian Standard AS5044 with a controlled vocabulary.
The vocabulary is not what we would need, but gives a parallel example
isVersionOf hasVersion isReplacedBy replaces isRequiredBy requires isPartOf isReferencedBy isFormatOf hasFormat isBasisFor isBasedOn
http://www.naa.gov.au/recordkeeping/gov_online/agls/ AGLS_reference_description.pdf
Cheers
Arthur
From Roderic Page r.page@BIO.GLA.AC.UK on 25 Nov 2005:
These relationships would be specified in the metadata attached
to
the GUIDs, not the GUIDs themselves (they are simply unique
identifiers).
For example, if we think of you tax number/Social Security Number/National Insurance Number (insert whatever identifier
your
government attaches to you here), then you could have two GUIDs
such
as
JE 5679434A
and
JH 5679434B
The metadata for JE 5679434A could contain a statement that the individuals are related, e.g. something like
<rdf:Description rdf:about="JE 5679434A"> <isMarriedTo rdf:resource ="JH 5679434B" /> </rdf:Description>
In other words, the person identifed by "JE 5679434A" is married
to
the
person identified by "JH 5679434B".
One can develop ontologies that specify these relationships, and
enable
us to deduce other facts. For example, if X is married to Y, then
Y
is married to X, but if Z is a child of Y, Y is the parent of Z, and
so
on. What is nice is that you wouldn't have to explicitly state
that Y
is the parent of Z in the metadata Y, it can be inferred from
the
relationship Z is a child of Y.
I use RDF here because these are the kind of things it handles nicely. All (!) you'd need is a consistent vocabulary to describe the relationships. RDF already has some basic ones ("sameAs", "subPropertyOf", etc.). In the examples you provide, I guess
you'd
want
"part of", "extracted from", "hosted by", "parent of", "mother
of",
etc.
Does this help?
Regards
Rod
On 25 Nov 2005, at 11:18, Arthur Chapman wrote:
Below I have placed two scenarios that show some of the cross-discipline problems I believe we face with GUIDs. They
don't
provide the answers, alas!
It would appear to me that each of these separate entities need
a
GUID; but that each needs to show some relationship (nearly a genealogy or pedigree line) - child of (i.e. derived from);
brother
of>
(duplicate collection); sister of (wet collection); part of
(genetic
study) etc. Can these be built into a GUID?
If we just look at the simplest problem, where a herbarium makes
a
collection and sends out duplicates to other herbaria. More
often
than not, the duplicates are distributed prior to receiving a catalogue number in the originating ionstitution. We can only
thus
identify duplicates using collector name and number, but these
are
not>
always unique, and not all collectors use numbers. - We can't
use
the
lat/long coordinates as these are often put on after
distribution
and
are often different (one collection I looked at in 5 different herbaria was given 4 different lat/longs). The resolution of
many of
these duplicates will need to be a human problem - possibly
helped
by
parsing routines similar those being developed for location information in the BioGeomancer project, and possibly some artificial
intelligence (to sort out collector's names used in different
ways,
etc. - initials first/surname first, etc.).
I wish I could supply the answers!
These scenarios don't show up all that well in text, I have
also
attached a word document.
PLANT
- Collector Makes collection
a. Provides collector number (not always Unique) <Fred 123> i. Submits collection to Herbarium
- Herbarium supplies collection number <Index
Herbarium-CANB12345> >
- and a name <TCS-123454>
a. Herbarium distributes collections to other herbaria i. New herbaria supply collection numbers <IH-NY65432;
IH-MO34562; IH-K98765>
=== message truncated ==
--
Roger Hyam Technical Architect Taxonomic Databases Working Group
http://www.tdwg.org roger@tdwg.org
+44 1578 722782
------------------------------------------------------------------------
Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom
Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page@bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/ Find out what we know about a species at http://ispecies.org