Two Scenarios

Fri Nov 25 18:44:16 CET 2005

Hi all,

Do we move to fast into the mechanism on how to resolve the GUID before
we have defined how the GUID will look like? Sorry if I am to jump too
fast on this train in my remarks below.

I think that what Arthur and Roger was touching, is that we already have
various production mechanisms for resolving metadata object relations
implemented. Many organizations already have systems established and
populated with metadata and logic to resolve information and relation
for the objects that we will assign GUIDs to? I believe this is in part
why we are interested to come up with an improved method to address
this task?

I know we in the European seed bank community have discussed even the
option to receive and redirect requests on living biological material
(seeds) from crop metadata portals. The portals themselves do not own
(or control) either the metadata nor the physical seed samples. There
are also services providing additional or corrections to the published
metadata on the seed samples. All these activities struggle to achieve
a stable link back to the original physical sample and would be
significantly improved by a stable GUID mechanism to resolve this link.
There are methods and logic implemented to address this problem. Often
the link back is lost, and often severe manual work is done to currate
the link between these objects.

So I think there would be a transition period to migrate the logic to
RDFs and to re-engineer the mechanisms to resolve the link between the
metadata objects and between the metadata and the physical sample. I am
not very well into the theory of the RDF and need to read up before I
would be able to start to migrate the logic we have established to
resolve relationsships in the seed bank community to RDF. I would
however be happy to do the reading if this is the standard GUID
representation we will go for. I hope however we will not build the
GUID resolver so complex that knowledge about how to use the GUID will
be too much of a limitation...? For example if the GUID itself would be
something you could paste directly into a browser and get information
about the object, would less user threshold than if you would need to
install some kind of RDF client to be able to "use" the GUID? We should
perhaps start with a very basic GUID solution and remember to think
about how this can be integrated with existing mechanisms. Not all
existing solutions involve schemas described in XML. And I also believe
that migrating even logic described in XML to RDFs would provide us
with some challenges? This is why the call for use cases by Donald is
important?

Cheers
Dag Terje

Quoting Roderic Page <r.page at BIO.GLA.AC.UK>:

> On 25 Nov 2005, at 15:34, Roger Hyam wrote:
>
> >
> >  But wait... there is more. As Arthur points out we already have
> most
> > of this stuff defined. TCS encapsulates a whole load of semantics
>
> > about nomenclatural relationship (types of type etc) and
> TaxonConcept
> > relationship (child taxon of, hybrid parent of etc) and ABCD has
> > similar knowledge about collections.  A great deal of
> re-engineering
> > and transition is involved. We mustn't go throwing any babies out
> with
> > the bath water.
>
> Maybe I'm missing something (since I've avoided XML schema like the
>
> plague), but given that RDF is (usually) expressed in XML, shouldn't
> we
> be able with a minimum of fuss to "port" the relevant bits from an
> XML
> schema to an RDF schema? Maybe I'm being naive but I don't see why
> this
> is such an engineering nightmare.
>
> >
> >  Also serving this stuff may be problematic....
>
> Serving what stuff?
>
>
> >
> >  So yes GUID+RDF seems to solve most every problem just at the
> moment.
> >
> >  Roger
> >
> >
> >
> >  Arthur Chapman wrote:Rod
> >>
> >> This is a neat solution and may well work.  I like it!
> >>
> >> It is somewhat akin to the "Relation" element in Dublin Core but
>
> >> which has generally not been implemented with a controlled
> vocabulary
> >> as was recommended at the Canberra meeting of Dublin Core in about
>
> >> 1996 or 1997.
> >>
> >> It was implemented in the Australian Government Locator Service
> >> (AGLS) as Australian Standard AS5044 with a controlled vocabulary.
>
> >> The vocabulary is not what we would need, but gives a parallel
> >> example
> >>
> >> isVersionOf
> >> hasVersion
> >> isReplacedBy
> >> replaces
> >> isRequiredBy
> >> requires
> >> isPartOf
> >> isReferencedBy
> >> isFormatOf
> >> hasFormat
> >> isBasisFor
> >> isBasedOn
> >>
> >> http://www.naa.gov.au/recordkeeping/gov_online/agls/
> >> AGLS_reference_description.pdf
> >>
> >> Cheers
> >>
> >> Arthur
> >>
> >> >From Roderic Page <r.page at BIO.GLA.AC.UK> on 25 Nov 2005:
> >>
> >>
> >>> These relationships would be specified in the metadata attached
> to
> >>> the
> >>> GUIDs, not the GUIDs themselves (they are simply unique
> identifiers).
> >>>
> >>> For example, if we think of you tax number/Social Security
> >>> Number/National Insurance Number (insert whatever identifier
> your
> >>> government attaches to you here), then you could have two GUIDs
> such
> >>> as
> >>>
> >>> JE 5679434A
> >>>
> >>> and
> >>>
> >>> JH 5679434B
> >>>
> >>> The metadata for JE 5679434A could contain a statement that the
> >>> individuals are related, e.g. something like
> >>>
> >>> <rdf:Description rdf:about="JE 5679434A">
> >>>      <isMarriedTo rdf:resource ="JH 5679434B" />
> >>> </rdf:Description>
> >>>
> >>> In other words, the person identifed by "JE 5679434A" is married
> to
> >>> the
> >>>
> >>> person identified by "JH 5679434B".
> >>>
> >>> One can develop ontologies that specify these relationships, and
>
> >>> enable
> >>>
> >>> us to deduce other facts. For example, if X is married to Y, then
> Y
> >>> is
> >>> married to X, but if Z is a child of Y, Y is the parent of Z, and
> so
> >>> on. What is nice is that you wouldn't have to explicitly state
> that Y
> >>> is the parent of Z in the metadata Y, it can be inferred from
> the
> >>> relationship Z is a child of Y.
> >>>
> >>> I use RDF here because these are the kind of things it handles
> >>> nicely.
> >>> All (!) you'd need is a consistent vocabulary to describe the
> >>> relationships. RDF already has some basic ones ("sameAs",
> >>> "subPropertyOf", etc.). In the examples you provide, I guess
> you'd
> >>> want
> >>>
> >>> "part of", "extracted from", "hosted by", "parent of", "mother
> of",
> >>> etc.
> >>>
> >>> Does this help?
> >>>
> >>> Regards
> >>>
> >>> Rod
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On 25 Nov 2005, at 11:18, Arthur Chapman wrote:
> >>>
> >>>
> >>>> Below I have placed two scenarios that show some of the
> >>>> cross-discipline problems I believe we face with GUIDs. They
> don't
> >>>> provide the answers, alas!
> >>>>
> >>>> It would appear to me that each of these separate entities need
> a
> >>>> GUID; but that each needs to show some relationship (nearly a
> >>>> genealogy or pedigree line) - child of (i.e. derived from);
> brother
> >>>>
> >>> of>
> >>>
> >>>> (duplicate collection); sister of (wet collection); part of
> (genetic
> >>>> study) etc.  Can these be built into a GUID?
> >>>>
> >>>> If we just look at the simplest problem, where a herbarium makes
> a
> >>>> collection and sends out duplicates to other herbaria.  More
> often
> >>>> than not, the duplicates are distributed prior to receiving a
> >>>> catalogue number in the originating ionstitution.  We can only
> thus
> >>>> identify duplicates using collector name and number, but these
> are
> >>>>
> >>> not>
> >>>
> >>>> always unique, and not all collectors use numbers. - We can't
> use
> >>>> the
> >>>>
> >>>> lat/long coordinates as these are often put on after
> distribution
> >>>> and
> >>>>
> >>>> are often different (one collection I looked at in 5 different
> >>>> herbaria was given 4 different lat/longs). The resolution of
> many of
> >>>> these duplicates will need to be a human problem - possibly
> helped
> >>>> by
> >>>>
> >>>> parsing routines similar those being developed for location
> >>>> information in the BioGeomancer project, and possibly some
> >>>> artificial
> >>>>
> >>>> intelligence (to sort out collector's names used in different
> ways,
> >>>> etc. - initials first/surname first, etc.).
> >>>>
> >>>> I wish I could supply the answers!
> >>>>
> >>>> These scenarios don't show up all that well in text, I have
> also
> >>>> attached a word document.
> >>>>
> >>>> ---------------------
> >>>> PLANT
> >>>> 1.  Collector Makes collection
> >>>>  a.  Provides collector number (not always Unique) <Fred 123>
> >>>>   i.  Submits collection to Herbarium
> >>>>    1.  Herbarium supplies collection number <Index
> >>>>
> >>> Herbarium-CANB12345> >
> >>>
> >>>>    2.  and a name <TCS-123454>
> >>>>     a.  Herbarium distributes collections to other herbaria
> >>>>      i.  New herbaria supply collection numbers <IH-NY65432;
> >>>> IH-MO34562; IH-K98765>
> >>>>
> >> === message truncated ==
> >>
> >>
> >
> > --
> >
> > -------------------------------------
> >  Roger Hyam
> >  Technical Architect
> >  Taxonomic Databases Working Group
> > -------------------------------------
> >  http://www.tdwg.org
> >  roger at tdwg.org
> >  +44 1578 722782
> > -------------------------------------
> >
> >
>
------------------------------------------------------------------------
>
> ----------------------------------------
> Professor Roderic D. M. Page
> Editor, Systematic Biology
> DEEB, IBLS
> Graham Kerr Building
> University of Glasgow
> Glasgow G12 8QP
> United Kingdom
>
> Phone:    +44 141 330 4778
> Fax:      +44 141 330 2792
> email:    r.page at bio.gla.ac.uk
> web:      http://taxonomy.zoology.gla.ac.uk/rod/rod.html
> reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
>
> Subscribe to Systematic Biology through the Society of Systematic
> Biologists Website:  http://systematicbiology.org
> Search for taxon names at
> http://darwin.zoology.gla.ac.uk/~rpage/portal/
> Find out what we know about a species at http://ispecies.org
>