Two Scenarios

Roger Hyam roger at TDWG.ORG
Fri Nov 25 15:34:46 CET 2005


Adding to what Rod said the important thing to point out is that using
GUID+RDF you can reason about resources you don't own. So the herbarium
that discovers it has a duplicate of something in another herbarium can
publish the fact without involving the other herbarium.

In fact I can make assertions I have only just invented about things in
two collections that have never heard of me.

But wait... there is more. As Arthur points out we already have most of
this stuff defined. TCS encapsulates a whole load of semantics about
nomenclatural relationship (types of type etc) and TaxonConcept
relationship (child taxon of, hybrid parent of etc) and ABCD has similar
knowledge about collections.  A great deal of re-engineering and
transition is involved. We mustn't go throwing any babies out with the
bath water.

Also serving this stuff may be problematic....

So yes GUID+RDF seems to solve most every problem just at the moment.

Roger



Arthur Chapman wrote:
> Rod
>
> This is a neat solution and may well work.  I like it!
>
> It is somewhat akin to the "Relation" element in Dublin Core but which has generally not been implemented with a controlled vocabulary as was recommended at the Canberra meeting of Dublin Core in about 1996 or 1997.
>
> It was implemented in the Australian Government Locator Service (AGLS) as Australian Standard AS5044 with a controlled vocabulary.  The vocabulary is not what we would need, but gives a parallel example
>
> isVersionOf
> hasVersion
> isReplacedBy
> replaces
> isRequiredBy
> requires
> isPartOf
> isReferencedBy
> isFormatOf
> hasFormat
> isBasisFor
> isBasedOn
>
> http://www.naa.gov.au/recordkeeping/gov_online/agls/AGLS_reference_description.pdf
>
> Cheers
>
> Arthur
>
> >From Roderic Page <r.page at BIO.GLA.AC.UK> on 25 Nov 2005:
>
>
>> These relationships would be specified in the metadata attached to the
>> GUIDs, not the GUIDs themselves (they are simply unique identifiers).
>>
>> For example, if we think of you tax number/Social Security
>> Number/National Insurance Number (insert whatever identifier your
>> government attaches to you here), then you could have two GUIDs such as
>>
>> JE 5679434A
>>
>> and
>>
>> JH 5679434B
>>
>> The metadata for JE 5679434A could contain a statement that the
>> individuals are related, e.g. something like
>>
>> <rdf:Description rdf:about="JE 5679434A">
>>      <isMarriedTo rdf:resource ="JH 5679434B" />
>> </rdf:Description>
>>
>> In other words, the person identifed by "JE 5679434A" is married to the
>>
>> person identified by "JH 5679434B".
>>
>> One can develop ontologies that specify these relationships, and enable
>>
>> us to deduce other facts. For example, if X is married to Y, then Y is
>> married to X, but if Z is a child of Y, Y is the parent of Z, and so
>> on. What is nice is that you wouldn't have to explicitly state that Y
>> is the parent of Z in the metadata Y, it can be inferred from the
>> relationship Z is a child of Y.
>>
>> I use RDF here because these are the kind of things it handles nicely.
>> All (!) you'd need is a consistent vocabulary to describe the
>> relationships. RDF already has some basic ones ("sameAs",
>> "subPropertyOf", etc.). In the examples you provide, I guess you'd want
>>
>> "part of", "extracted from", "hosted by", "parent of", "mother of",
>> etc.
>>
>> Does this help?
>>
>> Regards
>>
>> Rod
>>
>>
>>
>>
>>
>>
>> On 25 Nov 2005, at 11:18, Arthur Chapman wrote:
>>
>>
>>> Below I have placed two scenarios that show some of the
>>> cross-discipline problems I believe we face with GUIDs. They don't
>>> provide the answers, alas!
>>>
>>> It would appear to me that each of these separate entities need a
>>> GUID; but that each needs to show some relationship (nearly a
>>> genealogy or pedigree line) - child of (i.e. derived from); brother
>>>
>> of>
>>
>>> (duplicate collection); sister of (wet collection); part of (genetic
>>> study) etc.  Can these be built into a GUID?
>>>
>>> If we just look at the simplest problem, where a herbarium makes a
>>> collection and sends out duplicates to other herbaria.  More often
>>> than not, the duplicates are distributed prior to receiving a
>>> catalogue number in the originating ionstitution.  We can only thus
>>> identify duplicates using collector name and number, but these are
>>>
>> not>
>>
>>> always unique, and not all collectors use numbers. - We can't use the
>>>
>>> lat/long coordinates as these are often put on after distribution and
>>>
>>> are often different (one collection I looked at in 5 different
>>> herbaria was given 4 different lat/longs). The resolution of many of
>>> these duplicates will need to be a human problem - possibly helped by
>>>
>>> parsing routines similar those being developed for location
>>> information in the BioGeomancer project, and possibly some artificial
>>>
>>> intelligence (to sort out collector's names used in different ways,
>>> etc. - initials first/surname first, etc.).
>>>
>>> I wish I could supply the answers!
>>>
>>> These scenarios don't show up all that well in text, I have also
>>> attached a word document.
>>>
>>> ---------------------
>>> PLANT
>>> 1.  Collector Makes collection
>>>  a.  Provides collector number (not always Unique) <Fred 123>
>>>   i.  Submits collection to Herbarium
>>>    1.  Herbarium supplies collection number <Index
>>>
>> Herbarium-CANB12345> >
>>
>>>    2.  and a name <TCS-123454>
>>>     a.  Herbarium distributes collections to other herbaria
>>>      i.  New herbaria supply collection numbers <IH-NY65432;
>>> IH-MO34562; IH-K98765>
>>>
>
> === message truncated ==
>
>

--

-------------------------------------
 Roger Hyam
 Technical Architect
 Taxonomic Databases Working Group
-------------------------------------
 http://www.tdwg.org
 roger at tdwg.org
 +44 1578 722782
-------------------------------------


--------------060403030004080009010306
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
Adding to what Rod said the important thing to point out is that using
GUID+RDF you can reason about resources you don't own. So the herbarium
that discovers it has a duplicate of something in another herbarium can
publish the fact without involving the other herbarium.<br>
<br>
In fact I can make assertions I have only just invented about things in
two collections that have never heard of me.<br>
<br>
But wait... there is more. As Arthur points out we already have most of
this stuff defined. TCS encapsulates a whole load of semantics about
nomenclatural relationship (types of type etc) and TaxonConcept
relationship (child taxon of, hybrid parent of etc) and ABCD has
similar knowledge about collections.&nbsp; A great deal of re-engineering
and transition is involved. We mustn't go throwing any babies out with
the bath water.<br>
<br>
Also serving this stuff may be problematic....<br>
<br>
So yes GUID+RDF seems to solve most every problem just at the moment.<br>
<br>
Roger<br>
<br>
<br>
<br>
Arthur Chapman wrote:
<blockquote cite="mid1132920953.438700799eb6a at www.mailshell.com"
 type="cite">
  <pre wrap="">Rod

This is a neat solution and may well work.  I like it!

It is somewhat akin to the "Relation" element in Dublin Core but which has generally not been implemented with a controlled vocabulary as was recommended at the Canberra meeting of Dublin Core in about 1996 or 1997.

It was implemented in the Australian Government Locator Service (AGLS) as Australian Standard AS5044 with a controlled vocabulary.  The vocabulary is not what we would need, but gives a parallel example

isVersionOf
hasVersion
isReplacedBy
replaces
isRequiredBy
requires
isPartOf
isReferencedBy
isFormatOf
hasFormat
isBasisFor
isBasedOn

<a class="moz-txt-link-freetext" href="http://www.naa.gov.au/recordkeeping/gov_online/agls/AGLS_reference_description.pdf">http://www.naa.gov.au/recordkeeping/gov_online/agls/AGLS_reference_description.pdf</a>

Cheers

Arthur

&gt;From Roderic Page <a class="moz-txt-link-rfc2396E" href="mailto:r.page at BIO.GLA.AC.UK">&lt;r.page at BIO.GLA.AC.UK&gt;</a> on 25 Nov 2005:

  </pre>
  <blockquote type="cite">
    <pre wrap="">These relationships would be specified in the metadata attached to the
GUIDs, not the GUIDs themselves (they are simply unique identifiers).

For example, if we think of you tax number/Social Security
Number/National Insurance Number (insert whatever identifier your
government attaches to you here), then you could have two GUIDs such as

JE 5679434A

and

JH 5679434B

The metadata for JE 5679434A could contain a statement that the
individuals are related, e.g. something like

&lt;rdf:Description rdf:about="JE 5679434A"&gt;
     &lt;isMarriedTo rdf:resource ="JH 5679434B" /&gt;
&lt;/rdf:Description&gt;

In other words, the person identifed by "JE 5679434A" is married to the
    </pre>
    <pre wrap="">person identified by "JH 5679434B".

One can develop ontologies that specify these relationships, and enable
    </pre>
    <pre wrap="">us to deduce other facts. For example, if X is married to Y, then Y is
married to X, but if Z is a child of Y, Y is the parent of Z, and so
on. What is nice is that you wouldn't have to explicitly state that Y
is the parent of Z in the metadata Y, it can be inferred from the
relationship Z is a child of Y.

I use RDF here because these are the kind of things it handles nicely.
All (!) you'd need is a consistent vocabulary to describe the
relationships. RDF already has some basic ones ("sameAs",
"subPropertyOf", etc.). In the examples you provide, I guess you'd want
    </pre>
    <pre wrap="">"part of", "extracted from", "hosted by", "parent of", "mother of",
etc.

Does this help?

Regards

Rod






On 25 Nov 2005, at 11:18, Arthur Chapman wrote:

    </pre>
    <blockquote type="cite">
      <pre wrap="">Below I have placed two scenarios that show some of the
cross-discipline problems I believe we face with GUIDs. They don't
provide the answers, alas!

It would appear to me that each of these separate entities need a
GUID; but that each needs to show some relationship (nearly a
genealogy or pedigree line) - child of (i.e. derived from); brother
      </pre>
    </blockquote>
    <pre wrap="">of&gt;
    </pre>
    <blockquote type="cite">
      <pre wrap="">(duplicate collection); sister of (wet collection); part of (genetic
study) etc.  Can these be built into a GUID?

If we just look at the simplest problem, where a herbarium makes a
collection and sends out duplicates to other herbaria.  More often
than not, the duplicates are distributed prior to receiving a
catalogue number in the originating ionstitution.  We can only thus
identify duplicates using collector name and number, but these are
      </pre>
    </blockquote>
    <pre wrap="">not&gt;
    </pre>
    <blockquote type="cite">
      <pre wrap="">always unique, and not all collectors use numbers. - We can't use the

lat/long coordinates as these are often put on after distribution and

are often different (one collection I looked at in 5 different
herbaria was given 4 different lat/longs). The resolution of many of
these duplicates will need to be a human problem - possibly helped by

parsing routines similar those being developed for location
information in the BioGeomancer project, and possibly some artificial

intelligence (to sort out collector's names used in different ways,
etc. - initials first/surname first, etc.).

I wish I could supply the answers!

These scenarios don't show up all that well in text, I have also
attached a word document.

---------------------
PLANT
1.  Collector Makes collection
 a.  Provides collector number (not always Unique) &lt;Fred 123&gt;
  i.  Submits collection to Herbarium
   1.  Herbarium supplies collection number &lt;Index
      </pre>
    </blockquote>
    <pre wrap="">Herbarium-CANB12345&gt; &gt;
    </pre>
    <blockquote type="cite">
      <pre wrap="">   2.  and a name &lt;TCS-123454&gt;
    a.  Herbarium distributes collections to other herbaria
     i.  New herbaria supply collection numbers &lt;IH-NY65432;
IH-MO34562; IH-K98765&gt;
      </pre>
    </blockquote>
  </blockquote>
  <pre wrap=""><!---->
=== message truncated ==

  </pre>
</blockquote>
<br>
<pre class="moz-signature" cols="72">--

-------------------------------------
 Roger Hyam
 Technical Architect
 Taxonomic Databases Working Group
-------------------------------------
 <a class="moz-txt-link-freetext" href="http://www.tdwg.org">http://www.tdwg.org</a>
 <a class="moz-txt-link-abbreviated" href="mailto:roger at tdwg.org">roger at tdwg.org</a>
 +44 1578 722782
-------------------------------------
</pre>
</body>
</html>


More information about the tdwg-tag mailing list