Two Scenarios

Roderic Page r.page at BIO.GLA.AC.UK
Fri Nov 25 11:46:34 CET 2005


These relationships would be specified in the metadata attached to the  
GUIDs, not the GUIDs themselves (they are simply unique identifiers).

For example, if we think of you tax number/Social Security  
Number/National Insurance Number (insert whatever identifier your  
government attaches to you here), then you could have two GUIDs such as

JE 5679434A

and

JH 5679434B

The metadata for JE 5679434A could contain a statement that the  
individuals are related, e.g. something like

<rdf:Description rdf:about="JE 5679434A">
     <isMarriedTo rdf:resource ="JH 5679434B" />
</rdf:Description>

In other words, the person identifed by "JE 5679434A" is married to the  
person identified by "JH 5679434B".

One can develop ontologies that specify these relationships, and enable  
us to deduce other facts. For example, if X is married to Y, then Y is  
married to X, but if Z is a child of Y, Y is the parent of Z, and so  
on. What is nice is that you wouldn't have to explicitly state that Y  
is the parent of Z in the metadata Y, it can be inferred from the  
relationship Z is a child of Y.

I use RDF here because these are the kind of things it handles nicely.  
All (!) you'd need is a consistent vocabulary to describe the  
relationships. RDF already has some basic ones ("sameAs",  
"subPropertyOf", etc.). In the examples you provide, I guess you'd want  
"part of", "extracted from", "hosted by", "parent of", "mother of",  
etc.

Does this help?

Regards

Rod






On 25 Nov 2005, at 11:18, Arthur Chapman wrote:

> Below I have placed two scenarios that show some of the  
> cross-discipline problems I believe we face with GUIDs. They don't  
> provide the answers, alas!
>
> It would appear to me that each of these separate entities need a  
> GUID; but that each needs to show some relationship (nearly a  
> genealogy or pedigree line) - child of (i.e. derived from); brother of  
> (duplicate collection); sister of (wet collection); part of (genetic  
> study) etc.  Can these be built into a GUID?
>
> If we just look at the simplest problem, where a herbarium makes a  
> collection and sends out duplicates to other herbaria.  More often  
> than not, the duplicates are distributed prior to receiving a  
> catalogue number in the originating ionstitution.  We can only thus  
> identify duplicates using collector name and number, but these are not  
> always unique, and not all collectors use numbers. - We can't use the  
> lat/long coordinates as these are often put on after distribution and  
> are often different (one collection I looked at in 5 different  
> herbaria was given 4 different lat/longs). The resolution of many of  
> these duplicates will need to be a human problem - possibly helped by  
> parsing routines similar those being developed for location  
> information in the BioGeomancer project, and possibly some artificial  
> intelligence (to sort out collector's names used in different ways,  
> etc. - initials first/surname first, etc.).
>
> I wish I could supply the answers!
>
> These scenarios don't show up all that well in text, I have also  
> attached a word document.
>
> ---------------------
> PLANT
> 1.  Collector Makes collection
>  a.  Provides collector number (not always Unique) <Fred 123>
>   i.  Submits collection to Herbarium
>    1.  Herbarium supplies collection number <Index Herbarium-CANB12345>
>    2.  and a name <TCS-123454>
>     a.  Herbarium distributes collections to other herbaria
>      i.  New herbaria supply collection numbers <IH-NY65432;  
> IH-MO34562; IH-K98765>
>     ii.  New herbarium creates wet collection in alcohol  
> <IH-NY-wet-65432>
>    b.   Herbarium takes tissue samples and cultivates and propagates  
> in living collection
>     i.   Living collections supplies a number <BotGard-ANBG99-24-12-18>
>      1.   Living collection makes a new collection for herbarium  
> <IH-CANB35556>
>    c.   University researcher takes pollen sample and carries out  
> analysis.
>     i.   Stores microscope slide of pollen grains <ANUxxxx342567587>
>      1.   carries out genetic study
>       a.   sends collection to GenBank <Genbank 456783>
>    ii.   Submits seeds to Seed Bank
>      1.   Seed bank supplies a number <CSIRO-Euc34547>
>       a.   Seed bank supplies seed to China where it is cultivated.
>        i.   China takes sample for chemical assay <XXXX3435646763>
>         1.   Develops drug and applies for patent <USA Pat xxxxxxxx>
> -------------------
>
> ZOO ANIMAL
> 2.   Collector catches two animals
>  a.   Probably doesn’t give a number
>  b.   Sells animals to two different zoos
>   i.   Zoo A documents and gives animal a number <ZOOA-xxxxx22323>
>    1.   Animal was pregnant and gives birth to 2 young (parent unknown)
>    2.   Animals given numbers <ZOOA-xxxxx22323-1; ZOOA-xxxxx22323-2>
>     a.   Young (1) sold to Zoo B
>      i.   Zoo B gives a number <ZOOB-yyy34562>
>      ii.   Young (1) develops a parasite
>        1.   Parasite sent to Lab 1  <LAB1-2222342>
>         a.   Lab identifies parasite and sends specimen to Museum  
> <Museum-FL12322>
>         b.   Young (2) dies
>          i.   Specimen sent to Museum <Museum XY2222234>
>           1.   Tissue sample taken and stored <Museum XY2222234-a>
>           2.   Skin sample sent to different museum <Museum BT4563721>
>   ii.   Zoo B documents and gives animal number <ZOOB-yyy12345>
>     1.   Zoo B crosses ZOOB-yyy12345 with ZOOB-yyy34562
>      a.   Offspring born and given numbers <ZOOB-yyy673221;  
> ZOOB-yyy673222>
>       i.   Animal ZOOB-yyy673221 sold to Zoo C
>        1.   Zoo C supplies number <ZOOC-22245>  needs to link pedigree.
>      ii.   Animal ZOOB-yyy673222 etc.
> \etc.
>
> Hope this helps and just doesn't confuse the issue even more.
>
> Arthur Chapman
> Toowoomba, Australia
>
>
------------------------------------------------------------------------ 
----------------------------------------
Professor Roderic D. M. Page
Editor, Systematic Biology
DEEB, IBLS
Graham Kerr Building
University of Glasgow
Glasgow G12 8QP
United Kingdom

Phone:    +44 141 330 4778
Fax:      +44 141 330 2792
email:    r.page at bio.gla.ac.uk
web:      http://taxonomy.zoology.gla.ac.uk/rod/rod.html
reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html

Subscribe to Systematic Biology through the Society of Systematic
Biologists Website:  http://systematicbiology.org
Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/
Find out what we know about a species at http://ispecies.org




More information about the tdwg-tag mailing list