<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

  <head>

    <meta content="text/html; charset=windows-1252"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#ffffff" text="#000000">

    Hi all, <br>

    <br>

    having just joined this list, I find it a great idea to have such an

    RDF Task Group.<br>

    I am going to publish a little species catalog for the Federal

    Environment Agency in Germany as Linked Data, and I am looking for

    the best way to express it in RDF.<br>

    <br>

    In the Linked Data cloud [1] I find several related contributions,

    such as Geospecies, TaxonConcept, EUNIS, and more.<br>

    Comparing these approaches I prefer the idea of reusing SKOS [2]

    labels and hierarchical relations, as in the Geospecies example [3].

    <br>

    <br>

    It might be a good idea to apply the SKOS XL extension as well to go

    deeper into the taxon name properties.<br>

    Finally, I would add the taxon ranks as a distinct concept scheme

    and link them to the taxon concepts with a mapping relation.<br>

    <br>

    Certainly this is not the only way to go, but it is rather simple

    and will be easily understood as SKOS is quite common in the Linked

    Data community. This might be called "Simple Darwin Core" and give

    room for more complex ontology approaches beyond that.<br>

    <br>

    Looking forward to discussion,<br>

    Thomas<br>

    <br>

    [1] <a class="moz-txt-link-freetext" href="http://lod-cloud.net">http://lod-cloud.net</a><br>

    [2] <a class="moz-txt-link-freetext"

      href="http://www.w3.org/TR/2009/NOTE-skos-primer-20090818/">http://www.w3.org/TR/2009/NOTE-skos-primer-20090818/</a><br>

    [3] <a class="moz-txt-link-freetext"

      href="http://lod.geospecies.org/ses/v6n7p.rdf">http://lod.geospecies.org/ses/v6n7p.rdf</a><br>

    <br>

    <br>

    Peter J. DeVries<br>

    <br>

    Am 07.10.2010 19:29, schrieb Blum, Stan:

    <blockquote cite="mid:C8D3518D.B789%25sblum@calacademy.org"

      type="cite">

      <meta http-equiv="Content-Type" content="text/html;

        charset=windows-1252">

      <title>Re: [tdwg-content] Idea for Discussion,Differentiating

        between "type's" of identifiers</title>

      <font face="Calibri, Verdana, Helvetica, Arial"><span

          style="font-size: 11pt;">Hi Steve,<br>

          <br>

          Sorry, I missed your message below (as well as your response

          to Roger) before I sent my reply about the utility of an RDF

          guide for DwC.  Obviously, I think it’s a great idea.  To do

          this within the “normal” TDWG process, this should be done as

          a Task Group.  I could help you draft a charter for that,

          which would then need to be reviewed by the TAG and Exec.

           Once approved, we would put the charter up on the web site,

          and do our best to provide any other resources that would help

          speed the task.  I don’t mean to slow you down.  The Charter

          doesn’t have to be elaborate.  It’s function is to let others

          in TDWG and beyond know that this task is proceeding, who to

          contact, how to get involved, etc.  It also gives you the

          backing of the TDWG community.<br>

          <br>

          Let me know if you’d like to pursue this.<br>

          <br>

          -Stan<br>

          <br>

          <br>

          On 10/7/10 7:41 AM, "Steve Baskauf" &lt;<a

            moz-do-not-send="true" href="steve.baskauf@vanderbilt.edu">steve.baskauf@vanderbilt.edu</a>&gt;

          wrote:<br>

          <br>

        </span></font>

      <blockquote><font face="Calibri, Verdana, Helvetica, Arial"><span

            style="font-size: 11pt;">I agree that it is best to avoid a

            proliferation of terms and I agree that it is best to keep

            Darwin Core technology independent to the maximum extent

            possible.  However, I think that the case of facilitating

            HTTP URIs is a special one because of the requirements of

            GUIDs/Persistent Identifiers.  Both the TDWG and GBIF

            guidelines such as they currently stand say that GUIDs must

            be resolvable, that in their resolution they must return

            RDF, and that the RDF has to be in an XML format.  Like it

            or not, that is what we have.  Given the amount of time that

            it seems to have taken to settle on that much, I think it is

            best for us to decide to live with it, warts and all, rather

            than re-opening the discussion and delaying the

            implementation of GUIDs for another five years.  <br>

            <br>

            Given that assumption, there needs to be within Darwin Core

            some way to support this particular "technology" (Linked

            Data, RDF/XML) even if we don't do "special" things to

            support other technologies such as LSID, DOI, etc.  The

            point is well taken that most of those other technologies

            have mechanisms for turning their identifiers into URIs and

            the aforementioned guidelines lay out how owl:sameAs can be

            used within the RDF to associate the non-HTTP-resolvable

            forms with the URIs.  Based on my admittedly limited

            experience with trying to write RDF using Darwin Core terms,

            I think that in most cases there already exists appropriate

            terms for getting the job done.  What may be lacking is

            concrete examples and community consensus on what terms to

            use for what.  I also think that there are probably some

            "ID" terms where it isn't really very important (from an RDF

            point of view) that there exist both a URI form and a text

            string form.  I'm thinking of something like

            dwc:identificationID, which is mostly likely to be needed to

            allow a machine to make a connection between some resource

            and its identification.  The machine isn't going to care if

            there is a human-readable version.  In contrast, something

            like dwc:collectionID is likely to need both a URI version

            (e.g. proxied version of the BCI LSID) for the machines and

            a string version (the name of the collection as it would be

            displayed) for humans.  I think that trying to make

            example/template RDF for various types of resources will

            help make it clear in which cases one version (URI), the

            other (string), or both are actually necessary.<br>

            <br>

            I "volunteered" a couple weeks ago to have a go at writing

            an RDF guide for Darwin Core.  I am still willing to do

            this, although I'm still getting caught up at work from

            being at the TDWG meeting.  However, next week we have fall

            break and I will make it a priority to come up with a draft

            which can be the subject of discussion.  As a part of this

            process, I think it would be good to create one or more

            "boilerplate" RDF files for the various kinds of resources

            that are likely to be identified with GUIDs (e.g.

            Occurrences, Taxa, etc.).  This can also be a subject of

            discussion and I think it will help to clarify what will

            meet the actual needs that we have discussed in this thread.

             I have a pretty clear picture of what I think Occurrence

            RDF should look like. I'm going to have to depend on Pete

            and others to deal with the taxonomy part.<br>

            <br>

            Steve<br>

            <br>

            Markus Döring wrote: <br>

          </span></font>

        <blockquote><font face="Calibri, Verdana, Helvetica, Arial"><span

              style="font-size: 11pt;"> <br>

              Steve, Pete,<br>

              <br>

              Id like to draw your attention on a basic DarwinCore

              design pattern. Dwc has the goal of being technology

              independent by simply providing a list of abstract terms

              one can use in various arenas such as xml, rdf, xhtml, csv

              etc. And even within those there might be various ways of

              using them (e.g. we have a normalised and a simple flat

              xml schema), thats why we should have a guideline for each

              of them on how to use them. We are missing such a

              guideline for rdf currently, hence this debate.<br>

              <br>

              Whether scientificName is a literal string or some complex

              object shouldnt matter - its defined to be a scientific

              name. Such a dwc rdf property could either hold a literal

              string or a url to some name rdf:resource (potentially

              with a rdfs:label).<br>

              <br>

              With the introduction if many ID terms we have diluted

              that idea a little already in my mind. We could have as

              well used scientificName in xml to hold some identifier

              for that name. All URNs tell you what they are by their

              urn prefix (not necessarily how to resolve them), so you

              can easily detect a UUID, LSID, http(s) url, ftp, doi and

              apply the conventional resolution mechanism. The hardest

              problem are the local ids and other plain identifers. For

              those mainly we created the ID terms (at least in my

              mind). I am feeling rather uncomfortable discussing the

              introduction of specific dwc terms for each type of id.

              Maybe we should remove all id terms in dwc and use the

              specific guidelines to specify these? At least if you

              really think having all those id terms for rdf is a good

              thing I would feel much more comfortable going down this

              route instead of diluting dwc by adding more and more

              rather redundant terms. The abstract concept is key to a

              dwc term, not the actual data type fo<br>

              <br>

              rced by the technology you are using it with. Would you

              want several date terms for various date formats? In fact

              we do that already to some degree (eventDate, eventTime,

              year, month, day, verbatimEventDate) and I always felt

              this is not a good idea. There are also a number of

              verbatimXXX terms in dwc which also contradict this

              pattern. <br>

              <br>

              Talking about new dwc terms - in the examples given

              properties like "hasScientificName" is not strictly the

              correct dwc term, which is simply scientificName. I think

              it would be fine to have the convention in the rdf

              guidlines to use hasDwcTerm instead of dwcTerm, this is

              exactly what an rdf guideline is for. On the flip side I

              am sure this only applies to some terms, recordBy for

              example is likely to remain as it is. Its unclear to me

              what is best to do really. Always stick to the original

              dwc terms? Refine them through some rdfs or owl schema and

              define the relation to the original term? Should we still

              use the same namespace in this case?<br>

              <br>

              As an rdf beginner even after a few years exposed I wonder

              if we cant simply stick to the non ID terms and use them

              either as literals or with a uri pointer. As in the rdf

              world a resolvable http is really required for resource

              relations to work, why not simply mandate this in the

              guidelines? If you only happen to have non resolvable uris

              like lsid or dois the guidelines should be asking you to

              use proxied versions, knowing it will break rdf frameworks

              and lod conventions otherwise. On the resolving side one

              could always include such urns with owl:sameAs (or sth

              alike) I believe. But how many non resolvable ids with no

              matching http counterpart are really out there yet?<br>

              <br>

              - Markus<br>

              <br>

              <br>

              On Oct 6, 2010, at 9:02, Peter DeVries wrote:<br>

              <br>

                <br>

               <br>

            </span></font>

          <blockquote><font face="Calibri, Verdana, Helvetica, Arial"><span

                style="font-size: 11pt;"> <br>

                Hi Steve,<br>

                <br>

                You are probably right that it might be best to use

                rdfs:Label, but I am thinking we might be able to get

                the same<br>

                result my defining the string variants as subproperties

                of rdfs:Label.<br>

                <br>

                This would make them an rdfs:Label but a special kind of

                rdfs:Label.<br>

                <br>

                This is one of those things that I would test with

                Sindice and URIburner to see if they interpret these

                correctly.<br>

                <br>

                This would require a live vocabulary that Sindice could

                look at to determine that hasScientificName is to be<br>

                treated as a  rdfs:Label.<br>

                <br>

                - Pete<br>

                <br>

                On Mon, Oct 4, 2010 at 10:41 AM, Steve Baskauf &lt;<a

                  moz-do-not-send="true"

                  href="steve.baskauf@vanderbilt.edu">steve.baskauf@vanderbilt.edu</a>&gt;

                &lt;<a moz-do-not-send="true"

                  href="mailto:steve.baskauf@vanderbilt.edu">mailto:steve.baskauf@vanderbilt.edu</a>&gt;

                 wrote:<br>

                Although this specific example deals with taxonomic name

                identifiers, it is related to a previous discussion on

                this list about how we should use the dwc:xxxxxID terms

                and other terms (such as recordedBy and identifiedBy)

                that could have either a string (literal) or URI form.

                 Although I don't really want to see an unnecessary

                proliferation of Darwin Core terms, I think that in the

                interest of clarity (particularly where RDF is involved)

                there either should be multiple terms that make it clear

                what form of identifier is expected, or else there

                should be an understanding that in RDF the default for

                such a term is a URI which would then have an rdfs:Label

                property which was the string form.  I think the former

                would be preferable to the latter.  <br>

                <br>

                I came to this opinion when trying to write RDF

                describing an herbarium specimen.  The collector should

                be the dwc:recordedBy property of the specimen.

                 Optimally, there would be a database in which known

                collectors were assigned URIs so that "Glen N. Montz",

                "Glen Montz", "G. N. Montz", etc. would all be different

                labels for the same resource.  However, realistically,

                I'm not going to drop what I'm doing to set up such a

                database (even if I were capable of doing it, which I'm

                not).  So I ended up just writing it as

                &lt;dwc:recordedBy&gt;Glen N.

                Montz&lt;/dwc:recordedBy&gt; even though I knew it

                wasn't probably the best thing.  In a large Occurrence

                database that was compiled from the RDF created by a lot

                of people, there might end up being a mixture of strings

                and URIs for dwc:recordedBy properties of the specimens.

                 It seems to me like it would be better to have

                properties like dwc:recordedBy for strings and

                dwc:recordedByURI for a corresponding URI (and I suppose

                dwc:reco<br>

                <br>

                rdedByLSID if anyone wants to use it).  Of course, this

                would require a number of term additions to DwC and

                clarification in the DwC documentation that the generic

                version was intended for strings.  <br>

                <br>

                With respect to the example<br>

                <br>

                &lt;dwc:hasScientificNameLSID

rdf:resource="urn:lsid:catalogueoflife.org:taxon:24e7d624-60a7-102d-be47-00304854f810:ac2010"/&gt;<br>

                I think you are right that (with the possible exception

                of rdfs:seeAlso) there is an expectation that an

                rdf:resource attribute will be a resolvable URI that

                produces RDF.  So <br>

&lt;dwc:hasScientificNameLSID&gt;urn:lsid:catalogueoflife.org:taxon:24e7d624-60a7-102d-be47-00304854f810:ac2010&lt;/dwc:hasScientificNameLSID&gt;<br>

                is probably better.<br>

                <br>

                Steve<br>

                <br>

                <br>

                Peter DeVries wrote:<br>

                    <br>

                 <br>

              </span></font>

            <blockquote><font face="Calibri, Verdana, Helvetica, Arial"><span

                  style="font-size: 11pt;"> <br>

                  I have been thinking about the following pattern. In

                  part after looking at the GBIF vocabulary.<br>

                  <br>

                  I am not sure if it is even a good idea but might be

                  worth some discussion.<br>

                  <br>

                  For those fields that have both a string and "ID" form

                  maybe the following pattern might be useful<br>

                  <br>

                  hasScientificName = string form<br>

                  hasScientificNameURI = Resolvable LOD compliant

                  identifier<br>

                  hasScientificNameLSID = LSID identifier which could be

                  resolvable once you add the "<a moz-do-not-send="true"

                    href="http:proxy">http:proxy</a>" &lt;<a

                    moz-do-not-send="true" href="http:proxy">http:proxy</a>&gt;

                   etc.<br>

                  <br>

                  This allows all three forms to be included if desired,

                  it also provides a hint as to how the field should be

                  interpreted or resolved.<br>

                  <br>

                  One group could also provide a mapping service so that

                  each record does not need to include all three forms,

                  but would allow systems<br>

                  to find the matching LSID for a given URI or vs.

                  versa.<br>

                  <br>

                  My concern was that it would be difficult to infer how

                  a scientificNameID should be interpreted by other

                  systems.<br>

                  <br>

                  Is this an LSD, is it a URI, is it a UUID etc. ?<br>

                  <br>

                  This impacts the structure of the RDF.<br>

                  <br>

                  * Note that the actual identifiers might not be

                  correct, the example below is more about the form of

                  the RDF<br>

                  * For instance, I don't think it is probably correct

                  to see the COL LSID as just a namestring<br>

                  * Also in this example the GNI name does not exactly

                  match the string name<br>

                  <br>

                  &lt;dwc:hasScientificName&gt;Puma concolor (Linnaeus

                  1771)&lt;/dwc:hasScientificName&gt;<br>

                  &lt;dwc:hasScientificNameURI rdf:resource="<a

                    moz-do-not-send="true"

href="http://gni.globalnames.org/name_strings/6c3dc35f-d901-5cc5-b9c8-ad241069b9f8">http://gni.globalnames.org/name_strings/6c3dc35f-d901-5cc5-b9c8-ad241069b9f8</a>"

                  &lt;<a moz-do-not-send="true"

href="http://gni.globalnames.org/name_strings/6c3dc35f-d901-5cc5-b9c8-ad241069b9f8">http://gni.globalnames.org/name_strings/6c3dc35f-d901-5cc5-b9c8-ad241069b9f8</a>&gt;

                  /&gt;<br>

                  &lt;dwc:hasScientificNameLSID

rdf:resource="urn:lsid:catalogueoflife.org:taxon:24e7d624-60a7-102d-be47-00304854f810:ac2010"/&gt;<br>

                  <br>

                  Some system may choke on the LSID form assuming that

                  it uses a standard resolution mechanism<br>

                  <br>

                  So it might be best to use this form<br>

                  <br>

&lt;dwc:hasScientificNameLSID&gt;urn:lsid:catalogueoflife.org:taxon:24e7d624-60a7-102d-be47-00304854f810:ac2010&lt;/dwc:hasScientificNameLSID&gt;<br>

                  <br>

                  - Pete<br>

                  <br>

----------------------------------------------------------------<br>

                  Pete DeVries<br>

                  Department of Entomology<br>

                  University of Wisconsin - Madison<br>

                  445 Russell Laboratories<br>

                  1630 Linden Drive<br>

                  Madison, WI 53706<br>

                  TaxonConcept Knowledge Base / GeoSpecies Knowledge

                  Base<br>

                  About the GeoSpecies Knowledge Base<br>

------------------------------------------------------------<br>

                        <br>

                   <br>

                </span></font></blockquote>

            <font face="Calibri, Verdana, Helvetica, Arial"><span

                style="font-size: 11pt;"> <br>

              </span></font></blockquote>

        </blockquote>

      </blockquote>

      <pre wrap=""><fieldset class="mimeAttachmentHeader"></fieldset>

This body part will be downloaded on demand.</pre>

    </blockquote>

    <br>

    <br>

    <pre class="moz-signature" cols="72">-- 

Thomas Bandholtz, <a class="moz-txt-link-abbreviated" href="mailto:thomas.bandholtz@innoq.com">thomas.bandholtz@innoq.com</a>, <a class="moz-txt-link-freetext" href="http://www.innoq.com">http://www.innoq.com</a> 

innoQ Deutschland GmbH, Halskestr. 17, D-40880 Ratingen, Germany

Phone: +49 228 9288490 Mobile: +49 178 4049387 Fax: +49 228 9288491

</pre>

  </body>

</html>