Re: [tdwg-content] [tdwg-tag] Idea for Discussion, Differentiating between "type's" of identifiers

19 Oct 2010

      I agree with Bob that solutions that might work well in the LOD Cloud might
not be optimal for some uses.

In one of my examples I have two links to a particular image because for
some services I need to use foaf:depicts to get the image to display
properly.

For instance this example:
http://sig.ma/search?pid=88b8af943c0734ae3c2322e0d6787417

<http://sig.ma/search?pid=88b8af943c0734ae3c2322e0d6787417>What I would
prefer is a widely accepted solution that allowed me to tie a photograph as
supporting documentation of a specimen and an occurrence record.

A lot of the issues with foaf etc are being worked out and there is enough
overlap between the LOD community and the tdwg community that often makes
sense to ask them, "this is what we need, how would you recommend we do
this".

That was the procedure I followed to get an "Area" solution that supports a
radius and works with the existing geo vocabulary.

You might get suggestions that are not exactly what you are looking for. You
might get conflicting opinions.

However, the end result is usually a set of well informed suggestions.

What Bob may not be appreciating is that by working within the greater LOD
community you get the same kinds of benefits that you often get from other
open source projects - data sets, tools and documentation that can be
extremely valuable. I would also argue that some of the techniques and
technologies used by the LOD community scale much better and enable more
efficient data harvesting, and are, in many ways easier, for small groups to
implement that other technologies.

I mentioned this to Steve in a separate email, but to some extent the first
step is to determine the kinds of queries you want to be able to make and
then use that as a guide for how to design the RDF.

I believe that I can design something that works well for the queries that I
need and works well in the LOD cloud. What is not clear is if tdwg will
choose to adopt any of this.

Respectfully,

- Pete

On Tue, Oct 19, 2010 at 8:21 AM, Bob Morris <morris.bob@gmail.com> wrote:
...
It is very important to keep the words "Linked Data" and "Linked Open
Data" clearly in the conversation and properly used.  Neither of them
is equivalent to the less well-defined "Semantic Web".
Whether temporarily or not,  neither LD and LOD  address specific some
use cases that are important for other semantic applications. As an
example, some of the recommended practices for LOD do not currently
support tractable reasoning either fundamentally or with current
reasoners. Intractable reasoning means, among other things, that it is
possible to launch queries that will never complete, and for which it
is not possible to know in advance whether that is the case or not.
The current conversation is treading on rather deep issues, some of
which are on the bleeding edge of Knowledge Representation research.
Premature decisions or KR-naive decisions will likely revisit the
history of Darwin Core itself. That is, a tremendously useful solution
will go a long way, provoke misuse along the way,  and then come up
against a stone wall requiring a major multi-year re-architecture
effort, perhaps with huge expense to retrofit to the previous uses.
For some insight into what the problems are,  one could do worse than
read the thread that begins(?) with
http://lists.w3.org/Archives/Public/public-lod/2010Jul/0330.html That
thread addresses the current wobbly state of the FOAF ontology, which
to my knowledge still remains without an agreed upon form that
guarantees tractable reasoning.
Also,  see especially the bullet points on "Most Notably"  in the
Objectives of 1st Workshop on Knowledge Injection into and Extraction
from Linked Data
http://www.wikicfp.com/cfp/servlet/event.showcfp?eventid=10142©ownerid=11212
Bob Morris
...
Hi all,
having just joined this list, I find it a great idea to have such an RDF
Task Group.
I am going to publish a little species catalog for the Federal
Environment
Agency in Germany as Linked Data, and I am looking for the best way to
express it in RDF.
In the Linked Data cloud [1] I find several related contributions, such
as
Geospecies, TaxonConcept, EUNIS, and more.
Comparing these approaches I prefer the idea of reusing SKOS [2] labels
and
hierarchical relations, as in the Geospecies example [3].
It might be a good idea to apply the SKOS XL extension as well to go
deeper
into the taxon name properties.
Finally, I would add the taxon ranks as a distinct concept scheme and
...
them to the taxon concepts with a mapping relation.
Certainly this is not the only way to go, but it is rather simple and
will
be easily understood as SKOS is quite common in the Linked Data
community.
This might be called "Simple Darwin Core" and give room for more complex
ontology approaches beyond that.
Looking forward to discussion,
Thomas
[1] http://lod-cloud.net
[2] http://www.w3.org/TR/2009/NOTE-skos-primer-20090818/
[3] http://lod.geospecies.org/ses/v6n7p.rdf
Peter J. DeVries
Am 07.10.2010 19:29, schrieb Blum, Stan:
Hi Steve,
Sorry, I missed your message below (as well as your response to Roger)
before I sent my reply about the utility of an RDF guide for DwC.
 Obviously, I think it’s a great idea.  To do this within the “normal”
TDWG
process, this should be done as a Task Group.  I could help you draft a
charter for that, which would then need to be reviewed by the TAG and
Exec.
 Once approved, we would put the charter up on the web site, and do our
best
to provide any other resources that would help speed the task.  I don’t
mean
to slow you down.  The Charter doesn’t have to be elaborate.  It’s
function
is to let others in TDWG and beyond know that this task is proceeding,
who
to contact, how to get involved, etc.  It also gives you the backing of
...
TDWG community.
Let me know if you’d like to pursue this.
-Stan
On 10/7/10 7:41 AM, "Steve Baskauf" <steve.baskauf@vanderbilt.edu>
wrote:
I agree that it is best to avoid a proliferation of terms and I agree
...
it is best to keep Darwin Core technology independent to the maximum
extent
possible.  However, I think that the case of facilitating HTTP URIs is a
special one because of the requirements of GUIDs/Persistent Identifiers.
 Both the TDWG and GBIF guidelines such as they currently stand say that
GUIDs must be resolvable, that in their resolution they must return RDF,
and
that the RDF has to be in an XML format.  Like it or not, that is what we
have.  Given the amount of time that it seems to have taken to settle on
that much, I think it is best for us to decide to live with it, warts and
all, rather than re-opening the discussion and delaying the
implementation
of GUIDs for another five years.
Given that assumption, there needs to be within Darwin Core some way to
support this particular "technology" (Linked Data, RDF/XML) even if we
don't
do "special" things to support other technologies such as LSID, DOI, etc.
 The point is well taken that most of those other technologies have
mechanisms for turning their identifiers into URIs and the aforementioned
guidelines lay out how owl:sameAs can be used within the RDF to associate
the non-HTTP-resolvable forms with the URIs.  Based on my admittedly
...
experience with trying to write RDF using Darwin Core terms, I think that
in
most cases there already exists appropriate terms for getting the job
done.
 What may be lacking is concrete examples and community consensus on what
terms to use for what.  I also think that there are probably some "ID"
terms
where it isn't really very important (from an RDF point of view) that
...
exist both a URI form and a text string form.  I'm thinking of something
like dwc:identificationID, which is mostly likely to be needed to allow a
machine to make a connection between some resource and its
identification.
 The machine isn't going to care if there is a human-readable version.
 In
contrast, something like dwc:collectionID is likely to need both a URI
version (e.g. proxied version of the BCI LSID) for the machines and a
string
version (the name of the collection as it would be displayed) for humans.
 I
think that trying to make example/template RDF for various types of
resources will help make it clear in which cases one version (URI), the
other (string), or both are actually necessary.
I "volunteered" a couple weeks ago to have a go at writing an RDF guide
for
Darwin Core.  I am still willing to do this, although I'm still getting
caught up at work from being at the TDWG meeting.  However, next week we
have fall break and I will make it a priority to come up with a draft
which
can be the subject of discussion.  As a part of this process, I think it
would be good to create one or more "boilerplate" RDF files for the
various
kinds of resources that are likely to be identified with GUIDs (e.g.
Occurrences, Taxa, etc.).  This can also be a subject of discussion and I
think it will help to clarify what will meet the actual needs that we
have
discussed in this thread.  I have a pretty clear picture of what I think
Occurrence RDF should look like. I'm going to have to depend on Pete and
others to deal with the taxonomy part.
Steve
Markus Döring wrote:
Steve, Pete,
Id like to draw your attention on a basic DarwinCore design pattern. Dwc
has
the goal of being technology independent by simply providing a list of
abstract terms one can use in various arenas such as xml, rdf, xhtml, csv
etc. And even within those there might be various ways of using them
(e.g.
we have a normalised and a simple flat xml schema), thats why we should
have
a guideline for each of them on how to use them. We are missing such a
guideline for rdf currently, hence this debate.
Whether scientificName is a literal string or some complex object
shouldnt
matter - its defined to be a scientific name. Such a dwc rdf property
could
either hold a literal string or a url to some name rdf:resource
(potentially
with a rdfs:label).
With the introduction if many ID terms we have diluted that idea a little
already in my mind. We could have as well used scientificName in xml to
hold
some identifier for that name. All URNs tell you what they are by their
urn
prefix (not necessarily how to resolve them), so you can easily detect a
UUID, LSID, http(s) url, ftp, doi and apply the conventional resolution
mechanism. The hardest problem are the local ids and other plain
identifers.
For those mainly we created the ID terms (at least in my mind). I am
feeling
rather uncomfortable discussing the introduction of specific dwc terms
for
each type of id. Maybe we should remove all id terms in dwc and use the
specific guidelines to specify these? At least if you really think having
all those id terms for rdf is a good thing I would feel much more
comfortable going down this route instead of diluting dwc by adding more
and
more rather redundant terms. The abstract concept is key to a dwc term,
not
the actual data type fo
rced by the technology you are using it with. Would you want several date
terms for various date formats? In fact we do that already to some degree
(eventDate, eventTime, year, month, day, verbatimEventDate) and I always
felt this is not a good idea. There are also a number of verbatimXXX
terms
in dwc which also contradict this pattern.
Talking about new dwc terms - in the examples given properties like
"hasScientificName" is not strictly the correct dwc term, which is simply
scientificName. I think it would be fine to have the convention in the
rdf
guidlines to use hasDwcTerm instead of dwcTerm, this is exactly what an
rdf
guideline is for. On the flip side I am sure this only applies to some
terms, recordBy for example is likely to remain as it is. Its unclear to
me
what is best to do really. Always stick to the original dwc terms? Refine
them through some rdfs or owl schema and define the relation to the
original
term? Should we still use the same namespace in this case?
As an rdf beginner even after a few years exposed I wonder if we cant
simply
stick to the non ID terms and use them either as literals or with a uri
pointer. As in the rdf world a resolvable http is really required for
resource relations to work, why not simply mandate this in the
guidelines?
If you only happen to have non resolvable uris like lsid or dois the
guidelines should be asking you to use proxied versions, knowing it will
break rdf frameworks and lod conventions otherwise. On the resolving side
one could always include such urns with owl:sameAs (or sth alike) I
believe.
But how many non resolvable ids with no matching http counterpart are
really
out there yet?
- Markus
On Oct 6, 2010, at 9:02, Peter DeVries wrote:
Hi Steve,
You are probably right that it might be best to use rdfs:Label, but I am
thinking we might be able to get the same
result my defining the string variants as subproperties of rdfs:Label.
This would make them an rdfs:Label but a special kind of rdfs:Label.
This is one of those things that I would test with Sindice and URIburner
to
see if they interpret these correctly.
This would require a live vocabulary that Sindice could look at to
determine
that hasScientificName is to be
treated as a  rdfs:Label.
- Pete
On Mon, Oct 4, 2010 at 10:41 AM, Steve Baskauf
<steve.baskauf@vanderbilt.edu> <mailto:steve.baskauf@vanderbilt.edu>
 wrote:
Although this specific example deals with taxonomic name identifiers, it
is
related to a previous discussion on this list about how we should use the
dwc:xxxxxID terms and other terms (such as recordedBy and identifiedBy)
...
could have either a string (literal) or URI form.  Although I don't
really
want to see an unnecessary proliferation of Darwin Core terms, I think
On Fri, Oct 8, 2010 at 8:07 AM, Thomas Bandholtz
<thomas.bandholtz@innoq.com> wrote:
link
the
that
limited
there
that
that
...
in the interest of clarity (particularly where RDF is involved) there
either
should be multiple terms that make it clear what form of identifier is
expected, or else there should be an understanding that in RDF the
default
for such a term is a URI which would then have an rdfs:Label property
which
was the string form.  I think the former would be preferable to the
latter.
I came to this opinion when trying to write RDF describing an herbarium
specimen.  The collector should be the dwc:recordedBy property of the
specimen.  Optimally, there would be a database in which known collectors
were assigned URIs so that "Glen N. Montz", "Glen Montz", "G. N. Montz",
etc. would all be different labels for the same resource.  However,
realistically, I'm not going to drop what I'm doing to set up such a
database (even if I were capable of doing it, which I'm not).  So I ended
up
just writing it as <dwc:recordedBy>Glen N. Montz</dwc:recordedBy> even
though I knew it wasn't probably the best thing.  In a large Occurrence
database that was compiled from the RDF created by a lot of people, there
might end up being a mixture of strings and URIs for dwc:recordedBy
properties of the specimens.  It seems to me like it would be better to
have
properties like dwc:recordedBy for strings and dwc:recordedByURI for a
corresponding URI (and I suppose dwc:reco
rdedByLSID if anyone wants to use it).  Of course, this would require a
number of term additions to DwC and clarification in the DwC
documentation
that the generic version was intended for strings.
With respect to the example
<dwc:hasScientificNameLSID
rdf:resource="urn:lsid:catalogueoflife.org:
taxon:24e7d624-60a7-102d-be47-00304854f810:ac2010"/>
I think you are right that (with the possible exception of rdfs:seeAlso)
there is an expectation that an rdf:resource attribute will be a
resolvable
URI that produces RDF.  So
<dwc:hasScientificNameLSID>urn:lsid:catalogueoflife.org:
taxon:24e7d624-60a7-102d-be47-00304854f810:ac2010</dwc:hasScientificNameLSID>
is probably better.
Steve
Peter DeVries wrote:
I have been thinking about the following pattern. In part after looking
at
the GBIF vocabulary.
I am not sure if it is even a good idea but might be worth some
discussion.
For those fields that have both a string and "ID" form maybe the
following
pattern might be useful
hasScientificName = string form
hasScientificNameURI = Resolvable LOD compliant identifier
hasScientificNameLSID = LSID identifier which could be resolvable once
you
add the "http:proxy" <http:proxy>  etc.
This allows all three forms to be included if desired, it also provides a
hint as to how the field should be interpreted or resolved.
One group could also provide a mapping service so that each record does
not
need to include all three forms, but would allow systems
to find the matching LSID for a given URI or vs. versa.
My concern was that it would be difficult to infer how a scientificNameID
should be interpreted by other systems.
Is this an LSD, is it a URI, is it a UUID etc. ?
This impacts the structure of the RDF.
* Note that the actual identifiers might not be correct, the example
below
is more about the form of the RDF
* For instance, I don't think it is probably correct to see the COL LSID
as
just a namestring
* Also in this example the GNI name does not exactly match the string
name
<dwc:hasScientificName>Puma concolor (Linnaeus
1771)</dwc:hasScientificName>
<dwc:hasScientificNameURI
rdf:resource="
http://gni.globalnames.org/name_strings/6c3dc35f-d901-5cc5-b9c8-ad241069b9f8
"
<
http://gni.globalnames.org/name_strings/6c3dc35f-d901-5cc5-b9c8-ad241069b9f8
/>
<dwc:hasScientificNameLSID
rdf:resource="urn:lsid:catalogueoflife.org:
taxon:24e7d624-60a7-102d-be47-00304854f810:ac2010"/>
Some system may choke on the LSID form assuming that it uses a standard
resolution mechanism
So it might be best to use this form
<dwc:hasScientificNameLSID>urn:lsid:catalogueoflife.org:
taxon:24e7d624-60a7-102d-be47-00304854f810:ac2010</dwc:hasScientificNameLSID>
- Pete
----------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
TaxonConcept Knowledge Base / GeoSpecies Knowledge Base
About the GeoSpecies Knowledge Base
------------------------------------------------------------
This body part will be downloaded on demand.
--
Thomas Bandholtz, thomas.bandholtz@innoq.com, http://www.innoq.com
innoQ Deutschland GmbH, Halskestr. 17, D-40880 Ratingen, Germany
Phone: +49 228 9288490 Mobile: +49 178 4049387 Fax: +49 228 9288491
_______________________________________________
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
--
Robert A. Morris
Emeritus Professor  of Computer Science
UMASS-Boston
100 Morrissey Blvd
Boston, MA 02125-3390
Associate, Harvard University Herbaria
email: morris.bob@gmail.com
web: http://bdei.cs.umb.edu/
web: http://etaxonomy.org/mw/FilteredPush
http://www.cs.umb.edu/~ram
phone (+1) 857 222 7992 (mobile)
_______________________________________________
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- 
----------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
TaxonConcept Knowledge Base <http://www.taxonconcept.org/> / GeoSpecies
Knowledge Base <http://lod.geospecies.org/>
About the GeoSpecies Knowledge Base <http://about.geospecies.org/>
------------------------------------------------------------