RE: tdwg-tag Digest, Vol 22, Issue 5
Gregor's wrote: "That may be an excellent idea indeed. But then we should call it a "tag" and not "category" or "class", and make clear that adding multiple tags will remain uninterpretable - other than as you indicated."
I don't really mind what we call our terms but if they are coming from a non-hierarchical, controlled vocabulary (a class?) (http://rs.tdwg.org/ontology/voc/SPMInfoItems.rdf) don't we leave open the possibility that these may eventually be arranged into an OWL ontology?
Gregor's wrote: "That would mean an InfoItem = tagged free-form text, and a Distribution and OrganismInteraction class, which could be similar, but don't have to be, and may grow into different directions."
So, both Gregor and Marcus are suggesting reducing the number of properties for an InfoItem to the following four (related properties in parentheses):
- organismInteraction (associatedTaxon) - subjectTag (category, context, contextValue) - region (contextOccurrence) - info (hasContent, hasValue)
I like the simplicity and do not think we are losing too much.
Éamonn
-----Original Message----- From: tdwg-tag-bounces@lists.tdwg.org [mailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of tdwg-tag-request@lists.tdwg.org Sent: 10 October 2007 12:00 To: tdwg-tag@lists.tdwg.org Subject: tdwg-tag Digest, Vol 22, Issue 5
Send tdwg-tag mailing list submissions to tdwg-tag@lists.tdwg.org
To subscribe or unsubscribe via the World Wide Web, visit http://lists.tdwg.org/mailman/listinfo/tdwg-tag or, via email, send a message with subject or body 'help' to tdwg-tag-request@lists.tdwg.org
You can reach the person managing the list at tdwg-tag-owner@lists.tdwg.org
When replying, please edit your Subject line so it is more specific than "Re: Contents of tdwg-tag digest..."
Today's Topics:
1. Re: RE: tdwg-tag Digest, Vol 22, Issue 2 (Gregor Hagedorn) 2. Re: RE: tdwg-tag Digest, Vol 22, Issue 1 (Gregor Hagedorn)
----------------------------------------------------------------------
Message: 1 Date: Wed, 10 Oct 2007 09:48:47 +0200 From: "Gregor Hagedorn" g.m.hagedorn@gmail.com Subject: Re: [tdwg-tag] RE: tdwg-tag Digest, Vol 22, Issue 2 To: tdwg-tag@lists.tdwg.org Message-ID: 5ebbead70710100048v4de595sf594cd36a3f4830d@mail.gmail.com Content-Type: text/plain; charset=ISO-8859-1
http://www.w3.org/TR/swbp-vocab-pub/ Gregor will especially like it. :-)
:-)
Indeed, the document convinced me that http+purls+content negotiation (which is the chosen protocol for the semantic web!) would have been a better choice than LSIDs. The document is referenced on several Wiki pages, e.g. http://wiki.tdwg.org/twiki/bin/view/GUID/CommunityPracticesForHttp-basedGUID s; it helped me understand why the semantic web can be based on http at all and that the perceived problems, that initially lead us to opt for LSIDs, probably do not exist.
Gregor
------------------------------
Message: 2 Date: Wed, 10 Oct 2007 10:00:20 +0200 From: "Gregor Hagedorn" g.m.hagedorn@gmail.com Subject: Re: [tdwg-tag] RE: tdwg-tag Digest, Vol 22, Issue 1 To: tdwg-tag@lists.tdwg.org Message-ID: 5ebbead70710100100j13d6a70ctd177ea11527fe30d@mail.gmail.com Content-Type: text/plain; charset=ISO-8859-1
On 10/10/07, Eamonn O Tuama eotuama@gbif.org wrote:
Can't we keep the model simple and mandate that we are offering a faceted classification, similar to a general tagging system like folksonomies? So
a
particular InfoItem might have content which pertains to both genetics and ecology and when tagged with those two categories, the only inference that can be drawn is that the content is relevant to both. In a search, that InfoItem would be returned for either of those categories, but is also
very
likely to be appropriate to someone interested in "ecological genetics"
who
would search on the categories "ecology + genetics" or vice versa.
That may be an excellent idea indeed. But then we should call it a "tag" and not "category" or "class", and make clear that adding multiple tags will remain uninterpretable - other than as you indicated.
What worries me is that too much seems to be desired in terms of using the vocabulary, i.e. we have been discussing what the category (and also the context...) could all be used for, and as I tried to show in my talk, sometimes the semantics of what subclasses or modifies what seem to be reversed already in the examples. My goal is to make this clear.
The question remains what SPM is for. As you explain, it is more for finding information than algorithmically aggregating information. I understood the EOL/other taxon page use case to be about automatic aggregation.
In either case, I find it hard to see how anything useful can be derived from such aggregations without human interpretation, e.g., someone
preparing
a species page for EOL might harvest multiple InfoItems and arrange/edit them as appropriate. Just being able to harvest data through the SPM is
But that would be a one-time copy process. I think it would be important to clarify whether this is the use case or not. If it is, I am completely in favor of Eamonns "tagging approach".
surely helpful. In certain cases, where the domain might be restricted (an invasive species group), the community may achieve more automated aggregation by agreeing on how to use certain categories, e.g., I can see benefit in being able to list multiple distributions of a particular
species
one after the other, especially if biologists are tracking this over time and the information is being continually updated.
I think it would be beneficial to define specific kinds of infoitems for these use cases (especially distribution and organism interaction), to make it clear how the information is to be interpreted, and how it is extended.
That would mean an InfoItem = tagged free-form text, and a Distribution and OrganismInteraction class, which could be similar, but don't have to be, and may grow into different directions.
Essentially, this would allow to add further classes, which have even stronger differently structure for higher-structured data (categorical or quantitative data).
How about that?
Gregor
------------------------------
_______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
End of tdwg-tag Digest, Vol 22, Issue 5 ***************************************
Gregor's wrote: "That may be an excellent idea indeed. But then we should call it a "tag" and not "category" or "class", and make clear that adding multiple tags will remain uninterpretable - other than as you indicated."
Eamonn wrote: I don't really mind what we call our terms but if they are coming from a non-hierarchical, controlled vocabulary (a class?) (http://rs.tdwg.org/ontology/voc/SPMInfoItems.rdf) don't we leave open the possibility that these may eventually be arranged into an OWL ontology?
I did not mean that, I meant that if *two* tags are added, that the relation of these tags should not be interpreted other than indicated by you. If two terms are added, one provider may intend a union, another an intersection, another a specification of the first by the second. I think we should make a recommendation that SPM is not meant to allow this interpretation detail.
What you say is that one could introduce a new term "ecological genetics" and then define it to be the intersection of ecology and genetics, using OWL statements. I believe you can not define in OWL how the repeated occurrence of a single attribute "Tag" with two values shall be interpreted semantically, right?
Gregor
So, both Gregor and Marcus are suggesting reducing the number of properties for an InfoItem to the following four (related properties in parentheses):
- organismInteraction (associatedTaxon)
- subjectTag (category, context, contextValue)
- region (contextOccurrence)
- info (hasContent, hasValue)
I would prefer to reduce it to a *single* highly generic property like "info", "data", "generics", and introduce the different classes as as the possible types of this property:
<SPM>
<info> <Text><content> sdfsdfsdf</content></Text> </info>
<info> <OrganismInteraction> <interaction resource="pollination"/> <taxon resource="first taxon"/> <taxon resource="second taxon"/> <geoContext resource="Germany"/> </OrganismInteraction> </info>
<info> <Distribution> <status resource="neobiota"/> <geoContext resource="Germany"/> <geoContext resource="Netherlands"/> </Distribution> </info>
</SPM>
A single attribute term seems to simplify extensibility in the future (other classes, e.g. a specific to inform about available Genbank accessions, etc.).
Distribution: I suggest that Occurrence should be limited to the context of species distribution; I think it would be better to reserve Occurrence for actual occurrence as in the other TDWG standards using TaxonOccurrence. Distribution information of a species is a hypothetical, or synthetical information that could be accompanied by estimates of frequency, OccurrenceDensity etc. We could use the same term, of course, but my feeling is that we avoid misunderstandings if we separate terms here.
Furthermore, I would appreciate to allow the occurrence of RDF collections, i.e. the info/data/generics attribute could be directly in SPM or in a collection container (SEQ, BAG). Again, this may be an argument for a single attribute, although a sequence of different attribute may also be possible in RDF, I don't really know.
Gregor
2+ thoughts.
Why is region/geography a special case and not covered like any other kind of context with a subjectTag? It could point to a polygon or TDWG geographic region or whatever.
The same could be argued for associatedTaxon. (I prefer associatedTaxon to organismInteraction - the two taxa concerned may not interact we may just be saying they occur in the same habitat or that Taxon A has some shared characteristic with taxon B - though of course every atom in the universe does interact with every other in some way I suppose.)
What is the difference between tagging something with a geographic region and tagging it with a taxon?
If this is boiled down far enough we don't need the InfoItem as a container and can use common vocabularies (DublinCore) for most of this stuff.
<tdwg:SpeciesProfile rdf:about="http://my.guid.could.be.lsid.org%22%3E <tdwg:aboutTaxon rdf:about="urn:lsid:of:some:taxon" /> <tdwg:associatedTaxon rdf:about="urn:lsid:of:some:other:taxon" /> dc:description This is some text about how good this is to eat and other stuff. </dc:descriptiono> <dc:subject rdf:about="http://my.controlled.list.of.terms#cooking" /> <dc:subject rdf:about="http://my.controlled.list.of.terms#Brazil" /> </tdwg:SpeciesProfile >
If we want to express Taxon-Taxon interactions Kevin Richards and I already came up with something to use for the HerbIMI LSID Authority
http://rs.tdwg.org/ontology/voc/TaxonOccurrenceInteraction
Note that this defines interactions between *occurrences* not taxa and the occurrences provide the context. I am not sure that this is the place to get into defining interactions other than in the most general way.
I feel we a washing around without the metric of a use case to test these ideas against.
The use case we started with was blocks of text that describe a particular aspect of a taxon in a particular context. That can be done really simply perhaps with just DublinCore and only a couple of our own predicates but we seem to be getting off the path and to be starting to over complicating things.
If you look at the DublinCore definition of "subject" it says: "The topic of the resource. Typically, the topic will be represented using keywords, key phrases, or classification codes. Recommended best practice is to use a controlled vocabulary. To describe the spatial or temporal topic of the resource, use the Coverage element." The coverage element says "Spatial characteristics of the intellectual content of the resource."
Could SPM just boil down to a "controlled vocabulary" for DublinCore metadata tags on chunks of text plus a predicate to indicate the taxon we are talking about? We could just do an applicability statement on how to tag them in HTML!
Roger
On 11 Oct 2007, at 16:28, Eamonn O Tuama wrote:
Gregor's wrote: "That may be an excellent idea indeed. But then we should call it a "tag" and not "category" or "class", and make clear that adding multiple tags will remain uninterpretable - other than as you indicated."
I don't really mind what we call our terms but if they are coming from a non-hierarchical, controlled vocabulary (a class?) (http://rs.tdwg.org/ontology/voc/SPMInfoItems.rdf) don't we leave open the possibility that these may eventually be arranged into an OWL ontology?
Gregor's wrote: "That would mean an InfoItem = tagged free-form text, and a Distribution and OrganismInteraction class, which could be similar, but don't have to be, and may grow into different directions."
So, both Gregor and Marcus are suggesting reducing the number of properties for an InfoItem to the following four (related properties in parentheses):
- organismInteraction (associatedTaxon)
- subjectTag (category, context, contextValue)
- region (contextOccurrence)
- info (hasContent, hasValue)
I like the simplicity and do not think we are losing too much.
Éamonn
-----Original Message----- From: tdwg-tag-bounces@lists.tdwg.org [mailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of tdwg-tag-request@lists.tdwg.org Sent: 10 October 2007 12:00 To: tdwg-tag@lists.tdwg.org Subject: tdwg-tag Digest, Vol 22, Issue 5
Send tdwg-tag mailing list submissions to tdwg-tag@lists.tdwg.org
To subscribe or unsubscribe via the World Wide Web, visit http://lists.tdwg.org/mailman/listinfo/tdwg-tag or, via email, send a message with subject or body 'help' to tdwg-tag-request@lists.tdwg.org
You can reach the person managing the list at tdwg-tag-owner@lists.tdwg.org
When replying, please edit your Subject line so it is more specific than "Re: Contents of tdwg-tag digest..."
Today's Topics:
- Re: RE: tdwg-tag Digest, Vol 22, Issue 2 (Gregor Hagedorn)
- Re: RE: tdwg-tag Digest, Vol 22, Issue 1 (Gregor Hagedorn)
Message: 1 Date: Wed, 10 Oct 2007 09:48:47 +0200 From: "Gregor Hagedorn" g.m.hagedorn@gmail.com Subject: Re: [tdwg-tag] RE: tdwg-tag Digest, Vol 22, Issue 2 To: tdwg-tag@lists.tdwg.org Message-ID: 5ebbead70710100048v4de595sf594cd36a3f4830d@mail.gmail.com Content-Type: text/plain; charset=ISO-8859-1
http://www.w3.org/TR/swbp-vocab-pub/ Gregor will especially like it. :-)
:-)
Indeed, the document convinced me that http+purls+content negotiation (which is the chosen protocol for the semantic web!) would have been a better choice than LSIDs. The document is referenced on several Wiki pages, e.g. http://wiki.tdwg.org/twiki/bin/view/GUID/CommunityPracticesForHttp- basedGUID s; it helped me understand why the semantic web can be based on http at all and that the perceived problems, that initially lead us to opt for LSIDs, probably do not exist.
Gregor
Message: 2 Date: Wed, 10 Oct 2007 10:00:20 +0200 From: "Gregor Hagedorn" g.m.hagedorn@gmail.com Subject: Re: [tdwg-tag] RE: tdwg-tag Digest, Vol 22, Issue 1 To: tdwg-tag@lists.tdwg.org Message-ID: 5ebbead70710100100j13d6a70ctd177ea11527fe30d@mail.gmail.com Content-Type: text/plain; charset=ISO-8859-1
On 10/10/07, Eamonn O Tuama eotuama@gbif.org wrote:
Can't we keep the model simple and mandate that we are offering a faceted classification, similar to a general tagging system like folksonomies? So
a
particular InfoItem might have content which pertains to both genetics and ecology and when tagged with those two categories, the only inference that can be drawn is that the content is relevant to both. In a search, that InfoItem would be returned for either of those categories, but is also
very
likely to be appropriate to someone interested in "ecological genetics"
who
would search on the categories "ecology + genetics" or vice versa.
That may be an excellent idea indeed. But then we should call it a "tag" and not "category" or "class", and make clear that adding multiple tags will remain uninterpretable - other than as you indicated.
What worries me is that too much seems to be desired in terms of using the vocabulary, i.e. we have been discussing what the category (and also the context...) could all be used for, and as I tried to show in my talk, sometimes the semantics of what subclasses or modifies what seem to be reversed already in the examples. My goal is to make this clear.
The question remains what SPM is for. As you explain, it is more for finding information than algorithmically aggregating information. I understood the EOL/other taxon page use case to be about automatic aggregation.
In either case, I find it hard to see how anything useful can be derived from such aggregations without human interpretation, e.g., someone
preparing
a species page for EOL might harvest multiple InfoItems and arrange/edit them as appropriate. Just being able to harvest data through the SPM is
But that would be a one-time copy process. I think it would be important to clarify whether this is the use case or not. If it is, I am completely in favor of Eamonns "tagging approach".
surely helpful. In certain cases, where the domain might be restricted (an invasive species group), the community may achieve more automated aggregation by agreeing on how to use certain categories, e.g., I can see benefit in being able to list multiple distributions of a particular
species
one after the other, especially if biologists are tracking this over time and the information is being continually updated.
I think it would be beneficial to define specific kinds of infoitems for these use cases (especially distribution and organism interaction), to make it clear how the information is to be interpreted, and how it is extended.
That would mean an InfoItem = tagged free-form text, and a Distribution and OrganismInteraction class, which could be similar, but don't have to be, and may grow into different directions.
Essentially, this would allow to add further classes, which have even stronger differently structure for higher-structured data (categorical or quantitative data).
How about that?
Gregor
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
End of tdwg-tag Digest, Vol 22, Issue 5
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
Roger writes:
Why is region/geography a special case and not covered like any other kind of context with a subjectTag? It could point to a polygon or TDWG geographic region or whatever.
The principle question to me is whether we simply want to have
tag = distribution text = occurring in higher altitude value = archibiota value = visible-only-in-summer value = high-altitude-distribution value = italy
that is, the consumer has to figure out what the meaning of the categorical data is and what the relevance of them is with regard to distribution, or whether specific attributes structure this information so that a consumer can easily find information it is interested in, like
tag = distribution text = occurring in higher altitude status = archibiota geoArea = italy modifier = visible-only-in-summer altitude = high-altitude-distribution
Structurally the first is clearly possible, but it seems to require a lot of semantic analysis to interpret, and my feeling is that it is liable to misinterpretation.
The same could be argued for associatedTaxon. (I prefer associatedTaxon to organismInteraction - the two taxa concerned may not interact we may just be saying they occur in the same habitat or that Taxon A has some shared characteristic with taxon B - though of course every atom in the universe does interact with every other in some way I suppose.)
There could be a taxon association, and that is interesting raw information on a specimen level. However, when talking about properties/traits of organisms, we are talking of knowledge, and the fact that somewhere, sometime on earth two organisms have been seen together is rather uninteresting. I want to learn about pathogens, pollinators, not the fox that happened to be present while the plant was pollinated. That is exactly what I would hope to express by using organimsInteraction instead.
What is the difference between tagging something with a geographic region and tagging it with a taxon?
Or tagging it with a color code, or tagging it with a nomenclatural code, or tagging it with a museum name? Clearly all are categories, but it seems to me they are of a different kind, i.e. the role why they are added is different.
If this is boiled down far enough we don't need the InfoItem as a container and can use common vocabularies (DublinCore) for most of this stuff.
<tdwg:SpeciesProfile rdf:about="http://my.guid.could.be.lsid.org%22%3E <tdwg:aboutTaxon rdf:about="urn:lsid:of:some:taxon" /> <tdwg:associatedTaxon rdf:about="urn:lsid:of:some:other:taxon" /> dc:description This is some text about how good this is to eat and other stuff. </dc:descriptiono> <dc:subject rdf:about="http://my.controlled.list.of.terms#cooking" /> <dc:subject rdf:about="http://my.controlled.list.of.terms#Brazil" /> </tdwg:SpeciesProfile >
My problem with this is that it is unclear whether the taxon occurs in Brazil, occurs in Argentine and is imported and cooked in Brazil, or whether it occurs and is cooked in Germany, but the cooking recipe originates from Brazil.
In constrast, having
<dc:distribution rdf:resource="http://my.controlled.list.of.terms#Brazil" />
seems to make this clear, provide semantics are defined for distribution. I am mostly interested in analyzing taxon-specific information for identification and phylogenetics, and it seems to me that the first kind of communication would be worthless for such purposes.
If we want to express Taxon-Taxon interactions Kevin Richards and I already came up with something to use for the HerbIMI LSID Authority
http://rs.tdwg.org/ontology/voc/TaxonOccurrenceInteraction
Note that this defines interactions between *occurrences* not taxa and the occurrences provide the context. I am not sure that this is the place to get into defining interactions other than in the most general way.
The values of interaction kind should be defined elsewhere. However, I would prefer the concept that such values exist visible and defined. I consider placing them as tags on the interaction class, but other solutions are possible. I propose to have a special type for it because we have an interaction of
I don't want to reject Roger's idea of remerging classes. Separating out datatypes to me is a vehicle to be able to come up with sharper definitions of the semantics of the various class attributes, expressing which is a measurement which an aggregation, what is aggregable, what is a context and what is scope under which aggregation was performed, what is a subclassing of the aggregation concept, what is subclassing of value concept, what is a frequency, what a probability of a statement. This is the major concern I have about the generality of SPM 0.2. The list was full of ideas for which purposes value and context could be used but when receiving the data no generic decoding seemed to be possible to me.
In SDD we distinguished between original measurements (SampleData) and aggregations (SummaryData - which often already occur on the single specimen level), between an aggregation Scope, and a Modification (subclassing) of values and characters. The solution chosen is heavily skewed towards acceptance. Originally we had frequency, probability, value modification and character modification, all as values and text. However, it was considered too complex so that now the modifier is overloaded with all these (but the modifier concepts carry a classification that allows making these distinctions in the end). The solutions in SDD are particular, and it would be good to make them more general as a result of the current discussion - but I don't think the issues we tried to solve do not exist.
All this is largely irrelevant for free-form text, but what we are discussing here is simply not free-form text, but exactly this.
If you look at the DublinCore definition of "subject" it says: "The topic of the resource. Typically, the topic will be represented using keywords, key phrases, or classification codes. Recommended best practice is to use a controlled vocabulary. To describe the spatial or temporal topic of the resource, use the Coverage element." The coverage element says "Spatial characteristics of the intellectual content of the resource."
Could SPM just boil down to a "controlled vocabulary" for DublinCore metadata tags on chunks of text plus a predicate to indicate the taxon we are talking about? We could just do an applicability statement on how to tag them in HTML!
In DublinCore the taxon would simply be a value of subject. The problem that results with this is we would end up with:
subject = pathogen, pollinator, taxon1, taxon2, taxon3, coverage = Germany, Italy, UK, summer, 1950-2007
which is ok for roughly finding something that might be interesting to read (which is what DC is good for) but almost worthless if you want to figure out what pathogens taxon2 has in Germany. That is what we need the container / envelope for, keeping things together. Furthermore, my intuition is that it is significantly easier to process data if in advance I know something is a status value, a geographic area, a taxon, a tag - all of which may come from an external rather than TDWG vocabulary - rather than having to figure this out using owl.
But that is a principle question. In current SPM I found it very hard to figure out which is the context and what context means. Context in an actual observation / specimen is quite clear, but I find it difficult to have "invasive" as value and "Germany" as context. Others might want to have "frequently" and "rarely" as values and "Germany" as context. Or both...
I cannot say whether in the future software will simply effortlessly figure out what kind of category a value is (taxon, geoarea, frequency, distributionstatus, conservationstatus, etc.) and analyse the implications. But the brazilian cooking example to me indicates that without some guidance, drastically alternative interpretations are possible.
What I am after with defining multiple classes like FreeFormText, Markuptext, Distribution, Interaction, QuantitativeMeasurement, CategoricalMeasurement, MolecularSequence is to give enough guidance to be able to explain how in a particular case the attributes relate to each other - and provide an appropriate context for extension with further attributes. My current understand is that if we do not explain this outside of RDF/OWL we would be forced to model it through reification, which we all seem to strive to avoid...
Gregor
I did wonder about region too but then assumed that both it and associatedTaxon - 'where' and 'what' - were key properties of an InfoItem that could be concisely defined. The main 'what', of course, being the 'aboutTaxon'. All that was missing was a 'when' property.
But Roger's model below is even more simple - I like this. He is right that we should keep our prime use case in mind - "The use case we started with was blocks of text that describe a particular aspect of a taxon in a particular context. " - the simpler the model, the more likely the uptake, and if we can make use of the popular Dublin Core then lots of clients should be able to understand our tagging system.
Éamonn
Roger Hyam wrote:
2+ thoughts.
Why is region/geography a special case and not covered like any other kind of context with a subjectTag? It could point to a polygon or TDWG geographic region or whatever.
The same could be argued for associatedTaxon. (I prefer associatedTaxon to organismInteraction - the two taxa concerned may not interact we may just be saying they occur in the same habitat or that Taxon A has some shared characteristic with taxon B - though of course every atom in the universe does interact with every other in some way I suppose.)
What is the difference between tagging something with a geographic region and tagging it with a taxon?
If this is boiled down far enough we don't need the InfoItem as a container and can use common vocabularies (DublinCore) for most of this stuff.
<tdwg:SpeciesProfile rdf:about="http://my.guid.could.be.lsid.org%22%3E <tdwg:aboutTaxon rdf:about="urn:lsid:of:some:taxon" /> <tdwg:associatedTaxon rdf:about="urn:lsid:of:some:other:taxon" /> dc:description This is some text about how good this is to eat and other stuff. </dc:descriptiono> <dc:subject rdf:about="http://my.controlled.list.of.terms#cooking" /> <dc:subject rdf:about="http://my.controlled.list.of.terms#Brazil" /> </tdwg:SpeciesProfile >
If we want to express Taxon-Taxon interactions Kevin Richards and I already came up with something to use for the HerbIMI LSID Authority
http://rs.tdwg.org/ontology/voc/TaxonOccurrenceInteraction
Note that this defines interactions between *occurrences* not taxa and the occurrences provide the context. I am not sure that this is the place to get into defining interactions other than in the most general way.
I feel we a washing around without the metric of a use case to test these ideas against.
The use case we started with was blocks of text that describe a particular aspect of a taxon in a particular context. That can be done really simply perhaps with just DublinCore and only a couple of our own predicates but we seem to be getting off the path and to be starting to over complicating things.
If you look at the DublinCore definition of "subject" it says: "The topic of the resource. Typically, the topic will be represented using keywords, key phrases, or classification codes. Recommended best practice is to use a controlled vocabulary. To describe the spatial or temporal topic of the resource, use the Coverage element." The coverage element says "Spatial characteristics of the intellectual content of the resource."
Could SPM just boil down to a "controlled vocabulary" for DublinCore metadata tags on chunks of text plus a predicate to indicate the taxon we are talking about? We could just do an applicability statement on how to tag them in HTML!
Roger
On 11 Oct 2007, at 16:28, Eamonn O Tuama wrote:
Gregor's wrote: "That may be an excellent idea indeed. But then we should call it a "tag" and not "category" or "class", and make clear that adding multiple tags will remain uninterpretable - other than as you indicated."
I don't really mind what we call our terms but if they are coming from a non-hierarchical, controlled vocabulary (a class?) (http://rs.tdwg.org/ontology/voc/SPMInfoItems.rdf) don't we leave open the possibility that these may eventually be arranged into an OWL ontology?
Gregor's wrote: "That would mean an InfoItem = tagged free-form text, and a Distribution and OrganismInteraction class, which could be similar, but don't have to be, and may grow into different directions."
So, both Gregor and Marcus are suggesting reducing the number of properties for an InfoItem to the following four (related properties in parentheses):
- organismInteraction (associatedTaxon)
- subjectTag (category, context, contextValue)
- region (contextOccurrence)
- info (hasContent, hasValue)
I like the simplicity and do not think we are losing too much.
Éamonn
-----Original Message----- From: tdwg-tag-bounces@lists.tdwg.org [mailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of tdwg-tag-request@lists.tdwg.org Sent: 10 October 2007 12:00 To: tdwg-tag@lists.tdwg.org Subject: tdwg-tag Digest, Vol 22, Issue 5
Send tdwg-tag mailing list submissions to tdwg-tag@lists.tdwg.org
To subscribe or unsubscribe via the World Wide Web, visit http://lists.tdwg.org/mailman/listinfo/tdwg-tag or, via email, send a message with subject or body 'help' to tdwg-tag-request@lists.tdwg.org
You can reach the person managing the list at tdwg-tag-owner@lists.tdwg.org
When replying, please edit your Subject line so it is more specific than "Re: Contents of tdwg-tag digest..."
Today's Topics:
- Re: RE: tdwg-tag Digest, Vol 22, Issue 2 (Gregor Hagedorn)
- Re: RE: tdwg-tag Digest, Vol 22, Issue 1 (Gregor Hagedorn)
Message: 1 Date: Wed, 10 Oct 2007 09:48:47 +0200 From: "Gregor Hagedorn" g.m.hagedorn@gmail.com Subject: Re: [tdwg-tag] RE: tdwg-tag Digest, Vol 22, Issue 2 To: tdwg-tag@lists.tdwg.org Message-ID: 5ebbead70710100048v4de595sf594cd36a3f4830d@mail.gmail.com Content-Type: text/plain; charset=ISO-8859-1
http://www.w3.org/TR/swbp-vocab-pub/ Gregor will especially like it. :-)
:-)
Indeed, the document convinced me that http+purls+content negotiation (which is the chosen protocol for the semantic web!) would have been a better choice than LSIDs. The document is referenced on several Wiki pages, e.g. http://wiki.tdwg.org/twiki/bin/view/GUID/CommunityPracticesForHttp-basedGUID
s; it helped me understand why the semantic web can be based on http at all and that the perceived problems, that initially lead us to opt for LSIDs, probably do not exist.
Gregor
Message: 2 Date: Wed, 10 Oct 2007 10:00:20 +0200 From: "Gregor Hagedorn" g.m.hagedorn@gmail.com Subject: Re: [tdwg-tag] RE: tdwg-tag Digest, Vol 22, Issue 1 To: tdwg-tag@lists.tdwg.org Message-ID: 5ebbead70710100100j13d6a70ctd177ea11527fe30d@mail.gmail.com Content-Type: text/plain; charset=ISO-8859-1
On 10/10/07, Eamonn O Tuama eotuama@gbif.org wrote:
Can't we keep the model simple and mandate that we are offering a faceted classification, similar to a general tagging system like folksonomies? So
a
particular InfoItem might have content which pertains to both genetics and ecology and when tagged with those two categories, the only inference that can be drawn is that the content is relevant to both. In a search, that InfoItem would be returned for either of those categories, but is also
very
likely to be appropriate to someone interested in "ecological genetics"
who
would search on the categories "ecology + genetics" or vice versa.
That may be an excellent idea indeed. But then we should call it a "tag" and not "category" or "class", and make clear that adding multiple tags will remain uninterpretable - other than as you indicated.
What worries me is that too much seems to be desired in terms of using the vocabulary, i.e. we have been discussing what the category (and also the context...) could all be used for, and as I tried to show in my talk, sometimes the semantics of what subclasses or modifies what seem to be reversed already in the examples. My goal is to make this clear.
The question remains what SPM is for. As you explain, it is more for finding information than algorithmically aggregating information. I understood the EOL/other taxon page use case to be about automatic aggregation.
In either case, I find it hard to see how anything useful can be derived from such aggregations without human interpretation, e.g., someone
preparing
a species page for EOL might harvest multiple InfoItems and arrange/edit them as appropriate. Just being able to harvest data through the SPM is
But that would be a one-time copy process. I think it would be important to clarify whether this is the use case or not. If it is, I am completely in favor of Eamonns "tagging approach".
surely helpful. In certain cases, where the domain might be restricted (an invasive species group), the community may achieve more automated aggregation by agreeing on how to use certain categories, e.g., I can see benefit in being able to list multiple distributions of a particular
species
one after the other, especially if biologists are tracking this over time and the information is being continually updated.
I think it would be beneficial to define specific kinds of infoitems for these use cases (especially distribution and organism interaction), to make it clear how the information is to be interpreted, and how it is extended.
That would mean an InfoItem = tagged free-form text, and a Distribution and OrganismInteraction class, which could be similar, but don't have to be, and may grow into different directions.
Essentially, this would allow to add further classes, which have even stronger differently structure for higher-structured data (categorical or quantitative data).
How about that?
Gregor
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
End of tdwg-tag Digest, Vol 22, Issue 5
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
Two points:
a) I think "subject" is much better than subjectTag.
b) simple DublinCore is great if it indeed it matches the use case. If what you want are minipublications of free form text and you do not want to use them for automatic reasoning and data aggregation (which the SPM examples to me always seemed to imply). The EOL / species page use case seemed to me to imply the desire to automatically aggregate data. Eamonn says this is not the case, if the use case is that humans find information and then rework it, I am all in favor of using DC.
Gregor
participants (4)
-
Eamonn O Tuama
-
Gregor Hagedorn
-
Roger Hyam
-
Éamonn Ó Tuama