On 10/10/07, Eamonn O Tuama eotuama@gbif.org wrote:
Can't we keep the model simple and mandate that we are offering a faceted classification, similar to a general tagging system like folksonomies? So a particular InfoItem might have content which pertains to both genetics and ecology and when tagged with those two categories, the only inference that can be drawn is that the content is relevant to both. In a search, that InfoItem would be returned for either of those categories, but is also very likely to be appropriate to someone interested in "ecological genetics" who would search on the categories "ecology + genetics" or vice versa.
That may be an excellent idea indeed. But then we should call it a "tag" and not "category" or "class", and make clear that adding multiple tags will remain uninterpretable - other than as you indicated.
What worries me is that too much seems to be desired in terms of using the vocabulary, i.e. we have been discussing what the category (and also the context...) could all be used for, and as I tried to show in my talk, sometimes the semantics of what subclasses or modifies what seem to be reversed already in the examples. My goal is to make this clear.
The question remains what SPM is for. As you explain, it is more for finding information than algorithmically aggregating information. I understood the EOL/other taxon page use case to be about automatic aggregation.
In either case, I find it hard to see how anything useful can be derived from such aggregations without human interpretation, e.g., someone preparing a species page for EOL might harvest multiple InfoItems and arrange/edit them as appropriate. Just being able to harvest data through the SPM is
But that would be a one-time copy process. I think it would be important to clarify whether this is the use case or not. If it is, I am completely in favor of Eamonns "tagging approach".
surely helpful. In certain cases, where the domain might be restricted (an invasive species group), the community may achieve more automated aggregation by agreeing on how to use certain categories, e.g., I can see benefit in being able to list multiple distributions of a particular species one after the other, especially if biologists are tracking this over time and the information is being continually updated.
I think it would be beneficial to define specific kinds of infoitems for these use cases (especially distribution and organism interaction), to make it clear how the information is to be interpreted, and how it is extended.
That would mean an InfoItem = tagged free-form text, and a Distribution and OrganismInteraction class, which could be similar, but don't have to be, and may grow into different directions.
Essentially, this would allow to add further classes, which have even stronger differently structure for higher-structured data (categorical or quantitative data).
How about that?
Gregor