[tdwg-content] Another example of non-overlapping concepts

Peter DeVries pete.devries at gmail.com
Wed May 18 02:36:09 CEST 2011


Sorry,

I meant http://www.freebase.com/view/en/puma.

Sort of , I have an "all different in my people ontology.

DBpedia has the same link with freebase. I wonder if that freebase URI has
changed recently.

- Pete



On Tue, May 17, 2011 at 5:17 PM, Jim Croft <jim.croft at gmail.com> wrote:

> hmmm... a  http://www.freebase.com/view/en/cougar seems to be an
> 'armoured fighting vehicle'...
>
> which might be part of the point you are making... wondering if
> NotEvenRemotelyTheSameAs is a valid relationship type... ;)
>
> jim
>
> On Wed, May 18, 2011 at 8:00 AM, Peter DeVries <pete.devries at gmail.com>
> wrote:
> > Hi Matt,
> > It took me a while to ponder your question. There is a long answer which
> > complex and easily misinterpreted and there is a shorter answer.
> > For now I think the "shorter" answer set in a historical context is best.
> > The best use of my abilities seems to be recognizing a "ability gap" and
> > figuring out a technical solution or tool to address it.
> > The most visible of these were involving microscopy and visualization
> tools
> > to make complex ideas understandable.
> > My interest in the species problem dates back to when I had the
> opportunity
> > to talk with E.O. WIlson in 1991/1992.
> > At that time he said that if you have a knack for computers we need all
> this
> > information in databases so it is accessible.
> > *One of his former Ph.D. students is on my committee.
> > Years later I had the opportunity to work on questions like this and
> started
> > to think about how to connect all these disparate facts about species
> > together in a usable queryable knowledge base.
> > I noticed that several groups and individuals were marking up data sets
> > including observations with different scientific names even though they
> were
> > clearly meaning the same "species".
> > * These groups would agree that they were communicating about the same
> > species, but not always agree on the name
> > This prevents large scale data integration and analysis which in part is
> > described here: http://about.geospecies.org/
> > With the advent of the web, and the the semantic web in particular, this
> > "database" could be global and almost infinitely scalable.
> > I started lobbying TDWG starting in 2006 for two things:
> > 1) A GUID for the "species" that was not tied to a particular name string
> > 2) A system that followed semantic web best practices which LSID etc. do
> > not.
> > Since my TDWG efforts were not successful, I started GeoSpecies and based
> on
> > comments from a semantic web expert modified these somewhat into what is
> now
> > TaxonConcept.org
> > The TCS is an xml standard for transmitting information about a taxon
> > concepts that I think maps best to a "name use concept." (Rich's TNU's)
> > The TaxonConcepts are identified with semantic web GUIDs that follow
> > semantic web best practices and resolve to an informative documents.
> > In their current form these documents are not ideal because they do not
> do a
> > good enough job clearing up what would be the best concept match for a
> given
> > individual or specimen.
> > They do however have most of the plumbing for this in that they allow
> > semantic web links to name uses, specimens, occurrence records, images,
> DNA,
> > authors and publications including the original description.
> > They also link to similar entities that are on the semantic web, most
> > notably DBpedia, Uniprot, Freebase, Bio2RDF etc.
> > This linking may not seem valuable to a humans, is valuable for machines
> > that need to determine what entities are similar and what entities are
> > different.
> > This also increases the "findability" of these other data sets.
> > I see my current set of about 105,000 species as an example set that
> people
> > can use to try out these models.
> > In their final form these should be authored by editors that determine
> what
> > specimens and other data are good examples of instances of these
> concepts.
> > The editors will be linked via a URI so it is easy to track attribution.
> > The final concepts do not have to be in one place, they could be
> distributed
> > but to avoid the kinds of nomenclatural differences that have occurred
> > between zoology / botany etc it would be best to have one code base for
> now.
> > They don't have to have the same underlying stack, which now is based on
> > Ruby on Rails, but could be ported to anything.
> > What they do need is a common structure and a common understanding as to
> > what each attribute means and how it can be appropriately used.
> > For some use cases it is appropriate to consider the following the same
> > "thing"
> >  http://lod.taxonconcept.org/ses/v6n7p#Species
> >  http://purl.uniprot.org/taxonomy/9696
> >  http://www.freebase.com/view/en/cougar
> > http://sw.opencyc.org/concept/Mx4rvVj5o5wpEbGdrcN5Y29ycA
> > http://www.bbc.co.uk/nature/species/Cougar#species
> >
> > For other use cases, this sameAs is not appropriate.
> > Wikipedia is very valuable, but if someone changes the article title then
> > the URI changes in DBpedia.
> > Uniprot and Bio2RDF are useful in that they link to lots of related data
> but
> > they don't really give you any information about what specimens are
> > instances of that concept and they only have those species which have
> NCBI
> > ID's.
> > What I want is a set of GUID's that resolve to a human readable HTML page
> > and an RDF representation that people can use to "tag" their data.
> > For instance:
> > I am going to assert that what I have under the microscope is an instance
> of
> > the concept described on this page. I do not tie this assertion to a
> > particular name or classification hierarchy.
> > Because it makes no sense to replicate the functionality of the
> Encyclopedia
> > of Life etc., I am mainly concentrating on the RDF representations and
> > testing if they behave as expected in SPARQL queries.
> > * The HTML pages are not really pretty or as informative as the RDF or as
> > the concept as viewed in the knowledge base.
> > I have been working with the Encyclopedia of Life and GNI groups for a
> while
> > exploring how these may or may not be useful to them.
> > During my visited Woods Hole I said that I have no interest in building
> and
> > empire I just want to build a solution and would like to partner with
> them
> > and GBIF.
> > Although I remain active on TDWG I find the most valuable suggestions
> seem
> > to come from the LOD community since we seem to have a common goal - that
> is
> > creating something that works in a reasonable amount of time.
> > Also, in the LOD cloud every linked data set increases the value of all
> the
> > other data sets.
> > This is probably more than your question required, but it provides some
> > explanation as to what these are and why I have implemented them in the
> way
> > I have.
> > Respectfully,
> > - Pete
> >
> >
> > On Fri, May 13, 2011 at 4:14 PM, Matt Jones <jones at nceas.ucsb.edu>
> wrote:
> >>
> >> Hi Peter,
> >> Does your idea of #ObjectiveSpeciesModel correspond 1:1 with the TCS
> >> standard's idea of a Nominal Concept (i.e., <TaxonConcept
> type="nominal">) ?
> >>  Can you outline how your concept types differ from TCS concept types?
> >> Thanks,
> >> Matt
> >> On Fri, May 13, 2011 at 12:41 PM, Peter DeVries <pete.devries at gmail.com
> >
> >> wrote:
> >>>
> >>> Hi Nico,
> >>> Thanks for posting this.
> >>> I have something in the concept model to indicate the basis for the
> >>> species concept.
> >>> For now I have three types. An individual species concept can have a
> >>> combination of one, two or all three
> >>> In the RDF they look like this
> >>> <txn:speciesConceptBasedOn
> >>> rdf:resource="
> http://lod.taxonconcept.org/ontology/txn.owl#ObjectiveSpeciesModel"/>
> >>>
> >>> The first is what I call the #ObjectiveSpeciesModel - this indicates
> that
> >>> it is a species concept because we say it is.
> >>> All the species concepts are at least an #ObjectiveSpeciesModel
> >>> *This is in part a way to handle things like the domestic cat which you
> >>> want to be seen as different from the African Wildcat.
> >>> There are also tags for
> >>> txn:PhylogeneticSpeciesModel
> >>> txn:BiologicalSpeciesModel
> >>> For now I don't have these other models set in the example data, but
> >>> fields are in the database and the code for that an editor could state
> the
> >>> basis for the model.
> >>> I can think of a couple of different ways to handle the issue of
> >>> alternative species concepts.
> >>> * Note that the identifications as proposed by DarwinCore don't seem to
> >>> indicate what kind of model the identifications were based on.
> >>>   So it is not clear to me if a straight DarwinCore data set would
> allow
> >>> the analysis above.
> >>> Instead of having multiple different statements like
> >>> txn:occurrenceHasSpeciesConcept <> in the record for each occurrence
> >>> one could use different predicates to link to different kinds of
> species
> >>> concepts.
> >>> txn:occurrenceHasUniprotConcept =>
> >>> <http://purl.uniprot.org/taxonomy/9696>
> >>> This would allow someone to query for the occurrences
> >>> of <http://purl.uniprot.org/taxonomy/9696>
> >>> That said, it is not clear to me what people mean by different
> >>> identifications.
> >>> Is the intent to have identifications with different homotypic synonyms
> >>> to be an identification of the same thing or not?
> >>> The way it works now in many data sets is that Felis concolor, Puma
> >>> concolor and Puma conncolor are treated as identifications of different
> >>> things.
> >>> This is another way of saying is the namestring the concept?
> >>> My understanding of the eBird project is that it allows citizen
> >>> scientists to contribute their own observations. This creates a much
> larger
> >>> data set for analysis etc.
> >>> They have a created a curated list of species and a ~6 letter code for
> >>> each. This serves as a guide for observers on how to encode their
> >>> observations.
> >>> I think their progress would be inhibited, the occurrence coding
> >>> inconsistant, and contributors frustrated, if they have a list that
> included
> >>> many overlapping species concepts.
> >>> Thanks again for you comments,
> >>> - Pete
> >>>
> >>> On Fri, May 13, 2011 at 3:05 AM, Nico Franz <nico.franz at upr.edu>
> wrote:
> >>>>
> >>>> Hello Pete (et al.):
> >>>>
> >>>>    For bird, Town Peterson at KU and colleagues have published these
> >>>> papers showing how alternative bird taxonomies affect the ranking of
> >>>> conservation priorities.
> >>>>
> >>>>
> >>>>
> http://specify5.specifysoftware.org/Informatics/bios/biostownpeterson/PN_CB_1999.pdf
> >>>>
> >>>>
> http://specify5.specifysoftware.org/Informatics/bios/biostownpeterson/NP_BN_2004.pdf
> >>>>
> >>>>
> http://specify5.specifysoftware.org/Informatics/bios/biostownpeterson/P_BCI_2006.pdf
> >>>>
> >>>>    Here's the abstract of the 1999 paper:
> >>>>
> >>>> Analysis of geographic concentrations of endemic taxa is often used to
> >>>> determine priorities for conservation
> >>>> action; nevertheless, assumptions inherent in the taxonomic authority
> >>>> list used as the basis for
> >>>> analysis are not always considered. We analyzed foci of avian endemism
> >>>> in Mexico under two alternate species
> >>>> concepts. Under the biological species concept, 101 bird species are
> >>>> endemic to Mexico and are concentrated
> >>>> in the mountains of the western and southern portions of the country.
> >>>> Under the phylogenetic species
> >>>> concept, however, total endemic species rises to 249, which are
> >>>> concentrated in the mountains and lowlands
> >>>> of western Mexico. Twenty-four narrow endemic biological species are
> >>>> concentrated on offshore islands, but
> >>>> 97 narrow endemic phylogenetic species show a concentration in the
> >>>> Transvolcanic Belt of the mainland and
> >>>> on several offshore islands. Our study demonstrates that conservation
> >>>> priorities based on concentrations of
> >>>> endemic taxa depend critically on the particular taxonomic authority
> >>>> employed and that biodiversity evaluations
> >>>> need to be developed in collaboration or consultation with practicing
> >>>> systematic specialists.
> >>>>
> >>>>    There was a debate recently on Taxacom that was started and
> >>>> subsequently neatly summarized by Fabian Haas. The topic was "let's
> >>>> summarize reasons why 'donors' seem to not fund taxonomy". One point
> from
> >>>> the summary was this:
> >>>>
> >>>> 3) Taxonomy is over-accurate for most applications
> >>>>
> >>>> Most (not all) decisions in e.g. modelling and conservation are done
> and
> >>>> can be done without complete knowledge of taxa. As it is, decisions
> for
> >>>> conservation areas are often based on flagship species (e.g.
> elephants), on
> >>>> taxa which have an excellent research background, e.g. birds (IBAs),
> on
> >>>> availability of land (e.g. land with a high Tsetse burden), importance
> as
> >>>> corridor and other factors, but never on a complete view on an all
> >>>> biodiversity in a specific area. Even if an inventory existed, it
> would be
> >>>> an illusion that we could collect data on ecological requirements and
> >>>> population dynamics for most of the species necessary for informed
> >>>> decisions. A complete inventory does not seem to provide an advantage
> for
> >>>> conservation.
> >>>>
> >>>>    I personally think there's some truth to that. I also think that,
> >>>> while it's understandable that an accurate representation of the
> (sometimes)
> >>>> fleetingness of taxonomic consensus it not a priority for applied
> ecological
> >>>> projects, if taxonomists themselves don't find better ways to document
> and
> >>>> link these alternatives perspectives, then it's not the best science
> we can
> >>>> do. That would be fine too if adopted outright as a pragmatic stance.
> >>>>
> >>>> Regards,
> >>>>
> >>>> Nico
> >>>>
> >>>>
> >>>> On 5/13/2011 1:08 AM, Peter DeVries wrote:
> >>>>
> >>>> I thought that I would also mention that in addition to The Plants
> List,
> >>>> the eBird project also uses on overlapping concepts in its bird list
> (it
> >>>> does have concepts for common hybrids)
> >>>> What is clear to me is that you cannot create graphs like these if
> every
> >>>> observation can have X number of species (especially those that
> overlapping
> >>>> ) without any indication which is is the most appropriate one.
> >>>> eBird Occurrence Maps Northern Cardinal
> >>>>
> http://ebird.org/content/ebird/about/occurrence-maps/northern-cardinal
> >>>>
> >>>> NCBI is also similar.
> >>>> Perhaps a member of the consensus committee can comment?
> >>>> -- Pete
> >>>>
> >>>>
> ------------------------------------------------------------------------------------
> >>>> Pete DeVries
> >>>> Department of Entomology
> >>>> University of Wisconsin - Madison
> >>>> 445 Russell Laboratories
> >>>> 1630 Linden Drive
> >>>> Madison, WI 53706
> >>>> Email: pdevries at wisc.edu
> >>>> TaxonConcept  &  GeoSpecies Knowledge Bases
> >>>> A Semantic Web, Linked Open Data  Project
> >>>>
> >>>>
> --------------------------------------------------------------------------------------
> >>>>
> >>>> _______________________________________________
> >>>> tdwg-content mailing list
> >>>> tdwg-content at lists.tdwg.org
> >>>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> tdwg-content mailing list
> >>>> tdwg-content at lists.tdwg.org
> >>>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>>
> ------------------------------------------------------------------------------------
> >>> Pete DeVries
> >>> Department of Entomology
> >>> University of Wisconsin - Madison
> >>> 445 Russell Laboratories
> >>> 1630 Linden Drive
> >>> Madison, WI 53706
> >>> Email: pdevries at wisc.edu
> >>> TaxonConcept  &  GeoSpecies Knowledge Bases
> >>> A Semantic Web, Linked Open Data  Project
> >>>
> >>>
> --------------------------------------------------------------------------------------
> >>>
> >>> _______________________________________________
> >>> tdwg-content mailing list
> >>> tdwg-content at lists.tdwg.org
> >>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
> >>>
> >>
> >
> >
> >
> > --
> >
> ------------------------------------------------------------------------------------
> > Pete DeVries
> > Department of Entomology
> > University of Wisconsin - Madison
> > 445 Russell Laboratories
> > 1630 Linden Drive
> > Madison, WI 53706
> > Email: pdevries at wisc.edu
> > TaxonConcept  &  GeoSpecies Knowledge Bases
> > A Semantic Web, Linked Open Data  Project
> >
> --------------------------------------------------------------------------------------
> >
> > _______________________________________________
> > tdwg-content mailing list
> > tdwg-content at lists.tdwg.org
> > http://lists.tdwg.org/mailman/listinfo/tdwg-content
> >
> >
>
>
>
> --
> _________________
> Jim Croft ~ jim.croft at gmail.com ~ +61-2-62509499 ~ http://about.me/jrc
> 'A civilized society is one which tolerates eccentricity to the point
> of doubtful sanity.'
>  - Robert Frost, poet (1874-1963)
>
> Please send URLs, not attachments:
> http://www.gnu.org/philosophy/no-word-attachments.html
>



-- 
------------------------------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
Email: pdevries at wisc.edu
TaxonConcept <http://www.taxonconcept.org/>  &
GeoSpecies<http://about.geospecies.org/> Knowledge
Bases
A Semantic Web, Linked Open Data <http://linkeddata.org/>  Project
--------------------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20110517/fdad96d3/attachment.html 


More information about the tdwg-content mailing list