Re: [tdwg-content] GUIDs for publications (usages and names) [SEC=UNCLASSIFIED]

6 Jan 2011

      I should clarify that I used rdfs:seeAlso specifically because (as I 
understand it) it is one of the few RDF predicates where it is 
acceptable for the object to not be another RDF resource.  See 
http://www.w3.org/TR/rdf-schema/#ch_seealso and various other 
discussions on the web as to why seeAlso was defined this way.  Thus, it 
isn't clear to me that creating a "hasPDF" predicate is a good idea 
because it might send an uninformed client on a wild-goose chase for 
non-existent RDF.  I believe that the idea behind seeAlso was that 
clients were on their own in figuring out the meaning of what they get 
as the result of following the seeAlso link. 
Steve

Peter DeVries wrote:
...
I guess the question is do you want to use a generic seeAlso which 
most crawlers follow, vs some more specific predicate that says "here 
is the PDF"
My reluctance was more about minting my own vs. finding some other 
vocabulary which has a similar predicate.
With the *hasPDF* predicate it would be pretty easy to query for all 
species concepts that have a linked original description PDF etc.
I suspect that some standard predicate will eventually become accepted 
since it is very useful to have something more specific than 
foaf:Document.
Respectively,
- Pete
On Wed, Jan 5, 2011 at 6:54 PM, Paul Murray <pmurray@anbg.gov.au 
<mailto:pmurray@anbg.gov.au>> wrote:
On 06/01/2011, at 7:48 AM, Peter DeVries wrote:
...
Also, although I like a lot of what Steve says, I think that most
    existing crawlers expect that a seeAlso link is to some html,
    xml, rdf type thing and will
    not be able to handle a multi-megabyte PDF.
This is why I reluctantly minted the predicate "hasPDF"
Hmm. This is an issue with linkeddata: when you fetch a URI
    while crawling the semantic web, if it redirects, then it's an
    "other resource" and you get RDF. If not, then you are potentially
    pulling a multimegabyte "information resource" across the wire.
A solution is to use an HTTP "HEAD" request when you do the
    initial URI fetch. If it's an "other resource", the HEAD return
    will be a 303 and contain redirect that you want in the "Location"
    header, and that's all you need. If not, the 200 result will
    contain the content type and possibly even the size, which is what
    you need to know before you GET it.
So .. the problem that "hasPDF" is meant to address might be
    addressable by the crawlers just being a bit smarter about how
    they browse the semweb.
_______________________________________________
If you have received this transmission in error please notify us
    immediately by return e-mail and delete all copies. If this e-mail
    or any attachments have been sent to you in error, that error does
    not constitute waiver of any confidentiality, privilege or
    copyright in respect of information in the e-mail or attachments.
    Please consider the environment before printing this email.
-- 
---------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
TaxonConcept Knowledge Base <http://www.taxonconcept.org/> / 
GeoSpecies Knowledge Base <http://lod.geospecies.org/>
About the GeoSpecies Knowledge Base <http://about.geospecies.org/>
------------------------------------------------------------
-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707
http://bioimages.vanderbilt.edu

Re: [tdwg-content] GUIDs for publications (usages and names) [SEC=UNCLASSIFIED]

Steve Baskauf