Re: [tdwg-guid] Immutability data [SEC=UNCLASSIFIED]

17 Jul 2007

      Extending the specification might seem like the honorable thing to do, 
but who is going to implement it.  How are we going to build it into the 
  list of services that providers are already obliged to support if they 
wish to participate.  Why would we want to do it when we already have a 
robust protocol for delivering data using any format in TAPIR services?

The fact that LSID was a purpose built, off-the-shelf solution was a 
strong attractor in the initial selection process.  If now we find that 
it is just not suitable then we should probably re-evaluate that 
decision before we rush off to roll our own.  Especially now that we 
know that LSIDs are not the hook into the semantic web that we thought 
they might be.

How many of us believe in a database federation based on 
tdwgLSID.getSemanticallyEquivalentData(LSID, format) calls?

The important issues here are not ones of resolution - which has come as 
a consequence of choosing LSID and duplicates what we have elsewhere - 
but in delivering on our requirement to establish provenance, manage 
uniqueness and support persistence. We will find LSIDs *already* 
embedded in tdwgFORMAT records and expect to believe that they represent 
globally unique keys for these data objects and their relationships.

Isn't getSemanticallyEquivalentData(LSID, format) already handled by
http://tapir_provider/?op=search&model=formatSpec&LSID=...
along with op=metadata, op=capabilities, op=inventory, etc?

greg

Ricardo Pereira wrote:
...
Guys,
It looks like we are converging towards extending the LSID 
specification to add a new method that returns semantically equivalent 
data.
If we all agree, I'll write up a specification for this new method 
and add it to the TDWG LSID Applicability Statement.
Also, if you agree, I'd like to move on to discuss the next issue.
What do you think?
Cheers,
Ricardo
Dave Vieglais wrote:
...
Returning XML data in the getMetadata() operation is probably ok (not 
illegal, perhaps borderline), but what about those other data types 
that may have different byte streams but still identical content?  
Expressing those in the getMetadata() operation may be unwieldy.  
Would the data be returned as an attachment to getMetadata()?  Always 
returning the data in the getMetadata operation may also be 
inefficient (toss up between the number of calls to a service and the 
volume of data returned for each call).
The simplest and cleanest mechanism to do this seems to be through a 
new method.  The signature might be like the getMetadata() operation:
bytes getSemanticallyEquivalentData(LSID lsid, string[] accepted_formats)
where accepted_formats is an optional parameter specifying a list of 
acceptable MIME type of the data (in order of preference).  A list of 
MIME types supported by the service (may only be one) can be expressed 
in the metadata.
The new method can be defined simply by extending the WSDL document 
that describes the data retrieval services.
On Jul 17, 2007, at 05:50, P. Bryan Heidorn wrote:
...
I do not know if I stated my position on the issue of getData() 
immutability. There is an installed base of application "expecting" 
that data returned by getData() will always have the same bit 
pattern. Because of that and the existing definition of getData() in 
the LSID spec we should not mess with that contract. That leaves two 
options for "semantically immutable" data. Either call it metadata 
and return it in getMetaData() or I would prefer an extension to the 
LSID spec to allow a new method getMimeData() or getflexData() we can 
argue for a long time about the name but this method can validly 
return XML, RDF or other data types that many have semantically 
equivalent representations with different byte orders. With this 
solution we would not need to support illegal activity.
On Jul 16, 2007, at 12:10 PM, Bob Morris wrote:
...
My last escaped prematurely. I meant:
There is no way that an application that passes an LSID to another
application can know that the second program will abide by some
non-standard TDWG-defined contract about something called an LSID. Any
program that passes a uri beginning with urn:lsid with an implicit or
explicit request for a getData() call cannot be assured of anything
about the chain of custody except what is in the LSID spec.
I wholly \agree/ with the need to have semantically persistent
services, together with agreed upon, named, algorithms which establish
the identity of two data streams for that purpose. What I don't agree
with is calling the hook urn:lsid and the method getData()
Since the infrastructure at TDWG and elsewhere is in place for LSID, I
think I would address this issue not by defining a new standard that
is a clone of LSID except with a different definition of getData(),
but rather think about whether there can be stuff in the getMetadata()
calls and returns that permit an assertion by the callee that some bit
of stuff has been provided under the semantic persistence contract.
Yes, this will lead to needing a call to getMetadata() for stuff that
some people insist is data (and also insist there is a difference).
This is the cost of doing robust business. Yes, some people will write
non-compliant getData() services. Yes, applications that deal with
those will sometimes break. As Bruce Stein said in a breakout group
last week in the Observation Modeling workshop: "You can't legislate
against illegal activity."
On 7/16/07, Bob Morris <morris.bob@gmail.com> wrote:
...
There is no way to guarantee that a particular application which
passes an LSID to another application can expect anything other
...
I am not sure if I follow completely Bob but I think you are
...
out an important issue for "semantics immutability" versus 
"byte/bit-
level immunity". If a client retrieves data from two different
clients under a byte-level immutability contract a simple 
equivalence
test should be able to verify the byte-level equivalence. Under the
semantic immutability contract, a more complex text for equivalence
would be required to fit for example the mime-type.
In practice I do not think this is an issue. If clients act under
blind faith under either contract they would not text the
equivalence. In fact they would usually only retrieve a particular
LSID from one service. The blind faith client would process the data
as if the data provider is following the contract and no more. The
client could not assume byte-level immutability when there is only
semantic immutability because it may indeed break the client code.
Caching a byte-level representation of data from one call can not be
compared with semantic data. If XML is carried in the data all
operations must be consistent with XML operations. I do not see this
as a problem.
Since in the biodiversity community LSID data payloads would be 
about
a large variety of objects, clients would always need to check the
data types before most processing operations. The data type
information would be encoded in the metadata but could also be
segregated by service provider (but even there for good form the
metadata should encode the data type.) The metadata needs to encode
both the physical layout of the bits and "use" (there must be a
better word). For example, the data could be a Darwin core 
records, a
dublin core records or SDD. All are XML but the legal operations 
over
that XML are different depending on the "use". Some clients could
just pass the data through without be concerned about this but other
clients would need to process accordingly perhaps ignoring types it
knows nothing about.
------
Unrelated to Bob's comment I would like to add a point about digital
from birth vs made digital data.
What is data and what is metadata has no relation to being 
digital or
not. There was data and metadata long before there were computers.
Galileo studying the time of objects to move down an inclined plane
collected data, the time, distance, angle and mass of the 
objects. At
least the time and the distance recorded in his notebooks are data.
If we re-represent his data from the notebook in digital format in
2007 so we can process it in an excel spreadsheet it is still the
same data. If we just take a photo of the book we might have a
different beast but as long as we leave his number as numbers it is
the same data. The metadata about inclined plane experiment would
include information about the apparatus used. For example he might
have bells that ring at different locations/distances of the 
inclined
plane., it might be made of a wooden frame with brass rails. All
...
metadata tells us about the data, it is data about the data. Similar
arguments can be made about specimens. A digital representation of a
specimen is still data. No one is arguing that the specimen is a
species or a species concept. A specimen glued to paper or in a
...
can be assigned to a species concept, meaning someone has said this
is an X. As such we can treat it as an exemplar of X. If it is a 
type
we can even say it is a very good example of X but it does not cover
the entire concept of X. The image of the specimen can be data. We
need not treat it as metadata just because it is digital or because
there is an object or event in the world that is now primary
representation. Galileo's numbers also existing in the notebook do
not make the numbers in the computer any less data. We will want to
add metadata to the digital numbers to tell the user that they came
from Galileo's notebook.
Bryan
--
--------------------------------------------------------------------
   P. Bryan Heidorn
   Graduate School of Library and Information Science
   University of Illinois at Urbana-Champaign
   pheidorn@uiuc.edu
   (V)217/ 244-7792     (F)217/ 244-3302
   http://www.uiuc.edu/goto/heidorn
   Online Calendar: http://www.uiuc.edu/goto/heidorncalendar
On Jul 16, 2007, at 9:01 AM, Bob Morris wrote:
> On 7/16/07, Ricardo Pereira <ricardo@tdwg.org> wrote:
>>
> One thing that is wrong with it is that if a conforming client
> acquires the data with a getData call from two different
...
> they return different byte strings, then the client is
On 7/16/07, P. Bryan Heidorn <pheidorn@uiuc.edu> wrote:
pointing
this
photo
sources, and
permitted to
...
> signal an error and possibly break an application that exercises a
> blind faith in the power of "semantic immutability".
>
>
>>  b) Some may claim that caching of LSIDs and the associated data
>> would be
>> impossible. But since the data is always "semantically 
immutable",
>> what's
>> wrong with caching it?
>>
>
> --
> Robert A. Morris
> Professor of Computer Science
> UMASS-Boston
> ram@cs.umb.edu
> http://bdei.cs.umb.edu/
> http://www.cs.umb.edu/~ram
> http://www.cs.umb.edu/~ram/calendar.html
> phone (+1)617 287 6466
> _______________________________________________
> tdwg-guid mailing list
> tdwg-guid@lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-guid
-- 
Robert A. Morris
Professor of Computer Science
UMASS-Boston
ram@cs.umb.edu
http://bdei.cs.umb.edu/
http://www.cs.umb.edu/~ram
http://www.cs.umb.edu/~ram/calendar.html
phone (+1)617 287 6466
--Robert A. Morris
Professor of Computer Science
UMASS-Boston
ram@cs.umb.edu
http://bdei.cs.umb.edu/
http://www.cs.umb.edu/~ram
http://www.cs.umb.edu/~ram/calendar.html
phone (+1)617 287 6466
_______________________________________________
tdwg-guid mailing list
tdwg-guid@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-guid
_______________________________________________
tdwg-guid mailing list
tdwg-guid@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-guid
_______________________________________________
tdwg-guid mailing list
tdwg-guid@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-guid
-- 

Australian Centre for Plant BIodiversity Research<------------------+
National            greg whitBread             voice: +61 2 62509 482
Botanic Integrated Botanical Information System  fax: +61 2 62509 599
Gardens                      S........ I.T. happens.. ghw@anbg.gov.au
+----------------------------------------->GPO Box 1777 Canberra 2601

------If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
------

Re: [tdwg-guid] Immutability data [SEC=UNCLASSIFIED]

Greg Whitbread