Re: RDF query and inference in a distributed environment

4 Jan 2006

      Hi Rich,

Richard Pyle wrote:
...
With hard drive sizes spiraling skyward, and $/GB ($/TB) spiraling
downward.... I'm wondering whether or not the "distributed" system that
serves us best might be "distributeded mirror copies", rather than
distributed complementary data.
Instead of mutually exclusive, both approaches you mention are
complementary. The "distributed complementary data" aproach is a
fundamental part of the infrastructure necessary to build the
"distributed mirror copies" you propose. That (the "distributed
complementary data" approach) is essentially what we have in place now
with DiGIR/BioCase/Tapir as the harvesting protocol and GBIF and other
institutions as harvesters. The only missing piece of software we need
to really have "an automated and robust synchronization protocol", in my
opinion is some kind of push mechanism to trigger updates in the caches.

    However, I think it is not very useful to try to standardize how the
distributed mirror copies should be built and organized. I also don't
think that having exact copies of all metadata available is very
helpful. Considering a RDF approach to metadata, it may not be feasible
or desirable to harvest every single triple there is. A (socially)
descentralized schema would work better in this case: data providers
would make available data they create and that is under their direct
custody and individual harvesters would be free to look at the metadata
being served and create their own caches, selectively harvesting only
the information that is relevant to the services they intend to provide.

    One important point in the GUID discussion is to make sure that IPR
is in the metadata and ownership is unanbiguous, regardless of whether
the harvester left that information intact or not. As long as the
aggregator leave a reference to the original record (using its GUID),
any of the GUID technologies under evaluation lets you get back to the
ownership information.

    Regards,

Ricardo

Re: RDF query and inference in a distributed environment

Ricardo Scachetti Pereira