Re: RDF query and inference in a distributed environment
Kevin Richards wrote:
Coming from an IT background rather than a taxonomic background, I have never understood the strong "ownership" of data that people/scientists have for their data. This seems rather short sited to me - people with these concerns must have some thought about how to maintain/expose/use "their" data in the long term future? I can understand their concerns, but there must be a solution, otherwise their data will be no-ones concern in 50 years time when it disappears from existence.
Philosophically, I agree with you 100%. As a scientist myself, I strongly believe that scientific data (ESPECIALLY from government-funded research) belongs in the public domain. But there are socio-political realities that must be dealt with -- and, as pointed out by Chuck, Patricia, and others, dealt with carefully and with sensitivity. It's really the main reason we're not further along than we currently are.
Another thought I had about data caching systems. Say you want to search the cached/centralised copy of the data (eg a GBIF cache). A list of results is returned, then you decide you want to view more details of one of the results, so you follow a link off to the associated data (this would theoretically be by using the GUID system we are discussing). This would result in viewing the details of the selected record at the location where the GUID resolves to - this would always be the same location as a GUID only resolves to a single location.
I'm confused. I thought that a GUID resolves to a single data record. The location of that data record seems to me to be an issue of resolution, not (neccessarily) an intrinsic component of the GUID<->data relationship per se. I don't understand why -- from a purely technological perspective -- a GUID must be resolved through a single designated resolution service, as opposed to any of a multitude of resolution services that conform to a standard protocol. The main issue would be the confidence/certainty that the same GUID would be resolved by any one of the services to the exact same data -- which comes back to the robustness/reliability of the automated synchronization protocols.
Is this correct, or would the intention here be to view the cached details of the selected record (which would require an separate ID for all the cached records)?
I guess it comes back to what the GUID is attached to (i.e., defined as representing). I view the GUID as a universally adopted surrogate representation of a specified collection of data (information). A more technically rigorous interpretation of a GUID would be as a data-record *instance* identifier. In the former case, the taxon-name data object "Centropyge boylei Pyle & Randall 2001 [etc...]" and the specimen data object "BPBM 35041 [etc...]" would each be representd by a single GUID, regardless of how many instances of the bit-equivalent data records existed on various servers around the world (e.g., my laptop computer, the Catalog of Fishes, GBIF, ITIS, FishBase, Species2000, etc.). In the latter case, each of these two data objects (the taxon name and the specimen) would have a different GUID for each database/server that a copy of the record for the data objects lived on.
Aloha, Rich
participants (1)
-
Richard Pyle