My initial thought is that a FAN-type system is probably intended for the uncommon/un-handled case where a thrid party needs to add annotations to someone elses LSIDs. I would imagine that in most cases linkages between data stored at different locations/authorities would be specified in the RDF of each LSID. Eg one LSID metadata might have a field that indicated it was a "correctedSpellingOf" another LSID.
Kevin
Steven Perry smperry@ku.edu 05/19/06 4:14 PM >>>
I've been thinking about annotation, attribution, and LSID Foreign Authority Notification and wanted to continue the discussions from last month.
While I like the idea of notification of annotations, I'm not a fan of FAN. In the first paragraph of the FAN proposal it states "An example authority might return metadata service endpoints that contain metadata (for example annotations) about LSIDs for which it is not the authority".
I'll explore this idea using two example LSID authorities, A and B. Assume that A has published some metadata identified by an LSID it assigned (A is the original source) and that B wants to annotate it. B could be doing so for one of three reasons: to suggest a revision to A's
metadata (correction), to propose additional information that supplements A's metadata (annotation), or to intentionally confuse clients and disrupt the system (spam).
A close reading of the FAN proposal makes me think that, according to the authors, B will create assertions directly about A's metadata (meaning B will use A's LSIDs as the subject of assertions). FAN provides a method for B to notify A that it has done so and suggests a method for telling clients of A's resolver to directly contact B (using a modified form of resolution) to get additional information about A's LSID. This leads me to believe that the authors think that a single metadata object could be split between multiple authorities.
Because of this I see three problems with FAN. It increases the burden on LSID resolution clients and perhaps on the LSID resolution system as a whole, it makes attribution of assertions about an LSIDs indirect, and
it makes it easier to spam the network.
FAN-aware resolution is more intensive than normal resolution and requires the following process: The client knows an LSID. It performs normal resolution by querying DNS for the service record of the authority, contacting the authority to find the correct endpoint, and calling getMetadata() on that endpoint.. Next it must parse this metadata and look for the special predicate which indicates that a foreign authority contains more information about an LSID. Since the client does not know if it has retrieved the entire metadata object (some pieces may exist on other authorities), it must perform a modified
getMetadata() call for each foreign authority. To do so, the proposal suggests that the client again queries DNS for the foreign authority (because only the authority name is returned in the first resolution), gets its endpoint, and calls getMetadata() with the original LSID (though it was assigned by a different authority). The implied final step is that the metadata from the foreign authority is merged with the metadata from the authoritative source before being handed back to calling code.
The assumption that a metadata object *could* be split across multiple authorities forces the client to query all foreign authorities, even if those authorities only contain annotations that the client is not interested in. To look at an example case of names and specimens, it might be that we want notification of usage of names by specimens so we can immediately retrieve all specimens for a name. Assume A issues a name and B issues a specimen. Under FAN, the B authority could add an assertion to their data store that says [urn:lsid:A:names:1 hasSpecimen urn:lsid:B:specimens:3]. I consider this an annotation (rather than a revision, modification, or correction) because the fact that B linked a specimen to a name does not
change the semantic meaning of the name object (it merely provides additional information). If you're interested in just the name and you don't know by resolving it against A that you've got the complete set of
name metadata, you're forced to get a bunch of information about specimens that you're not interested in (and there could be a great deal
of this kind of information).
To complicate matters, C might want to propose a correction to A. C might believe that A misspelled the name. Consider this simple example:
A's original assertion: urn:lsid:A:names:1 fullScientificName "Salmo saler, Linnaeus, 1758"
C's proposed modification: urn:lsid:A:names:1 fullScientificName "Salmo salar, Linnaeus, 1758"
Under FAN, C would propose its modification and notify A that it did so. When a FAN-aware client goes to resolve A:names:1, it will get back
some data that changes the meaning of A's original object (C's assertion) and a lot of data that's not directly about it (all the specimen assertions). What's worse, it now has both of these assertions
in the same model:
urn:lsid:A:names:1 fullScientificName "Salmo saler, Linnaeus, 1758" urn:lsid:A:names:1 fullScientificName "Salmo salar, Linnaeus, 1758"
If the names class is defined with fullScientificName as a functional property (can have only one assertion about the property on an instance)
we're in trouble and we no longer have a valid OWL instance. So the FAN
merge process has to know both that this is a problem, by understanding the ontology for names, and how to fix it, by deciding whether to trust C or not and select one or the other assertion but not both.
How does the client decide to trust or distrust C?. The FAN proposal suggests that the client gets a semantic description of C and uses that to make the decision. I think it more likely that the code that calls the client on behalf of the user should decide whether or not to trust C.
So to fix all this we could propose that the FAN-aware client has to be triply smart. In addition to having to know how to deal with foreign authorities it should provide 1.) some mechanism for getting information
about a foreign authority and deciding whether to trust it or not, 2.) some mechanism for conditionally merging the foreign metadata it cares about (proposed modifications for example) while discarding other information (specimens), and 3.) the ability to know whether or not its merge process violates the integrity of the metadata. This seems to be quite a burden. I think it will add not only complexity to LSID resolution, but also slow it down tremendously. LSID resolution is a low-level data identification and access mechanism. Since so many other
services will build upon it, it ought to be fast.
That brings up the idea of attribution. In FAN, the attribution of an assertion is not directly tied to the authority of the LSID that acts as
the subject of the assertion. Instead, since foreign authorities are allowed to make statements directly about an LSID issued by someone else, the resolution client has to deduce attribution of each assertion during the merge process (using the foreign authority predicate as evidence). After the merge process is complete, downstream processes will have lost this attribution information.
Finally, FAN opens the door to spammers because attribution is not directly tied to the authority name. If I trust A and I resolve the name id above I could get back information from C. It could easily be the case that I don't trust C (because I believe it to be a spammer) but
that my LSID resolution client doesn't know this fact. The end user of the client may be using a FAN-enabled resolver or not and never know. The FAN proposal suggests that clients be notified, but what is your average piece of software that calls a resolution client on behalf of a user to do with this information?
I propose we take an alternative approach. I like the notification system in FAN, but I dislike the assumption about distributed objects and indirect attribution. I would rather we agree on an annotation and linking system that holds the following to be true:
1.) An authority is never allowed to make an assertion whose subject is
an LSID that it is not authoritative for. 2.) LSIDs authoritatively attribute an assertion to the organization that owns the fully qualified authority domain (this includes sub domains to support the case of hosted authority services) 3.) Objects are not distributed across authorities but instead link to other first-class objects that may exist within other authorities. 4.) Annotations, corrections, etc. are valuable, and as such should be first class objects that are attributed to their issuers by the normal process of 2 above. This is annotation as linking; the subject of the annotation is the LSID of the annotation, it is attributed to the authority of the issuer, and an agreed upon predicate is used to define the object that acts as domain of an annotation (what it applies to). This guarantees correct attribution of annotations throughout the network. 5.) We cannot stop spammers from publishing malicious assertions but we
can validate a given assertion acquired from some source external to the
LSID system by resolving the LSID and checking the resulting metadata against the assertion. Spammers will find it difficult to circumvent this method because it requires hijacking the DNS system. If you trust that a source gave you a complete and full copy of an object you don't have to resolve its LSID. 6.) We will often have to use systems that are external to LSID in order to find the data/metadata we're interested in (at the very least to identify LSIDs to resolve). These other systems have to understand how to attribute assertions to their owners and can do so using the rules above without relying on metadata about a foreign authority. 7.) We don't have to assume (as FAN does) that an authority is responsible for providing all information in the universe that relates in any way to its LSIDs 8.) If you get back invalid metadata from an authority (not incorrect, but metadata that, for example, violates a functional property restriction), you automatically know who is at fault (the organization that owns the authority name). 9.) A FAN style notification system could still be used for logging and tracking the use of one's data or for proposing corrections to and issuer who might want to approve and incorporate those suggestions into their published data.
So we might use a FAN-like system, but only for foreign annotation notification, not foreign authority notification.
Any comments? Have I misunderstood the assumptions behind FAN?
-Steve
_______________________________________________ TDWG-GUID mailing list TDWG-GUID@mailman.nhm.ku.edu http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ WARNING: This email and any attachments may be confidential and/or privileged. They are intended for the addressee only and are not to be read, used, copied or disseminated by anyone receiving them in error. If you are not the intended recipient, please notify the sender by return email and delete this message and any attachments.
The views expressed in this email are those of the sender and do not necessarily reflect the official views of Landcare Research.
Landcare Research http://www.landcareresearch.co.nz ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++