Hi Javier,
Thanks for your points. My responses below. Including a bit about GML that should really be in a different thread!
Javier de la Torre wrote:
Hi,
There are many reason why I think data providers should be more capable than just simple interfaces to indexers. Some of them have been already pointed by Markus and Bob, but I would like to use to a non-technical reason and very related to what TDWG is good for.
While making the biological collection databases available for GBIF indexing I think we are also helping them in their daily work. Some databases are setting up their own web interface based on BioCASe, there are projects to help them geoloreference their specimens based on the provider software, they have the possibility to export their collection database and import it into another collection management software, and many other useful possibilities that will hopefully appear in the next years. These solutions are possible because the software installed on their servers is capable of doing searches and queries. So here my argument is that: by setting up a query level and installing a capable software on the providers directly we are improving these databases.
But surely these people can search their own database already. If you are providing a cheap and easy web interface for them then that is a tangible benefit but could equally be done centrally with a branding for the institution. It doesn't physically have to reside with them and be maintained by them - though that may be the best for many organisations.
I have used this argument for a while already when convincing data providers: by joining GBIF they are not just only making their data available to the community but that they will also benefit of the tools that are appearing for them based on TDWG standards. I think it is a good deal, make your data available and we will help you to improve it with standard tools from the community for no cost.
This is just as available if the indexing is outsourced from the data owner I believe but is difficult to discuss abstractly here.
There is also many people who do not want to share their data, specially researches, and that can also benefit from our software and standards without having to participate in any network. If we create good and useful software they might consider using it to handle their data and at some point maybe open it to the public.
That might be a really nice side effect of our activities but as it is based on serendipity (or good karma perhaps!) is not something that we can plan for.
This is also somehow related to what I call the OAI "model" versus the OGC "model". The OAI is helping and promoting the accessibility to data in distributed databases while the OGC is an organism just promoting the interoperability of applications. While the OAI is focus on making the accessibility to the data as good as possible (to set up value-added services on top of cache databases) the OGC community is working on making software interoperable, extensible and open to new uses that they might not know now.
I am not sure that either approach works in isolation. If I want to define a GML application schema that contains a definition of a bridge in it, for example, I am not sure how I relate this to all the other people who have defined (or may define in the future) bridge-like things. How do I write an application that will 'understand' not only my bridge feature but also any other features out there that are bridge-like but that I am not aware of just now. I can see how we get interoperability if we all agree to use the same Application Schema and I can see that we can use the same software for multiple Application Schemas but we want *data *interoperability not just *software *interoperability
Here is an example: The British Ordnance Survey have their own GML Application Schema and it defines a "FerryLink" that extends from their own abstract feature type and back to GML feature. So I can treat it like a GML feature in an application - very useful - this is software interoperability. Trouble is I have no way of knowing that it is to do with ferries and water or anything useful about it unless I can read English - which machines don't. (Incidentally there is no documentation in the schema so I can't retrieve it and display it to the user automatically either). Presumably another mapping agency is also encoding ferry links. In fact the instances of ferry routes that are encoded using this schema may join the UK to countries that use different GML Application Schemas to define the *same *physical ferry links!
My understanding is that the GML model does not give me a way to discover this or express it once I know that there are two ways of talking about the same physical objects.
You can get the OS application schemas here:
http://www.ordnancesurvey.co.uk/oswebsite/xml/schema/index.html
My knowledge of OGC standards is limited but I may be wrong on this so stand to be corrected.
We are building a global system so we have to be able to reconcile different encodings of the same object types. GML does not solve this problem but might be useful in other ways. The OAI standards may be useful for finding stuff.
I want to think that GBIF is like the OAI and that instead of creating their own technology to achieve their goals is using TDWG. I tend to think of TDWG more like OGC in the other hand. So, GBIF is just one user of TDWG work.
So my vote goes for more sophisticated data providers that allow us to construct more things on top of them without having to consider GBIF at all. TAPIR looks fine to me for this task, even more if complemented with the TAPIR "Lite" idea for providers that just want to contribute to GBIF.
I am sure we need both fat and thin providers but I also think we need to define the roles played by different actors within the network more formally - which I think we will in the near future.
Some good points,
Roger
Best regards,
Javier.