Umm...there is a distinguishable class of data consumers, namely <span style="font-style: italic;">applications</span>, and so a distinguishable constituency whose burden is relevant, namely <span style="font-style: italic;">

application writers.</span> Some applications may well be motivated to query providers directly for a number of reasons, including:<br><ul><li>the data indexers currency policies may be unsuitable</li><li>the

data indexers may aggregate in undesirable ways [the present model

seems to be that indexer==portal, but I doubt that is general]

</li><li>the data indexers may index too promiscuously or not

promiscuously enough for the application's taste [this might be a

non-issue if there were a way for a machine to understand what exactly

the indexing strategy is and perhaps how to induce the indexer to alter

it, but that sounds hard]

</li><li>portals, and maybe indexers---indeed, <span style="font-style: italic;">any</span>

processor of the data---can intentionally or inadvertantly hide

assumptions about how the data will be used, making it unsuited for

uses that don't meet these assumptions. Put another way, it is probably

difficult to insure that a machine-enforceable contract is possible

between aggregators and applications that assures the application that

records obtained from the aggregator or identical to those available

from the provider. I think it is even a deep problem to have&nbsp;

machine-understandable &quot;fitness for use&quot; metadata that would allow a

machine to understand what fitness contract the aggregator is actually

offering.

<br></li></ul>In general it should never be <span style="font-style: italic;">harder</span>

to query providers than aggregators, especially if it is difficult for

a machine to understand what, if any, point of view the aggregator has

imposed on the view they offer of the aggregated data.

<br><br>People are no doubt tired of hearing this from me, but my

position is always that modeling data consumers as humans is

dangerously constricting. Humans are too smart and readily deal with

lots of violations of the principle of least amazement, whereas

machines don't. In point of fact, except for those on paper, stone,

clay tablets and the like, there is no such thing as a database

accessed by a human. They all have software between the human and the

data provision service.&nbsp; From this I conclude that in your trinity

below, reduction of the burden on humans actually falls to the

applications, and so&nbsp; I think TAGs&nbsp; requirement is to reduce the burden

on application writers&nbsp; (including those of TDWG itself, but also all

others in the world) in <span style="font-style: italic;">their</span> quest to reduce the

burden on human data consumers. My intuition is that this will lead to

a different analysis than thinking about humans as consumers, but at

the moment I have no specific examples to offer.

<br><br>A little more is interspersed below.<br><br><br><div style="direction: ltr;"><span class="q"><span class="gmail_quote">On 3/1/06, <b class="gmail_sendername">Roger Hyam</b> &lt;<a href="mailto:roger@tdwg.org" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">

roger@tdwg.org</a>&gt; wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div style="direction: ltr;">

<pre>This is a little more of a controversial question that has been suggested:<br><br>&quot;Why should data providers supply search and query services?&quot;<br></pre>

<ul><li>We have many potential data providers (potentially every

collection and institution).</li><li>We have many potential data consumers (potentially every

researcher with a laptop).</li><li>We have a few potential data indexers (GBIF, ORBIS , etc + others

to come).</li></ul>

<pre>The implementation burden should therefore be:<br></pre>

<ul><li>Light for the providers - who's role is to conserve data and

physical objects.</li><li>Light for the consumer - who's role is to do research not mess

with data handling.<br>

  </li><li>Heavy for the indexers - who's core business is making the data

accessible.</li></ul>

Data providers should give the objects they curate GUIDs. This is

important because it stamps their ownership (and responsibility) on

that piece of data. They then need to run an LSID service that serves

the

(meta)data for the objects they own. <b>There work should stop at this

point!</b>

They should not have to implement search and query services. They

should not anticipate what people will require by way of data access -

that is a separate function.<br>

<br>

Data consumers should be able to access indexing services that pool

information from multiple data providers. They should not have to run

federated queries across multiple data providers or have to discover

providers as this is complex and

difficult (though they may want to browse round data providers like

they would browse links on web pages). Once they have retrieved the

GUIDs of the objects they are interested

in from the indexers they may want to call the data providers for more

detailed information.<br>

<br>

Data indexers should crawl the data exposed by the providers and index

them in thematic ways. e.g. provide geographic or taxon focused

services. This is a complex job as it involves doing clever, innovative

things with data and optimization of searches etc.<br>

<br>

Currently we are trying to make every data provider support searching

and querying when the consumers aren't really interested in querying or

searching individual providers - they want to search thematically

across

providers.</div></blockquote></span></div><div style="direction: ltr;"><div><br>Restated,

this sentence may fall in my class of questions forbidden to software

architects, namely&nbsp; that class of questions that begin with the words

&quot;Why would anybody ever want to ...&quot; <br></div></div><div style="direction: ltr;"><span class="q"><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div style="direction: ltr;">

If a big data provider wants to provide search and query then

they can set themselves up as both a provider and an

indexer - which is more or less what everyone is forced to do now - but

the functions are separate.<br>

<br>

Data providers would have to implement a little more than just an LSID

resolver services for this to work. They would need to provide a single

web service

method (URL call) that allowed indexers to get lists of LSIDs they hold

that have had their (meta)data modified since a certain date but this

would be a relatively simple thing compared with providing arbitrary

query facilities.<br>

<br>

I believe (though I haven't done a thorough analysis of log data ) that

this is more or less the situation now. Data providers implement

complete DiGIR or BioCASE protocols but are only queried in a limited

way by portal engines. Consumers go directly to portals for their data

discovery. So why implement full search and query at the data provider

nodes of the network (possibly the hardest thing we have to do) when it

may not be used?<br>

<br>

This may be controversial. What do you think?</div></blockquote></span></div><div><br><br>I'm

not sure about controversial, but I am pretty sure that what you are

pointing at is a warehouse model. I don't know if I am&nbsp; prepared to

agree that&nbsp; all possible present and future concerns&nbsp; of TDWG&nbsp; can be

answered by data warehouses.&nbsp; In particular, if you analyse log data of

a warehouse, it won't be too surprising if the conclusion is that users

are behaving as though they mainly need a warehouse. [To data consumers

a warehouse and a portal are indistinguishable. I think.]

<br><br>Bob Morris<br></div><br><div style="direction: ltr;"><span class="q">

Roger<br>

<br>

<pre cols="72">-- <br><br>-------------------------------------<br> Roger Hyam<br> Technical Architect<br> Taxonomic Databases Working Group<br>-------------------------------------<br> <a href="http://www.tdwg.org/" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">

<br>http://www.tdwg.org</a><br> <a href="mailto:roger@tdwg.org" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">roger@tdwg.org</a><br> +44 1578 722782<br>-------------------------------------<br></pre>

</span></div><div style="direction: ltr;">

</div><br>______________________________<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">_________________<br>Tdwg-tag mailing list<br><a href="mailto:Tdwg-tag@lists.tdwg.org" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">

Tdwg-tag@lists.tdwg.org</a><br><a href="http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">

http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org</a><br></blockquote><div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><br>

<br></blockquote></div><br>