[tdwg-ncd] RE: NCD toolkit

Mon Apr 23 11:16:58 CEST 2007

Hi Ruud,

Many thanks for providing the URL to the test site - don't worry about its looks, we are all aware that the functionality needs to be in place first.

For your questions below:

  - We decided to drop <PercentageDatabased> as being an unhelpful element. It is one that would need constant updating, which we thought would not happen. The URLs in the Related Materials Group could alert a user to the fact that a database exists and could then go direct to it. As to EAD, Barabara Mathé at the AMNH has prepared a mapping between EAD and NCD which you may find helpful once it reaches its final form.

  - Yes, this approach sounds good. I did try to put some dummy data into the test site, but I got thrown out with an SQL error. No matter, having an organisation and a person as a precursor to a collection description sounds perfectly reasonable and is similar to the approach taken by NoDIT.

  - Yes, the lists will be in English, but we could may get other European language equivalents from GEMET? One for use to discuss with Roger.

  - There may be more than one agent per site that will be creating and editing records. At my own place there may be a dozen or more authors.

Perhaps we should have an "open issues" page on the TDWG wiki, otherwise it is going to become difficult to keep track of queries, responses and decisions on emails?

All best wishes,
Neil

-----Original Message-----
From: Ruud Altenburg [mailto:ruud at eti.uva.nl] 
Sent: 20 April 2007 15:19
To: Neil Thomson
Cc: tdwg-ncd at lists.tdwg.org
Subject: Re: NCD toolkit

Hallo Neil,

I don't think I'm on the list your cc'ed, so if this bounces could  
you please forward my reply? I have combined both of your answers of  
today into one mail.

But before I get to that I have an important announcement: I have  
made our NLBIF metadatabase maintenance site available on a public  
server, so you and other NCD participants can have a look. Please  
note that my first concern has been the functionality, not its looks!  
This should easily be improved with style sheets but frankly I never  
heard complaints from our ETI people about its 'Spartan look' ; )

The system is available at http://www.nlbif.nl/ncd-preview/

It currently contains no data, but anyone can have a try at adding/ 
updating/deleting records. The login is test/test. A final note: I  
would not play too much with the "Send mailing" option, as this may  
seriously annoy people you have entered as test persons! :)

Then the Q&A revisited. I only reply to issues that have remained  
open or that I feel need some more discussion.

> -----------
> 3. The contract mentions import from NoDIT databases as a requirement
> (plus the EAD standard which I'm not familiar with). To cater for
> this we have added several fields to the metadatabase, otherwise we
> would not have been able to migrate all data. However, strictly
> speaking the metadatabase therefore does not match the NCD standard
> one-on-one. How do we deal with this?
>
> ## NHT: Would you be able to let me have a list of the elements that
>   do not match, please? Then we can evaluate whether they should be
>   added to NCD or whether it does not matter. Since NCD was derived
>   from the schema that underlies NoDIT they should be very similar.
>   Markus may be able to advise on this, since he built NoDIT for
>   BioCASE.

These are elements such as PercentageDatabased. We could maintain  
them in the database (so they are not lost) but omit them from the  
web service.

> ------------
> 4. Which fields in the NCD standard are required and which are not? I
> assume ones which have a closed box in XML Spy, but this may be too
> loose for a database setup. In the NLBIF metadatabase, a collection
> must be connected to either a person or an organisation, otherwise it
> would be orphaned. If I interpret the NCD schema well, information
> about a collection can be entered without providing information about
> the organisation or person to which/whom it belongs.
>
> ## NHT: Your are right in your interpretation of the schema. We have
>   tried to keep the number of required elements to a minimum to
>   encourage data entry - there are currently only 8 required elements
>   and 5 of these are about who created the entry and when. The
>   <FamilyName> of a person is required if a person is entered. It  
> would
>   be easy enough (and makes sense) to make an institution required and
>   should not be any hardship for data entry folk since for a  
> session, at
>   least, the instutition details will be the same and so entered just
> once.

This IMO needs further discussion. A collection can also be related  
to a person, not exclusively to an organisation (private  
collections!). Also, please keep in mind that information on  
collections by themselves may be virtually useless if there is no  
information stored about where the collection is stored (for access)  
and whom to address for further information. The virtue of a database  
is that you can search for information, but if only an extremely  
limited set of data is required, this will seriously impede its use.

Please have a look at the setup of the metadatabase maintenance site.  
It's easy to add collections to existing organisations or persons,  
which only have to be created once.

> -----------
> 5. We have copied some value lists from the NoDIT database to our
> database, e.g. keywords describing collections. However, the NCD
> standard has a broader scope than the NoDIT database and the NLBIF
> metadatabase. In the interface, I assume that many values too should
> be selected from popups menus, to guarantee a uniform database --
> e.g. AgentRole, CollectionPurpose, UnitOfMeasure, DigitalFileFormats
> to name but a few. To create such lists, we need to have these values
> stored in the database, so somehow these lists should be compiled. If
> multiple languages should be supported, we may need to have these
> values in other languages but English as well.
>
> ## NHT: These terminology lists and their association with the TDWG
> ontology
>   are the subject of a separate development being undertaken at the
>   Smithsonian Institution by Carol Butler in association with Roger
>   Hyam and Markus. They will result in sets of terms that can be used
>   as pick-lists for those elements that should have just a few  
> terms or
>   for which consistency is important.
>   We should see the first draft of these at the NCD Workshop, if not
>   before. It would be good for anyone interested to make  
> suggestions for
>   such terms and their definition through this list to help Carol to
>   compile them.

This is good news! I could start with the basic set we have compiled  
for NLBIF and replace these with the NCD entries once these have been  
determined. I assume these lists will be in English and will not be  
translated to other languages?

> ----------
> 2. More or less the same applies to Agents. My idea is that this is
> filled in once when setting up the database and that this is returned
> only when the database is accessed externally. But as apparently more
> than one agent with one or more roles can be responsible for a single
> record, so possibly this solution is incomplete. However, many
> changes have to be made to the current database to implement agents
> to a record level. E.g. consider that a collection is appended to an
> organisation that has previously been created by another agent.
>
> ## NHT: You are right that someone entering new records should only
>   need to record their details once for the session. The agents
>   section of the NCD Header (which is based on the METS header) is
>   intended to record metadata about the record itself, rather than
>   the collection that the record describes. The date of record
>   creation should default to "today" and be fixed. However, an
>   agent can come along later and amend the record, maybe adding to
>   it or correcting it. In this case the role of the person is  
> "editor".
>   Appending a collection to an existing organisation record should not
>   affect the organisation record, since the relationship is
>   established simply by recording the organisation ID in the  
> collection
>   record. Nothing is added to or changed in the organisation record.

Thanks for the explanation. The system tracks which records  
(organisations, collections and persons) have been updated by whom  
(by Users). The Agent information e.g. is fixed for each site proving  
a web service on their local database. This could be entered once  
when setting up the database. Correct me if I'm wrong!

> ----------
> 3. Shouldn't PostalAddressText and PhysicalAddressText each have an
> independent zipcode and town? In reverse, is it necessary to use
> language attributes to towns and regions? We have stored this data in
> a non-language-dependent field.
>
> ## NHT: Not sure about this one - are there cases where an  
> organisation
>   has a postal address in a different town from the organisation? This
>   may be solved, as the annotation text suggests, by having the  
> complete
>   address text in the postal element. The ZIP code and town are  
> intended
>   for your wizzy Google Maps link and for folk to determine what
> collections
>   they may visit in a particular area. So if we are agreed that  
> only one
>   of these is required, the documentation will need to make it clear
>   that the ZIP code and Town refer the to actual organisation, rather
>   than its postbox.
>   While we are on the topic, does anyone have a preference for
> "organisation"
>   over "institution", or does it not matter?

Well, normally both towns are the same, but not always. From a  
database point of view, it not desirable to enter data that ideally  
should be stored in separate fields into one field. E.g. it would  
probably make Google Maps queries much more complicated if not  
impossible, as addresses can be entered in various forms.

Best regards,

Ruud