[tdwg-ncd] RE: NCD toolkit

23 Apr 2007

      Hi Ruud,

Many thanks for providing the URL to the test site - don't worry about its looks, we are all aware that the functionality needs to be in place first.

For your questions below:

  - We decided to drop <PercentageDatabased> as being an unhelpful element. It is one that would need constant updating, which we thought would not happen. The URLs in the Related Materials Group could alert a user to the fact that a database exists and could then go direct to it. As to EAD, Barabara Mathé at the AMNH has prepared a mapping between EAD and NCD which you may find helpful once it reaches its final form.

  - Yes, this approach sounds good. I did try to put some dummy data into the test site, but I got thrown out with an SQL error. No matter, having an organisation and a person as a precursor to a collection description sounds perfectly reasonable and is similar to the approach taken by NoDIT.

  - Yes, the lists will be in English, but we could may get other European language equivalents from GEMET? One for use to discuss with Roger.

  - There may be more than one agent per site that will be creating and editing records. At my own place there may be a dozen or more authors.

Perhaps we should have an "open issues" page on the TDWG wiki, otherwise it is going to become difficult to keep track of queries, responses and decisions on emails?

All best wishes,
Neil

-----Original Message-----
From: Ruud Altenburg [mailto:ruud@eti.uva.nl] 
Sent: 20 April 2007 15:19
To: Neil Thomson
Cc: tdwg-ncd@lists.tdwg.org
Subject: Re: NCD toolkit

Hallo Neil,

I don't think I'm on the list your cc'ed, so if this bounces could  
you please forward my reply? I have combined both of your answers of  
today into one mail.

But before I get to that I have an important announcement: I have  
made our NLBIF metadatabase maintenance site available on a public  
server, so you and other NCD participants can have a look. Please  
note that my first concern has been the functionality, not its looks!  
This should easily be improved with style sheets but frankly I never  
heard complaints from our ETI people about its 'Spartan look' ; )

The system is available at http://www.nlbif.nl/ncd-preview/

It currently contains no data, but anyone can have a try at adding/ 
updating/deleting records. The login is test/test. A final note: I  
would not play too much with the "Send mailing" option, as this may  
seriously annoy people you have entered as test persons! :)

Then the Q&A revisited. I only reply to issues that have remained  
open or that I feel need some more discussion.
...
-----------
3. The contract mentions import from NoDIT databases as a requirement
(plus the EAD standard which I'm not familiar with). To cater for
this we have added several fields to the metadatabase, otherwise we
would not have been able to migrate all data. However, strictly
speaking the metadatabase therefore does not match the NCD standard
one-on-one. How do we deal with this?
## NHT: Would you be able to let me have a list of the elements that
  do not match, please? Then we can evaluate whether they should be
  added to NCD or whether it does not matter. Since NCD was derived
  from the schema that underlies NoDIT they should be very similar.
  Markus may be able to advise on this, since he built NoDIT for
  BioCASE.
These are elements such as PercentageDatabased. We could maintain  
them in the database (so they are not lost) but omit them from the  
web service.
...
------------
4. Which fields in the NCD standard are required and which are not? I
assume ones which have a closed box in XML Spy, but this may be too
loose for a database setup. In the NLBIF metadatabase, a collection
must be connected to either a person or an organisation, otherwise it
would be orphaned. If I interpret the NCD schema well, information
about a collection can be entered without providing information about
the organisation or person to which/whom it belongs.
## NHT: Your are right in your interpretation of the schema. We have
  tried to keep the number of required elements to a minimum to
  encourage data entry - there are currently only 8 required elements
  and 5 of these are about who created the entry and when. The
  <FamilyName> of a person is required if a person is entered. It  
would
  be easy enough (and makes sense) to make an institution required and
  should not be any hardship for data entry folk since for a  
session, at
  least, the instutition details will be the same and so entered just
once.
This IMO needs further discussion. A collection can also be related  
to a person, not exclusively to an organisation (private  
collections!). Also, please keep in mind that information on  
collections by themselves may be virtually useless if there is no  
information stored about where the collection is stored (for access)  
and whom to address for further information. The virtue of a database  
is that you can search for information, but if only an extremely  
limited set of data is required, this will seriously impede its use.

Please have a look at the setup of the metadatabase maintenance site.  
It's easy to add collections to existing organisations or persons,  
which only have to be created once.
...
-----------
5. We have copied some value lists from the NoDIT database to our
database, e.g. keywords describing collections. However, the NCD
standard has a broader scope than the NoDIT database and the NLBIF
metadatabase. In the interface, I assume that many values too should
be selected from popups menus, to guarantee a uniform database --
e.g. AgentRole, CollectionPurpose, UnitOfMeasure, DigitalFileFormats
to name but a few. To create such lists, we need to have these values
stored in the database, so somehow these lists should be compiled. If
multiple languages should be supported, we may need to have these
values in other languages but English as well.
## NHT: These terminology lists and their association with the TDWG
ontology
  are the subject of a separate development being undertaken at the
  Smithsonian Institution by Carol Butler in association with Roger
  Hyam and Markus. They will result in sets of terms that can be used
  as pick-lists for those elements that should have just a few  
terms or
  for which consistency is important.
  We should see the first draft of these at the NCD Workshop, if not
  before. It would be good for anyone interested to make  
suggestions for
  such terms and their definition through this list to help Carol to
  compile them.
This is good news! I could start with the basic set we have compiled  
for NLBIF and replace these with the NCD entries once these have been  
determined. I assume these lists will be in English and will not be  
translated to other languages?
...
----------
2. More or less the same applies to Agents. My idea is that this is
filled in once when setting up the database and that this is returned
only when the database is accessed externally. But as apparently more
than one agent with one or more roles can be responsible for a single
record, so possibly this solution is incomplete. However, many
changes have to be made to the current database to implement agents
to a record level. E.g. consider that a collection is appended to an
organisation that has previously been created by another agent.
## NHT: You are right that someone entering new records should only
  need to record their details once for the session. The agents
  section of the NCD Header (which is based on the METS header) is
  intended to record metadata about the record itself, rather than
  the collection that the record describes. The date of record
  creation should default to "today" and be fixed. However, an
  agent can come along later and amend the record, maybe adding to
  it or correcting it. In this case the role of the person is  
"editor".
  Appending a collection to an existing organisation record should not
  affect the organisation record, since the relationship is
  established simply by recording the organisation ID in the  
collection
  record. Nothing is added to or changed in the organisation record.
Thanks for the explanation. The system tracks which records  
(organisations, collections and persons) have been updated by whom  
(by Users). The Agent information e.g. is fixed for each site proving  
a web service on their local database. This could be entered once  
when setting up the database. Correct me if I'm wrong!
...
----------
3. Shouldn't PostalAddressText and PhysicalAddressText each have an
independent zipcode and town? In reverse, is it necessary to use
language attributes to towns and regions? We have stored this data in
a non-language-dependent field.
## NHT: Not sure about this one - are there cases where an  
organisation
  has a postal address in a different town from the organisation? This
  may be solved, as the annotation text suggests, by having the  
complete
  address text in the postal element. The ZIP code and town are  
intended
  for your wizzy Google Maps link and for folk to determine what
collections
  they may visit in a particular area. So if we are agreed that  
only one
  of these is required, the documentation will need to make it clear
  that the ZIP code and Town refer the to actual organisation, rather
  than its postbox.
  While we are on the topic, does anyone have a preference for
"organisation"
  over "institution", or does it not matter?
Well, normally both towns are the same, but not always. From a  
database point of view, it not desirable to enter data that ideally  
should be stored in separate fields into one field. E.g. it would  
probably make Google Maps queries much more complicated if not  
impossible, as addresses can be entered in various forms.

Best regards,

Ruud