[tdwg-ncd] RE: NCD toolkit

Fri Apr 20 11:47:20 CEST 2007

 Hi Ruud,

Thank you for your mailing - I have interspersed my responses to your
questions below. As mentioned, I am also copying this out to the NCD
mail-list so that folk can be informed of developments and can offer
additional advice, or correct my responses where I am talking nonsense.

The toolkit is highly anticipated, with several institutions already
looking to make use of it - including my own - so it is important that
we get things right as early as possible. The first step is to agree on
a stable version of NCD that we can all work with. A new version is
required anyway, since the current v0.50 has had some of its elements
"un-typed" by XMLSpy and I'm grateful to Markus for pointing this out.

So, on with the Q&A session, number 1 ...

-----Original Message-----
From: Ruud Altenburg [mailto:ruud at eti.uva.nl] 
Sent: 18 April 2007 12:24
To: Neil Thomson
Subject: NCD toolkit

Hallo Neil,

this is Ruud Altenburg from ETI. My forthcoming project is to  
participate in the creation of the NCD toolkit. As you know, ETI has  
prepared a "metadatabase" for NLBIF which was based on the NCD 0.3  
standard and added some fields from the NoDIT database which we  
considered essential (we had to migrate the data from NoDIT to the  
new database). I have sent you the scheme of the metadatabase some  
time ago.

According to Wouter (Addink, ETI), the toolkit should be based on  
v0.5 of the NCD scheme. I have compared that to our metadatabase  
schema and noticed some changes. This implies that we need to update  
our database schema, which will boil down to the addition of several  
new fields. This of course should be fairly easy to implement.  
However, there are some points we need to address before I can really  
start with this project.

## NHT: Version v0.50 was developed in response to the schema that you
  sent to me and the presentation that Wouter gave to TDWG at the end
  of last year. We will need to bear in mind that there will be
  differences between the NCD data standard, which is intended as a
  data aggregation and interchange standard, and the implementation
  of it as a database. We have noted before that there are database
  fields that are required to make the database work that are not
  required for the interchange of data. This should not be a problem, 
  though.

-----------
1. The contract states that the NCD toolkit should be multi-lingual.  
Does this refer to the web interface (the entry tool), the contents  
of the database, or both? I propose to have the interface in English  
and only to store the data in several languages.

## NHT: This refers primarily to the contents of the database. Most,
  if not all, text-oriented elements are now repeatable and have a
  language attribute so that entries may be made both in the local
  language and in a second language, such as english. The exception
  is the <CollectionUniformName> which is expected to be in english.

-----------
2. About the contents: how many languages should the interface cater  
for? Currently our metadatabase caters for the entry of two languages  
(English and Dutch). Data have to be entered in two languages at  
once. The reason for this is that you need to avoid that all  
organisations and collections are described in English but only half  
of them in Dutch. So in our setup, when someone enters new data, he  
or she is forced to do this in two languages at once. I propose a  
similar strategy for the NCD toolkit, i.e. to restrict the contents  
to English plus a native language if the user is not from an English  
speaking territory.

## NHT: This sounds like a good strategy. I'm not sure that 
  we can enforce english except where it is useful for sorting and
  searching, but restricting input to two languages should be ok.
  I would expect (but I may be wrong here) that providing local
  versions of the interface would only involve re-labelling the
  input form and report form elements? In which case, provided
  guidance is given, it would be reasonable to expect that those
  implementing the toolkit in a different language could do that.

-----------
3. The contract mentions import from NoDIT databases as a requirement  
(plus the EAD standard which I'm not familiar with). To cater for  
this we have added several fields to the metadatabase, otherwise we  
would not have been able to migrate all data. However, strictly  
speaking the metadatabase therefore does not match the NCD standard  
one-on-one. How do we deal with this?

## NHT: Would you be able to let me have a list of the elements that
  do not match, please? Then we can evaluate whether they should be
  added to NCD or whether it does not matter. Since NCD was derived
  from the schema that underlies NoDIT they should be very similar.
  Markus may be able to advise on this, since he built NoDIT for 
  BioCASE.

------------
4. Which fields in the NCD standard are required and which are not? I  
assume ones which have a closed box in XML Spy, but this may be too  
loose for a database setup. In the NLBIF metadatabase, a collection  
must be connected to either a person or an organisation, otherwise it  
would be orphaned. If I interpret the NCD schema well, information  
about a collection can be entered without providing information about  
the organisation or person to which/whom it belongs.

## NHT: Your are right in your interpretation of the schema. We have 
  tried to keep the number of required elements to a minimum to 
  encourage data entry - there are currently only 8 required elements
  and 5 of these are about who created the entry and when. The 
  <FamilyName> of a person is required if a person is entered. It would
  be easy enough (and makes sense) to make an institution required and
  should not be any hardship for data entry folk since for a session, at
  least, the instutition details will be the same and so entered just
once.

-----------
5. We have copied some value lists from the NoDIT database to our  
database, e.g. keywords describing collections. However, the NCD  
standard has a broader scope than the NoDIT database and the NLBIF  
metadatabase. In the interface, I assume that many values too should  
be selected from popups menus, to guarantee a uniform database --  
e.g. AgentRole, CollectionPurpose, UnitOfMeasure, DigitalFileFormats  
to name but a few. To create such lists, we need to have these values  
stored in the database, so somehow these lists should be compiled. If  
multiple languages should be supported, we may need to have these  
values in other languages but English as well.

## NHT: These terminology lists and their association with the TDWG
ontology
  are the subject of a separate development being undertaken at the
  Smithsonian Institution by Carol Butler in association with Roger
  Hyam and Markus. They will result in sets of terms that can be used
  as pick-lists for those elements that should have just a few terms or
  for which consistency is important.
  We should see the first draft of these at the NCD Workshop, if not
  before. It would be good for anyone interested to make suggestions for
  such terms and their definition through this list to help Carol to 
  compile them.

-----------
I hope you have the time to discuss these items, and probably a few  
more that will popup when I dive deeper into the subject!

## NHT: Sure thing - I see that there is another one awaiting attention 
  and will get on to that very soon.
  Thank you very much for your work on this and please do not hesitate
to
  ask any further questions of me or anyone on what is now the TDWG 
  Collections Description Interest Group.

  All best wishes,
  Neil

Best regards,

Ruud