Hi Ruud,
Thank you for your mailing - I have interspersed my responses to your
questions below. As mentioned, I am also copying this out to the NCD
mail-list so that folk can be informed of developments and can offer
additional advice, or correct my responses where I am talking nonsense.
The toolkit is highly anticipated, with several institutions already
looking to make use of it - including my own - so it is important that
we get things right as early as possible. The first step is to agree on
a stable version of NCD that we can all work with. A new version is
required anyway, since the current v0.50 has had some of its elements
"un-typed" by XMLSpy and I'm grateful to Markus for pointing this out.
So, on with the Q&A session, number 1 ...
-----Original Message-----
From: Ruud Altenburg [mailto:ruud@eti.uva.nl]
Sent: 18 April 2007 12:24
To: Neil Thomson
Subject: NCD toolkit
Hallo Neil,
this is Ruud Altenburg from ETI. My forthcoming project is to
participate in the creation of the NCD toolkit. As you know, ETI has
prepared a "metadatabase" for NLBIF which was based on the NCD 0.3
standard and added some fields from the NoDIT database which we
considered essential (we had to migrate the data from NoDIT to the
new database). I have sent you the scheme of the metadatabase some
time ago.
According to Wouter (Addink, ETI), the toolkit should be based on
v0.5 of the NCD scheme. I have compared that to our metadatabase
schema and noticed some changes. This implies that we need to update
our database schema, which will boil down to the addition of several
new fields. This of course should be fairly easy to implement.
However, there are some points we need to address before I can really
start with this project.
## NHT: Version v0.50 was developed in response to the schema that you
sent to me and the presentation that Wouter gave to TDWG at the end
of last year. We will need to bear in mind that there will be
differences between the NCD data standard, which is intended as a
data aggregation and interchange standard, and the implementation
of it as a database. We have noted before that there are database
fields that are required to make the database work that are not
required for the interchange of data. This should not be a problem,
though.
-----------
1. The contract states that the NCD toolkit should be multi-lingual.
Does this refer to the web interface (the entry tool), the contents
of the database, or both? I propose to have the interface in English
and only to store the data in several languages.
## NHT: This refers primarily to the contents of the database. Most,
if not all, text-oriented elements are now repeatable and have a
language attribute so that entries may be made both in the local
language and in a second language, such as english. The exception
is the <CollectionUniformName> which is expected to be in english.
-----------
2. About the contents: how many languages should the interface cater
for? Currently our metadatabase caters for the entry of two languages
(English and Dutch). Data have to be entered in two languages at
once. The reason for this is that you need to avoid that all
organisations and collections are described in English but only half
of them in Dutch. So in our setup, when someone enters new data, he
or she is forced to do this in two languages at once. I propose a
similar strategy for the NCD toolkit, i.e. to restrict the contents
to English plus a native language if the user is not from an English
speaking territory.
## NHT: This sounds like a good strategy. I'm not sure that
we can enforce english except where it is useful for sorting and
searching, but restricting input to two languages should be ok.
I would expect (but I may be wrong here) that providing local
versions of the interface would only involve re-labelling the
input form and report form elements? In which case, provided
guidance is given, it would be reasonable to expect that those
implementing the toolkit in a different language could do that.
-----------
3. The contract mentions import from NoDIT databases as a requirement
(plus the EAD standard which I'm not familiar with). To cater for
this we have added several fields to the metadatabase, otherwise we
would not have been able to migrate all data. However, strictly
speaking the metadatabase therefore does not match the NCD standard
one-on-one. How do we deal with this?
## NHT: Would you be able to let me have a list of the elements that
do not match, please? Then we can evaluate whether they should be
added to NCD or whether it does not matter. Since NCD was derived
from the schema that underlies NoDIT they should be very similar.
Markus may be able to advise on this, since he built NoDIT for
BioCASE.
------------
4. Which fields in the NCD standard are required and which are not? I
assume ones which have a closed box in XML Spy, but this may be too
loose for a database setup. In the NLBIF metadatabase, a collection
must be connected to either a person or an organisation, otherwise it
would be orphaned. If I interpret the NCD schema well, information
about a collection can be entered without providing information about
the organisation or person to which/whom it belongs.
## NHT: Your are right in your interpretation of the schema. We have
tried to keep the number of required elements to a minimum to
encourage data entry - there are currently only 8 required elements
and 5 of these are about who created the entry and when. The
<FamilyName> of a person is required if a person is entered. It would
be easy enough (and makes sense) to make an institution required and
should not be any hardship for data entry folk since for a session, at
least, the instutition details will be the same and so entered just
once.
-----------
5. We have copied some value lists from the NoDIT database to our
database, e.g. keywords describing collections. However, the NCD
standard has a broader scope than the NoDIT database and the NLBIF
metadatabase. In the interface, I assume that many values too should
be selected from popups menus, to guarantee a uniform database --
e.g. AgentRole, CollectionPurpose, UnitOfMeasure, DigitalFileFormats
to name but a few. To create such lists, we need to have these values
stored in the database, so somehow these lists should be compiled. If
multiple languages should be supported, we may need to have these
values in other languages but English as well.
## NHT: These terminology lists and their association with the TDWG
ontology
are the subject of a separate development being undertaken at the
Smithsonian Institution by Carol Butler in association with Roger
Hyam and Markus. They will result in sets of terms that can be used
as pick-lists for those elements that should have just a few terms or
for which consistency is important.
We should see the first draft of these at the NCD Workshop, if not
before. It would be good for anyone interested to make suggestions for
such terms and their definition through this list to help Carol to
compile them.
-----------
I hope you have the time to discuss these items, and probably a few
more that will popup when I dive deeper into the subject!
## NHT: Sure thing - I see that there is another one awaiting attention
and will get on to that very soon.
Thank you very much for your work on this and please do not hesitate
to
ask any further questions of me or anyone on what is now the TDWG
Collections Description Interest Group.
All best wishes,
Neil
Best regards,
Ruud