[tdwg-ncd] RE: NCD toolkit

Fri Apr 20 17:55:44 CEST 2007

Hi there,
comments inline...
--
Markus

On 20.04.2007, at 11:47, Neil Thomson wrote:

>  Hi Ruud,
>
> Thank you for your mailing - I have interspersed my responses to your
> questions below. As mentioned, I am also copying this out to the NCD
> mail-list so that folk can be informed of developments and can offer
> additional advice, or correct my responses where I am talking  
> nonsense.
>
> The toolkit is highly anticipated, with several institutions already
> looking to make use of it - including my own - so it is important that
> we get things right as early as possible. The first step is to  
> agree on
> a stable version of NCD that we can all work with. A new version is
> required anyway, since the current v0.50 has had some of its elements
> "un-typed" by XMLSpy and I'm grateful to Markus for pointing this out.
>
> So, on with the Q&A session, number 1 ...
>
>
> -----Original Message-----
> From: Ruud Altenburg [mailto:ruud at eti.uva.nl]
> Sent: 18 April 2007 12:24
> To: Neil Thomson
> Subject: NCD toolkit
>
> Hallo Neil,
>
> this is Ruud Altenburg from ETI. My forthcoming project is to
> participate in the creation of the NCD toolkit. As you know, ETI has
> prepared a "metadatabase" for NLBIF which was based on the NCD 0.3
> standard and added some fields from the NoDIT database which we
> considered essential (we had to migrate the data from NoDIT to the
> new database). I have sent you the scheme of the metadatabase some
> time ago.
>
> According to Wouter (Addink, ETI), the toolkit should be based on
> v0.5 of the NCD scheme. I have compared that to our metadatabase
> schema and noticed some changes. This implies that we need to update
> our database schema, which will boil down to the addition of several
> new fields. This of course should be fairly easy to implement.
> However, there are some points we need to address before I can really
> start with this project.
>
> ## NHT: Version v0.50 was developed in response to the schema that you
>   sent to me and the presentation that Wouter gave to TDWG at the end
>   of last year. We will need to bear in mind that there will be
>   differences between the NCD data standard, which is intended as a
>   data aggregation and interchange standard, and the implementation
>   of it as a database. We have noted before that there are database
>   fields that are required to make the database work that are not
>   required for the interchange of data. This should not be a problem,
>   though.
>
> -----------
> 1. The contract states that the NCD toolkit should be multi-lingual.
> Does this refer to the web interface (the entry tool), the contents
> of the database, or both? I propose to have the interface in English
> and only to store the data in several languages.
>
> ## NHT: This refers primarily to the contents of the database. Most,
>   if not all, text-oriented elements are now repeatable and have a
>   language attribute so that entries may be made both in the local
>   language and in a second language, such as english. The exception
>   is the <CollectionUniformName> which is expected to be in english.
MD: As many other countries are planning to use the toolkit it would  
be great if also the interface would be multilingual. I know its a  
bit of extra work, but if the toolkit would be prepared for multiple  
languages and initially ships with english only that would be nice.  
The localized texts should probably be stored aside of the common  
html templates. A starting point for PHP is know of:
http://de3.php.net/gettext
http://tikiwiki.org/MultilingualDev
http://www.mail-archive.com/php-i18n@lists.php.net/msg00801.html

>
> -----------
> 2. About the contents: how many languages should the interface cater
> for? Currently our metadatabase caters for the entry of two languages
> (English and Dutch). Data have to be entered in two languages at
> once. The reason for this is that you need to avoid that all
> organisations and collections are described in English but only half
> of them in Dutch. So in our setup, when someone enters new data, he
> or she is forced to do this in two languages at once. I propose a
> similar strategy for the NCD toolkit, i.e. to restrict the contents
> to English plus a native language if the user is not from an English
> speaking territory.
>
> ## NHT: This sounds like a good strategy. I'm not sure that
>   we can enforce english except where it is useful for sorting and
>   searching, but restricting input to two languages should be ok.
>   I would expect (but I may be wrong here) that providing local
>   versions of the interface would only involve re-labelling the
>   input form and report form elements? In which case, provided
>   guidance is given, it would be reasonable to expect that those
>   implementing the toolkit in a different language could do that.
MD: 2 languages are fine I assume. There will be countries like  
belgium though that require more than 2 languages...

> -----------
> 3. The contract mentions import from NoDIT databases as a requirement
> (plus the EAD standard which I'm not familiar with). To cater for
> this we have added several fields to the metadatabase, otherwise we
> would not have been able to migrate all data. However, strictly
> speaking the metadatabase therefore does not match the NCD standard
> one-on-one. How do we deal with this?
>
> ## NHT: Would you be able to let me have a list of the elements that
>   do not match, please? Then we can evaluate whether they should be
>   added to NCD or whether it does not matter. Since NCD was derived
>   from the schema that underlies NoDIT they should be very similar.
>   Markus may be able to advise on this, since he built NoDIT for
>   BioCASE.
MD: surely interesting in getting a list of missing elements in NCD.  
If they were removed from NCD on purpose I dont think we need to  
cater for them anymore in the toolkit database. A biocase nodit  
import would simply import the data that maps to NCD!

>
> ------------
> 4. Which fields in the NCD standard are required and which are not? I
> assume ones which have a closed box in XML Spy, but this may be too
> loose for a database setup. In the NLBIF metadatabase, a collection
> must be connected to either a person or an organisation, otherwise it
> would be orphaned. If I interpret the NCD schema well, information
> about a collection can be entered without providing information about
> the organisation or person to which/whom it belongs.
>
> ## NHT: Your are right in your interpretation of the schema. We have
>   tried to keep the number of required elements to a minimum to
>   encourage data entry - there are currently only 8 required elements
>   and 5 of these are about who created the entry and when. The
>   <FamilyName> of a person is required if a person is entered. It  
> would
>   be easy enough (and makes sense) to make an institution required and
>   should not be any hardship for data entry folk since for a  
> session, at
>   least, the instutition details will be the same and so entered just
> once.
MD: would it make sense to have a configuration file with a list of  
elements that need to be required? Every installation could then  
setup their own restrictions. Otherwise I agree that more required  
fields are needed in the UI than in the schema. But not too many :)

>
> -----------
> 5. We have copied some value lists from the NoDIT database to our
> database, e.g. keywords describing collections. However, the NCD
> standard has a broader scope than the NoDIT database and the NLBIF
> metadatabase. In the interface, I assume that many values too should
> be selected from popups menus, to guarantee a uniform database --
> e.g. AgentRole, CollectionPurpose, UnitOfMeasure, DigitalFileFormats
> to name but a few. To create such lists, we need to have these values
> stored in the database, so somehow these lists should be compiled. If
> multiple languages should be supported, we may need to have these
> values in other languages but English as well.
>
> ## NHT: These terminology lists and their association with the TDWG
> ontology
>   are the subject of a separate development being undertaken at the
>   Smithsonian Institution by Carol Butler in association with Roger
>   Hyam and Markus. They will result in sets of terms that can be used
>   as pick-lists for those elements that should have just a few  
> terms or
>   for which consistency is important.
>   We should see the first draft of these at the NCD Workshop, if not
>   before. It would be good for anyone interested to make  
> suggestions for
>   such terms and their definition through this list to help Carol to
>   compile them.
MD: Yes. And the ontology can cater for multiple languages too  
although I dont expect them to exist initially. If you want to look  
at the way these term lists will look like, here is a list of  
taxonomic ranks:
http://rs.tdwg.org/ontology/voc/TaxonRank.rdf

and here are lists designated for NCD, but without any real terms yet :)
http://rs.tdwg.org/ontology/voc/InstitutionType.rdf
http://rs.tdwg.org/ontology/voc/CollectionType.rdf
http://rs.tdwg.org/ontology/voc/ObjectType.rdf
http://rs.tdwg.org/ontology/voc/PreservationMethodType.rdf

The way we keep multilingual versions is still being discussed.
It would be great to have an admin tool for the toolkit that allows  
one to download the latest list of terms from the ontology and store  
the terms in the database so they become available for the UI.

> -----------
> I hope you have the time to discuss these items, and probably a few
> more that will popup when I dive deeper into the subject!
>
> ## NHT: Sure thing - I see that there is another one awaiting  
> attention
>   and will get on to that very soon.
>   Thank you very much for your work on this and please do not hesitate
> to
>   ask any further questions of me or anyone on what is now the TDWG
>   Collections Description Interest Group.
>
>   All best wishes,
>   Neil
>
> Best regards,
>
> Ruud
> _______________________________________________
> tdwg-ncd mailing list
> tdwg-ncd at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-ncd