[tdwg-ncd] RE: NCD toolkit
Hi Ruud,
Thank you for your mailing - I have interspersed my responses to your questions below. As mentioned, I am also copying this out to the NCD mail-list so that folk can be informed of developments and can offer additional advice, or correct my responses where I am talking nonsense.
The toolkit is highly anticipated, with several institutions already looking to make use of it - including my own - so it is important that we get things right as early as possible. The first step is to agree on a stable version of NCD that we can all work with. A new version is required anyway, since the current v0.50 has had some of its elements "un-typed" by XMLSpy and I'm grateful to Markus for pointing this out.
So, on with the Q&A session, number 1 ...
-----Original Message----- From: Ruud Altenburg [mailto:ruud@eti.uva.nl] Sent: 18 April 2007 12:24 To: Neil Thomson Subject: NCD toolkit
Hallo Neil,
this is Ruud Altenburg from ETI. My forthcoming project is to participate in the creation of the NCD toolkit. As you know, ETI has prepared a "metadatabase" for NLBIF which was based on the NCD 0.3 standard and added some fields from the NoDIT database which we considered essential (we had to migrate the data from NoDIT to the new database). I have sent you the scheme of the metadatabase some time ago.
According to Wouter (Addink, ETI), the toolkit should be based on v0.5 of the NCD scheme. I have compared that to our metadatabase schema and noticed some changes. This implies that we need to update our database schema, which will boil down to the addition of several new fields. This of course should be fairly easy to implement. However, there are some points we need to address before I can really start with this project.
## NHT: Version v0.50 was developed in response to the schema that you sent to me and the presentation that Wouter gave to TDWG at the end of last year. We will need to bear in mind that there will be differences between the NCD data standard, which is intended as a data aggregation and interchange standard, and the implementation of it as a database. We have noted before that there are database fields that are required to make the database work that are not required for the interchange of data. This should not be a problem, though.
----------- 1. The contract states that the NCD toolkit should be multi-lingual. Does this refer to the web interface (the entry tool), the contents of the database, or both? I propose to have the interface in English and only to store the data in several languages.
## NHT: This refers primarily to the contents of the database. Most, if not all, text-oriented elements are now repeatable and have a language attribute so that entries may be made both in the local language and in a second language, such as english. The exception is the <CollectionUniformName> which is expected to be in english.
----------- 2. About the contents: how many languages should the interface cater for? Currently our metadatabase caters for the entry of two languages (English and Dutch). Data have to be entered in two languages at once. The reason for this is that you need to avoid that all organisations and collections are described in English but only half of them in Dutch. So in our setup, when someone enters new data, he or she is forced to do this in two languages at once. I propose a similar strategy for the NCD toolkit, i.e. to restrict the contents to English plus a native language if the user is not from an English speaking territory.
## NHT: This sounds like a good strategy. I'm not sure that we can enforce english except where it is useful for sorting and searching, but restricting input to two languages should be ok. I would expect (but I may be wrong here) that providing local versions of the interface would only involve re-labelling the input form and report form elements? In which case, provided guidance is given, it would be reasonable to expect that those implementing the toolkit in a different language could do that.
----------- 3. The contract mentions import from NoDIT databases as a requirement (plus the EAD standard which I'm not familiar with). To cater for this we have added several fields to the metadatabase, otherwise we would not have been able to migrate all data. However, strictly speaking the metadatabase therefore does not match the NCD standard one-on-one. How do we deal with this?
## NHT: Would you be able to let me have a list of the elements that do not match, please? Then we can evaluate whether they should be added to NCD or whether it does not matter. Since NCD was derived from the schema that underlies NoDIT they should be very similar. Markus may be able to advise on this, since he built NoDIT for BioCASE.
------------ 4. Which fields in the NCD standard are required and which are not? I assume ones which have a closed box in XML Spy, but this may be too loose for a database setup. In the NLBIF metadatabase, a collection must be connected to either a person or an organisation, otherwise it would be orphaned. If I interpret the NCD schema well, information about a collection can be entered without providing information about the organisation or person to which/whom it belongs.
## NHT: Your are right in your interpretation of the schema. We have tried to keep the number of required elements to a minimum to encourage data entry - there are currently only 8 required elements and 5 of these are about who created the entry and when. The <FamilyName> of a person is required if a person is entered. It would be easy enough (and makes sense) to make an institution required and should not be any hardship for data entry folk since for a session, at least, the instutition details will be the same and so entered just once.
----------- 5. We have copied some value lists from the NoDIT database to our database, e.g. keywords describing collections. However, the NCD standard has a broader scope than the NoDIT database and the NLBIF metadatabase. In the interface, I assume that many values too should be selected from popups menus, to guarantee a uniform database -- e.g. AgentRole, CollectionPurpose, UnitOfMeasure, DigitalFileFormats to name but a few. To create such lists, we need to have these values stored in the database, so somehow these lists should be compiled. If multiple languages should be supported, we may need to have these values in other languages but English as well.
## NHT: These terminology lists and their association with the TDWG ontology are the subject of a separate development being undertaken at the Smithsonian Institution by Carol Butler in association with Roger Hyam and Markus. They will result in sets of terms that can be used as pick-lists for those elements that should have just a few terms or for which consistency is important. We should see the first draft of these at the NCD Workshop, if not before. It would be good for anyone interested to make suggestions for such terms and their definition through this list to help Carol to compile them.
----------- I hope you have the time to discuss these items, and probably a few more that will popup when I dive deeper into the subject!
## NHT: Sure thing - I see that there is another one awaiting attention and will get on to that very soon. Thank you very much for your work on this and please do not hesitate to ask any further questions of me or anyone on what is now the TDWG Collections Description Interest Group.
All best wishes, Neil
Best regards,
Ruud
Hi there, comments inline... -- Markus
On 20.04.2007, at 11:47, Neil Thomson wrote:
Hi Ruud,
Thank you for your mailing - I have interspersed my responses to your questions below. As mentioned, I am also copying this out to the NCD mail-list so that folk can be informed of developments and can offer additional advice, or correct my responses where I am talking nonsense.
The toolkit is highly anticipated, with several institutions already looking to make use of it - including my own - so it is important that we get things right as early as possible. The first step is to agree on a stable version of NCD that we can all work with. A new version is required anyway, since the current v0.50 has had some of its elements "un-typed" by XMLSpy and I'm grateful to Markus for pointing this out.
So, on with the Q&A session, number 1 ...
-----Original Message----- From: Ruud Altenburg [mailto:ruud@eti.uva.nl] Sent: 18 April 2007 12:24 To: Neil Thomson Subject: NCD toolkit
Hallo Neil,
this is Ruud Altenburg from ETI. My forthcoming project is to participate in the creation of the NCD toolkit. As you know, ETI has prepared a "metadatabase" for NLBIF which was based on the NCD 0.3 standard and added some fields from the NoDIT database which we considered essential (we had to migrate the data from NoDIT to the new database). I have sent you the scheme of the metadatabase some time ago.
According to Wouter (Addink, ETI), the toolkit should be based on v0.5 of the NCD scheme. I have compared that to our metadatabase schema and noticed some changes. This implies that we need to update our database schema, which will boil down to the addition of several new fields. This of course should be fairly easy to implement. However, there are some points we need to address before I can really start with this project.
## NHT: Version v0.50 was developed in response to the schema that you sent to me and the presentation that Wouter gave to TDWG at the end of last year. We will need to bear in mind that there will be differences between the NCD data standard, which is intended as a data aggregation and interchange standard, and the implementation of it as a database. We have noted before that there are database fields that are required to make the database work that are not required for the interchange of data. This should not be a problem, though.
- The contract states that the NCD toolkit should be multi-lingual.
Does this refer to the web interface (the entry tool), the contents of the database, or both? I propose to have the interface in English and only to store the data in several languages.
## NHT: This refers primarily to the contents of the database. Most, if not all, text-oriented elements are now repeatable and have a language attribute so that entries may be made both in the local language and in a second language, such as english. The exception is the <CollectionUniformName> which is expected to be in english.
MD: As many other countries are planning to use the toolkit it would be great if also the interface would be multilingual. I know its a bit of extra work, but if the toolkit would be prepared for multiple languages and initially ships with english only that would be nice. The localized texts should probably be stored aside of the common html templates. A starting point for PHP is know of: http://de3.php.net/gettext http://tikiwiki.org/MultilingualDev http://www.mail-archive.com/php-i18n@lists.php.net/msg00801.html
- About the contents: how many languages should the interface cater
for? Currently our metadatabase caters for the entry of two languages (English and Dutch). Data have to be entered in two languages at once. The reason for this is that you need to avoid that all organisations and collections are described in English but only half of them in Dutch. So in our setup, when someone enters new data, he or she is forced to do this in two languages at once. I propose a similar strategy for the NCD toolkit, i.e. to restrict the contents to English plus a native language if the user is not from an English speaking territory.
## NHT: This sounds like a good strategy. I'm not sure that we can enforce english except where it is useful for sorting and searching, but restricting input to two languages should be ok. I would expect (but I may be wrong here) that providing local versions of the interface would only involve re-labelling the input form and report form elements? In which case, provided guidance is given, it would be reasonable to expect that those implementing the toolkit in a different language could do that.
MD: 2 languages are fine I assume. There will be countries like belgium though that require more than 2 languages...
- The contract mentions import from NoDIT databases as a requirement
(plus the EAD standard which I'm not familiar with). To cater for this we have added several fields to the metadatabase, otherwise we would not have been able to migrate all data. However, strictly speaking the metadatabase therefore does not match the NCD standard one-on-one. How do we deal with this?
## NHT: Would you be able to let me have a list of the elements that do not match, please? Then we can evaluate whether they should be added to NCD or whether it does not matter. Since NCD was derived from the schema that underlies NoDIT they should be very similar. Markus may be able to advise on this, since he built NoDIT for BioCASE.
MD: surely interesting in getting a list of missing elements in NCD. If they were removed from NCD on purpose I dont think we need to cater for them anymore in the toolkit database. A biocase nodit import would simply import the data that maps to NCD!
- Which fields in the NCD standard are required and which are not? I
assume ones which have a closed box in XML Spy, but this may be too loose for a database setup. In the NLBIF metadatabase, a collection must be connected to either a person or an organisation, otherwise it would be orphaned. If I interpret the NCD schema well, information about a collection can be entered without providing information about the organisation or person to which/whom it belongs.
## NHT: Your are right in your interpretation of the schema. We have tried to keep the number of required elements to a minimum to encourage data entry - there are currently only 8 required elements and 5 of these are about who created the entry and when. The <FamilyName> of a person is required if a person is entered. It would be easy enough (and makes sense) to make an institution required and should not be any hardship for data entry folk since for a session, at least, the instutition details will be the same and so entered just once.
MD: would it make sense to have a configuration file with a list of elements that need to be required? Every installation could then setup their own restrictions. Otherwise I agree that more required fields are needed in the UI than in the schema. But not too many :)
- We have copied some value lists from the NoDIT database to our
database, e.g. keywords describing collections. However, the NCD standard has a broader scope than the NoDIT database and the NLBIF metadatabase. In the interface, I assume that many values too should be selected from popups menus, to guarantee a uniform database -- e.g. AgentRole, CollectionPurpose, UnitOfMeasure, DigitalFileFormats to name but a few. To create such lists, we need to have these values stored in the database, so somehow these lists should be compiled. If multiple languages should be supported, we may need to have these values in other languages but English as well.
## NHT: These terminology lists and their association with the TDWG ontology are the subject of a separate development being undertaken at the Smithsonian Institution by Carol Butler in association with Roger Hyam and Markus. They will result in sets of terms that can be used as pick-lists for those elements that should have just a few terms or for which consistency is important. We should see the first draft of these at the NCD Workshop, if not before. It would be good for anyone interested to make suggestions for such terms and their definition through this list to help Carol to compile them.
MD: Yes. And the ontology can cater for multiple languages too although I dont expect them to exist initially. If you want to look at the way these term lists will look like, here is a list of taxonomic ranks: http://rs.tdwg.org/ontology/voc/TaxonRank.rdf
and here are lists designated for NCD, but without any real terms yet :) http://rs.tdwg.org/ontology/voc/InstitutionType.rdf http://rs.tdwg.org/ontology/voc/CollectionType.rdf http://rs.tdwg.org/ontology/voc/ObjectType.rdf http://rs.tdwg.org/ontology/voc/PreservationMethodType.rdf
The way we keep multilingual versions is still being discussed. It would be great to have an admin tool for the toolkit that allows one to download the latest list of terms from the ontology and store the terms in the database so they become available for the UI.
I hope you have the time to discuss these items, and probably a few more that will popup when I dive deeper into the subject!
## NHT: Sure thing - I see that there is another one awaiting attention and will get on to that very soon. Thank you very much for your work on this and please do not hesitate to ask any further questions of me or anyone on what is now the TDWG Collections Description Interest Group.
All best wishes, Neil
Best regards,
Ruud _______________________________________________ tdwg-ncd mailing list tdwg-ncd@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-ncd
participants (2)
-
Markus Döring
-
Neil Thomson