[tdwg-ncd] Re: NCD status summary

Tue Jun 5 13:52:33 CEST 2007

Neil,
my comments inline...
some documentation of the latest schema changes at the very bottom as  
well of the summary of decisions needed to take.
please comment!
--
Markus

> Neil,
> I have been working on the schema and the ontology lately. They don't
> match a 100%. In my opinion it would be best to focus on one
> specification only instead of trying to maintain and keep in sync 2
> specifications for different target platforms (XML+RDF). If we need
> to do that, I strongly suggest to do the real modelling in an
> abstract form with UML and derive a schema and ontology from that!
> You can also create a suitable database schema from UML automatically
> if you want. Im not sure how to go about this decision. Maybe its
> best to discuss this during the workshop?
>
> ## NT: We should complete the schema v1.0 at least before  
> transferring to another technology. Maybe you could cover the  
> reasons for moving and how it will affect future developments in  
> your session at the workshop?
Roger is giving reasons why TDWG aims at RDF over XML schemas

> -----------------------
>
> In regard to the php toolkit I think this decision doesnt have a very
> strong impact as long as the toolkit doesn't implement im/export via
> xml or rdf yet. I suggest to work with the schema for now cause this
> is 99% complete. On the other hand I believe the RDF version is the
> preferred tdwg format, so we should use this ontology as the
> reference. And as I said its not exactly the same as the ncd schema,
> as it is part of the tdwg ontology and uses external ontologies too
> (like vcard).
>
> ## NT: ok, that's good in that it should not hold Ruud up too much.
> ---------
>
> Detailed issues that still need to be resolved are:
>
> *** SCHEMA ***
> - what IDs should exist? 1 NCD guid (lsid or whatever) + many others
> with source. Is this correct?
>
> ##NT: Only the NCDID needs to be generated, probably as an LSID.  
> All other identifiers are taken from elsewhere (i.e.the "source")
so my assumption is correct. One NCD id plus any number of associated  
IDs NCD software doesnt need to understand. Thats what the current  
schema reflects.

> ------------------
>
> - are multiple parent collections & parent institutions allowed?
> currently this is not
>
> ##NT: Yes, they should be allowed
both, multiple institutions and multiple parent collections? I know  
we had this in biocase, but not many people made use of it. And this  
is the kind of change in data structure which affects our software a  
lot. With a single hierarchy you have a nice tree, with multiple  
parents you got a nasty graph which is hard to present. If this is  
needed only in 1% of the cases I actually think we should ignore it,  
people should rather indicate this in notes I believe. But if its an  
important and frequent problem we have to deal with it, sure. What do  
you reckon?

> ------------------
>
> - person roles not fixed or at least core suggested? I think we need
> this
>
> ##NT: Yes, we do need this
so I will use the ones below

> ------------------
>
> - what is "richRecordSearchString" in contrast to
> "collectionDatabaseURL"? what about "objectAccessService" and simply
> "furtherInformation" ?
>
> ##NT: The "collectionDatabaseURL" is the URL of the database that  
> holds item-level data for the collection being described in this  
> NCD record. The "richRecordSearchString" is what should be used in  
> that database to get directly to the relevant record. To take an  
> example, there will be an NCD record describing a herbarium. The  
> "collectionDatabaseURL" will be to the Index Herbariorum  
> WebDatabase and the "richRecordSearchString" will contain the coden  
> for the herbarium so that the full description of the herbarium can  
> be found by adding the two together. I have no objection to  
> "objectAccessService" for the latter, but the former is more  
> explicit than "furtherInformation".
Now Im confused. I assumed we deal with 2 different things here.

A) link to a service that gives access to individual collection items.
This doesnt necessarily have to be a database, so this is why I  
suggest to rename collectionDatabaseURL into collectionObjectAccess  
(or collectionItemAccess, objectLevelAccess?)

B) further eventually more detailed information about this collection  
meta record. Thats why I suggested furtherInformation (or the dublin  
core pendant)

If I understand you correctly the richRecordSearchString will be used  
together with the database url? Isnt that mixing item-level  with  
collection meta-level?

> -------------------
>
> - I have replaced all address information with vcard elements. That
> is vc:N for person names and vc:ADR for addresses. Some existing NCD
> elements do not exist in vcard though. How shall we treat them? Right
> now Ive simply left them as they were before in addition the vcard
> elements. It looks a bit weird, but its ok I believe. In particular
> that is byear and deathyear for persons (full birthday exists in
> vcard) and fax, email, logourl for institutional addresses. Maybe we
> can remove the email and fax and add a website instead? The
> collection have real contacts anyway already, so institutions dont
> really need this, do they?
>
> ##NT: Yes, institutions do require contact details too so I would  
> prefer that these elements remain. If they are not present as vc:  
> elements then can they just be ncd: elements?
ok. but isnt fax+email a pretty limited set of potential contact  
methods? what about telephone and websites, isnt that even more  
important?

> -------------------
>
> - many elements cover the same ideas that dublin core covers. Should
> we use the explicit dublin core elements for title, description,
> other resources, keyword, rights, modified, created and citation? I
> would be in favor to do so. It would mean that NCD mainly is about
> the extra metadata bits that dublin core doesn't cover!
>
> ##NT: We certainly could do this and Doug has produced an NCD <-->  
> DC mapping. Since this would just be a re-labelling exercise we can  
> have this as a discussion item at the workshop. NCD would then be  
> an application profile, re-using elements from mets:, dc: and vc:  
> along with its own.
I think thats a good approach. Please lets put this on the agenda for  
the workshop

> -------------------
>
> *** ONTOLOGY ***
> - collection extent is a pure integer without a unit definition. I
> know this is bad, but having complex data (i.e. with multiple
> elements or attributes) in an ontology means I would have to define a
> new class just for this. And Roger is very keen on keeping the number
> of classes low.
>
> ##NT: In this case the single attribute should be a string not an  
> integer. Describing a collection extent as "2" is spectacularly  
> meaningless. If the string is "2 shelves" or "2 cupboards" or "2  
> rooms" it means a little bit more. Not much, but enough for someone  
> to assess whether it is worth a visit or not.
totally right. Ill do that

> --------------------
> - all keywords lacking strength flag cause I have created a single
> keyword class that is used by all keyword types. And the strewngth
> flag didnt exist for all keywords. Should it or should we drop it?
>
> ##NT: I would like to keep the strength flag if possible fot those  
> elements where it makes sense - it would be good to indicate where  
> the collection owner believe's the collection to be particularly  
> strong.
well. I added strength now to the "DefinedTerm" class so it applies  
to all keywords (). But it might be incorrect, cause now the class  
describes terms that can be traced back to where they come from  
(source, idinsource). if we also add strength, then the strength  
really is a property of the collection (or the term in regard to the  
collection), not the term itself. To model this in an easier way I  
will create 2 properties for each strength containing keyword of the  
collection class. E.g TaxonCoverage and TaxonCoverageStrength. Then  
both properties can use the DefinedTerm class (which only describes a  
term, no matter where its used). Looks as if this problem is solved!

> --------------------
>
> - a persons role in regard to a specific collection is missing. I
> think we need it, but then again we need at least a basic controlled
> vocabulary
>
> ##NT: Yesm we do need this. Example terms would be owner,  
> adminstrator, collector, preparator etc.
ok, Ill start with these roles then. How many roles do you expect? If  
its less than 10 we can avoid new classes and create seperate  
properties for the different roles. Just like I did with strength in  
the keywords. I've added those 4 person types to the ontology now.

> -------------------
>
> I think thats all there is. It looks like a lot, but most issues are
> of a cosmetic nature and do not change the abstract standard. Some
> like multiple parents do of cause.
>
> I would be glad if you could reply to these questions as soon as
> possible, even if its only partly, cause I hope to find some time
> over the weekend to settle some of them.
>
> ##NT: Thanks again for looking at these details. It was always  
> known that changes would be needed for the toolkit to work and  
> others may well come to light during testing. We can also talk that  
> through at the workshop, since I think that some of the RAVNS are  
> keen to help with the testing.
>
> Is everyone in agreement with the changes proposed above? We need  
> to quickly move to v1.0 so that Ruud can strut his stuff. Apologies  
> again for the unintended out-time recently.

+++++++++++++++++++++++++++++++++++++
I think we need to still take a decision on:

  - multiple parent hierarchy. maybe we leave it single and discuss  
it at the workshop?

  - wording and meaning of "richRecordSearchString" vs  
"collectionDatabaseURL". Can we agree to use "collectionObjectAccess"  
and "furtherDetails" both of which are URIs?

  - CONTACTS. Wouldnt it be easier if we have a single contact class  
(complextype) i.e. a vcard contact, that can be used for institution  
contacts as well as collection contacts? I would prefer this option  
to the strict selection of relevant contact fields for different  
purposes. The editor interface can still restrict this if we think  
thats needed. But NCD would be able to e3xchange an entire vcard  
record for a contact no matter where it is being used.

+++++++++++++++++++++++++++++++++++++

changes to the schema when cleaning up:

about an institution:
- PostalAddressText and PhysicalAddressText removed. An institution  
can have multiple vcard addresses now that allow to specify a "type"  
such as postal,parcel,home,work,pref