Hi all,
Let me share with you more thoughts about this subject using Wouter's original message as a reference.
dc:language is mandatory. What to do with data that is not language specific? Example: we are going to use Tapir for sharing lists of scientific names. Should the language be Latin in that case? We think about using specifying English (eng) as default in that case. The recommendation is to use IANA Language subtags. Probably better to recommend the languages from ethnologue.org (3-letter abbreviations). This because the data can be in much more languages then the IANA Languages, for instance common names in extinct languages. This is different from the xml:lang attribute, which is primarily for application development.
The fact that part of the data being exposed by a service contains scientific names doesn't mean that a user needs to understand latin to make use of this content. In my opinion, dc:language should only be used to indicate that users need to know one or more specific languages if they want to understand the content being served. The best example in our case would probably be species description data. In this case dc:language should definitely be used to indicate the language in which species are described.
If a service exposes only pure taxonomic data or just names, without any kind of description, I would probably not specify any language as part of TAPIR metadata. Even if the content includes common names in the most unusual languages, because names are essentially identifiers used to designate entities.
However, when exposing common names associated with a taxon, I certainly agree it's desirable to specify the language, but dc:language would not be appropriate here since it's just a general reference about the whole content of the service. It would be necessary to have a specific concept to indicate the language for each common name, and the content of this concept could be IANA codes, ethnologue, or any other option.
So now I think I agree with Markus that we could keep the existing approach and force a specific language standard through the spec. This standard could certainly be IANA, unless we expect services to provide content (related to descriptions, explanations, etc.) in really unusual languages.
By the way, even when the service content is not associated with any particular language, we could keep dc:language as a mandatory element. I've just discovered that the IANA code "zxx" means "No linguistic content".
Would it be OK for everybody if we keep dc:language a mandatory element, but now unbounded, and then force through the spec the use of IANA codes?
Best Regards, -- Renato
I think it may not be enough. ISO 639-2 (3 letter codes) lists about 500 languages if I am right. Ethnologue about 7000. The data can be in any language or dialect, especially common names or herbal information. The ethnologue 3-letter code list has the advantage of having a link between languages and countries, although the iso countries list they use is not completely up to date. Usually I prefer ISO standards, but in this case I am not sure.
Wouter
----- Original Message ----- From: "Döring, Markus" m.doering@BGBM.org To: "Wouter Addink" wouter@eti.uva.nl; tdwg-tapir@lists.tdwg.org Sent: Wednesday, July 04, 2007 12:08 PM Subject: Re: [tdwg-tapir] tapir metadata issues
Isn't rfc3066 as used by xml schema enough? Any arguments against it?
RFC3066 specifies the primary language to be ISO 639-2. The Library of Congress, maintainers of ISO 639-2, has made the list of languages registered available on the Internet. It can be found at
http://www.loc.gov/standards/iso639-2/langhome.html http://www.w3.org/TR/xmlschema-2/#language http://www.ietf.org/rfc/rfc3066.txt