Thanks James, Good questions all. As with all standards, the language question is an important one, and not one to be taken lightly. My current opinion (an only my personal one) is that we should decide on the standard elements in one language (and English appears to be the one that continues to surface, whether we all like that or not) and then figure out (and as technology character sets--Unicode not yet being universally available) how to make those work in all languages. I also personally believe that is important to include the source language and characters and to translate (or transliterate) those into other language(s)...whether we choose a single language or go for the UN languages.
However, the latter is a long range goal and we rely on those of you who are fluent in those languages to assist us, now, and in making the standards work in other languages (and probably metadata traditions).
ISO dates are an interesting question. For taXMLit, we decided that we should capture dates as they are in the original publication and that any conversion/interpretation belongs in another (and later, still to be developed--see the explanation document) layer. For other standards, we might consider a different solution, but for a full content standard, I believe it best to record what is there and then to add one or more interpretations to other standard(s) or opinions for someplace else (which is closely linked and equally easily acceptable).
Cheers, Anna
Nozomi Ytow nozomi@biol.tsukuba.ac.jp 09-Feb-2006 2:11:16 PM >>>
Dear all,
thanks for documents, Anna.
The following questions came to my mind. These are relevant to Level 1, but some also relevant to contents of higher levels.
Do we assume ASCII only for human-readable portion, or do we allow non-ASCII including east European, Cyrillic and Asian characters? If we allow non-ASCII, do we expect character normalisation (senu Unicode)? Two or more code points for a single character is not uncommon in Unicode. If we allow non-ASCII, do we need to restrict some fields to ASCII, even if the fields values are non-ASCII in the original literature?
Do we use standards such as ISO to display date?
Cheers, James
_______________________________________________ TDWG-Lit mailing list TDWG-Lit@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-lit_lists.tdwg.org
Dear Anne and all,
ASCII is a safe subset of UTF-8, an encoding method of Unicode characters. So having one mandatory representation in ASCII is technically reasonable way if source language field is suppoted, e.g.
GUID01+Flora of Japan ... + ja_JP GUID01+Nihon shokubutushi ... + ja_JP GUID01+(Nihon shokubutushi in Kanji)...
The first one is translation, the second one is transliteration, and the last one is original one. The last part ja_JP tell that these are non-original one. I suggest to add source language as an optional item to Level 1. The field may have list of languages if the publication is written in two or more languages, e.g. ICZN itself.
Note that here I gave the same GUID to all three representations of the single publication. It implies that the GUID is ID for the literature object, not for (meta) data objects. If we need GUID for data objects also, we would need two types ofGUID.
I meant ISO dates only for "date published (as corrected)". I agree with you date issue; in Level 1, date published as cited should be as is, shouldn't be interpreted.
Cheers, James
Note that here I gave the same GUID to all three representations of the single publication. It implies that the GUID is ID for the literature object, not for (meta) data objects. If we need GUID for data objects also, we would need two types ofGUID.
I would encourage everyone to have a look at the report from the recent TDWG/GBIF Workshop on GUIDs:
http://wiki.gbif.org/guidwiki/wikka.php?wakka=GUID1Report
One of the outcomes of this workshop was the need to clarify the distinction between GUIDs assigned to data objecs, vs. GUIDs assigned to conceptual entities. I was a member of a breakout group that discussed this distinction, and we were tasked with creating a page or set of pages on the Wiki to clarify this distinction -- particularly in the context of LSIDs (the GUID technology which seemed to be strongly preferred by the workshop participants). It's a subtle distinction with very significant implications (it's also a topic muddled with unfortunate semantics). Our breakout group concluded that GUIDs for conceptual entities serve an important role in biodiversity informatics, if for nothing else than to serve as a "hub" around which multiple data-object GUIDs may cross-reference each other.
So...James...in your example, there would probably be multiple data-object GUIDs for the different digital "renderings"; and perhaps a separate data-less GUID assigned to the the "concept" of the specific publication instance, to which all of the alternative data-GUIDs may be "hubbed".
This group should also take note of point 5 under the "Work plan" section of the Workshop Report, regarding the need for a sort of "Publication Bank".
I meant ISO dates only for "date published (as corrected)". I agree with you date issue; in Level 1, date published as cited should be as is, shouldn't be interpreted.
But wouldn't we also want the "corrected" (="interpreted") date (or date range) to be among the attributes -- even at level 1? If for no other reason than estimating chronology?
Aloha, Rich
Hi Rich,
I meant ISO dates only for "date published (as corrected)". I agree with you date issue; in Level 1, date published as cited should be as is, shouldn't be interpreted.
But wouldn't we also want the "corrected" (="interpreted") date (or date range) to be among the attributes -- even at level 1? If for no other reason than estimating chronology?
I do not understand here. Anne's start point documents contains both date published, as cited and as corrected. The latter comes with dd/mm/yyyy as an example. I suggested to use ISO date format for this one, not "as cited" one.
Cheers, JMS
participants (3)
-
Anna Weitzman
-
Nozomi Ytow
-
Richard Pyle