[tdwg-tapir] tapir metadata issues

D ö ring, Markus m.doering at BGBM.org
Mon Jul 2 16:05:35 CEST 2007


Hi there,
This really is weird.
I was confident that the xml schema language data type was used. This
defines natural language identifiers as defined by RFC 3066.
http://www.w3.org/TR/xmlschema-2/#language
http://www.ietf.org/rfc/rfc3066.txt

But when I looked up the tapir schema, it makes use of the dublin core
schema that then states the following:

  <xs:element name="language" substitutionGroup="any"/>
  <xs:element name="any" type="SimpleLiteral" abstract="true"/>
  <xs:complexType name="SimpleLiteral">
   <xs:complexContent mixed="true">
    <xs:restriction base="xs:anyType">
     <xs:sequence>
      <xs:any processContents="lax" minOccurs="0" maxOccurs="0"/>
     </xs:sequence>
     <xs:attribute ref="xml:lang" use="optional"/>
    </xs:restriction>
   </xs:complexContent>
  </xs:complexType>

So you can use anything for the dc:language element AND tag it with an
optional xml:lang attribute. Thats weird:

<dc:language xml:lang="en">swuaheli</dc:language>



Markus




Am 29.06.2007 15:52 Uhr schrieb "Jim Graham" unter <jim at nrel.colostate.edu>:

> Greetings,
> 
> I understand making the dc:language optional but I'd be really concerned
> about allowing the language code to be from different standards.  The
> example for "SW" Wouter points out would be a real concern.  Can we have
> mulitple language elements each of which is tied to a specific language code
> standard?  This way we cannot make the type of mistake with "SW" being
> missinterpreted as Swedish or Swahili.
> 
> Thanks,
> Jim
> 
> Jim Graham
> Natural Resource Ecology Laboratory
> Colorado State University
> Fort Collins, CO 80524
> jim at nrel.colostate.edu
> 970-491-0410
> 
> 
> -----Original Message-----
> From: tdwg-tapir-bounces at lists.tdwg.org
> [mailto:tdwg-tapir-bounces at lists.tdwg.org] On Behalf Of Wouter Addink
> Sent: Friday, June 29, 2007 2:45 AM
> To: tdwg-tapir at lists.tdwg.org
> Subject: Re: [tdwg-tapir] tapir metadata issues
> 
> Renato,
> thanks for your comments.
> 
>> - Make dc:language an optional element.
>> - Change the cardinality of dc:language to "unbounded".
>> - Change the recommendation about the content of dc:language by
>> including ethnologue codes as another option (probably the main
>> option). Note that it will still be just a recommendation, not a normative
> statement.
>> 
> Ok. Perhaps we should add an optional attribute also, for specifying the
> used code standard, if any? That should not affect current implementations I
> think. Problem is that you cannot do anything with an abbreviation if you do
> not know what it means. Making assumptions can be dangerous. For instance
> you could asume that "SW" means Swedish, or that it means Swahili. If you
> know that it is an IANA subtag, you can use it and you can also raise an
> error if there is an abbreviation which is not present in the used standard.
> 
> Another comment about the Tapir metadata: when giving courses in installing
> Tapirlink, I noticed that none of the about 10 (Dutch) students could figure
> out themselves what 'relatedEntity' means. They all needed help on that.
> Perhaps the documentation of that element should be expanded?
> 
> Cheers,
> Wouter
> 
> 
> 
> _______________________________________________
> tdwg-tapir mailing list
> tdwg-tapir at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-tapir
> 
> _______________________________________________
> tdwg-tapir mailing list
> tdwg-tapir at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-tapir




More information about the tdwg-tag mailing list