[tdwg-guid] Need for citation information in GUID metadata
Dear Greg,
The issue you describe is indeed much more general. It all boils down to defining metadata standards for images used in our community. In other words, if we had metadata standards for images, we could use them to address the requirement you described.
The (informal) TDWG Imaging Interest Group (http://www.tdwg.org/activities/img/) has the remit of developing such a metadata profile, but as far as I know, it hasn't done so yet.
In the absence of such an image metadata profile, we, the GUID group, or the TAG, could spec out a temporary solution that could later be superseded by a more complete solution devised by the Imaging group. I believe that this is what you are proposing.
As an initial, temporary solution to let page generators build links to images on the web using LSID metadata, I would suggest the use of relevant Dublin Core metadata terms such as title and description, format. Other DC tags could be used as well. These tags are described on the web page below.
http://dublincore.org/documents/dcmi-terms/#H2
The problem is that there are no tags to point to a thumbnail. There is the Image vocabulary type, but that doesn't let a client distinguish between the actual, full size image and a thumbnail (not sure if that's a helpful distinction anyway). To accommodate that, we should create our own term for referring to thumbnails.
Does that sound like a feasible approach to solve this issue? Does the imaging group have any better solution to this problem?
Cheers,
Ricardo
Greg Riccardi wrote:
The LSID Applicability statement includes brief coverage, in Section 10, of citation styles that describe how to format the LSID for inclusion in pages. The statement does not include how to use LSIDs to create appropriate citations in Web pages or other documents that refer to the referenced digital object.
For example, a Web application has an LSID reference to a Morphbank image to be used in generating a Web page. The value of the LSID is of secondary importance to the image itself and its metadata. The page needs to include a version of the image, or a description of the image, that is supported by a URL that presents more detail about the referenced object. A typical presentation is a thumbnail of the image and some descriptive text. Clicking on the image or text directs the user to more detail about the image. The LSID is not directly useful for this application. The page generator will use the LSID to fetch the metadata and then use the metadata to generate the citations. The metadata should include text to be used in the citation and URLs that provide standard views of the object. In the case of images, a URL for a thumbnail image to be used in an image tag and a URL for a standard presentation should be included.
The issue is much more general. The presentation to a user of a reference to a digital object needs to be useful to the user. The LSID has no semantics and so is not useful. The general problem occurs in any digital repository and is a subject of study within the digital library community. The repository needs to provide standard language to be used in making a readable citation to one of its LSIDs.
As a producer of LSIDs, the Morphbank system will provide a suggestion as to the appropriate citation text, URLs that are appropriate for use in image tags, the URL of the standard presentation of the object, and copyright information. The page generation application can rely on finding standard metadata tags for use in its pages.
I advocate that the applicability document include metadata standards for presentation of (at least) the suggested citation text, and the URL for the standard presentation.
Greg
Greg Riccardi Professor of the College of Information riccardi@ci.fsu.edu Florida State University 850-644-2869 Tallahassee, FL 32306-2100 http://www.ci.fsu.edu/riccardi
tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
The problem that nobody will take a position on is this:
Is the metadata on an image file, or on an image?
Even---or especially if---you stick to DC, you have a problem about what things are part of a description. If the metadata is about the file, then it is reasonable to express, e.g. that it has 1200x800 pixels, encoded as jpeg but perhaps not that it is a a picture of a flea biting a dog. If the image is being described, the reverse might hold.
Even if you are content to have folksonomies, i.e. tags, ---which is probably about the best you can hope for in dc:description--- you would find it only of very rare utility to search for "description contains '1200x800'. On the other hand, rendering clients probably desperately need the pixel size and also information about where to find other sizes of the "same" image.
This particular example is a little forced since most digital image formats actually encode the pixel size of the image within the file to aid decoding it for rendering, but the point remains.
I've put this whining in http://wiki.tdwg.org/twiki/bin/view/Image/ImageOrImageFile
...
http://dublincore.org/documents/dcmi-terms/#H2
The problem is that there are no tags to point to a thumbnail. There
is the Image vocabulary type, but that doesn't let a client distinguish between the actual, full size image and a thumbnail (not sure if that's a helpful distinction anyway). To accommodate that, we should create our own term for referring to thumbnails.
Does that sound like a feasible approach to solve this issue? Does
the imaging group have any better solution to this problem?
Cheers,
Ricardo
Greg Riccardi wrote:
The LSID Applicability statement includes brief coverage, in Section 10, of citation styles that describe how to format the LSID for inclusion in pages. The statement does not include how to use LSIDs to create appropriate citations in Web pages or other documents that refer to the referenced digital object.
For example, a Web application has an LSID reference to a Morphbank image to be used in generating a Web page. The value of the LSID is of secondary importance to the image itself and its metadata. The page needs to include a version of the image, or a description of the image, that is supported by a URL that presents more detail about the referenced object. A typical presentation is a thumbnail of the image and some descriptive text. Clicking on the image or text directs the user to more detail about the image. The LSID is not directly useful for this application. The page generator will use the LSID to fetch the metadata and then use the metadata to generate the citations. The metadata should include text to be used in the citation and URLs that provide standard views of the object. In the case of images, a URL for a thumbnail image to be used in an image tag and a URL for a standard presentation should be included.
The issue is much more general. The presentation to a user of a reference to a digital object needs to be useful to the user. The LSID has no semantics and so is not useful. The general problem occurs in any digital repository and is a subject of study within the digital library community. The repository needs to provide standard language to be used in making a readable citation to one of its LSIDs.
As a producer of LSIDs, the Morphbank system will provide a suggestion as to the appropriate citation text, URLs that are appropriate for use in image tags, the URL of the standard presentation of the object, and copyright information. The page generation application can rely on finding standard metadata tags for use in its pages.
I advocate that the applicability document include metadata standards for presentation of (at least) the suggested citation text, and the URL for the standard presentation.
Greg
Greg Riccardi Professor of the College of Information riccardi@ci.fsu.edu Florida State University 850-644-2869 Tallahassee, FL 32306-2100 http://www.ci.fsu.edu/riccardi
tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
The problem that nobody will take a position on is this:
Is the metadata on an image file, or on an image?
This comes back to the more general question:
Does the LSID identify a digital instance of an object (database record, binary file), or does the LSID identify the "abstract" object (specimen, image, etc.) that the digital object serves as surrogate for?
In the context of images, my thinking is this:
The "abstract" object is the set of photons that struck a planar surface inside a camera (over a given period of time) after passing through a series of lenses. From this object, there might be a derivative physical object (e.g., a frame of celluloid film, which might have its own set of properties such as film type, etc.), and there might be derivative digital object (e.g., a binary RAW file, or whatever "most original" set of 1s and 0s extracted from the camera representing a digitally interpreted version of the set of photons that struck the planar surface -- which I henceforth refer to as "RAW" file for simplicity sake -- even though it might actually be a TIFF or a JPEG or a NEF or whatever). Either of these two derivative objects (the celluloid or the "RAW" file) might have derivatives of their own (e.g., dupes, prints, and scans for the celluloid object; crops, color corrections, resizes, other digital image formats for the "RAW" file).
I see a world where each of these things (at least the ones that have value in terms of information presentation) gets its own LSID, and part of the metadata for each primary and subsequent derivative would be a pointer to the LSID that identifies the "top-level" non-physical, non-digital "abstract" image object (i.e., set of photons striking the planar surface).
Once at that stage, the big questions become:
1) What metadata from derived objects gets inherited "upstream" to the "master" abstract LSID;
2) What metadata from the "master" object gets inherited "downstream" to the various derivatives; and
3) Which of the LSIDs from the various derivatives become incorporated into the metadata of the "master" abstract LSID?
Number 3 above is slightly different from number 1, in that number 1 is more about metadata content inheritance, whereas #3 is more about cross-linking among various LSIDs.
One would assume that all LSIDs of derived image-objects would have as part of their metadata pointers to the "master" abstract LSID from which the former were derived.
In this model, it becomes relatively straightforward which LSIDs get which metadata.
The problem, of course, is how to structure the inheritance/flow of metadata from one LSID to another other (i.e., "master" to derived, and vice versa).
Just some random thoughts....
Rich
Please see my comments in line below.
Bob Morris wrote:
The problem that nobody will take a position on is this:
Is the metadata on an image file, or on an image?
I don't want to dismiss this as a simple problem. We've been trying to knock it down for a long time now. However, I keep wondering why can't we just include information from both (image and image file) in the metadata by using different predicates in each case. See an example below.
Even---or especially if---you stick to DC, you have a problem about what things are part of a description. If the metadata is about the file, then it is reasonable to express, e.g. that it has 1200x800 pixels, encoded as jpeg but perhaps not that it is a a picture of a flea biting a dog. If the image is being described, the reverse might hold.
Couldn't we say the following about an image?
rdf:RDF <tdwg:Image rdf:about="urn:lsid:example.com:image:1234"> dc:titlePicture of my dog Scratchy</dc:title> dc:subjectA picture of a flea biting my dog.</dc:subject> dc:descriptionA description of a flea biting my dog. You get the idea, but an image is worth a thousand words...</dc:description> dc:identifierurn:lsid:example.com:image:1234</dc:identifier> dc:formatimage/jpeg</dc:format> tdwg:imageDimensions1200x800</tdwg:ImageDimensions> </tdwg:Image> </rdf:RDF>
Even though I bet the RDF isn't valid, I hope you get the point that each predicate refers to either the file or the image, but not both.
If some of these predicates aren't suitable, we can always use some other vocabularies (EXIF?). If you want to refer to what's in the picture, we can somehow point to our familiar biodiversity information objects: taxon name, observation, specimen, etc.
Is there a case where this can't be done?
... rendering clients probably desperately need the pixel size and also information about where to find other sizes of the "same" image.
That's a different problem. We had agreed that LSIDs can't be used if the number of representations of an image is infinite or just very large. Should we be looking at OpenURL or just Web services (and WSDL)?? But that's a little advanced for our simple discussion thread, isn't it?
So, is this a feasible solution, or is there a class of counter examples that I'm missing completely?
Cheers,
Ricardo
I thoroughly agree with Ricardo - surely this is what any normal user would expect. They would get the technical metadata indicating size, etc. AND information on the significance of the image concerned - potentially referencing ontologies (if for example the image serves as a depiction of a character state), taxon concepts (via LSID), specimen records (via LSID), etc. as well as including a text description of what the image represents.
All of this information should be supported - we need the applicability statements to document how DC and other standards should be used to make this possible.
Thanks,
Donald
Ricardo Scachetti Pereira wrote:
Please see my comments in line below.
Bob Morris wrote:
The problem that nobody will take a position on is this:
Is the metadata on an image file, or on an image?
I don't want to dismiss this as a simple problem. We've been trying to knock it down for a long time now. However, I keep wondering why can't we just include information from both (image and image file) in the metadata by using different predicates in each case. See an example below.
Even---or especially if---you stick to DC, you have a problem about what things are part of a description. If the metadata is about the file, then it is reasonable to express, e.g. that it has 1200x800 pixels, encoded as jpeg but perhaps not that it is a a picture of a flea biting a dog. If the image is being described, the reverse might hold.
Couldn't we say the following about an image?
rdf:RDF <tdwg:Image rdf:about="urn:lsid:example.com:image:1234"> dc:titlePicture of my dog Scratchy</dc:title> dc:subjectA picture of a flea biting my dog.</dc:subject> dc:descriptionA description of a flea biting my dog. You get the idea, but an image is worth a thousand words...</dc:description> dc:identifierurn:lsid:example.com:image:1234</dc:identifier> dc:formatimage/jpeg</dc:format> tdwg:imageDimensions1200x800</tdwg:ImageDimensions> </tdwg:Image> </rdf:RDF>
Even though I bet the RDF isn't valid, I hope you get the point that each predicate refers to either the file or the image, but not both.
If some of these predicates aren't suitable, we can always use some other vocabularies (EXIF?). If you want to refer to what's in the picture, we can somehow point to our familiar biodiversity information objects: taxon name, observation, specimen, etc.
Is there a case where this can't be done?
... rendering clients probably desperately need the pixel size and also information about where to find other sizes of the "same" image.
That's a different problem. We had agreed that LSIDs can't be used if the number of representations of an image is infinite or just very large. Should we be looking at OpenURL or just Web services (and WSDL)?? But that's a little advanced for our simple discussion thread, isn't it?
So, is this a feasible solution, or is there a class of counter examples that I'm missing completely?
Cheers,
Ricardo
tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
and not forgetting that illustrations could be types of names, although the digital representation of the 'ink on paper' type has no status under the Code ... even though it might be more useful and more accessible ... ;-)
Paul
-----Original Message----- From: tdwg-guid-bounces@lists.tdwg.org [mailto:tdwg-guid-bounces@lists.tdwg.org] On Behalf Of Donald Hobern Sent: 07 November 2007 09:40 To: Ricardo Scachetti Pereira Cc: tdwg-guid@lists.tdwg.org Subject: Re: [tdwg-guid] Need for citation information in GUID metadata
I thoroughly agree with Ricardo - surely this is what any normal user would expect. They would get the technical metadata indicating size, etc. AND information on the significance of the image concerned - potentially referencing ontologies (if for example the image serves as a depiction of a character state), taxon concepts (via LSID), specimen records (via LSID), etc. as well as including a text description of what the image represents.
All of this information should be supported - we need the applicability statements to document how DC and other standards should be used to make this possible.
Thanks,
Donald
Ricardo Scachetti Pereira wrote:
Please see my comments in line below.
Bob Morris wrote:
The problem that nobody will take a position on is this:
Is the metadata on an image file, or on an image?
I don't want to dismiss this as a simple problem. We've been trying to
knock it down for a long time now. However, I keep wondering why can't
we just include information from both (image and image file) in the metadata by using different predicates in each case. See an example below.
Even---or especially if---you stick to DC, you have a problem about what things are part of a description. If the metadata is about the file, then it is reasonable to express, e.g. that it has 1200x800 pixels, encoded as jpeg but perhaps not that it is a a picture of a flea biting a dog. If the image is being described, the reverse might hold.
Couldn't we say the following about an image?
rdf:RDF <tdwg:Image rdf:about="urn:lsid:example.com:image:1234"> dc:titlePicture of my dog Scratchy</dc:title> dc:subjectA picture of a flea biting my dog.</dc:subject> dc:descriptionA description of a flea biting my dog. You get the idea, but an image is worth a thousand words...</dc:description> dc:identifierurn:lsid:example.com:image:1234</dc:identifier> dc:formatimage/jpeg</dc:format> tdwg:imageDimensions1200x800</tdwg:ImageDimensions> </tdwg:Image> </rdf:RDF>
Even though I bet the RDF isn't valid, I hope you get the point that each predicate refers to either the file or the image, but not both.
If some of these predicates aren't suitable, we can always use some other vocabularies (EXIF?). If you want to refer to what's in the picture, we can somehow point to our familiar biodiversity information objects: taxon name, observation, specimen, etc.
Is there a case where this can't be done?
... rendering clients probably desperately need the pixel size and also information about where to find other sizes of the "same" image.
That's a different problem. We had agreed that LSIDs can't be used if the number of representations of an image is infinite or just very large. Should we be looking at OpenURL or just Web services (and WSDL)?? But that's a little advanced for our simple discussion thread,
isn't it?
So, is this a feasible solution, or is there a class of counter examples that I'm missing completely?
Cheers,
Ricardo
tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
-- ------------------------------------------------------------ Donald Hobern (dhobern@gbif.org) Deputy Director for Informatics Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480 ------------------------------------------------------------
_______________________________________________ tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid ************************************************************************ The information contained in this e-mail and any files transmitted with it is confidential and is for the exclusive use of the intended recipient. If you are not the intended recipient please note that any distribution, copying or use of this communication or the information in it is prohibited.
Whilst CAB International trading as CABI takes steps to prevent the transmission of viruses via e-mail, we cannot guarantee that any e-mail or attachment is free from computer viruses and you are strongly advised to undertake your own anti-virus precautions.
If you have received this communication in error, please notify us by e-mail at cabi@cabi.org or by telephone on +44 (0)1491 829199 and then delete the e-mail and any copies of it.
CABI is an International Organization recognised by the UK Government under Statutory Instrument 1982 No. 1071.
**************************************************************************
My thought is that the LSID should apply to the image (not the file) If you want to say something about the file that is separate from the image (it's size) you can either refer to it by a plain old URL type URI - because it could be just a dynamically generated thing - or issue an LSID for the file itself - if you want to archive it and it would be useful.
The LSID should be a hook on which services (different renderings) are hung.
I actually created a class called DigitalImage
http://rs.tdwg.org/ontology/voc/DigitalImage
in the vocabulary. I don't believe it is used by anyone and it (or me) should be taken out and shot as it clearly goes against what I say above by including 'Digital'. The class should be able to represent an image for which we don't have a digital representation at all.
Roger
On 7 Nov 2007, at 09:39, Donald Hobern wrote:
I thoroughly agree with Ricardo - surely this is what any normal user would expect. They would get the technical metadata indicating size, etc. AND information on the significance of the image concerned - potentially referencing ontologies (if for example the image serves as a depiction of a character state), taxon concepts (via LSID), specimen records (via LSID), etc. as well as including a text description of what the image represents.
All of this information should be supported - we need the applicability statements to document how DC and other standards should be used to make this possible.
Thanks,
Donald
Ricardo Scachetti Pereira wrote:
Please see my comments in line below.
Bob Morris wrote:
The problem that nobody will take a position on is this:
Is the metadata on an image file, or on an image?
I don't want to dismiss this as a simple problem. We've been trying to knock it down for a long time now. However, I keep wondering why can't we just include information from both (image and image file) in the metadata by using different predicates in each case. See an example below.
Even---or especially if---you stick to DC, you have a problem about what things are part of a description. If the metadata is about the file, then it is reasonable to express, e.g. that it has 1200x800 pixels, encoded as jpeg but perhaps not that it is a a picture of a flea biting a dog. If the image is being described, the reverse might hold.
Couldn't we say the following about an image?
rdf:RDF <tdwg:Image rdf:about="urn:lsid:example.com:image:1234"> dc:titlePicture of my dog Scratchy</dc:title> dc:subjectA picture of a flea biting my dog.</dc:subject> dc:descriptionA description of a flea biting my dog. You get the idea, but an image is worth a thousand words...</ dc:description> dc:identifierurn:lsid:example.com:image:1234</dc:identifier> dc:formatimage/jpeg</dc:format> tdwg:imageDimensions1200x800</tdwg:ImageDimensions> </tdwg:Image> </rdf:RDF>
Even though I bet the RDF isn't valid, I hope you get the point that each predicate refers to either the file or the image, but not both.
If some of these predicates aren't suitable, we can always use some other vocabularies (EXIF?). If you want to refer to what's in the picture, we can somehow point to our familiar biodiversity information objects: taxon name, observation, specimen, etc.
Is there a case where this can't be done?
... rendering clients probably desperately need the pixel size and also information about where to find other sizes of the "same" image.
That's a different problem. We had agreed that LSIDs can't be used if the number of representations of an image is infinite or just very large. Should we be looking at OpenURL or just Web services (and WSDL)?? But that's a little advanced for our simple discussion thread, isn't it?
So, is this a feasible solution, or is there a class of counter examples that I'm missing completely?
Cheers,
Ricardo
tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
--
Donald Hobern (dhobern@gbif.org) Deputy Director for Informatics Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480
tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
That is basically what Rich Pyle and I argue for at http://wiki.tdwg.org/twiki/bin/view/Image/ImageOrImageFile, derived and expanded from his posting here.
There I ask rhetorically whether you have established any semantics for the relationship between the image and files that are asserted to represent it. If not, I am willing to have a go at proposing such. As others have observed, this is not only about media files but about anything that needs to support discovery, identity, and discourse about "abstract" things and digital representations of them. It is just that it is easy and probably uncontroversial to identify and address some common issues for currently encoded media. (But even in that case there are some interesting issues to consider. What exactly is a plastic skull of a T. rex rendered from a mold produced by a 3D printer from an XRay CAT scan of a T. rex skull dug up from the ground? Is it like an image file? Is the mold like an image file? Is either of them like the "ink on paper" type that Paul wishes(?) were codified? )
[Ironically, in this context, by "abstract" we almost always mean "concrete" in the conversational sense, whereas the digital representations are pretty abstract to most people. ]
Bob
On Nov 7, 2007 10:16 AM, Roger Hyam roger@tdwg.org wrote:
My thought is that the LSID should apply to the image (not the file) If you want to say something about the file that is separate from the image (it's size) you can either refer to it by a plain old URL type URI - because it could be just a dynamically generated thing - or issue an LSID for the file itself - if you want to archive it and it would be useful.
The LSID should be a hook on which services (different renderings) are hung.
I actually created a class called DigitalImage
http://rs.tdwg.org/ontology/voc/DigitalImage
in the vocabulary. I don't believe it is used by anyone and it (or me) should be taken out and shot as it clearly goes against what I say above by including 'Digital'. The class should be able to represent an image for which we don't have a digital representation at all.
Roger
On 7 Nov 2007, at 09:39, Donald Hobern wrote:
I thoroughly agree with Ricardo - surely this is what any normal user would expect. They would get the technical metadata indicating size, etc. AND information on the significance of the image concerned - potentially referencing ontologies (if for example the image serves as a depiction of a character state), taxon concepts (via LSID), specimen records (via LSID), etc. as well as including a text description of what the image represents.
All of this information should be supported - we need the applicability statements to document how DC and other standards should be used to make this possible.
Thanks,
Donald
Ricardo Scachetti Pereira wrote:
Please see my comments in line below.
Bob Morris wrote:
The problem that nobody will take a position on is this:
Is the metadata on an image file, or on an image?
I don't want to dismiss this as a simple problem. We've been trying to knock it down for a long time now. However, I keep wondering why can't we just include information from both (image and image file) in the metadata by using different predicates in each case. See an example below.
Even---or especially if---you stick to DC, you have a problem about what things are part of a description. If the metadata is about the file, then it is reasonable to express, e.g. that it has 1200x800 pixels, encoded as jpeg but perhaps not that it is a a picture of a flea biting a dog. If the image is being described, the reverse might hold.
Couldn't we say the following about an image?
rdf:RDF <tdwg:Image rdf:about="urn:lsid:example.com:image:1234"> dc:titlePicture of my dog Scratchy</dc:title> dc:subjectA picture of a flea biting my dog.</dc:subject> dc:descriptionA description of a flea biting my dog. You get the idea, but an image is worth a thousand words...</ dc:description> dc:identifierurn:lsid:example.com:image:1234</dc:identifier> dc:formatimage/jpeg</dc:format> tdwg:imageDimensions1200x800</tdwg:ImageDimensions> </tdwg:Image> </rdf:RDF>
Even though I bet the RDF isn't valid, I hope you get the point that each predicate refers to either the file or the image, but not both.
If some of these predicates aren't suitable, we can always use some other vocabularies (EXIF?). If you want to refer to what's in the picture, we can somehow point to our familiar biodiversity information objects: taxon name, observation, specimen, etc.
Is there a case where this can't be done?
... rendering clients probably desperately need the pixel size and also information about where to find other sizes of the "same" image.
That's a different problem. We had agreed that LSIDs can't be used if the number of representations of an image is infinite or just very large. Should we be looking at OpenURL or just Web services (and WSDL)?? But that's a little advanced for our simple discussion thread, isn't it?
So, is this a feasible solution, or is there a class of counter examples that I'm missing completely?
Cheers,
Ricardo
tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
--
Donald Hobern (dhobern@gbif.org) Deputy Director for Informatics Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480
tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
I tend to think of the problem in terms of resources that have different quality levels. Such a model is implemented e.g. in http://www.diversityworkbench.net/Portal/wiki/ResourcesModel_v1.3. We distinguish between resources that are derived from other resources with the intent of creating new resources (e.g. adding labeling or cropping the image or sound such that only part of the previous content remains) and conversions that can in principle be automated. The latter we model as quality levels (compression levels of audio or images, different resolutions of images, etc.)
We treat the analog original (a tape recording, photo print, etc.) as one of the quality levels.
Clearly there is a gray area between quality levels and explicitly derived works. However, expressing the intention behind a step (introducing semantics for it, as the model does) I believe to be very useful. This may be what Bob has in mind.
We distinguish identical copies for files by providing a similar mechanism for providers. Identical copies can be at different providers. Backup copies are handled through backup providers. Thus the model has AbstractResource (which may be derived from another abstract resource) and ResourceInstances.
The maximum number of resource instances is the product of AbstractResources x Providers x QualityLevels (there can be fewer).
Gregor
Disclaimer: I don't fully understand all of the issues involved here, as I've only been looking at the biology standards for a few months. I may be misinterpreting some of the points being made. However, I have a good understanding of related standards in the library world, so I hope my comments may be of use.
In my opinion, if you try to put too many external semantics on DC data, you're going to run into many problems in the future, when you interact with groups that have "regular" DC data. It is possible to solve these issues with explicit metadata relationships using existing metadata standards. Here is the metadata for an image object in a repository I built recently:
http://fedora.dlib.indiana.edu:8080/fedora/get/iudl:20008/METADATA
At first glance it looks long and complex, but it's relatively easy to pick apart. The outer layer of metadata follows the METS schema, which is a wrapper format for collecting together different types of metadata.
The first inner object is a MODS record. MODS holds essentially the same type of information as DC, but it allows for more detailed descriptions. This record always describes the artifact in the image, not the image itself. And a big advantage of MODS is that it allows specification of the thumbnail URL that Greg was originally asking about (in the <mods:url access="preview">). Note: It is possible to include a DC representation as well as a MODS representation with a single METS document.
After the MODS record are MIX records containing detailed technical information about each of the image files.
Finally, the mets:fileSec and mets:structMap sections specify relationships between the metadata sections and the actual files. In this case, the hrefs are relative URLs, but they could easily be full URLs or LSIDs.
Now, I'm not advocating that you dump RDF in favor of METS. My main point is that explicitly separating the different types of metadata may be useful. If you would like more information about the specifics, let me know.
--- Ryan Scherle --- Digital Data Repository Architect --- NESCent
On Nov 6, 2007, at 8:48 PM, Ricardo Scachetti Pereira wrote:
Please see my comments in line below.
Bob Morris wrote:
The problem that nobody will take a position on is this:
Is the metadata on an image file, or on an image?
I don't want to dismiss this as a simple problem. We've been trying to knock it down for a long time now. However, I keep wondering why can't we just include information from both (image and image file) in the metadata by using different predicates in each case. See an example below.
Even---or especially if---you stick to DC, you have a problem about what things are part of a description. If the metadata is about the file, then it is reasonable to express, e.g. that it has 1200x800 pixels, encoded as jpeg but perhaps not that it is a a picture of a flea biting a dog. If the image is being described, the reverse might hold.
Couldn't we say the following about an image?
rdf:RDF <tdwg:Image rdf:about="urn:lsid:example.com:image:1234"> dc:titlePicture of my dog Scratchy</dc:title> dc:subjectA picture of a flea biting my dog.</dc:subject> dc:descriptionA description of a flea biting my dog. You get the idea, but an image is worth a thousand words...</ dc:description> dc:identifierurn:lsid:example.com:image:1234</dc:identifier> dc:formatimage/jpeg</dc:format> tdwg:imageDimensions1200x800</tdwg:ImageDimensions> </tdwg:Image> </rdf:RDF>
Even though I bet the RDF isn't valid, I hope you get the point that each predicate refers to either the file or the image, but not both.
If some of these predicates aren't suitable, we can always use some other vocabularies (EXIF?). If you want to refer to what's in the picture, we can somehow point to our familiar biodiversity information objects: taxon name, observation, specimen, etc.
Is there a case where this can't be done?
... rendering clients probably desperately need the pixel size and also information about where to find other sizes of the "same" image.
That's a different problem. We had agreed that LSIDs can't be used if the number of representations of an image is infinite or just very large. Should we be looking at OpenURL or just Web services (and WSDL)?? But that's a little advanced for our simple discussion thread, isn't it?
So, is this a feasible solution, or is there a class of counter examples that I'm missing completely?
Cheers,
Ricardo
tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
Ryan,
My get out is that I don't know much about METS/MODS but I'll try and express why we are not *just* picking them up or any other XML based format. I hope this doesn't come across as a flame - I am just running over old arguments that I probably have said too often. I appreciate you didn't suggest we use METS but I think it needs justifying again.
We could use METS for digital objects, embed MODS for bibliographic stuff and make up our own schemas for each of our domains (entomology, botany, molecular phylogentics, functional ecology, you name it) and we would have integration of data at the application level but not at the semantic level. Effectively each of our domains would have its own XML silo and mixing stuff together would be a complete pain. The attraction of RDF is that it allows the mixing of concepts across domains so we only define things once at a very fine level and can be explicit about what we "mean".
I'll see if I can illustrate this in a naive way by picking one element from the example you give:
<mods:identifier displayLabel="Acquisition number" type="local">27309</mods:identifier>
Do different displayLabel attribute values effect the meaning (i.e. where I put it in my database or calculation) of the value in the element or does the value in the element only mean "mods:identifier" no matter what is in the attribute? So if I put displayLabel="National Insurance Number" or t displayLabel="Barcode" my application may do something different with 27309. How do we do multiple languages for the displayLabel?
The QNAME for mods:identifier from the document would be
http://www.loc.gov/mods/v3identifier
which doesn't resolve. There would normally be a slash or hash on the end of the namespace so that we would get
http://www.loc.gov/mods/v3/ http://www.loc.gov/mods/v3/identifier
but neither of these resolve to anything useful either.
All this may be in MODS documentation but only humans read documentation and then only rarely! Each time we come across a new XML standard some poor human has to go off and read all the PDFs involved before we can get started.
In a sematic web type world all the elements should resolve to their definitions and at that point we can define things like the relationship of this concept to other things and some display labels in different languages etc etc. There is an outside chance that a machine could do something "meaningful" with the information.
Really all XML bought the world is the ability to parse transfer files easily. In the old days when things were space delimited one would have to write a parser to get he documents into memory. Now we can use a generic parser to get them into memory. But XML does not tell us what to do once it is in memory. XML is just a serialization. All the interesting problems are in what is serialized. This is why we lean to RDF/OWL.
I hope this enlightens without putting you off. I expect/hope Bob will have a correction somewhere in what I have said :)
All the best,
Roger
On 8 Nov 2007, at 18:27, Ryan Scherle wrote:
Disclaimer: I don't fully understand all of the issues involved here, as I've only been looking at the biology standards for a few months. I may be misinterpreting some of the points being made. However, I have a good understanding of related standards in the library world, so I hope my comments may be of use.
In my opinion, if you try to put too many external semantics on DC data, you're going to run into many problems in the future, when you interact with groups that have "regular" DC data. It is possible to solve these issues with explicit metadata relationships using existing metadata standards. Here is the metadata for an image object in a repository I built recently:
http://fedora.dlib.indiana.edu:8080/fedora/get/iudl:20008/METADATA
At first glance it looks long and complex, but it's relatively easy to pick apart. The outer layer of metadata follows the METS schema, which is a wrapper format for collecting together different types of metadata.
The first inner object is a MODS record. MODS holds essentially the same type of information as DC, but it allows for more detailed descriptions. This record always describes the artifact in the image, not the image itself. And a big advantage of MODS is that it allows specification of the thumbnail URL that Greg was originally asking about (in the <mods:url access="preview">). Note: It is possible to include a DC representation as well as a MODS representation with a single METS document.
After the MODS record are MIX records containing detailed technical information about each of the image files.
Finally, the mets:fileSec and mets:structMap sections specify relationships between the metadata sections and the actual files. In this case, the hrefs are relative URLs, but they could easily be full URLs or LSIDs.
Now, I'm not advocating that you dump RDF in favor of METS. My main point is that explicitly separating the different types of metadata may be useful. If you would like more information about the specifics, let me know.
--- Ryan Scherle --- Digital Data Repository Architect --- NESCent
On Nov 6, 2007, at 8:48 PM, Ricardo Scachetti Pereira wrote:
Please see my comments in line below.
Bob Morris wrote:
The problem that nobody will take a position on is this:
Is the metadata on an image file, or on an image?
I don't want to dismiss this as a simple problem. We've been trying to knock it down for a long time now. However, I keep wondering why can't we just include information from both (image and image file) in the metadata by using different predicates in each case. See an example below.
Even---or especially if---you stick to DC, you have a problem about what things are part of a description. If the metadata is about the file, then it is reasonable to express, e.g. that it has 1200x800 pixels, encoded as jpeg but perhaps not that it is a a picture of a flea biting a dog. If the image is being described, the reverse might hold.
Couldn't we say the following about an image?
rdf:RDF <tdwg:Image rdf:about="urn:lsid:example.com:image:1234"> dc:titlePicture of my dog Scratchy</dc:title> dc:subjectA picture of a flea biting my dog.</dc:subject> dc:descriptionA description of a flea biting my dog. You get the idea, but an image is worth a thousand words...</ dc:description> dc:identifierurn:lsid:example.com:image:1234</dc:identifier> dc:formatimage/jpeg</dc:format> tdwg:imageDimensions1200x800</tdwg:ImageDimensions> </tdwg:Image> </rdf:RDF>
Even though I bet the RDF isn't valid, I hope you get the point that each predicate refers to either the file or the image, but not both.
If some of these predicates aren't suitable, we can always use some other vocabularies (EXIF?). If you want to refer to what's in the picture, we can somehow point to our familiar biodiversity information objects: taxon name, observation, specimen, etc.
Is there a case where this can't be done?
... rendering clients probably desperately need the pixel size and also information about where to find other sizes of the "same" image.
That's a different problem. We had agreed that LSIDs can't be used if the number of representations of an image is infinite or just very large. Should we be looking at OpenURL or just Web services (and WSDL)?? But that's a little advanced for our simple discussion thread, isn't it?
So, is this a feasible solution, or is there a class of counter examples that I'm missing completely?
Cheers,
Ricardo
tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
participants (9)
-
Bob Morris
-
Donald Hobern
-
Greg Riccardi
-
Gregor Hagedorn
-
Paul Kirk
-
Ricardo Scachetti Pereira
-
Richard Pyle
-
Roger Hyam
-
Ryan Scherle