lib-ir Archive
Date: Fri Jun 17 06:00:32 2005
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lib-ir: Bitstream vs content preservation



I think we agree essentially, perhaps using different terminology
to express the same concepts.

Carol

At 04:08 PM 6/15/2005, you wrote:
I agree with Carol that preserving intellectual access is a vital goal.  But
I'm a pragmatist, and don't think we know how to preserve meaning so we
should be putting most of our limited resources into preserving the
bitstreams and metadata.  If we're doing a good job of preserving the
bitstreams -- and the technical metadata needed to understand what they are
-- then our successors will always have a better chance of taking the steps
needed to make the works more accessible to their patrons.  And if we're
doing a good job of guiding the original content creator with best practices
then we'll minimize the amount of effort librarians need to put into
preservation a generation from now.

The analog is traditional library materials in natural languages.  Many
libraries have collections of works in classical Greek and Latin, and don't
feel that they must necessarily translate these works into the vernacular.
They realize that the translation will always be imperfect, and so it is
important to retain a copy of the original.  Indeed, they realize that the
original artifact is itself important to some scholars who won't be
satisfied with even a visually identical copy.

For some sorts of digital work it's clear that we can potentially do
translations (e.g. format conversions) that preserve almost all of the
content that future scholars are going to care about.  If I submit a Word
document that contains a preprint I've written, what I really care about is
the text and some of the surface formatting.  I probably don't even care
about most of the formatting -- choice of fonts, line and page breaks, etc.
I definitely don't think my readers need to see the Word document properties
[though someone trying to track a plagiarism claim might find it interesting
that my Word document had an "Author" property set to "John Smith", and my
biographer might be interested in the fact that I had done my editing on a
Mac.].  So if we know how to do it I have no objection in such cases to
doing forward migration or emulation, perhaps to the new MS Word XML
document format.  I do think that in those cases it's desirable/essential
also to retain the original bitstream as submitted, and to treat any
migrated copy as a surrogate.

The problem is much more acute for multimedia and dynamic works, where the
medium really is often the message.  In such cases I really don't think we
know which attributes of a work need to be conserved.

As a practical matter, I think we have a chance to do "supported" for plain
text and a moderately large subset (something comparable to PDF 1.2) of PDF.
Maybe large subsets of some 2d image formats such as JPEG and GIF.  Not
HTML.  It's too early to tell for most XML formats.  Certainly not the many
specialized formats that we may someday get and that I think we want to
encourage our faculty to contribute.  And certainly the documentation in
"supporting" a format needs to be substantial and procedural, not just an
"yes" in a table of supported formats.  I think MIT is being unrealistic in
claiming to support without qualification the wide range of formats they do,
and that they may be doing a disservice to their authors by claiming more
than they can deliver.  I'd rather deliver more than we claim.

Anyway, I don't think Carol and I really disagree much, except maybe in
emphasis.