lib-ir Archive
Date: Mon Aug 18 15:16:19 2003
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
lib-ir: some IR notes
We obtained permission from Economics to post copies of their RePEc
items on our repository. Andrew and Carol have been working hard
getting the materials transferred. See community "Department of
Economics", collection "Economics Working Papers."
Jennifer Freyd agreed to post (or have me post) copies of a set of her
reprints and working papers that she maintains on her website. See
collection "Psychology Working Papers."
In posting the Freyd materials, I've learned quite a bit about what
works and doesn't in dspace. One overall observation is that there's a
conceptual problem connected with archives -- the classic one of not
maintaining the context of the original. It's worse in dspace than in a
museum, though. In a museum it may be important to know that the
painting you see was designed to be displayed on a particular wall, but
at least that can be stated in the notes. For an article posted on
dspace, much of the intellectual content may be in the form of
hyperlinks to related articles, but relative hyperlinks are not
preserved when an item is moved between servers. For HTML documents,
much of the intellectual content may be in related images, referenced
from the main document using relative URLs (e.g. "<img
src="fig1.gif">").
Concrete example: if I post an HTML document on ir.uoregon.edu, the
process will probably result in a document where hyperlinks to other
items don't work and where images don't display. There are workarounds,
but they are ugly. The problem is a bit less acute if one publishes
PDF, but even there you can have the same problem since PDF files can
contain hyperlinks (they just aren't very common). The problem is most
acute with large compound documents, for instance a set of HTML files
representing chapters of a thesis or the slides in a lecture, or a big
multimedia project that may consist of thousands of separate
interrelated files.
My temporary solution: for PDFs, ignore the problem. For large
compound documents create a .zip file and archive that. For simple HTML
documents, visit the main .html file in your web browser, and use "Save
As...Web Page Complete". This creates a file on your hard disk, say
mydoc.htm, plus a directory containing support files, say mydoc_files.
It also makes many changes to the mydoc.htm file, so the file is
internally quite different from the original file on the web server; one
of these changes, normally bad but in this context desirable, is to
rewrite most relative links as absolute. Now create a new item in
dspace, and attach to it several bitstreams, including the mydoc.htm
(labelled "main document") AND all of the files in the mydoc_files
directory. Images in the main document still won't display correctly,
but at least all of the content will be there.
JQ Johnson Office: 115F Knight Library
Academic Education Coordinator e-mail: jqj@darkwing.uoregon.edu
1299 University of Oregon 1-541-346-1746 (v); -3485 (fax)
Eugene, OR 97403-1299 http://darkwing.uoregon.edu/~jqj