Since the new version of DSpace (up in Irtest thanks to Corey's work) includes a subject browse and some form of authority control, this study seems particularly timely.
Up to this point, we haven't had any easy way of checking the keywords (mapped to Dublin Core "subject") that are assigned to items in Scholars' Bank. And, since the model of the archive was originally for author self-submission, I haven't worried about it. Coupled with the fact that textual documents are full-text searchable (in most cases when the Media Filter software works as expected), I have more and more often opted not to supply keywords to items in the IR. And, when we have supplied keywords, the source vocabularies have varied - if there was any source vocabulary at all.
Read the following and see what you think. And if you want to take a look at the Subject Browse in test, go to: https://irtest.uoregon.edu/dspace/
Carol
Date: Thu, 9 Mar 2006 00:37:44 +0000 From: Leslie Carr <lac@ECS.SOTON.AC.UK> Subject: Use of Navigational Tools in a Repository To: JISC-REPOSITORIES@JISCMAIL.AC.UK
A recent discussion between some colleagues on the utility (or otherwise) of subject classification in repositories prompted me to undertake a brief investigation whose results I present here. (I'll also send this to AMSCI, so apologies for any duplicate copies that you see.) The discussion has broadly been between computer scientists and librarians over whether subject classification schemes offer advantages over Google-style text retrieval; the study below looks at the evidence as demonstrated in the usage of one particular repository. As such it doesn't address the intrinsic value of classification, but it does offer some insight into the effectiveness of navigational tools (including subject classification) in the context of a repository.
---------------- The University of Southampton Institutional Repository has been in operation for a number of years and an official (rather than experimental or pilot) part of its infrastructure for just over a year. As part of its capabilities, it includes lists of most recently deposited material, various kinds of searches, a subject tree based on the upper levels of the Library of Congress Classification scheme and an organisational tree listing the various Faculties, Schools and Research Groups in the University and a list of articles broken down by year of publication. These all provide what we hope are useful facilities for helping researchers find papers (ie by time, subject, affiliation or content).
Over a period of some 29.5 hours from 0400 GMT on March 7th 2006, 1978 "abstract" pages (ie eprints records) were downloaded from the repository (ignoring all crawlers, bots and spiders).
Of the 1978 downloaded pages, the following URL sources (referrers, in web log speak) were responsible: 439 - (direct URL, perhaps cut and paste into a browser or clicked on from an email client) 225 EPRINTS SOTON pages 25 OTHER SOTON WEB pages 1264 EXTERNAL SEARCH ENGINES 21 EXTERNAL WEB PAGES
ie the local repository facilities, including subject views and searches, led to only 225/1978 = 11% of all downloads.
From that we can tell that the repository navigation and search facilities affect little of the ultimate repository usage. (This may be a depressing message for a repository administrator such as myself, because it highlights how little control I have over my repository's users either to help or manipulate them!)
Of the 225 local repository links, the following breakdown applies: 13 Latest Deposits page 103 Searches (both simple and advanced) 57 Browse by Schools and Groups Hierarchy 17 Browse by Subjects Hierarchy 0 Browse by Year of Publication 33 Directly linked from other abstracts (or reloads). 12 Misc infrastructure
ie 11% of the downloaded records are accounted for by use of the local repository. 8% of that usage is caused by the subjects tree (ie 0.86% of all eprint downloads are caused by the subject tree). For what it's worth, a breakdown of papers by school and research group is three times more popular than the subjects list, but it is still only involved in 3% of the downloads. Local search accounts for 5%, but it still isn't very significant! The result is even more gloomy for the breakdown by "Year of Publication", which didn't lead to any eprint downloads whatsoever!
The majority of repository use, if I can equate eprint downloads with repository use, is due to external web search engines (64%).
This may be due to the fact that of the 1978 downloads, only 131 (or 7%) came from Southampton University IP addresses. In other words, behaviour of external traffic dominates the repository usage.
If you look only at the local users from the above data (the downloads that came from Southampton IP addresses), then the breakdown is as follows. 39 (direct URL, perhaps cut and paste into a browser or clicked on from an email client) 1 Directly linked from other abstracts (or reloads) 10 Latest Deposits page 71 Local Repository Searches 1 Browse by Schools and Groups Hierarchy 10 External Search Engines
These numbers are quite low and really need a longer period to be confident, but it appears that local repository searches are much more popular than external search engines for local users. But the browse by year/subject/school are all largely ignored.
Taking a diifferent approach and looking at all of the page requests for the repository that were coming from the University of Southampton users (not just eprint downloads but the home page and all search requests and browsing pages but ignoring icons, stylesheets and javascript), in the same period there were 1025 requests coming from 52 uniquely identifiable users. 72 Home Page 52 Latest Deposits 122 Search 2 List of Browse Choices 25 Browse by Group 6 Browse by Subjects 2 Browse by Year 132 Download Eprint Records (abstracts page) 26 Download EPrints Files (full texts) 544 User Login, Deposit and Admin 14 OAI-PMH
Once again we can see that local search overwhelms the use of local browse categories (whether by subject, group or year).
Conclusions ========== External users dominate repository usage. External search engines (including OAI search engines) are the primary mechanism for finding papers. Local users show a somewhat greater tendency to use local search facilities. Neither external nor local users appear much influenced by subject listings or other browse categories.
This study seems fairly conclusive but its results may not be typical. Further study is being undertaken to compare these results with other types of repository and to determine the repository features (if indeed there are any) that can best help readers in the task of finding relevant material (resource discovery). --- Les Carr
Attachment:
pastedGraphic.tif
Description: TIFF image
Attachment:
pastedGraphic1.tif
Description: TIFF image