Friday, November 14, 2014

Week 11 Reading: Blast from the Past

The reading for this week took us on a grand tour of last decade’s thinking about how radically changing technology influences scholarly education, as well as a short explanation of how search engines work (hint: it’s not magic and/or wizards, it just seems that way).
We’ll begin with the Paepcke, Garcia-Molina, and Wesley piece, cleverly titled Dewey Meets Turing.  They sketch a brief history of the uneasy relationship between librarians and computer scientists in developing what we know all take for granted: the digital library.  Apparently the librarians were frustrated that the computer scientists weren't as thorough in organizing the digitized content (what about preservation, after all?!), and the computer scientists saw the librarians as stodgy traditionalists who slowed down development with their endless, boring organization.  While this low-level eye-rolling was happening, the World Wide Web blew the roof off of everyone’s plans.  Instead of crafting beautiful but closed digital systems for libraries, everyone quickly realized that the public nature of the Web was the future, and the future was messy.  At the time this article was written (2003), Open Access wasn't as prominent an idea as it is today, and it addresses the concerns raised by this article.  In fact, I imagine it was concerns like this (in a kind of high-pitched “what do we do what do we do what do we do?!” kind of mentality) that drove the growth of OA technologies and mindsets.  My favorite point from this article is that change comes very slowly in the LIS field, driven by librarians “spending years arguing over structures”.  Get it together, everyone, or the train will leave without us.
Still more than a decade in the past, though more progressive in their thinking, ARL laid out an argument for digital repositories that has come mostly to fruition here in the second decade of the 21st century.   Institutional repositories are necessary for long-term digital preservation of scholarly material.  The digital migration is a healthy and empowering movement, but preservation at the institutional level is necessary for knowledge maintenance.  Moreover, building a strong institutional repository can reflect strongly on the institution’s prestige; it’s something to be proud of.  This paper presages the development of green Open Access.  That is, a digital repository at the institutional level that collects, organizes, preserves, and distributes scholarly material that goes beyond just an article accepted and published in a journal.  Instead, it allows access to a greater body of work, such as data sets, algorithms, theses and dissertations, and other knowledge objects outside the traditional purview of peer-review, organized in such a way as to enable new forms of discovery and connection in a networked environment.  The article warns against absolutely requiring scholars to self-archive their material, although this seems to be a painless and productive practice where it is happens today.  “Open Access is gonna be great, you guys!” seems to be the theme of the article.
Moving on to the Hawking article about the structure of web engines.  He describes the mechanisms of web crawlers (“bots” designed to index web content.  Like…all of it. Or most of it- tip of the hat to the black and white hats): be fast, be polite, only look at what the queue tells you to, and avoid multiple copies of the same material at different URLs, never stop, and stay strong against spam.  Algorithms index this content and makes it all searchable, no mean feat, as the amount of information on the available Web is mind-bendingly huge.  Indexing algorithms create cross-searchable tables based on searchable descriptors, and then rank them in respect to popularity (how many times a thing’s been clicked).  Really slick algorithms that seem to infer meaning (done through skipping, early termination, “clever assignment of document numbers”, and caching) get famous, like Google’s.  It’s fast, simple, and flexible.

The final article was about the Open Access Initiative Protocol for Metadata Harvesting, a protocol that is much touted in the other articles, as well.  It allows for interdisciplinary, interoperable searching of diverse types of content that find themselves suddenly close together in a digital world.  Though there existed previously a wide variety of organizational systems across disciplines, through the exacting use of XML, DublinCore, and other useful metadata structures, digital scholarly content.  OAI protocol gives different institutional repositories a way to communicate with one another to create larger collections freely accessible to anyone with an internet connection.  In addition, as is so important with regards to Open Access, metadata must be in place to track the provenance of a knowledge object.    Knowledge for the people.  Right on, OAI Protocol for Metadata Harvesting, right on.  Of course, this article came form 2005, a simpler time.  As we approach XML and metadata schemes in this course, it seems to me that these protocols don’t simplify anything but instead they manage to keep things organized until they change. Again.  Which isn't a bad thing, of course, and is in fact baseline necessary.  The tone in 2005, however, seems to be that of simplification.  Moving toward a controlled and universal vocabulary for organizing and providing Open Access is more of a pipe dream; the best we can manage so far is pointing toward a language, and then using it.  We've come a long way since 2005, but still no wizards.  Dang it.

No comments:

Post a Comment