Friday, November 21, 2014

Week 11 Muddiest Point

Okay, two things, both reading-related.

I'm living on the wild side here, so things may go horribly wrong.  That is, more grades were posted on BlackBoard recently, at it appears that I've fulfilled 20/20 reading blog points.  Is that just a minimum, or is that the completed portion of the course?  While I will certainly continue to do the reading, is it all right not to post a blog about it?  This is crunch time, and I would greatly value the extra time to put toward other projects.  Hence why, though I completed the readings, I didn't post about the web 2.0 content (wild side!).

That said, I would like to make absolutely sure about the reading for the remainder of the course.  I've completed the web 2.0 reading (for Nov 25).  Am I reading BlackBoard correctly that there will not be reading for our December 2 meeting, the one after the Thanksgiving break?  And then the final set of reading will concern Web Security and the Cloud, correct?

I think I could have stated that a little more clearly, but I hope you get the point.  If not, don't hesitate to contact me.  Thank you so much!

Friday, November 14, 2014

Week 10 Muddiest Point

To be perfectly honest, I would just like to thank Dr. Oh for reiterating the CSS slides with examples.  It was incredibly helpful, and gave me more confidence to tackle my own style sheet, which had been haunting me for the past few weeks.

If last week's Muddiest Point was "Everything! AAaaaHhHh!", then this week there isn't one because the re-do of the lecture was so helpful.  Thanks for all the examples, which were illuminating in a way that a simple explanation of the principles can't possibly be!

Thanks again!

Week 11 Reading: Blast from the Past

The reading for this week took us on a grand tour of last decade’s thinking about how radically changing technology influences scholarly education, as well as a short explanation of how search engines work (hint: it’s not magic and/or wizards, it just seems that way).
We’ll begin with the Paepcke, Garcia-Molina, and Wesley piece, cleverly titled Dewey Meets Turing.  They sketch a brief history of the uneasy relationship between librarians and computer scientists in developing what we know all take for granted: the digital library.  Apparently the librarians were frustrated that the computer scientists weren't as thorough in organizing the digitized content (what about preservation, after all?!), and the computer scientists saw the librarians as stodgy traditionalists who slowed down development with their endless, boring organization.  While this low-level eye-rolling was happening, the World Wide Web blew the roof off of everyone’s plans.  Instead of crafting beautiful but closed digital systems for libraries, everyone quickly realized that the public nature of the Web was the future, and the future was messy.  At the time this article was written (2003), Open Access wasn't as prominent an idea as it is today, and it addresses the concerns raised by this article.  In fact, I imagine it was concerns like this (in a kind of high-pitched “what do we do what do we do what do we do?!” kind of mentality) that drove the growth of OA technologies and mindsets.  My favorite point from this article is that change comes very slowly in the LIS field, driven by librarians “spending years arguing over structures”.  Get it together, everyone, or the train will leave without us.
Still more than a decade in the past, though more progressive in their thinking, ARL laid out an argument for digital repositories that has come mostly to fruition here in the second decade of the 21st century.   Institutional repositories are necessary for long-term digital preservation of scholarly material.  The digital migration is a healthy and empowering movement, but preservation at the institutional level is necessary for knowledge maintenance.  Moreover, building a strong institutional repository can reflect strongly on the institution’s prestige; it’s something to be proud of.  This paper presages the development of green Open Access.  That is, a digital repository at the institutional level that collects, organizes, preserves, and distributes scholarly material that goes beyond just an article accepted and published in a journal.  Instead, it allows access to a greater body of work, such as data sets, algorithms, theses and dissertations, and other knowledge objects outside the traditional purview of peer-review, organized in such a way as to enable new forms of discovery and connection in a networked environment.  The article warns against absolutely requiring scholars to self-archive their material, although this seems to be a painless and productive practice where it is happens today.  “Open Access is gonna be great, you guys!” seems to be the theme of the article.
Moving on to the Hawking article about the structure of web engines.  He describes the mechanisms of web crawlers (“bots” designed to index web content.  Like…all of it. Or most of it- tip of the hat to the black and white hats): be fast, be polite, only look at what the queue tells you to, and avoid multiple copies of the same material at different URLs, never stop, and stay strong against spam.  Algorithms index this content and makes it all searchable, no mean feat, as the amount of information on the available Web is mind-bendingly huge.  Indexing algorithms create cross-searchable tables based on searchable descriptors, and then rank them in respect to popularity (how many times a thing’s been clicked).  Really slick algorithms that seem to infer meaning (done through skipping, early termination, “clever assignment of document numbers”, and caching) get famous, like Google’s.  It’s fast, simple, and flexible.

The final article was about the Open Access Initiative Protocol for Metadata Harvesting, a protocol that is much touted in the other articles, as well.  It allows for interdisciplinary, interoperable searching of diverse types of content that find themselves suddenly close together in a digital world.  Though there existed previously a wide variety of organizational systems across disciplines, through the exacting use of XML, DublinCore, and other useful metadata structures, digital scholarly content.  OAI protocol gives different institutional repositories a way to communicate with one another to create larger collections freely accessible to anyone with an internet connection.  In addition, as is so important with regards to Open Access, metadata must be in place to track the provenance of a knowledge object.    Knowledge for the people.  Right on, OAI Protocol for Metadata Harvesting, right on.  Of course, this article came form 2005, a simpler time.  As we approach XML and metadata schemes in this course, it seems to me that these protocols don’t simplify anything but instead they manage to keep things organized until they change. Again.  Which isn't a bad thing, of course, and is in fact baseline necessary.  The tone in 2005, however, seems to be that of simplification.  Moving toward a controlled and universal vocabulary for organizing and providing Open Access is more of a pipe dream; the best we can manage so far is pointing toward a language, and then using it.  We've come a long way since 2005, but still no wizards.  Dang it.

Friday, November 7, 2014

Week 9 Muddiest Point

Greetings!

Now that my head has stopped spinning from the barrage of information from the CSS lecture (rereading the slides is helpful, but sometimes I feel like we rush through things too quickly in class, and I don't retain any of it.  Relying on the slides makes me a bit uncomfortable), I was wondering more about creating a navigation bar, as is hinted at in the description for Assignment 5.  Is that where creating universal attributes controlled by "#" and "." comes in?  Because you can slip the selector into the HTML only where you need to?  Or am I way off?

Also, in mentioning the selectors, what has been most useful for me in understanding these concepts hasn't been the slides that give, in loving and painstaking detail, the descriptions of the elements.  Those are necessary, for sure, but I feel like we could benefit from more examples.  I couldn't understand how universal attributes could be useful until I saw an example, which I feel like we rushed through.

Thanks!  See you next week!

Week 10 Reading: What the What?

I have to be perfectly honest here and say that all this XML is quite confusing to me, and this whole thing is going to read like one giant Muddy Point.

XML, as opposed to HTML, is less about defining the individual structure of a document and more about a document's ability to connect to others.  In addition, XML does not rely on a standardized set of terms.  This leads to an increased flexibility in determining how different parts of a document relate to other parts, and is therefore inherently a more explicit, dynamic set of semantics.  As opposed to HTML, there are no predefined tags, but the language used can refer back to a namespace that serves as a reference point.  Using a namespace, as opposed to tags allows for greater interoperability between readable documents.  This interoperability makes for an easier time when it comes to exchanging information across formats (maybe? Unsure).

An XML document is composed of entities with elements that have attributes.  This concept is familiar.  How they are created and manipulated is a little more confusing.

In the introductory statement of a piece of XML (the infuriatingly misspelled prolog), you can introduce the type of "grammar" you are going to use; you make up the tags out of your own, reasonably rational imagination!  Having defined the grammar, you can fill in the syntax with elements (like BIB or BOOK), and refine those elements with attributes (BOOK gets attributes like AUTHOR and TITLE).  This involves creating a document type definition (DTD)  There are very many rules about how to go about organizing the document, most of which boggled my mind.  The DTD, as an ancillary, external document, reminds me a little of how CSS relates to HTML, but, again, I'm probably way off on that because they serve different purposes.  The downfall of a DTD, however, is that you have to do it yourself.  Maybe that's not a downfall, however, as it provides firm control over the specific document you're creating.  However, because XML is based on exchange and connection, a tag you've created may mean something within your particular DTD, but may mean something else to the entity that's reading the code.  Enter the namespace, which essentially defines the vocabulary your XML grammar will be working with, so the computer on the receiving end of the document can use the namespace as a reference point, or dictionary.

The rules for linking (Xlink, XPointer, and XPath) are a tightly wound that involves what seem to me to be sub-languages.  You assign an addresses to the locations of the objects you want linked together using the Xlink namespace, and that's where I get completely, utterly lost.  Where does the XPointer point?  I know it uses XPath somehow.  Even the helpful, Explain-Like-I'm-Five W3C tutorials couldn't get me to understand it.

What I do understand, however, is that the flexibility and hierarchical structure of XML documents are good for storage in databases.  This makes them pretty important.  I very much look forward to next week's lecture for a little clarity.



Friday, October 31, 2014

Muddiest Point: Week 8 (10/31/14)

So, this week's lecture was a lot, I feel like.  Lots of slides, a lot of concepts, and a lot of fine-toothed work in Lab.  I wish we would have had a better explanation of and more experience with FileZilla.

I am also struggling with the concept of absolute vs relative linking.  Is that why, in FileZilla, the remote window, pubic file, there is a file that's titled ".."?  So it makes it easier to build linkable web pages for the My 2600 project?  I would like a little more clarification on it.

As always, thanks!

Week 9 Reading: Get Hexadecimal

Mary is trying to understand!

As with the previous week's "readings", I had a lot of fun playing around with W3C's tutorials and extremely helpful examples. The Cascading style sheet is a more efficient way of formatting the visual elements in an HTML document than using HTML itself to dictate the style. The value of the cascading style sheet is that the properties you manipulate are "inherited" throughout the document, so making a single change to the CSS changes everything controlled by the element you changed. Saves a lot of time if, for instance, the company you were working for rebranded itself and wants blue to be the dominant color instead of, say, green. The designer only has to make the necessary adjustments on the style sheet and the adjustment cascades throughout the elements controlled by the original code. There are pre-formatted CSS's out there, but I understand that it's important to know how to write it (and HTML, of course) by hand. It gives me more power over the design of web-based elements I will (hopefully!) be designing as I move back into the workforce.

A piece of CSS that describes a change in style (a "rule') can be split into several different parts. The selector defines which element of a document will be modified; this can be a header, a paragraph, etc. The declaration does that just that: declares what qualities to display in relation to the selector. The declaration has two parts, a property and a value, which point specifically to discrete variables that can affect the appearance of the selector. In other words, I want to change the appearance of my first header (selector), so my declaration would be to change the color (value) of the font (property). These rules are governed by discrete semantics, which makes it seem pretty simple.

But it's not!

Formatting a coherent CSS requires a lot of abstract design before writing the style sheet. You (well, I) have to hold a design in you (my) mind the whole time we're dealing with brackets and curly brackets and hex codes and alignment buffers. I can start to understand what people mean when they call a piece of code "beautiful".

Moving on, though.

The brilliant thing (but conceptually difficult) thing about CSS is that you can design the entire document in one go. That is, using the specified language provided by, say, W3C's cheat sheets, you can create an entire, consistent thematic design of a website without having to go in and manually change each and every element by hand. If you want your first three headings to be bright pink and in 36 pt Comic Sans, you can specify that in the CSS by simply listing h1 h2 and h3 as the selectors in your rule. No one but no one likes Comic Sans, though, so I don't advise it. Alternately, you can control the design elements of individual headers as their own selectors, and can nest commands like HTML. This is where I see it becoming really difficult to track changes across a long, plain text document.

So, you've manipulated your selectors, declarations, values, and properties into a document that you're rather proud of. What next? You can stick your CSS right into the head of the HTML document and upload it to a browser. Apparently, however, some, but not all, browsers won't know how to read the kind of CSS you've used, and so it's important to tell the browser which language you're using. If some jerk browser is giving your CSS trouble, however, apparently you can insert the rules into HTML's "comment" command.

Also, I've gone and hit "Preview", and it turns out that all this CSS I've been trying to do hasn't affected the blog post. I tried to do the "Inspect Element" option, but while I can change the text color, it doesn't seem to stick. Can you see what I was trying to do here? I put some CSS into the HTML head, but I suppose the Blogger code trumps mine. After much tinkering around and swearing under my breath, it worked when I plugged it into the W3C "Try It!" window, which was comforting. I wanted a pinkish background with lovely yellow letters in the headers. All I got was centered headers. Better than nothing, I suppose. I can see how designing with CSS, combined with proper HTML, can be simultaneously satisfying and infuriating.

Friday, October 24, 2014

Muddiest Point, 10/21/14 (week 7 again, apparently)

I hae a question about Koha.  It's open source, right?  Do I have to be a part of an institution to use it to create a catalogue? 

I've always wanted to create a catalogue for my personal collection, and am not quite savvy enough with Excel.

Or would it be smarter to use Access?

Week 8 Reading and Fun Times: Did you know you can't put HTML in a post's title?

I found that out because I wanted the title to be <Week 8 Reading> </Week 8 Reading>.  Oh, well.

The World Wide Web Consortium (W3C) is a regulating body that oversees more digital standards than I knew even existed.  Indeed, it is they at the W3C who are the overlords of HTML, which I did not know had such a master.  Hyper Text Mark-up Language is a set of tags that communicate the formatting of web sites.  It is, it turns out, a rather simple, concrete set of semantic commands set within brackets that describe a conceptual layout of visual web material.  All HTML, or Extensible HTML (XHTML), documents are satisfyingly balanced with opening and (for the most part) closing tags.  Additional information about the look and design of a website (CSS), as well as metadata information that can grab the attention of search engines (<meta>tags) may be embedded within an HTML document.  The great thing about HTML is that it's simple but flexible.  In addition, in conjunction with the mark-up language, the use of a cascading style sheet means that changes can be made across a whole website (say you wanted to change your site's font, or background color, or height of a logo) just by making a single change in the CSS.  In addition to text and style elements, HTML can be designed to display data in a variety of ways with the <table> tag.  HTML documents can also be designed to be more interactive that just turning a link purple by clicking on it.  Commands such as action and input can enable a site to collect user-submitted information. 

HTML is a wide-spread and useful mark-up language.  But why, you ask, should a would-be librarian even care about this?  The answer, of course, is because web sites matter, and understanding and working with dynamic digital content is integral to an information professional's work. Duh. Why did you even ask that question.  Developing these basic web design skills, and I am aware that these are very basic HTML skills, like the finger painting of HTML, provides the librarian with a certain level of autonomy in creating and managing online content.  However, as the librarians at Georgia State found out, variable skill level between staff members can result in an inconsistent and incomprehensible mess.  Enter Content Management Systems (CMS)!  SpringShare and the like provide slick, coherent templates that are easily tailored to fit a wide variety of needs such as course guides, library instruction support, or database navigation; these guides can then be shared and altered, as fits with the collaborative nature of 21st century information behavior.  I've developed a couple of course and subject guides using the SpringShare CMS, and I can't quite imagine getting the same (beautiful! amazing! useful!) results with my piddling HTML skillset.

The problem with SpringShare is that it's a commercial enterprise, and so it is out of reach for libraries with limited budgets.  The solution ought to be an open source option, such as Drupal or Al Fresco, but there is little material specifically designed for libraries.  Or, as GSU did, simply train your information techs, programmers, and librarians to be the baddest CMS designers around; this is probably the smartest bet to get that all-powerful ROI.  The tables for their CMS, and the description of how the Active Server Page system pulls information out of it, were intimidating, but not incomprehensible.

(But it won't take you anywhere) (But it's still a button I made!)

The W3C HTML Tutorial is a lot of fun. Because we can format in HTML in Blogger, here is something that I wrote in HTML.

Mary Is Great!

Here is a link to a picture of a pine marten. Or, if you don't want to click the link, here is the picture itself:

.

Aren't they adorable but just a little scary?  Just like HTML.

Friday, October 17, 2014

Muddiest Point for Week 7

My Muddiest Point is actually about Muddy Points in general.

Was one due last Friday (the week we had class)?  I bet it was.  And I missed it.  Shoot.

Also: Peer-to-peer networks.  I'm not sure I understand those very well.  P2P connect computers to one another, but not to any regulating mechanism?  Is this how the "dark web" and Tor operate?

Week 7 Reading: NAPS are Great, Google is Greater, and Get Your Act Together, ILS

We all know that the internet is a series of magical voodoo tubes manipulated by powerful eWizards that live in our wireless routers, right?  Well, no (but that would be totally cool).  The truth is simultaneously more mundane and complex than that.  The internet itself is an interconnected series of smaller networks (though highly intelligent computer engineers, the people who name these things are no poets).  If we zoom into the "smallest" network, we see our old friend the LAN linking machines together within a small radius.  The LAN then connects to a larger network at the Point of Presence (POP!) that covers a larger area or multiple LANS (right? I think so. Is this right?), which then feeds into the greater Network Access Point (NAP), which connects to others of its kind worldwide, thus enabling connection between every computer anywhere to every other computer anywhere (that has network capabilities).

With all these machines sending data along these complex layers of networks, something must act as a traffic light to control the flow of information.  This is the job of the router.  Thanks, little guys.  Once a router gives the go-ahead for information to be sent along its path, this global (and sometimes low-orbit) network then requires vast amounts of physical cable (fiber optics-a series of cables, not a series of tubes) to get the data where its going; companies that own these "backbone" cables all work together to maintain the fiber optic system, as mutual destruction is certainly assured if part of the system doesn't work.  So, our data, having been released properly by a friendly neighborhood router, has left its little LAN home and is travelling along the fiber optic spine that stretches across the whole world.   But it gets a little more complex.

The internet, though seemingly a wild splatter of information smeared across the globe, is actually organized according to Internet Protocol (IP).  Every computer connected to the internet has an IP address, a series of eight pairs of numbers (octets) that uniquely identify a location of origin...at least until we surpass 4.3 billion computers, and then I guess we'll figure something else out.  Since IP addresses are meant to be read by a machine, they look a little strange to human eyes, so we assign text to each IP address, which is known as the Domain Name System (DNS), and is familiar to us as, for instance, www.pitt.edu.  Note that there are three components to that address: "www" "pitt" and "edu", the whole thing being known as the Uniform Resource Locator (URL).  These are built in redundancies that allow different servers to search for the location; if, say, "pitt" can't be located (good old 404!), through it's doman name (.edu), it might be located by either of its other components.

We owe a lot to servers, the physical machines on which much of the information on the internet is stored, and which provides us (clients) with access to the material we're looking for (www.pitt.edu).  They have a static IP address; the networks know exactly where they are at all times.  "My" IP changes as I connect to the internet through different wifi networks.  My machine doesn't have an IP address, therefore, but the location of the modem does.

Meanwhile, the above-mentioned complexity of connections results in a user experience that is delightfully easy to use.  Information can be linked together like never before, and libraries are therefore having to deal with the increased expectation for user ease.  Our largely proprietary integrated systems did not anticipate this change in the nature of information consumption, and are scrambling to readjust and remain relevant.  This has involved smashing other services next to existing services without having them integrate their work flows.  For instance, in my experience with the III product Millennium, searching the circulation catalogue for an eBook would not produce any results.  Instead, we'd have to go into a separate OverDrive OPAC designed specifically to search our electronic resource collection.  This was clunky and confusing to patrons who then did not use it, thus keeping circulation of our investment in electronic material low.  Not cost effective, and offensive to the ROI gods.

What we did, therefore, was switch to III's new open source software, Sierra, which has electronic material integrated into both its staff workflows and its sleek, redesigned OPAC.  This was an expensive move to be sure, but the ease of use, expanded search capabilities (narrow results down to material type, for instance, or subject heading, or year published, author, etc), and clean visual design is much more attentive to modern user expectations; it allows greater search flexibility and user satisfaction.  I think the hand-wringing in the "Dismantling the ILS" is uncalled for, actually.  Or, perhaps, the Carnegie Library of Pittsburgh should stand as a positive model of rebuilding an ILS.

I denounce the hand wringing because instead of staring at our own navels as we get washed out on a tide of change, we should be looking ahead to the horizon (nice little nautical metaphor there) where Google is.  Their mission to help develop a healthy, education user base by delivering information for free to users sounds awfully familiar, doesn't it?  Except they seem to be taking our customers.  Excuse me, patrons.  Perhaps we need to look at their business model, which, first of all, fosters and informal and collaborative culture of highly trained staff.  Many of their successful applications (Gmail, Docs, Blogger, Maps, etc) were developed by allowing their engineers pursue projects that interested them.  There are also a fabulous list of failed Google applications (Google Answers-which was mentioned in the video HA!, Buzz, Notebook, etc).  That allowance for failure is refreshing.

Sometimes I get the feeling that librarians are possessive of the knowledge we purport to keep safe, and that can develop into an elitist attitude that may stifle innovation when we need it most.  That doesn't mean I want advertisements in libraries (although that'd be an interesting method of sustainable funding), but I do want us to be more amenable to the Google way of doing things.  Adventurous and world-dominating!  Gone are the days of austere buns and shushing in the stacks!  The idea that a library can organize its information in isolation have gone the way of the card catalogue.  We're not special anymore, Google's made sure of it; we're just the awesome, most amazing specialists.  We need to weigh anchor, we need to continue our ocean metaphor, and catch up if we can.


Wednesday, October 1, 2014

Week 5 Muddiest Point (09/30/14)

I'm not sure I understand the concept of metadata "schema".  Is it equivalent to MARC codes?  I understand that there is no consistency or standard, but I don't quite understand what...it is.  Is it semantic?  Or is there a semantic way to connect disconnected schemata?  I'm not sure those questions even make sense, as I don't quite understand the concept of "schema".

Thanks!

Week 6 Reading: A Series of Tubes

I want to look at an amusing cat picture on the internet.  This is an easy thing for me to do.  However, there is a lot of work going on behind the scenes that enable me to do this.

Beginning with the military in the '50's, scientists and engineers have built a connected series of data networks that provide us with pictures of cats, among other, less-important things like email, online scholarly journals, Yelp! reviews, interactive maps, etc.  There are many different kinds of computer networks, the most famous being the Internet (maybe you have heard of it, I don't know). These days, data is sent from a source to an end-user through packaging the information with source and destination information.  These packets of data can travel through physical and virtual topographies in order to reach their destination.

The physical structures of computer networks include different kinds of cables (twisted pair for telecommunications, coaxial cable for TVs, optical fiber for lots of cat pictures travelling under the ocean, etc); wireless technologies send data through earth-bound terrestrial microwaves (this week's Term of the Week) or up into space via satellites, the ubiquitous cell phone towers blinking in the night, and radio frequencies.

Once the cat picture has raced across a physical structure, it encounters a node, which is a kind of door into the destination.  These doors have specific addresses (NIC) and codes (MAC) that allow the cat picture to locate the correct destination and obtain access; once it finds the right place, the data gets "scrubbed" by a repeater.  Sending the cat picture from one network, say a WAN, to another (LAN, or HAN?  Maybe?  This is conceptually difficult for me, I admit) requires that these networks communicate with one another; routers and modems regulate this process, and firewalls protect it.

The digital structure of computer networks is different than the physical topology.  Network designs vary in how many connections there are between participating nodes; the more connections, the better, but more connections means a higher cost.  On top of this, virtual networks can be built on top of existing networks.

The links between these networks are governed by communications protocols such as the Ethernet family, which results in the inability to mooch your neighbor's wifi because it requires a "wireless access key"; Internet Protocol Suite, which the Wikipedia article calls "the foundation of all modern network", and manages to govern information exchanges over an unreliable network; SONET/SDH; and Asynchronous Transfer Mode, which is still relevant for connecting an ISP to someone in their home, trying to find a good cat picture.

The size and scope of networks vary.  They can be only within businesses/between business and their clients (intranets and extranets), small enough to fit into your house (PANs that govern wireless printers to print cat picture); city-wide (MAN!  I would love to see Pittsburgh have one of these); worldwide, so that your mobile carrier can drop your call (GAN), and, of course the Local Area Network, or LAN.  LAN is the type of network most Average Joes like me use most frequently.  It evolved out of a rapidly increasing use of connected computers in close geographic range (the office, the lab, the educational department, etc), and resulted in a standardization of protocols.  

Security issues abound when discussing interconnected computer networks, of course.  There is the issue of network security, that (hopefully) prevents unauthorized entities from accessing, modifying, stealing, and otherwise manipulating information.   If you've been living under a rock, the issue of network surveillance is very important in this new, interconnected age.  Powerful organizations (governments, law enforcement bodies, the Mafia, etc) monitor the data being exchanged over various networks.  For safety or nefarious purposes?  The debate is wide-spread, heated, and far from over.

Moving on to RFID, and its potential for use in libraries.

Replacing barcodes with RFID chips could speed up library work considerably.  I've dragged a squeaking cart loaded down with a laser scanner clumsily connected to an aging laptop through stack after stack to preform inventory.  This cumbersome and time-consuming task could optimized with an RFID scanner.  Circulation could be optimized by RFID scanners in book drops (some patrons already think this happens, that a book gets checked in immediately after they put it in the drop), checking in/out large stacks of items at once, and having security information encoded onto and RFID for more efficient loss prevention.  Self-check out is already in place in many libraries, but it still relies on barcode scanning and (generally) magnetic security screening; trying to teach patrons to "thump" a book after they've checked it out has, in  my experience, been a difficult task that leads to library staff (Mary) having to stop them at the door because they've set off the alarms anyway.  Streamlining self-checkout might encourage more use, and finally start paying off the exorbitant price of the self-checkout machines.  This, however, would make the circulation clerks very lonely.

RFID technology permeates more and more of our world.  It is currently more expensive than printing barcodes, but it may be inevitable that libraries will have to move to RFIDs simply because it is the available technology.  This presents some problems.  The chips are relatively easy to remove or veil, which increases the potential of loss.  Smaller items (magazines, thin books, children's books) present challenges for reading individual tags.  Replacing tags can become costly in a way that slapping another barcode on an item never will.

Anyway, in case you were wondering:





Wednesday, September 24, 2014

Week 5 Reading: Uniquely Identify This

So, metadata.  It's quite a buzzword here at the iSchool and, until this week's reading, I had only a vague notion of what people meant about data-about-data.  Thankfully, this set of articles really helped.

From its humble roots as a data organization system for geospatial data management systems, metadata has taken the information world by storm.  Instead of previous methods of categorization and organization to provide storage and use, metadata not only describes the content of an object but also its behavior.  That is, well developed metadata is more than an isolated description of contents and provenance; it describes its use, the history of its use, storage, and management across changing media landscapes, as well as its relationship to the contents and uses of other data across diverse fields.

Different types of metadata can be used to track and organize information at many levels, as well.  If a librarian needs to create a finding aid, she'll use descriptive metadata, as opposed to an archivist looking to document a recent conservation project (he'd manipulate preservation metadata).  The internet generates its own metadata at an alarming rate.  Each of these situations calls for its own classification and encoding systems.  It is increasingly difficult to achieve the goal of generating metadata in the first place.  That is, to create a richer and more accurate body of information in its complex context as it interacts with the changing landscape of knowledge.

To add to all of this, technology changes at such a rate that networked information needs to migrate, so metadata "has to exist independently of the system that is currently being used to store and retrieve them" (Gilliland, 2008).  This requires a high level of technical expertise that has resulted in a rising sense of panic that I've been reading about in my other classes.

Different fields of study value different kinds of information, however, and there is no consistency to track their contents across disciplines; enter Dublin Core!

Have you ever read The Hitchhiker's Guide to the Galaxy?  The Dublin Core Metadata Initiative (DCMI) seems like they're trying to build a Babel Fish, which in Hitchhiker's Guide is a little fish you can put in your ear that instantly translates any language you hear; you can understand anyone in their first language, and you can be universally understood.  From what I can tell, the Dublin Core Metadata Initiative is trying to create a universal translator.  Working within the Resource Description Framework (RDF), Dublin Core identifies the specific Markup language in use and "speaks" in that language.  That is, it pulls up the context-correct dictionary for the data in question, points to a specific definition, and then uses it in the query, what Eric J. Miller calls "modular semantic [vocabulary]".  For instance, if you wanted to know about famous hospitals in the 1800, the DMCI would do the heavy lifting of specifying field-specific classification schemes, and you would get results from systems that use LC, DDC, Medical Subject Headings, and maybe AAT, too.  So, generally, the goal of DCMI is to act as translator for well-established data models in order to allow for a more flexible interdisciplinary discovery system.  Inter! Opera! Bility!


Week 4: Muddiest Point (09/23/14)

To be perfectly honest, this week's class was more illuminating than question-raising for me.  The semantic and logical nature of building data models is somehow satisfying and interesting to me, and I'm looking forward to Assignment #2.

I would like to know what kind of work is done with data stored in an "analytic" database (data warehouse?) as opposed to a retrieval database like the kinds librarians use, but I bet I can go find out about that myself.

Oh! How do you save a database with its attendant query history in Access?  I saved the query we did in lab yesterday, and I saved the database, but they don't seem to have saved the same information.

Thank you!


Thursday, September 18, 2014

Week 4 Reading: Database, My Database, and Nothing is Normal(ized)

My mother-in-law worked for the government in the 1970's.  She has told me tales of these giant drums that were the databases of the era.  They a lot of physical manipulation and waiting.  Is this a navigational kind of database, wherein the user had to wade through combinations of inflexible, predefined paths of data?  Applications had to sort through linked data sets within a larger network; this required inefficient levels of time, training, and funding.

Enter the relational database, much more flexible and interrelated in that it uses and interrelated series of table to reference one another on a query-basis.  I would be assigned a "key", let's say my first name, all subsequent data about me on other tables could be accessed using that key.  If all my personal information (address, phone number, birth date, social security number, favorite flavor of ice cream, etc) were listed in separate tables as defined by those categories, a relational database is able to call up the query-relevant data only when it is needed.  A standardized search term like this improves the capacity and fluidity of the database.

Subsequent evolution of database structures seem to be based on the relational approach (it seems that Moore's Law was the downfall for the "Integrated approach").  The question then became one of developing relevant and effective query languages as well as refining schemata for increased and interrelated data.  It seems that SQL remains on the top of the pile in terms of query languages, even though it's been through various permutations for different uses.

With a vast increase in end users in the 1980's (desktop computers! Welcome, the hoi polloi!), users were left the task of manipulating data, and DBSMs would quietly go about their business of decompressing, reading, and recompressing files.  With the advent of end users, data became more tied to a "user", rather than a user being tied to disparate data concerning her, the beginning of the "profile".

These days, there are many types of databases to serve different data needs.  For instance, a database that manages ambulance dispatch would probably be an in-memory database because, as the Wikipedia article states, "response time is critical".  Many people now use cloud databases to have access to their data and information from any physical point that has internet connectivity.  Libraries are increasingly serving as data warehouses.  As digital scholarly publishing and discourse happen in digital spaces, the warehouses allow for mining and managing data "for further use", which I think means the development of metadata- very much a buzzword in the LIS field today.  In addition, current digital journals or collections thereof, like JSTOR, are hypertext/hypermedia databases, allowing for papers to be linked to research data and referenced works, for instance.  Federated databases are relevant to library work in the creation of, say, the European Digital Library, where disparate institutions share their collections and information.

Designing databases starts with understanding exactly what data is going to be organized for storage and retrieval.  This conceptual modelling informs the actual data structure; the data structure is dependent on the database technology and its attendant DBMS, i.e., the logical data model is not conceptual at all, but is instead informed by the requirements of the database management system.  There are lots of these models and, to be frank, I almost understood them, but not really.  Not really.

Data modelling's first step is normalization; from what I can deduce, normalization is the paring down of raw data until it's lean, mean, and internally consistent enough to be treated as a body of data to be placed into a database (that you are building according to the demands of the DBMS).  Eliminate useless redundancies and similarities (which brings us to the best term of the reading: atomicity) and assign each reduced data set a primary key (see below);  from here, the cached webpage became very difficult for me to understand, as it was so referential to the diagrams, which unfortunately were missing.  From what I can tell, the next step in normalization requires that each primary key (having trouble with the concept of "concatenated keys") is distinct in itself and does not rely on association with any other key; if association occurs, creating a new table with all associated keys is in order.  The last step in normalization is identifying non-key attributes, as they relate to the key attributes being used in a table.  I am also unclear as to the one-to-many, many-to-one, and many-to-many relationships.  Visuals would definitely help.

Peter Chen's data modeling system begins with a conceptual mapping of the structure; this concerns the dividing up into tables that will house the relevant data, and creating a cartography of relationships between the tables.  After the conceptual framework is set up, tables are populated with data in a way that conforms with the dictates of the DBMS.  In order to best define the relationship of data between one another, each datum is considered an attribute (noun); "relationships" are expressed as the connection between these nouns.  Unique attributes assigned to an entity create is "primary key".  The semantics of the entity-relationship model are in grammatical terms, which really helps me keep a handle on the concept.  Illustrating the types of connection between entities and relationships can be visually represented in many ways, known as "cardinality constraints".

Meanwhile, in order to keep a database healthy and happy, it seems that developing redundancy and the ability to replicate older forms (what if a migration goes wrong?) of a database are key.  Back up your data, folks, that's always been rule number one.

This is a lot of information, and I look forward to seeing some visual aids!

Wednesday, September 17, 2014

Muddiest Points: Week 3 (09/16/14)

Greetings, and welcome to my Muddiest Points for LIS 2600, Week 3!

I was wondering if you could integrate RLE within an image that requires more complex compression techniques.  That is, can there be more than one method of compression within a single file?  Or is this illogical?

Also, I am still trying to get my head around binary.  I though I understood it, but then there was that Foxtrot comic and I got confused.  I made a little chart, and am wondering if it's correct?

So, if there are four bits, there are a potential for 16 values.  I tried to make a chart and would like to know if I got it right.  I know it isn't necessarily relevant to what we were learning in terms of compression, but I didn't understand the explanation very well; now I want to know if I understand the concept.  So, here's a chart of 4 bits (four powers of two, right?) and their decimal equivalents.  I apologize that the underlining is a little weird:

23            22                 21              20          Decimal Equivalent
0              0              0              0  ………………. 0
0              0              0              1  ……………..…1
0              0              1              0  ………………..2
0              0              1              1  ……………..…3
0              1              0              0  ………………...4
0              1              0              1  ………………...5
0              1              1              0  ………………...6
0              1              1              1  ………………...7
1              0              0              0  ………………...8
1              0              0              1  ………………...9
1              0              1              0  ………………...10
1              0              1              1  ………………...11
1              1              0              0  ………………...12
1              1              0              1  ………………...13
1              1              1              0  ………………...14
1              1              1              1  ………………...15

Thanks!
-Mary


Thursday, September 11, 2014

Week 3 Readings : Data Compression Almost De-Mystified But Not Quite, Historic Pittsburgh is the Best Pittsburgh, and Duh, We All Use YouTube

Run Length Encoding (RLE) and Lempel-Ziv (LZ) methods of compression are fashioned to compress repetitive texts; RLE deals better with sequences of identical value, such as AAAAHHH NNOOO!!!, but not something like "Data compression is a conceptual challenge for Mary Jean".  Therefore, RLE is better for compressing low-contrast images; super-long sequences (many pixels of the same color, for instance) can be compressed by sorting by channel.
In the case of multiple patterns of different lengths, LZ handles this information in a more efficient manner; it uses self-referential data gained from it previous iterations.  The idea that we compress speech and text in daily life, as well, was particularly helpful for me in understanding how LZ works.

See?  I did it just there.

Both RLE and LZ are strategies that produce lossless compression, resulting in perfect reproduction of the uncompressed form, as well as allowing for the process to be reversible.  Other lossless compression methods include entropy coding (so that's what encoding means!), which assigns a value to the data inversely based on its statistical probability of occurring in a given set.  In this way, smaller assigned values for more frequently occurring data ensures a smaller file size.  There was a lot of math here, and it frightened me.  However, lossless compression is integral for compression, programs, as a single "misfire" of information will send the whole finely wrought program down the drain.  The DVD-HQ article also mentions that lossless compression is ideal during "intermediate production stages"; so, I guess lossless is necessary for things that absolutely have to be preserved as they originally appeared.  For example, if you were editing speech tracks for a movie you're making, you'd want to use lossless compression because, even though it takes up more space, you need an exact copy of the original recording to work with.

Lossy compression deals more with the difference between information and data; during decompression, certain pieces of data can sloughed off ("lost", right?) while not losing the essential rendering, as opposed to exact digital copy, of the end product.  Lossy compression is more efficient for audio and video information, which makes sense because the information is much more complex than a static image comprised of shades of color; instead, they are dynamic pieces of information that the "end processor" (a person, for example), well, processes in a more recondite way.  A particularly helpful example for me from Wikipedia was ripping a CD; lossy compression shrinks the file size by eliminating, though the fabulous term "psychoacoustics", less audible or irrelevant sounds.  The result is an inexact copy of the original data, but a form of consumable information nonetheless.
It's interesting to note that most of the information we receive is lossy.  That is, streaming video, cable, DVDs, mp3s, satellite radio, etc.  Makes me think about what we're missing.

Both Imagining Pittsburgh and  YouTube and Libraries focus on the end-user benefits of integrating information that's been compressed and then decompressed into library services; I imagine they used some form of lossless compression, as the images are static, and largely grayscale.  The Imagining Pittsburgh article delineated the plan for creating the really wonderful Historic Pittsburgh database.  I am a frequent user of this service, both in my professional and personal lives.  As an aside, it feels like it hasn't been refined since they launched the site 10 years ago; searching is difficult and clunky.
The article, meanwhile, highlights the processes three major Pittsburgh institutions went through to create a collaborative space that tells the story of this city through images, maps, and population data.  It's proof positive of why things like data compression matter; it provides an example of practical applications of the skills we're developing in this course.  The capacity for and output of cross-disciplinary, inter-organizational collaboration is so greatly increased on a digital platform.  The article also gives a step-by-step breakdown of how information technology integrates with the goals of organizations.  Imagining Pittsburgh is a fine example of a professional document meant to reflect accountability and expertise in the face of funders and professional organizations.  Meanwhile, none of it would have been possible without the ability of data compression.

While the YouTube and Libraries article demonstrated an implementation of lossy compression, it nevertheless seemed, to me a little hokey; by this point everyone knows the democratizing value of YouTube.  I initially wondered why the author didn't suggest embedding the helpful videos she proposes ("how to find the reference desk") onto the library's own website, but I do understand that the wide popularity of the juggernaut that is YouTube would probably garner more views than on the library's site itself.  Overall, though, the article seemed to me to reflect the latent fear of new technology, born from the fear of obsolescence, that is a passing trend among librarians as a new generation steps into the field.

Tuesday, September 9, 2014

Muddiest Points: Week 3 (09/09/14)

Today was a lot of information to take in.  I didn't quite finish the whole Lab worksheet, but I think that's okay because of time constraints from discussing Assignment 1.

I thought I knew a little about computer basics, but there was so much information that I couldn't sort through what, exactly, were the main points.  Lots of slides, as Dr. Oh said.  It muddled my comprehension.  I felt woefully under-prepared to complete the Lab worksheet; for instance, from the worksheet we were (thankfully!) provided in Lab, in the "Hardware-Memory and Storage" row, what's a Card Reader?  Where do we find graphics hardware?  I couldn't seem to find any information.  Did we go over this in lecture?  I might have missed it because of the above-mentioned rapidity of slides.  Also, what are "Inputs and Controls" (in the "Power and Expansion" row)?

I would also like clarification on Moore's Law.  Is it important to understand because technology changes so quickly, and it's good to have a schedule to know when we should "update" our working knowledge?

I was also wondering if it's feasible or even reasonable to replace my hard drive with a solid state.  All my IT friends are telling me to do this, but I'm not sure it's necessary.

Thanks, and I'll see you next week!

Tuesday, September 2, 2014

Week 2 Readings: The Fast Lane, Google Might Be Evil, and PC Deployment Schedules Are a Drag

This week's reading focused mainly on the necessity of bringing librarianship in line with the changing nature of information, the attendant stumbling blocks, both technological and cultural, associated with doing so, and the murky landscape in which it all takes place.  A practical solution is also put forward.

Charles Edward Smith's piece, "A Few Thoughts on the Google Books Library Project", addresses the escalating sense of panic among libraries that has arisen because Google is on the forefront of digitization before the supposed information professionals got on board.  He reassures us, however, that putting literature and knowledge online provides a wider platform for access.  Instead of rendering academic libraries obsolete, the Google Books Library Project serves as a model for the "successful transfer of knowledge".  Building online information resources will actually liberate scholarship and research from traditional barriers of access like physical proximity, scarcity of resources, interlibrary loans, etc.  He stresses that material that is not digitized will no longer count; people rely almost solely on digital resources, and anything that remains only analog will fall by the wayside;  in fact, analog-only material can be said not to exist in the developing world of information technology.  Smith is excited by the possibilities digitization presents, and encourages professionals not to be afraid of it but to embrace it wholly.  The problem I see with our brave new world of information, however, is that it can limit the scope of one's research.  The infinite amount of information available online is governed by search terms defined by the user and his/her preconceptions about the material, and as such can lead to the researcher wearing "blinders", as it were, through specifically tailoring his/her search to exclude "extraneous" results.  This specificity can inhibit exposure to material the researcher might not have thought of as relevant, but may be nonetheless.

Meanwhile, in Europe, librarians fear the America-centric nature of Google.  When trying to build the European Digital Library in order to provide wide access to out-of-print and old texts, libraries have struggled with alliances with the internet giant.  It seems that public funding is sparse or non-existent, and the project has turned toward private funders to get the Library online.  This, of course, involves Google because it has already dealt with publishers and book sellers during its quest to put books online.  Doreen Carvajal's article "European libraries face problems in digitalizing" brings the question of Google's apparent American bias into focus.  I'm not surprised that, as an American, I had absolutely no idea that this was the case.  Google's tendency toward American information would seem to contradict the mission of the European Digital Library; it seems that the company needs to be on board to make anything happen, however.  The relationship between providing a free, online and public library and operating in a world where private money seems to control what goes on is still an unsolved point of tension.

While both of the above-mentioned articles display a little theoretical hand-wringing over what a 21st century library will look like, the University of Nevada, Las Vegas got its hands dirty by building one of its own.  Jason Vaughn's article "Lied Library @ 4 Years: Technology Never Stands Still", breaks down the practical steps necessary to create a connected, efficient, and relevant
library space.  From actual PC replacement, building staff competencies on both hardware and software, managing physical space, keeping systems healthy, and preventing loss, to developing an ongoing plan to maintain relevance in a rapidly changing information environment, it seems that UNLV's librarians have been awfully busy.  Some of the technological changes they made back at the beginning of the millennium are now laughably and thankfully obsolete; the prevalence of "community use" policy makes me think that the public libraries weren't as connected as they were today; widespread wireless connectivity was still a long way away and so they needed to install lots of "hot jacks"; Deepfreeze was a new technology; open source software was almost unthinkable.  However, this article displays how an honest recognition of the field's evolving nature is baseline necessary to implement effective change.  It takes a lot of effort, and a lot of money, but is less expensive than refusing to change at all.