LIS 2600: My Fun Blog of Extreme Awesomeness: Week 3 Readings : Data Compression Almost De-Mystified But Not Quite, Historic Pittsburgh is the Best Pittsburgh, and Duh, We All Use YouTube

Run Length Encoding (RLE) and Lempel-Ziv (LZ) methods of compression are fashioned to compress repetitive texts; RLE deals better with sequences of identical value, such as AAAAHHH NNOOO!!!, but not something like "Data compression is a conceptual challenge for Mary Jean". Therefore, RLE is better for compressing low-contrast images; super-long sequences (many pixels of the same color, for instance) can be compressed by sorting by channel.
In the case of multiple patterns of different lengths, LZ handles this information in a more efficient manner; it uses self-referential data gained from it previous iterations. The idea that we compress speech and text in daily life, as well, was particularly helpful for me in understanding how LZ works.

See? I did it just there.

Both RLE and LZ are strategies that produce lossless compression, resulting in perfect reproduction of the uncompressed form, as well as allowing for the process to be reversible. Other lossless compression methods include entropy coding (so that's what encoding means!), which assigns a value to the data inversely based on its statistical probability of occurring in a given set. In this way, smaller assigned values for more frequently occurring data ensures a smaller file size. There was a lot of math here, and it frightened me. However, lossless compression is integral for compression, programs, as a single "misfire" of information will send the whole finely wrought program down the drain. The DVD-HQ article also mentions that lossless compression is ideal during "intermediate production stages"; so, I guess lossless is necessary for things that absolutely have to be preserved as they originally appeared. For example, if you were editing speech tracks for a movie you're making, you'd want to use lossless compression because, even though it takes up more space, you need an exact copy of the original recording to work with.

Lossy compression deals more with the difference between information and data; during decompression, certain pieces of data can sloughed off ("lost", right?) while not losing the essential rendering, as opposed to exact digital copy, of the end product. Lossy compression is more efficient for audio and video information, which makes sense because the information is much more complex than a static image comprised of shades of color; instead, they are dynamic pieces of information that the "end processor" (a person, for example), well, processes in a more recondite way. A particularly helpful example for me from Wikipedia was ripping a CD; lossy compression shrinks the file size by eliminating, though the fabulous term "psychoacoustics", less audible or irrelevant sounds. The result is an inexact copy of the original data, but a form of consumable information nonetheless.
It's interesting to note that most of the information we receive is lossy. That is, streaming video, cable, DVDs, mp3s, satellite radio, etc. Makes me think about what we're missing.

Both Imagining Pittsburgh and YouTube and Libraries focus on the end-user benefits of integrating information that's been compressed and then decompressed into library services; I imagine they used some form of lossless compression, as the images are static, and largely grayscale. The Imagining Pittsburgh article delineated the plan for creating the really wonderful Historic Pittsburgh database. I am a frequent user of this service, both in my professional and personal lives. As an aside, it feels like it hasn't been refined since they launched the site 10 years ago; searching is difficult and clunky.
The article, meanwhile, highlights the processes three major Pittsburgh institutions went through to create a collaborative space that tells the story of this city through images, maps, and population data. It's proof positive of why things like data compression matter; it provides an example of practical applications of the skills we're developing in this course. The capacity for and output of cross-disciplinary, inter-organizational collaboration is so greatly increased on a digital platform. The article also gives a step-by-step breakdown of how information technology integrates with the goals of organizations. Imagining Pittsburgh is a fine example of a professional document meant to reflect accountability and expertise in the face of funders and professional organizations. Meanwhile, none of it would have been possible without the ability of data compression.

While the YouTube and Libraries article demonstrated an implementation of lossy compression, it nevertheless seemed, to me a little hokey; by this point everyone knows the democratizing value of YouTube. I initially wondered why the author didn't suggest embedding the helpful videos she proposes ("how to find the reference desk") onto the library's own website, but I do understand that the wide popularity of the juggernaut that is YouTube would probably garner more views than on the library's site itself. Overall, though, the article seemed to me to reflect the latent fear of new technology, born from the fear of obsolescence, that is a passing trend among librarians as a new generation steps into the field.

LIS 2600: My Fun Blog of Extreme Awesomeness

Thursday, September 11, 2014

Week 3 Readings : Data Compression Almost De-Mystified But Not Quite, Historic Pittsburgh is the Best Pittsburgh, and Duh, We All Use YouTube

No comments:

Post a Comment