My mother-in-law worked for the government in the 1970's. She has told me tales of these giant drums that were the databases of the era. They a lot of physical manipulation and waiting. Is this a navigational kind of database, wherein the user had to wade through combinations of inflexible, predefined paths of data? Applications had to sort through linked data sets within a larger network; this required inefficient levels of time, training, and funding.
Enter the relational database, much more flexible and interrelated in that it uses and interrelated series of table to reference one another on a query-basis. I would be assigned a "key", let's say my first name, all subsequent data about me on other tables could be accessed using that key. If all my personal information (address, phone number, birth date, social security number, favorite flavor of ice cream, etc) were listed in separate tables as defined by those categories, a relational database is able to call up the query-relevant data only when it is needed. A standardized search term like this improves the capacity and fluidity of the database.
Subsequent evolution of database structures seem to be based on the relational approach (it seems that Moore's Law was the downfall for the "Integrated approach"). The question then became one of developing relevant and effective query languages as well as refining schemata for increased and interrelated data. It seems that SQL remains on the top of the pile in terms of query languages, even though it's been through various permutations for different uses.
With a vast increase in end users in the 1980's (desktop computers! Welcome, the hoi polloi!), users were left the task of manipulating data, and DBSMs would quietly go about their business of decompressing, reading, and recompressing files. With the advent of end users, data became more tied to a "user", rather than a user being tied to disparate data concerning her, the beginning of the "profile".
These days, there are many types of databases to serve different data needs. For instance, a database that manages ambulance dispatch would probably be an in-memory database because, as the Wikipedia article states, "response time is critical". Many people now use cloud databases to have access to their data and information from any physical point that has internet connectivity. Libraries are increasingly serving as data warehouses. As digital scholarly publishing and discourse happen in digital spaces, the warehouses allow for mining and managing data "for further use", which I think means the development of metadata- very much a buzzword in the LIS field today. In addition, current digital journals or collections thereof, like JSTOR, are hypertext/hypermedia databases, allowing for papers to be linked to research data and referenced works, for instance. Federated databases are relevant to library work in the creation of, say, the European Digital Library, where disparate institutions share their collections and information.
Designing databases starts with understanding exactly what data is going to be organized for storage and retrieval. This conceptual modelling informs the actual data structure; the data structure is dependent on the database technology and its attendant DBMS, i.e., the logical data model is not conceptual at all, but is instead informed by the requirements of the database management system. There are lots of these models and, to be frank, I almost understood them, but not really. Not really.
Data modelling's first step is normalization; from what I can deduce, normalization is the paring down of raw data until it's lean, mean, and internally consistent enough to be treated as a body of data to be placed into a database (that you are building according to the demands of the DBMS). Eliminate useless redundancies and similarities (which brings us to the best term of the reading: atomicity) and assign each reduced data set a primary key (see below); from here, the cached webpage became very difficult for me to understand, as it was so referential to the diagrams, which unfortunately were missing. From what I can tell, the next step in normalization requires that each primary key (having trouble with the concept of "concatenated keys") is distinct in itself and does not rely on association with any other key; if association occurs, creating a new table with all associated keys is in order. The last step in normalization is identifying non-key attributes, as they relate to the key attributes being used in a table. I am also unclear as to the one-to-many, many-to-one, and many-to-many relationships. Visuals would definitely help.
Peter Chen's data modeling system begins with a conceptual mapping of the structure; this concerns the dividing up into tables that will house the relevant data, and creating a cartography of relationships between the tables. After the conceptual framework is set up, tables are populated with data in a way that conforms with the dictates of the DBMS. In order to best define the relationship of data between one another, each datum is considered an attribute (noun); "relationships" are expressed as the connection between these nouns. Unique attributes assigned to an entity create is "primary key". The semantics of the entity-relationship model are in grammatical terms, which really helps me keep a handle on the concept. Illustrating the types of connection between entities and relationships can be visually represented in many ways, known as "cardinality constraints".
Meanwhile, in order to keep a database healthy and happy, it seems that developing redundancy and the ability to replicate older forms (what if a migration goes wrong?) of a database are key. Back up your data, folks, that's always been rule number one.
This is a lot of information, and I look forward to seeing some visual aids!
No comments:
Post a Comment