Historical Perspective and Modern Developments
Databases were originally collections of flat files, manipulated by programs that were typically written in COBOL. Typically, multiple copies of the same data were maintained, each copy sorted in a different way. The invention of the index (a data structure on disk that enabled one to look at the data as though it were sorted on one or more fields) changed all that. A single flat file could have as many indexes (on as many fields) as necessary, making multiple copies unnecessary.
Hierarchical Databases (mid 60s)
The first attempt at a database was IBMs IMS, which followed what was termed the hierarchical model. This allowed one-to-many relationships (e.g., one manager, many subordinates) but had a problem with many-to-many relationships. (E.g., if a subordinate reported to more than one manager, you had to duplicate the subordinate's data.)
Having said this, there are certain kinds of data that naturally follow a hierarchical model, notably, documents that are marked up with XML. There are times when you want to manipulate the XML (e.g., performing queries on a single document to extract certain features), and you often want to index the XML content for faster search across documents that are stored in the same database. Most high-end relational databases (see below) now offer XML support to a varying degree.
Network Databases (late 60s)
The Network model, developed in response to this shortcoming, allowed many-to-many relationships. They are fast and efficient (network databases are still used to this day in applications such as satellite communications and airline reservations, where very fast response times are needed). This is partly because they use "hard pointers": references to the contents of one table from another actually refer to the physical address of the relevant data on disk, bypassing the need for indexes.
However, network databases require a lot of programming to use successfully- unlike relational databases, they cannot be used by non-programming power users out of the box. Also, they are inflexible: once the data is organized in a particular way, it is hard to change one's mind. Data reorganization involves the equivalent of major surgery.
Relational Databases (late 60s to present)
Today's de facto standard databases were conceived in late 60s by Edgar F. Codd at IBM. Codd was a mathematician, and the heavily mathematical tone of his papers was partly responsible for the delayed acceptance of his work. Codd's idea was to define a set of operations that would work on databases the way algebraic operators worked on numbers. The first pilot implementation (at IBM San Jose by a team led by Chris Date in the early 70s) demonstrated the viability of the concept.
In keeping with the ideas of COBOL, relational database implementation was thought to be facilitated by creation of an English-like Query Language. The language created for this purpose was called SQL (Structured Query Language), though it eventually grew in scope to handle other tasks such as editing/modification of data, data definition (e.g., specifying the structure of individual tables) and database security (e.g., specifying the privileges of individual database users for different tables in the database).
Object-Oriented Databases (OODBs) and Object-Relational Databases (late 80s to present)
Object-orientation is a programmer's rather than an end-user's concept. To understand this term (which has been one of the buzzwords in database marketing during the 90's), you must first understand the concept of a class. An class is a "kind of thing" that we want to represent information about- a student, employee, product, etc. To use an over-simplification, it is roughly the equivalent of a table in a relational database, with an important enhancement: the permissible operations on a class ("methods") are also specified along with the class data structure definition. That is, class = data structure + methods. An object is an instance of a class, e.g., Jim Jones is an instance of Employee.
An Overview of Classes
From the programmer's point of view, classes are a good thing. This is because a well-designed class is reusable: instead of reinventing the wheel, you use other people's work. More important, a properly designed class hides the details of how it works, and lets the user of the class focus on the more important task of getting the job done. All that the user of the class has to know is what the methods of the class are good for, and how to use them. An example of a well-designed set of classes is the set that is bundled with Microsoft Word. While most people use Word only to compose documents, it is also possible to program Word to do pretty interesting and powerful things that would take you a very long time to create if you started from scratch.
For example, there is a method in Word called GetSpellingSuggestions, which takes a series of letters representing a word, and can do one of several things:
It can check that the word is spelled correctly, or if not, return spelling suggestions. "Spelled correctly" only means that the word exists in MS-Word's online dictionary.
It can return the synonyms or antonyms of the word by checking its thesaurus
It can take a scrambled word and return all anagrams of the word that happen to be words in its dictionary.
The interesting thing about this method is that it takes just a single line of program code to call it: a tremendous amount of work is being done behind the scenes, but the programmer doesn't have to worry about what's happening: only the end-result matters.
On platforms such as Windows, classes can be designed in such a way that they can be called in a programming language other than that in which they were originally written. For example, though Word is written in a combination of C and assembler, you can call GetSpellingSuggestions from Word's own programming language, Visual Basic for Applications. (For that matter, you could call it from a custom application that you built: such an application could be used as part of a Medical Dictation System, for example, to increase the accuracy of transcription.)
Classes and Databases
Relational databases have traditionally addressed only the data structure aspect of classes. For large and complex systems, bundling the operations along with the data structure enables more robust construction of the system components (because each component can be tested more thoroughly before it is added to the system). Almost all mainstream programming languages (even Visual BASIC) have evolved to incorporate object-oriented features, and OODBs have similarly evolved to fill a lacuna.
Unfortunately, many of the early "object-oriented" DBs were little more than repackaged hierarchical or network database engines with a modest programming veneer layered on. Mercifully, market pressures have resulted in considerable attrition. Also, most relational databases (Oracle, Informix) have evolved to add object-oriented features, resulting in the Object-Relational database (ORDB). It appears that all relational systems of the future will in fact be object-relational.
OODBs and ORDBs are a heterogeneous bunch, and the ones that have survived have done so for several reasons:
1. The ability to model and manipulate complex data, such as images, video, fingerprints and so on. The main advantage of OODBs here is that allow considerably greater control over the storage and manipulation of the contents of a complex data element than most relational systems would. For example, one can think of "methods" for a fingerprint: returning a classification code for a fingerprint of a known or unknown person (so that it may be filed away), and searching the database with the fingerprint of an unknown person to return close matches, which can then be checked by human experts.
2. Support for special operations common to niche markets that are not very well supported by the more mainstream database engines, e.g., support for Computer-Aided-Design and Manufacturing (CAD/CAM) and Document management. Both of these involve Version Control.
3. Support for data structures such as arrays, which are useful for waveform data such as EKGs and EEGs.
4. The OODBs with an underlying network architecture use "hard pointers" for speed of retrieval in operations such as telecommunications.
The pioneering package in this area was Illustra (itself a commercialization of the University of California, Berkeley, POSTGRES project). Illustra was subsequently bought by Informix and incorporated into the Informix Universal Server technology. The essential idea (exemplified by Illustra's marketing logo, "No BLOBs") is that there should be no difference (with respect to the level of difficulty) between the manipulation of simple datatypes and complex datatypes.
Of course, not everyone has the skills and gumption to program the manipulation of video or sound clips. Therefore, the idea was to allow hardcore developers to create "database extensions". An extension is basically a programming toolkit that had all the functionality for manipulation of a special kind of data (e.g., multimedia, spatial data). Such extensions can then be (and are) made commercially available for less skilled programmers to use.
Oracle, for example, markets extensions for handling spatial data as well as free text data such as bibliographic data. Informix also markets extensions which can handle text in a variety of native word-processing and other formats (e.g., HTML, MS-Word, Wordperfect).