Smart Computing-Editorial

Reference Series


How Computers Work, Part I
August 2001• Vol.5 Issue 3 Page(s) 108-111 in print issue
Relating To Databases Used Properly, Relational DatabasesAre Powerful Tools For Business

One of the main tools used to control information is the database. Databases arrange information in sets for easy access. With a well-organized database, business can track and obtain reports on invoices, inventory, customers, and employees. For example, you can use a database to store and retrieve telephone numbers for customers.

You find databases outside the computer world, although where you find them won’t immediately come to mind. The contents of a manila folder and a Rolodex work as functional databases. However, the massive storage and tracking capabilities of computers unleashes the true power of the database, particularly relational databases.

On the next few pages we explain relational design principles with the hope that you familiarize yourself with the foundation of a database, including how to design tables for storing data and how to link those tables. Understanding these two points is at the very core of relational database design.

Early relational database design proponents articulated a set of rules for building tables and relating them. The relational database design literature calls the most significant of these rules Normal Forms. We’ll take you through Normal Forms and then introduce you to the different types of relationships for the tables in a database.

We’ll also spell out the basics of SQL (Structured Query Language), a major way to manipulate the contents of relational databases. Even if your relational database manager puts a friendly face on SQL through a graphical user interface, it likely offers a chance to query the database directly through SQL.

Databasics. Database software organizes information about things into tables, records, and fields. A table is a collection of records, and a record is a collection of fields. In our introductory telephone example, we could make a table of customer data—including telephone numbers. Each customer in the table would appear as a record, or row, in the table. The customer’s number will be a field in each record. If a customer has several types of telephone numbers, you might create multiple fields for office, fax, cell, and home telephones.

When people organize data with tables, they use either relational databases or flat database files. The flat-file approach puts all the information into one large table. The relational approach divides data into two or more tables and relates the tables. Novice database users start with flat files because of the ease in putting data into a single table. The relational database rules require segmenting data into multiple tables.

As long as the database remains very small, relational database designs offer no significant advantage. However, as a database grows in terms of the number of records and fields it contains, relational databases lead to faster, easier, and more flexible use of your data.

Failure to apply relational database principles threatens the validity of your database as you (and perhaps others) add, update, and delete records. Nearly all leading database managers, including Microsoft Access, Microsoft SQL Server, Sybase, and Oracle, rely on relational design principles.

Addition Concerns. Putting all your data in one table inevitably creates redundancies. Placing all the data in one table also creates a potential for updating and deletion problems because the table does not recognize the various sources for data in the database application. Following relational database design principles ensures you avoid undesirable effects resulting from faulty design.

To our example: Figure 1 (on the next page) shows a table representing student seminar enrollments. When you fill a database with redundant fields, data entry slows, and it annoys those using a database application when they must re-enter the same fields. Repeating field values also creates opportunities for corrupting a database.

Notice the trainer Charles Hill appears on two rows, but he has different telephone numbers on each row. What if you want to remove student Don Charney from the table? You would have to remove the entire row, that includes course and trainer, in a flat-file format. Since Don Charney is the sole student in VBA Programming, eliminating the student also removes the VBA Programming seminar. But you didn’t want to eliminate the seminar, you just wanted to eliminate that one student. With a relational database and proper table design, you can accomplish your tasks without changing the face of your database’s landscape.

Normal Form Design. Relational database theory has many rules that help with database design, but the first three Normal Form rules for table design are especially popular.

First, Normal Form states that the values in table cells must be atomic, or indivisible. One cell should not contain both a person’s first and last names if there is any chance the database will need to report either name separately.

There are two common violations of First Normal Form. One is to put two distinct fields in the same column. In our Figure 1 example, the Trainer field violates First Normal Form because it contains both first and last names of the trainers. If your database application requires a trainer’s last name alone or last name before first name, database users will have to write some code to process the data in the Trainer column before they can use it. The table removes this difficulty for students by representing first name and last name in separate columns.

A second way to violate First Normal Form is use multiple columns for the same type of data. Novice database designers sometimes use a separate set of columns for each repetition of field values. For example, the table in Figure 1 could have a separate pair of columns (with Cid and Cname) for each seminar. Using this type of design violates First Normal Form. The values in the second and subsequent pairs of columns repeat those in the first pair of columns. This type of table design uses computer storage inefficiently and is slow to process.

Second Normal Form states that all the columns in a table should be about one thing (and the table should be in First Normal Form). Relational databases represent this one thing with the primary key—a column with unique values for each row. The sample table is actually about three entities: students, courses, and trainers. Proper design calls for splitting the sample table into three separate ones. By splitting the one table into three, you gain the ability to drop a sole student taking a course and retain the course in the curriculum.

Third Normal Form requires that all non-key fields in a row be independent of one another (and the table should be in Second Normal Form). For example, you should not have fields for both Order Date and Order Month in the same table. This is because you can derive Order Month from Order Date.

Normalization rules improve your database design, but some situations justify deviation from them. One common scenario involves the inclusion of postal code, state, and city fields in a table. If a table has a postal code column, it does not strictly need city or state columns. This is because these are dependent on ZIP code. However, some smaller database applications may not have the resources to set up and maintain a table with all the equivalencies between postal code, city, and state fields. In addition, reporting city and state fields always requires the joining of a postal code table with the table containing the remainder of the contact data. For a very large table of contact data, this can slow processing.

Oracle, one of the nation's leading database managers, relies on relational design principles. Other top database managers include Microsoft Access, Microsoft SQL Server, and Sybase.

Relationships Between Tables. The first three Normal Form rules specify how to construct individual tables in a database. Subsequent rules deal with how to specify relationships between tables. Three relationships can exist between any pair of tables in a database. They can be in a one-to-one, one-to-many, or many-to-many relationship. When you understand these relationships and specify them properly for the tables in your database, your design will be consistent with relational database principles, including selected higher-level Normal Form rules such as Fourth Normal Form.

One-to-one relationships are uncommon in database design, but they have several distinct advantages. First, a one-to-one keeps your tables in compliance with the First Normal Form and Third Normal Form. You encounter the need for a one-to-one relationship when there are two or more fields that are unique for every row in a table.

Imagine a company policy where each worker can have a company charge card. The employee and charge card tables will be in a one-to-one relationship. Fields appropriate for the employee table, such as picture and department, are not necessary for the charge card table. Likewise, a field for outstanding balance may be appropriate for the charge card table but not the employee table. Link these two tables by placing the primary key from each table in the other table. A primary key from another table is called a foreign key. The foreign keys provide a mechanism for linking the two tables with one another.

The one-to-many relationship is typical in database applications. This relationship is established in the sample table between students and courses. Each student can take more than one course. Similarly, there is a one-to-many relationship between trainers and courses. Each trainer can present more than one course.

As with the one-to-one relationship, the primary key from the table on the one side of the relationship must appear as a foreign key in the other table. After splitting the sample table into three tables, add a copy of the Sid field from the Students table to the Courses table. This makes the Sid field available in the Courses table as a foreign key. It does not serve that role until you actually declare a one-to-many relationship between the Students and Courses tables.

The details for declaring a one-to-many relationship vary according to the specific database package you use. If you use Microsoft SQL Server, open a diagram window. Add the tables for the one and many sides of the relationship. Select the primary key, Sid, from the table on the one side of the relationship. This is the Students table in our example. Drag and drop the Sid field to the foreign key in the many table (Sid in the Courses table). This opens the Create Relationship dialog box. Verify that the primary and foreign keys are correct before clicking OK to create the relationship and close the dialog box.

A many-to-many relationship between tables is appropriate when rows in each table can relate to many instances of rows in the other table. The Students and Trainers tables share a many-to-many relationship. Each student can take courses from more than one trainer. In addition, each trainer can present courses to many students.

This kind of relationship can confuse beginners because it defines a relationship between two tables, but it depends critically on a third table. You might call this third intermediary table a junction box because it is in a one-to-many table with each of the other two tables. In our example, Courses is the junction box table between the Students and Trainers tables.

To complete a many-to-many relationship, add a one-to-many relationship between the Trainers and the Courses tables. Since the Courses table is already in a one-to-many relationship with the Students tables, this completes the many-to-many relationship between the Students and Trainers tables.

Referential integrity is a feature of relational database managers that maintains consistent relationships between the two tables in a one-to-many relationship. It supports three specific outcomes.

First, it does not allow the adding of records to a table on the many side of a relationship if there is no matching primary key on the one side of the relationship. Second, it does not permit a change to a primary key value on the one side of the relationship that results in foreign keys on the many side that have no match. Third, it disallows the deletion of a record on the one side of a relationship if there are any related records on the many side.

Speaking SQL. SQL is the language for representing relational database models. If you work with relational databases long enough, you inevitably encounter it.

SQLcontains three subsets. First, the DML (Data Manipulation Language) enables the processing of existing database objects. A table is an example of a database object. Different database managers may define their objects distinctly, but these objects are the building blocks of a database. Second, the DDL (Data Definition Language) supports the creation of new databases and objects within an existing database. Third, the DCL(Data Control Language) focuses on the administration of permissions and other security-related issues.

Nearly all users of relational databases are likely to encounter DML. This SQL subject includes commands that extract selectable result sets from a database as well as the insertion, update, and deletion of database records. Among the various DML statements, the SELECT statement may offer the greatest variety of options for it permits the specification of a subset from a relational database.

The INSERT, UPDATE, and DELETE statements permit their namesake operation. With the Insert command, you can add one or more rows to a table. The source can be constants or records from another data source (either within or outside of the current database). The UPDATE statement lets you change one or more field values. You can specify the target for the update as fields from one or more tables. The DELETE statement is seductively simple and powerful since it can remove one, a few, or all records of your choosing from a table.

With SELECT statements, you require the SELECT keyword and at least the FROM clause. In the arguments for the SELECT keyword, designate the column names for the fields you want in your return set. The FROM clause arguments designate the tables that contribute values to the SELECT arguments. The SQL statement SELECT au_fname, au_lname FROM authors extracts the first and last name fields from all rows in the authors table for the pubs database that ships with Microsoft and Sybase relational database managers. Use the WHERE clause to select a subset of rows from which to extract results. The WHERE clause also enables joining two or more tables on-the-fly for ad hoc analyses.

Relational Database History. Relational database theory emerged during the 1970s. An article by E. F. Codd in the June 1970 issue of the Communications of the ACM was an early statement of this revolutionary way of representing data on a computer. Before relational databases became popular, it was common for database users to have to know about the physical storage of data on a computer as well as the process they were modeling with data.

Relational database theory disconnected the physical storage of data from how users created and queried databases. The growth of relational databases is a driving factor for the widespread application of database systems in organizations. Today, it is common for database applications to play critical roles in accounting, ordering, manufacturing, human resources, and sales management.

SQL specifically targets tables designed in accordance with relational design principles. This language permits the processing of data in tables without the need for specifying each row within a table. You can designate selecting, inserting, deleting, updating, and other database operations for whole tables. By referencing whole tables instead of the individual rows within a table, SQL simplifies database commands and speeds their operation.

System R was an early predecessor of SQL. Researchers at the IBM San Jose Research facility created System R in the early 1970s to demonstrate its ability to perform common database tasks such as transaction processing. The Multics Relational Data Store appeared circa 1977 as the first commercial database manager. Since then, the ANSI (American National Standards Institute) standardized SQL. Commercial vendors, such as Oracle and Microsoft, augment the ANSI SQL version with custom extensions for their database systems.

Get Organized. Relational database principles represent a powerful, proven technology for modeling data. Its widespread acceptance and durability since the 1970s confirm its power, suitability, and ease of use by many different communities of users. While volumes have been written about the design and use of these powerful tools, we have reviewed just a few of the core concepts here. Consider it a foundation to build upon if your requirements dictate the need for you to drill further down into the topic.

by Rick Dobson

View the graphics that accompany this article.
(NOTE: These pages are PDF (Portable Document Format) files. You will need Adobe Acrobat Reader to view these pages. Download Adobe Acrobat Reader)

Want more information about a topic you found of interest while reading this article? Type a word or phrase that identifies the topic and click "Search" to find relevant articles from within our editorial database.