![]() |
Reference Series |
Table of Contents For This Issue
|
| |
How Computers Work, Part I | |
August 2001 Vol.5 Issue 3 Page(s) 108-111 in print issue | |
Relating To Databases Used Properly, Relational DatabasesAre Powerful Tools For Business |
![]() You find databases outside the computer world, although where you find them won’t immediately come to mind. The contents of a manila folder and a Rolodex work as functional databases. However, the massive storage and tracking capabilities of computers unleashes the true power of the database, particularly relational databases. On the next few pages we explain relational design principles with the hope that you familiarize yourself with the foundation of a database, including how to design tables for storing data and how to link those tables. Understanding these two points is at the very core of relational database design. Early relational database design proponents articulated a set of rules for building tables and relating them. The relational database design literature calls the most significant of these rules Normal Forms. We’ll take you through Normal Forms and then introduce you to the different types of relationships for the tables in a database. We’ll also spell out the basics of SQL (Structured Query Language), a major way to manipulate the contents of relational databases. Even if your relational database manager puts a friendly face on SQL through a graphical user interface, it likely offers a chance to query the database directly through SQL. ![]() When people organize data with tables, they use either relational databases or flat database files. The flat-file approach puts all the information into one large table. The relational approach divides data into two or more tables and relates the tables. Novice database users start with flat files because of the ease in putting data into a single table. The relational database rules require segmenting data into multiple tables. As long as the database remains very small, relational database designs offer no significant advantage. However, as a database grows in terms of the number of records and fields it contains, relational databases lead to faster, easier, and more flexible use of your data. Failure to apply relational database principles threatens the validity of your database as you (and perhaps others) add, update, and delete records. Nearly all leading database managers, including Microsoft Access, Microsoft SQL Server, Sybase, and Oracle, rely on relational design principles. ![]() To our example: Figure 1 (on the next page) shows a table representing student seminar enrollments. When you fill a database with redundant fields, data entry slows, and it annoys those using a database application when they must re-enter the same fields. Repeating field values also creates opportunities for corrupting a database. Notice the trainer Charles Hill appears on two rows, but he has different telephone numbers on each row. What if you want to remove student Don Charney from the table? You would have to remove the entire row, that includes course and trainer, in a flat-file format. Since Don Charney is the sole student in VBA Programming, eliminating the student also removes the VBA Programming seminar. But you didn’t want to eliminate the seminar, you just wanted to eliminate that one student. With a relational database and proper table design, you can accomplish your tasks without changing the face of your database’s landscape. ![]() First, Normal Form states that the values in table cells must be atomic, or indivisible. One cell should not contain both a person’s first and last names if there is any chance the database will need to report either name separately. There are two common violations of First Normal Form. One is to put two distinct fields in the same column. In our Figure 1 example, the Trainer field violates First Normal Form because it contains both first and last names of the trainers. If your database application requires a trainer’s last name alone or last name before first name, database users will have to write some code to process the data in the Trainer column before they can use it. The table removes this difficulty for students by representing first name and last name in separate columns. A second way to violate First Normal Form is use multiple columns for the same type of data. Novice database designers sometimes use a separate set of columns for each repetition of field values. For example, the table in Figure 1 could have a separate pair of columns (with Cid and Cname) for each seminar. Using this type of design violates First Normal Form. The values in the second and subsequent pairs of columns repeat those in the first pair of columns. This type of table design uses computer storage inefficiently and is slow to process. Second Normal Form states that all the columns in a table should be about one thing (and the table should be in First Normal Form). Relational databases represent this one thing with the primary key—a column with unique values for each row. The sample table is actually about three entities: students, courses, and trainers. Proper design calls for splitting the sample table into three separate ones. By splitting the one table into three, you gain the ability to drop a sole student taking a course and retain the course in the curriculum. Third Normal Form requires that all non-key fields in a row be independent of one another (and the table should be in Second Normal Form). For example, you should not have fields for both Order Date and Order Month in the same table. This is because you can derive Order Month from Order Date. Normalization rules improve your database design, but some situations justify deviation from them. One common scenario involves the inclusion of postal code, state, and city fields in a table. If a table has a postal code column, it does not strictly need city or state columns. This is because these are dependent on ZIP code. However, some smaller database applications may not have the resources to set up and maintain a table with all the equivalencies between postal code, city, and state fields. In addition, reporting city and state fields always requires the joining of a postal code table with the table containing the remainder of the contact data. For a very large table of contact data, this can slow processing.
![]() One-to-one relationships are uncommon in database design, but they have several distinct advantages. First, a one-to-one keeps your tables in compliance with the First Normal Form and Third Normal Form. You encounter the need for a one-to-one relationship when there are two or more fields that are unique for every row in a table. Imagine a company policy where each worker can have a company charge card. The employee and charge card tables will be in a one-to-one relationship. Fields appropriate for the employee table, such as picture and department, are not necessary for the charge card table. Likewise, a field for outstanding balance may be appropriate for the charge card table but not the employee table. Link these two tables by placing the primary key from each table in the other table. A primary key from another table is called a foreign key. The foreign keys provide a mechanism for linking the two tables with one another. The one-to-many relationship is typical in database applications. This relationship is established in the sample table between students and courses. Each student can take more than one course. Similarly, there is a one-to-many relationship between trainers and courses. Each trainer can present more than one course. As with the one-to-one relationship, the primary key from the table on the one side of the relationship must appear as a foreign key in the other table. After splitting the sample table into three tables, add a copy of the Sid field from the Students table to the Courses table. This makes the Sid field available in the Courses table as a foreign key. It does not serve that role until you actually declare a one-to-many relationship between the Students and Courses tables. The details for declaring a one-to-many relationship vary according to the specific database package you use. If you use Microsoft SQL Server, open a diagram window. Add the tables for the one and many sides of the relationship. Select the primary key, Sid, from the table on the one side of the relationship. This is the Students table in our example. Drag and drop the Sid field to the foreign key in the many table (Sid in the Courses table). This opens the Create Relationship dialog box. Verify that the primary and foreign keys are correct before clicking OK to create the relationship and close the dialog box. A many-to-many relationship between tables is appropriate when rows in each table can relate to many instances of rows in the other table. The Students and Trainers tables share a many-to-many relationship. Each student can take courses from more than one trainer. In addition, each trainer can present courses to many students. This kind of relationship can confuse beginners because it defines a relationship between two tables, but it depends critically on a third table. You might call this third intermediary table a junction box because it is in a one-to-many table with each of the other two tables. In our example, Courses is the junction box table between the Students and Trainers tables. To complete a many-to-many relationship, add a one-to-many relationship between the Trainers and the Courses tables. Since the Courses table is already in a one-to-many relationship with the Students tables, this completes the many-to-many relationship between the Students and Trainers tables. Referential integrity is a feature of relational database managers that maintains consistent relationships between the two tables in a one-to-many relationship. It supports three specific outcomes. First, it does not allow the adding of records to a table on the many side of a relationship if there is no matching primary key on the one side of the relationship. Second, it does not permit a change to a primary key value on the one side of the relationship that results in foreign keys on the many side that have no match. Third, it disallows the deletion of a record on the one side of a relationship if there are any related records on the many side. ![]() SQLcontains three subsets. First, the DML (Data Manipulation Language) enables the processing of existing database objects. A table is an example of a database object. Different database managers may define their objects distinctly, but these objects are the building blocks of a database. Second, the DDL (Data Definition Language) supports the creation of new databases and objects within an existing database. Third, the DCL(Data Control Language) focuses on the administration of permissions and other security-related issues. Nearly all users of relational databases are likely to encounter DML. This SQL subject includes commands that extract selectable result sets from a database as well as the insertion, update, and deletion of database records. Among the various DML statements, the SELECT statement may offer the greatest variety of options for it permits the specification of a subset from a relational database. The INSERT, UPDATE, and DELETE statements permit their namesake operation. With the Insert command, you can add one or more rows to a table. The source can be constants or records from another data source (either within or outside of the current database). The UPDATE statement lets you change one or more field values. You can specify the target for the update as fields from one or more tables. The DELETE statement is seductively simple and powerful since it can remove one, a few, or all records of your choosing from a table. With SELECT statements, you require the SELECT keyword and at least the FROM clause. In the arguments for the SELECT keyword, designate the column names for the fields you want in your return set. The FROM clause arguments designate the tables that contribute values to the SELECT arguments. The SQL statement SELECT au_fname, au_lname FROM authors extracts the first and last name fields from all rows in the authors table for the pubs database that ships with Microsoft and Sybase relational database managers. Use the WHERE clause to select a subset of rows from which to extract results. The WHERE clause also enables joining two or more tables on-the-fly for ad hoc analyses. ![]() Relational database theory disconnected the physical storage of data from how users created and queried databases. The growth of relational databases is a driving factor for the widespread application of database systems in organizations. Today, it is common for database applications to play critical roles in accounting, ordering, manufacturing, human resources, and sales management. SQL specifically targets tables designed in accordance with relational design principles. This language permits the processing of data in tables without the need for specifying each row within a table. You can designate selecting, inserting, deleting, updating, and other database operations for whole tables. By referencing whole tables instead of the individual rows within a table, SQL simplifies database commands and speeds their operation. System R was an early predecessor of SQL. Researchers at the IBM San Jose Research facility created System R in the early 1970s to demonstrate its ability to perform common database tasks such as transaction processing. The Multics Relational Data Store appeared circa 1977 as the first commercial database manager. Since then, the ANSI (American National Standards Institute) standardized SQL. Commercial vendors, such as Oracle and Microsoft, augment the ANSI SQL version with custom extensions for their database systems. ![]() ![]() by Rick Dobson View the graphics that accompany this article. (NOTE: These pages are PDF (Portable Document Format) files. You will need Adobe Acrobat Reader to view these pages. Download Adobe Acrobat Reader)
|