Access database normalization levels 1, 2, 3; getting normal about it!
The database normalization process was developed by EFCodd, considered the father of relational database theory.
There are several rules that provide theoretical structures and disciplines that are not always practical to follow, but help provide the main objectives which are:
- Eliminate redundant information
- Increase data integrity
- Make systems more efficient
Modern databases should be in BCNF Boyce–Codd normal form which is considered to be in third normal form of which there are considered to be five in total. This article focuses on what I think is considered a good balance for applying some of these rules, and covers up to the third rule of database normalization.
Before first normal form refers to multiple managed tables in smaller units that have a potential relationship and have the following attributes:
- Each table is described as an entity, so an example of an ‘Order Processing’ database might have multiple tables that would be divided into logical units (tables), including Customers and Orders, and would not be stored in one table complete (known as a flat file). ).
- Each record in a table is not duplicated, so a field in a record can be set to unique: primary key.
- The fields (columns) are in no particular order.
- The records (rows) are in no particular order.
The key is the wrench and each table must have a set of primary keys, although this is optional in Access, it is highly recommended that you at least set this type of key for each table you want to relate to.
First normal form (1NF) says that all column values must be atomic. Atomic means indivisible and refers to the row-by-column position where you should have multiple records (or repeating groups) and not all values stored in one record. For example, an order placed for a customer may contain one or more items. You should not store the order in a row for multiple items, as you would have difficulty retrieving information from the record.
By storing each element in a separate record with a common link to the ID field, the principle of First Normal (INF) database normalization is followed. Even if there is no unique ID in the illustration above, you can either add a unique audit trail ID field and set the primary key or consider combining the two fields which will also make it unique i.e. combining the ‘ID of Order’ and ‘Order Item ID fields. This will help write queries later where you can filter by an item and more importantly calculate with the ‘Quantity’ field.
Second normal form (2NF) This follows from the first normal level (1NF) where tables should store data related to only one thing. (entity) and that this entity must be fully described by its primary key.
For example, an ordering database system might contain the following four tables:
- Customers that have a primary key (Customer ID) and the data is related to the customer profile, namely name, address and contact details.
- Orders containing the order header information, including the order reference (as a primary key), date, and shipping information, but not individual item details.
- Order details that has a secondary key (Order ID as defined in the Orders table) and the data is related to the parent order header that shows the items in the order, namely product, price, and quantity.
- Products that have a primary key (Product ID) and the data is related to the product profile, i.e. name, price, and stock levels.
Focusing on the ‘Orders’, ‘Order Details’ and ‘Products’ tables, there is a relationship between them that looks at this level of database normalization where each record is distinct and the data is relative to a single entity. As an order is placed (by the customer), it is assigned a unique order ID which is the primary key. The second table records this unique ID, but it is considered the secondary key of the Order Details table, since it cannot be unique. This is because this order can have multiple items (products) and is stored in the Order Details table along with the price and quantity. Product information comes from the Products table that has a unique value known as the Product ID.
So the ‘Order Detail’ movable table sits between the ‘Order’ and the ‘Product’ so that database normalization allows you to reuse the same order for multiple items and reuse the same product across different orders. and it is only stored as a single record once in the external tables. Having only two or even just one table would denormalize the database and make it very difficult to run reports efficiently.
Note: The only real point to remember when achieving this level of database normalization is how the tables will relate to each other using the simple one to many relationship. One order, many articles and a product used many times. Natural groups of repeating records are the key here, helping to break data sets into smaller, more manageable units.
Third Normal Form (3NF) This is derived from the Second Normal Form of database normalization and deals with the descriptive information of a primary key field.
Using the ‘Customers’ table which has a field ‘Customer ID’ (primary key) has a field called ‘Company Name’ which is the narrative/description that identifies what the customer is known for. I would not store the company name each time in the ‘Orders’ table, thus repeating the data value as the ‘Customer ID’ field controls this. Therefore, if the company changes its name, it is changed once in one place. However, there may be a problem if you need to retain historical records of previous names and an additional move may be required.
The ‘Products’ table also has the same relationship as the ‘Product Name’ field, with the ‘Product ID’ being the unique key and only stored in the ‘Products’ table. This rule also deals with dependency fields that are computed fields. There is no benefit to storing the ‘Order Total’ for a transaction if it can be determined by calculating the ‘Quantity’ times ‘Price’. Store unnecessary extra fields i.e. total order adds extra overhead to a table and can affect performance. Using queries, forms and reports to calculate a new field is the best approach and therefore only raw data must be stored in a table.
Note: Violating this level of database normalization with respect to computed fields can sometimes be a good trade-off between performance and usability.
Having ‘Order Total’ stored as an actual field can help with the query, form and report design process, as it can reduce the need to have multiple tables in a relationship, making development easier. Don’t get too wrapped up in theory. Try and work your way up to a level three database normalization as it makes good sense when designing tables and fields. There are higher levels (four and five) that must have the previous levels applied first as each level continues in the hierarchy. If you think there are deficiencies in the first three levels, move on to the next level (which isn’t covered here, search the web!) Finally, remember the main goal of standardization. I came across an article that I came across many years ago that emphasized these rules that keep the whole theory in perspective…
“The key, all the key and nothing but the key, so help me, Codd.”