After considerable research, it turned out that the contractor’s quality control measures had rejected the drums upon preliminary inspection. Foolishly, they sent them to the dump unused but with their radioactive symbol still prominently displayed causing panic in the neighborhood. Our prospective role was to help them label each drum as soon as it was received using either bar codes or RF Tags so they could track them quickly and easily through their process and to quickly respond with a full history of each drum.
In database design, I too frequently see DB developers make this same mistake; that is, they fail to tag their data or design their schema in a way that facilitates location and retrieval.
In the next few posts, I’ll provide some guidelines that I’ve learned over the past three decades of building and deploying relational databases. As a disclaimer, this treatise is not a database doctrinal statement; but more of a set of guidelines to consider.
One reason the waste contractor had suffered the public-relations nightmare was because he had no quick way of distinguishing one drum from another. To learn the history, he had to query the employees until he found one who know why the drums where there and what their history was. Often, databases have the same problem so developers have to run multiple queries before they can locate the data they’re after.
Guideline One – Every Table Should Have a Primary Key (and that key should be as small as possible and be efficiently sortable).
- A Primary Key (“PK”) is like the barcode identifier on the drums and enables the DB engine to locate a particular record in a given table.
- They are used by indexes to find a particular row and therefore, they must be unique to that column in the table.
- They should be easily and efficiently storable and sortable by the DB engine. This almost always means a single column (even though multiple columns can be joined to form a composite primary key) and almost always means numbers, typically integers.
- GUIDs as keys (even the so called “sequential GUIDs”) are almost never a good idea.
- If you have a simple lookup table, use the smallest integer type you can which will improve query response time. For example, the SQL Server engine can store 8 times as many pointers in the same space when you use a tinyInt over a BigInt (1 byte vs 8 bytes).
- The simplest and often the best PK is to use is an “Identity Key” which the DB will auto-increment for each new record.
Guidelines always have exceptions and here are a couple:
- Since the GUID (see my anecdote below) is globally unique, it is sometimes used in DB design to make sure the values of a table are unique to a table that is distributed. For example, you may have multiple distributed databases and tables holding the same data and use replication to merge the values back later. In this case, the use of a globally unique number can make sure each unique row is replicated to the various copies of the table. As a designer/developer, you should understand though that this design is optimized for replication and not query performance.
- Composite Primary Keys are alright in cross-reference tables. For example, you might have a Patient table and an Encounter table, each with integer primary keys. The cross-reference table, Patient_Encounter might have only two columns that are joined as a composite Primary Key: the PatientId, and EncounterId which would support the one-to-many relationship between Patient and Encounter.
An Anecdote: What’s a GUID?
The formal definition of GUID is that it’s Microsoft’s implementation of the standards based UUID or Universally Unique Identifier. So why does Microsoft call it a Globally Unique Identifier instead of a Universally Unique Id?
Well in fact, Microsoft started out using UUID and still has a Win32 API called UUIDCreate. But the new Star Trek had come out about this time and there was a lively internal and external debate about whether these number were truly “universally” unique or just “globally”. To appease developers both inside and outside the company (who were TOTAL geeks), they changed the moniker to “Globally” Unique ID.
Now, if we can all just agree that it’s pronounced GWID and not GOO’ ID, I’ll be happy.