Posted Jan 31, 2011
Apr 20, 2006 Every RDBMS of which I’m aware offers a feature to make surrogate keys easier by automatically generating the next larger value upon insert. In SQL Server, it’s called an IDENTITY column. In MySQL, it’s called AUTOINCREMENT. It’s possible to generate the value in SQL, but it’s easier and generally safer to let the RDBMS do it instead.
By Gregory A. Larsen
When designing a database to support applications you need to consider how you are going to handle primary keys. This article explores natural and surrogate keys, and discusses the pros and cons of each, allowing you to determine what makes the best sense in your environment when you are designing your databases.
When designing a database to support applications you needto consider how you are going to handle primary keys. There are two schoolsof thought, or maybe three. There are those that say primary keys shouldalways be a made up key, or what is commonly called a surrogate key. Otherssay there are good reasons to use real data as a key value; this type of key isknown as natural key. The third group is those that design their databases sotheir primary keys are a combination of natural and surrogate keys. In thisarticle, I’m going explore natural and surrogate key, and discuss the pros andcons of each. This will allow you to determine what makes best sense in yourenvironment when you are designing your databases.
When you design tables with SQL Server, a table typically hasa column or a number of columns that are known as the primary key. The primarykey is a unique value that identifies each record. Sometimes the primary key ismade up of real data and these are normally referred to as natural keys, whileother times the key is generated when a new record is inserted into a table. When a primary key is generated at runtime, it is called a surrogate key. A surrogatekey is typically a numeric value. Within SQL Server, Microsoft allows you todefine a column with an identity property to help generate surrogate key values.
Before I talk about the pros and cons of natural andsurrogate keys, let me first expand a little more on each type of key. By doingthis you will have a better understanding of each of these two types of keys,and will have a more solid foundation to determine which type of key you shoulduse in your database design.
A natural key is a single column or set of columns thatuniquely identifies a single record in a table, where the key columns are madeup of real data. When I say “real data” I mean data that has meaning andoccurs naturally in the world of data. A natural key is a column value thathas a relationship with the rest of the column values in a given data record. Here are some examples of natural keys values: Social Security Number, ISBN, andTaxId.
A surrogate key like a natural key is a column that uniquelyidentifies a single record in a table. But this is where the similaritystops. Surrogate keys are similar to surrogate mothers. They are keys thatdon’t have a natural relationship with the rest of the columns in a table. Thesurrogate key is just a value that is generated and then stored with the restof the columns in a record. The key value is typically generated at run timeright before the record is inserted into a table. It is sometimes alsoreferred to as a dumb key, because there is no meaning associated with thevalue. Surrogate keys are commonly a numeric number.
Now that you have an understanding of the difference betweenthese two types of keys I will explore why you might use one key over theother. In the world of data architects, there is much debate over when it isappropriate to use a natural key and when a better solution would be to use asurrogate key. As already stated there are mainly just twodifferent camps. Some say you should always use a natural key and the otherssay a surrogate key is best. I suppose there is also a third camp that uses acombination of both natural keys and surrogate keys in their database design. Rather than state my opinion on which is best I’ll give you the pros and consof uses each and then you can decide with is best for your design.
A definite design and programming aspect of working with databases is built on the concept that all keys will be supported by the use surrogate keys. To understand these programming aspects better, review these pros and cons of using surrogate keys.
Pros:
Cons:/microsoft-office-2013-key-generator-free-download-no-survey.html.
Having natural keys as indexes on your tables mean you willhave different programming considerations when building your applications. Delta force xtreme 2 cd key generator. Youwill find that pros and cons for natural keys to be just the opposite as thepros and cons for surrogate keys.
Pros:
Cons:
There is much debate in the world of data modeling over whatkind of data should be used to support primary keys. There are some puristthat say all primary key should be surrogate keys, no matter how small thenatural key, or the fact that the natural key will never be updated. Other sayyou need to use natural keys because they make coding your application just somuch easier. When you design your databases, you need to decide what works bestin your environment. What kind of database designer are you and into which design camp do you fall?
» See All Articles by ColumnistGregory A. Larsen
| Latest Forum Threads | |||
| MS SQL Forum | |||
| Topic | By | Replies | Updated |
| SQL 2005: SSIS: Error using SQL Server credentials | poverty | 3 | August 17th, 07:43 AM |
| Need help changing table contents | nkawtg | 1 | August 17th, 03:02 AM |
| SQL Server Memory confifuration | bhosalenarayan | 2 | August 14th, 05:33 AM |
| SQL Server – Primary Key and a Unique Key | katty.jonh | 2 | July 25th, 10:36 AM |
As those of you who watched my recent webinar Data Modeling Fundamentals With Sisense ElastiCube might recall, a primary key is a unique identifier given to a record in our database, which we can use when querying the database or in order to join multiple sources. This article will discuss the concept of surrogate keys and show some examples of when and how to apply them using simple SQL.
Before we dive into natural vs. surrogate keys, let’s recall four important rules to follow when selecting a primary key for your data model:
First, let’s go over the difference between these two forms of primary keys:
A natural key is a key that has contextual or business meaning (for example, in a table containing STORE, SALES, and DATE, we might use the DATE field as a natural key when joining with another table detailing inventory).
A natural key can be system-generated, but natural keys are at least partially determined by a manual process. Some natural keys are totally manually generated. One of the most widely recognized uses of a natural key is a stock ticker symbol – i.e. MSFT, APPL, and GOOGL. Natural keys serve as a great primary key when contextual meaning is important.
A surrogate key is a key which does not have any contextual or business meaning. It is manufactured “artificially” and only for the purposes of data analysis. The most frequently used version of a surrogate key is an increasing sequential integer or “counter” value (i.e. 1, 2, 3). Surrogate keys can also include the current system date/time stamp, or a random alphanumeric string.
See Sisense in action:
The main advantage of natural keys is in their simplicity and in the fact that the data maintains its original context. They will often be (relatively) easy to recognize to people viewing the data, and relying on natural keys reduces the need to enrich the data using custom SQL. Additionally:
Even though all three records contain a sequential ID of 123, the natural key prefix allows the user to immediately identify different data types.
To create
While it might be tempting and initially easier to rely on existing natural keys, this could prove problematic when scaling the data model, or in a more complex environment, which we will demonstrate using an example of stock tickers:
As mentioned, a surrogate key sacrifices some of the original context of the data. However, it can be extremely useful for analytical purposes for the following reasons:
Certain business scenarios might require keeping the natural key intact as a means for users to interact with the database. In these cases …
In the following example, we will look at a table containing historical data about product prices. By using a custom SQL expression in the Sisense Elasticube Manager, we create the surrogate key ProdDate_Key, which in this case is created by combining the other fields into a single, unique identifier that can easily be queried later.
SSELECT DISTINCT
tostring(ProductID)+'_'+tostring(getyear(Date))+'-'+tostring(getmonth(Date))+'-'+tostring(Getday(Date)) AS Prod_Date_Key,
Date,
PH.ProductID,
PH.ListPrice
FROM [ProductListPriceHistory] PH JOIN [AllDates] ON Date between PH.StartDate AND PH.EndDate
Want to master data modeling? Watch our on demand webinar and learn the fundamental skills every analyst should have.