Mongo Sez Technology Blog: Prevent @@IDENTITY Theft

I've been working on a database design that will be importing multiple feeds and exporting data thru SSIS, and have created various audit fields in my tables to link a record to a specific input file. The overall SSIS solution I've chosen is a topic for another post, but while figuring this out, I ran into a common operation that a developer runs into: returning the identity value of the last inserted record.

Why am I writing about this? For starters, when I began thinking about this, I wanted to refresh my memory on what my options were, and share my findings and thoughts. Secondly, it is imperative that the most applicable method is chosen. If not, the risk of data compromise increases greatly and I wanted to describe my situation and brief "catch-all" situation we might encounter in our day-to-day design and development.

Meat & Potatoes

I already know from my requirements what the frequency of the reports would be and when they would be delivered to us. Therefore, I know I can schedule each import task to run and not compete with any other import; thus, guaranteeing that my last inserted id in the audit table will be unique to that feed.

However, what happens if you cannot guarantee that each insert will be unique to that specific operation - such as a web application with concurrent users or a service that listens for input files coming in at various times or concurrently?

So which method do I choose or which method should you choose? Let's take a look...

For those that are familiar there are three different ways to achieve the last inserted id for an identity column in SQL Server:

The links will take you directly to the SQL Server 2008 Books Online reference in MSDN. I will not cover deep dive details, just a high level understanding with small examples.

@@IDENTITY

"@@IDENTITY returns the last identity value generated for any table in the current session, across all scopes."

Bad news bears. This pretty much says, give me the last identity that was created by anyone and anywhere. I know that I don't need the last identity generated. In my case that will return me the id of the last record I inserted into my data table and not my audit table. In a concurrent user environment this could return you the id from a unrelated table that had a record added by a completely separate user.

To me, this is the least desirable method and runs a much higher risk of compromising data leading to my trademark, "Things That Go FAIL."

SCOPE_IDENTITY()

"SCOPE_IDENTITY returns the last identity value generated for any table in the current session and the current scope."

Ok. This doesn't help me, but may help you. We can now say that I will be returned the last id value for that specific process. Again, in my case this still doesn't work. I will still get the last id from the data table and not the audit table.

In a concurrent application, this will at least provide you with the last id from the current process. However, what if you are doing multiple inserts in one stored procedure? You would have to consider your order of operations to ensure that you are returning the id value you need. This may not be possible because of foreign key relationships and how the inserts will cascade. Or, following the operation you need the id value you can set the output param of the stored procedure to this. Better, but not great.

IDENT_CURRENT('table_name')

"IDENT_CURRENT returns the last identity value generated for a specific table in any session and any scope."

I like the sound of this. Soup to nuts: give me the last id value for the table I specify that was created by anyone. Pretty much a SELECT MAX(id_field) of the specified table.

I've finally reached the potato salad in the buffet line. Given my requirements and design I can now update my inserted record with the appropriate id from my audit table.

How does this help in concurrent applications? It really doesn't. You still assume the risk that another user has updated the table after you. I would stick with a SCOPE_IDENTITY solution while being aware of order of operations and/or assignment of the output param following the given insert operation.

Or we can go ahead a use a GUIDs to ensure that we have an impenetrable unique id for every insert operation, and pass those around using parameters.

Things That Go FAIL

All three of these methods can go FAIL. Understanding how each method operates and what you're data means and is supposed to do will lead to a well informed decision.

@@IDENTITY - One simply cannot guarantee that the last id returned is from my specific operation on table it was performed on.
SCOPE_IDENTITY - If not used properly.
IDENT_CURRENT('table_name') - If not used properly.

Mongo Sez

Read up on the three techniques in SQL Server Books Online to gain an understanding of what each one does.
I did not cover the implications of these methods in a transactional or replicated environment. There are other considerations of using identity values in this circumstance.
Understanding your data and how it's being used is imperative - as in any database design. A full understanding will prepare you to make a well informed decision on your approach. Proper design not only helps your data become information, but can greatly impact those down the line from you: other database developers working with your design, application developers consuming your database, and your end users.
I <3 potato salad.

Questions, comments, am I completely out-of-my-mind? Let me know.

Mongo Sez Technology Blog

Tuesday, May 18, 2010

Prevent @@IDENTITY Theft

No comments:

Post a Comment

Who 'dis

Blog Archive

Followers