May 25, 2011

Delete Duplicate Records in Sql Server 2005

In this post, we will learn how to delete duplicate records in SQL Server using CTE.

In the earlier version of SQL Server, if you need to delete duplicate records you had to the following steps
  • Store the distinct of duplicate records in some temp tables
  • Delete the duplicate records from the main table
  • Move the temp table data to main table
But, from SQL Sever 2005 onward, it becomes easy and more efficient to delete duplicate records from the table using CTE and Row_Number.

NOTE: CTE was introduced in SQL Server 2005, so the solution would work on SQL Server 2005 onward

Let's first create a table
CREATE TABLE tblduplicate(
    id       INT,
    VALUE    VARCHAR(10)
)

Now let's populate the table with duplicates records
insert into tblduplicate
select 1, 'value 1' union all
select 1, 'value 1' union all
select 1, 'value 2' union all
select 2, 'value 2' union all
select 2, 'value 2' union all
select 3, 'value 3'

Now let's check the data in the table
select * from tblduplicate
OUTPUT

Now lets write script to delete duplicates records from the table using CTE. In the solution first we would generate a sequential no. for duplicate records using Row_Number.

Suggested article to understand Row_Number in detail: Row_Number in SQL Server

Once row number is generated we would delete all the records where row number is greater than 1 as there are duplicate records.
;with cte AS (
    SELECT  id, VALUE
            , Row_number() OVER (PARTITION BY id, VALUE ORDER BY id) AS rowno
    FROM    tblduplicate
)
DELETE cte WHERE rowno > 1
Now let's again check the data in the table
select * from tblduplicate
OUTPUT

Check the output. Duplicate records are deleted from the table