SQL Points: Delete Duplicate

This blog will demonstrats how to delete duplicate rows by using Row_Number() function in sql server.
there is a Student table which have few duplcate records (highlighted with yellow color)

ID	Code	Address	Created Date
1	CN01	Memphis, TN	7/1/2017
2	CN02	Nashville, TN	7/9/2017
3	CN03	Cordova, TN	7/10/2017
4	CN01	Memphis, TN	7/11/2017

First we need to decide, which row we are going to keep and which one we are going to deleted.
In below example we are going to keep the oldest record.

Step 1 : Get duplicate student records - see sub query which is grouping by code and selecting only those code which are having more than 1 records

SQL :

Select Code, count(Code) [Ct] from vr.Student group by Code

having count(Code) > 1

Step 2: Generating Row Number by using ROW_NUMBER() function for each duplcate records

SQL :

With CTE AS

(

Select y.* , ROW_NUMBER() OVER(PARTITION BY y.Code ORDER BY y.Code, y.ID) AS RowRank from

(

Select Code, count(Code) [Ct] from dbo.Student group by Code

having count(Code) > 1

) y

)

Select * from CTE

Output :

ID	Code	Address	Created Date	RowRank
1	CN01	Memphis, TN	7/1/2017	1
4	CN01	Memphis, TN	7/11/2017	2

in above output, we can see the oldest duplcate row has low rowrank and now we can delete records exception the oldest record (row rank = 1)

With CTE AS

(

Select y.* , ROW_NUMBER() OVER(PARTITION BY y.Code ORDER BY y.Code, y.ID) AS RowRank from

(

Select Code, count(Code) [Ct] from dbo.Student group by Code

having count(Code) > 1

)

)

Delete from dbo.Student where ID IN (Select ID from CTE where CTE.RowRank <> 1)

SQL Points

Sunday, January 20, 2019

SQL : Delete duplicate row

Contributors