This blog will demonstrats how to delete duplicate rows by using Row_Number() function in sql server.
there is a Student table which have few duplcate records (highlighted with yellow color)
there is a Student table which have few duplcate records (highlighted with yellow color)
ID
|
Code
|
Address
|
Created Date
|
1
|
CN01
|
Memphis, TN
|
7/1/2017
|
2
|
CN02
|
Nashville, TN
|
7/9/2017
|
3
|
CN03
|
Cordova, TN
|
7/10/2017
|
4
|
CN01
|
Memphis, TN
|
7/11/2017
|
First we need to decide, which row we are going to keep and which one we are going to deleted.
In below example we are going to keep the oldest record.
In below example we are going to keep the oldest record.
Step 1 : Get duplicate student records - see sub query which is grouping by code and selecting only those code which are having more than 1 records
SQL :
Select Code, count(Code) [Ct] from vr.Student group by Code
having count(Code) > 1
Step 2: Generating Row Number by using ROW_NUMBER() function for each duplcate records
SQL :
With CTE AS
(
Select y.* , ROW_NUMBER() OVER(PARTITION BY y.Code ORDER BY y.Code, y.ID) AS RowRank from
(
Select Code, count(Code) [Ct] from dbo.Student group by Code
having count(Code) > 1
) y
)
Select * from CTE
Output :
ID
|
Code
|
Address
|
Created Date
|
RowRank
|
1
|
CN01
|
Memphis, TN
|
7/1/2017
|
1
|
4
|
CN01
|
Memphis, TN
|
7/11/2017
|
2
|
in above output, we can see the oldest duplcate row has low rowrank and now we can delete records exception the oldest record (row rank = 1)
With CTE AS
(
Select y.* , ROW_NUMBER() OVER(PARTITION BY y.Code ORDER BY y.Code, y.ID) AS RowRank from
(
Select Code, count(Code) [Ct] from dbo.Student group by Code
having count(Code) > 1
)
)
)
Delete from dbo.Student where ID IN (Select ID from CTE where CTE.RowRank <> 1)
Other Links:
SQL Pivot Table
Thanks for Visiting!!