Showing posts with label Delete Duplicate. Show all posts
Showing posts with label Delete Duplicate. Show all posts

Sunday, January 20, 2019

SQL : Delete duplicate row

This blog will demonstrats how to delete duplicate rows by using Row_Number() function in sql server.
there is a Student table which have few duplcate records (highlighted with yellow color)

ID
Code
Address
Created Date
1
CN01
Memphis, TN
7/1/2017
2
CN02
Nashville, TN
7/9/2017
3
CN03
Cordova, TN
7/10/2017
4
CN01
Memphis, TN
7/11/2017


First we need to decide, which row we are going to keep and which one we are going to deleted.
In below example we are going to keep the oldest record.



Step 1 : Get  duplicate student records - see sub query which is grouping by code and selecting only those code which are having more than 1 records

SQL :
Select Code, count(Code) [Ct] from vr.Student group by Code
having count(Code) > 1

Step 2: Generating Row Number by using ROW_NUMBER() function for each duplcate records

SQL :

With CTE AS
(
Select y.* , ROW_NUMBER() OVER(PARTITION BY y.Code ORDER BY y.Code, y.ID) AS RowRank from 
(
Select Code, count(Code) [Ct] from dbo.Student group by Code
having count(Code) > 1
) y
)

Select * from CTE

Output : 

ID
Code
Address
Created Date
RowRank 
1
CN01
Memphis, TN
7/1/2017
1
4
CN01
Memphis, TN
7/11/2017
2

in above output, we can see the oldest duplcate row has low rowrank and now we can delete records exception the oldest record (row rank = 1)



With CTE AS
(
  Select y.* , ROW_NUMBER() OVER(PARTITION BY y.Code ORDER BY y.Code, y.ID) AS RowRank  from 
(
Select Code, count(Code) [Ct] from dbo.Student group by Code
having count(Code) > 1
) 

)

Delete from dbo.Student where ID IN (Select ID from CTE where CTE.RowRank <> 1)


Other Links:
SQL Pivot Table


Thanks for Visiting!!