Sunday, January 20, 2019

SQL : ROW_NUMBER (), RANK () and DENSE_RANK ()

In this blog, we will discuss about ROW_NUMBER (), RANK () and DENSE_RANK () sql function and these sql function is being used to generate RowID for given result set. But these function calculate rowID in different way.

Table:  EmployeeCourse

Select * From EmployeeCourse

Output:  

Employee ID
Course ID
1
1
1
3
2
2
2
1
3
3
3
1


Row_NUMBER ():
This function assign unique ID for each row of result with help of Order by Column.

Select *, Row_Number() Over (Order by EmployeeID)  as [Row Number] From EmployeeCourse

Output:

Employee ID
Course ID
Row Number
1
1
1
1
3
2
2
2
3
2
1
4
3
3
5
3
1
6

RANK() :
This function assigns a unique number for each row and for those rows, which have duplicate value, assign the same ranking. There would be gap in the sequence in case of duplicate value.

Select *, Rank() Over (Order by EmployeeID)  as [Rank] From EmployeeCourse

Output: 
Employee ID
Course ID
Rank
1
1
1
1
3
1
2
2
3
2
1
3
3
3
5
3
1
5

DENSE_RANK() :
This function assigns a unique number for each row and in case of duplicate value, it assigns the same rank but don’t leave any GAP in sequence.

Select *, DENSE_RANK() Over (Order by EmployeeID)  as [Dense Rank]  From EmployeeCourse

Output:

Employee ID
Course ID
Dense Rank
1
1
1
1
3
1
2
2
2
2
1
2
3
3
3
3
1
3

Thanks for visiting!!

SQL : Delete duplicate row

This blog will demonstrats how to delete duplicate rows by using Row_Number() function in sql server.
there is a Student table which have few duplcate records (highlighted with yellow color)

ID
Code
Address
Created Date
1
CN01
Memphis, TN
7/1/2017
2
CN02
Nashville, TN
7/9/2017
3
CN03
Cordova, TN
7/10/2017
4
CN01
Memphis, TN
7/11/2017


First we need to decide, which row we are going to keep and which one we are going to deleted.
In below example we are going to keep the oldest record.



Step 1 : Get  duplicate student records - see sub query which is grouping by code and selecting only those code which are having more than 1 records

SQL :
Select Code, count(Code) [Ct] from vr.Student group by Code
having count(Code) > 1

Step 2: Generating Row Number by using ROW_NUMBER() function for each duplcate records

SQL :

With CTE AS
(
Select y.* , ROW_NUMBER() OVER(PARTITION BY y.Code ORDER BY y.Code, y.ID) AS RowRank from 
(
Select Code, count(Code) [Ct] from dbo.Student group by Code
having count(Code) > 1
) y
)

Select * from CTE

Output : 

ID
Code
Address
Created Date
RowRank 
1
CN01
Memphis, TN
7/1/2017
1
4
CN01
Memphis, TN
7/11/2017
2

in above output, we can see the oldest duplcate row has low rowrank and now we can delete records exception the oldest record (row rank = 1)



With CTE AS
(
  Select y.* , ROW_NUMBER() OVER(PARTITION BY y.Code ORDER BY y.Code, y.ID) AS RowRank  from 
(
Select Code, count(Code) [Ct] from dbo.Student group by Code
having count(Code) > 1
) 

)

Delete from dbo.Student where ID IN (Select ID from CTE where CTE.RowRank <> 1)


Other Links:
SQL Pivot Table


Thanks for Visiting!!

SQL: Clustered, Non-Clustered and Index on Multiple Columns

This blog will demonstrates how to improve the SQL query performance by adding index on SQL tables and will be discuss the scenarios where we should go with clustered , non-clustered or single index on multiple columns.

Clustered Index should be created on Column which uniquely identify to each row of tables and it defines the physical order of table records and then generally table’s primary key should be have clustered index. Now when you are going to create other index (Non-clustered) which generally help SQL engine to create execution plan and quickly filter the records.
If you creates non-clustered index in very smartly way, then it will be more efficient and more reusable.


Here is an example - an order table which contain company huge order records

Table: Order

Table Columns:

OrderID
PK
Primary Key
OrderDate


OrderNumber


State


City


Zip



On Order table, mostly we make search by OrderNumber and also search by location like State, City and Zip.

In consideration of uniqueness of record, you create clustered index on OrderID which is primary key of table and one more index (non-clustered) you can create on Order Number.
If you are looking search by location query, we are considering three columns (State, City and Zip)

Then mostly time we are searching order by State, City and zip combination
And sometime by state and city and sometime by zip code only.

If you are going to have single index which covers all columns (state, city and zip) then it will helps in all scenarios except search by zip; so in this case we need to have one more index on zip.

So after considering all above scenarios, table should have below indexes

Index Name
Type of Index
Columns
Inx_OrderOrderID
Clustered Index
OrderID
Inx_OrderOrderNumber
Non-Clustred Index
OrderNumber
Inx_OrderStateCityZip
Non-Clustred Index
State, City, ZIP
Inx_OrderZip
Non-Clustred Index
ZIP


Thanks for visiting!!

Thursday, January 17, 2019

SQL Pivot function

SQL Pivot function helps us to combine and compare of larger amount of data and it transfer the data from rows into columns and it will helpful in writing for cross-tabulation output.

Syntax : 

SELECT
Non-PIVOT AND PIVOT COLUMNS
FROM
    (
        PIVOT DATA  eg. Select Query 
    )
    AS   
PIVOT
(
   aggregation function eg. SUM, Avg or Max 

FOR
[]   
    IN ( SETS OF PIVOT COLUMNS)
) AS   
;


Example :

EMP  Table :
It contains employee records
                                                                 
EMP_ID
EMP_NAME
 0001
SLOOPY JOE
0002
BILL GATES        
0003
FRED FLINTSTONE

TASK Table :
It is Task Lookup table

TASK_CD
TASK_DESC
A
ADMIN
D
DESIGN        
C
CODING
F
FUNCTION TEST
Q
QA

EMPLOYEEHOURS  :

This table contains daily employee hours as task wise
  
DATE
EMP_ID
TASK_CD
HOURS
2013-04-01        
0001
D
10
2013-04-02
0001
C
8
2013-04-03
0001
C
8
2013-04-04
0001
F
8
2013-04-05
0001
C
8
2013-04-07
0001
Q
8
2013-04-01        
0002
C
10
2013-04-02
0002
C
8
2013-04-03
0002
C
8
2013-04-04
0002
C
8
2013-04-05
0002
Q
8
2013-04-07
0002
C
8
2013-04-08
0002
C
8
2013-04-01
0003
Q
10



Getting hours information as task wise for each employee

EMP_ID
NAME
DESIGN
CODING
FUNCTIONAL_TEST
QA
0002
BILL GATES              
0
50
0
8
0003
FRED FLINTSTONE         
0
0
0
10
0001
SLOOPY JOE              
10
24
8
8


Here is SQL Pivot Script to calcuate each employee 's  task hours for Design, Coding , functional Testing and QA and generate above tabluar report

Select 
EMP_ID, 
EMP_NAME, 
ISNULL ([DESIGN],0) [DESIGN],
ISNULL([CODING],0) [CODING],
ISNULL([FUNCTION TEST],0[FUNCTION TEST],
ISNULL([QA],0)[QA]
FROM
(
    SELECT 
    e.EMP_ID, 
    e.EMP_NAME, 
    t.TASK_DESC,
    h.HOURS
    FROM EMPLOYEEHOURS h
    inner join EMP e on h.EMP_ID  = e.EMP_ID
    inner join TASK t on h.TASK_CD = t.TASK_CD
) p
PIVOT
(
   sum (HOURS)
   FOR TASK_DESC IN
   ([DESIGN],[CODING],[FUNCTION TEST],[QA])
) AS pvt

ORDER BY 2;

Thanks for Visiting!!

Other SQL Links :