SQL Points

Sunday, January 20, 2019

SQL : ROW_NUMBER (), RANK () and DENSE_RANK ()

In this blog, we will discuss about ROW_NUMBER (), RANK () and DENSE_RANK () sql function and these sql function is being used to generate RowID for given result set. But these function calculate rowID in different way.

Table: EmployeeCourse

Select * From EmployeeCourse

Output:

Employee ID	Course ID
1	1
1	3
2	2
2	1
3	3
3	1

Row_NUMBER ():

This function assign unique ID for each row of result with help of Order by Column.

Select *, Row_Number() Over (Order by EmployeeID) as [Row Number] From EmployeeCourse

Output:

Employee ID	Course ID	Row Number
1	1	1
1	3	2
2	2	3
2	1	4
3	3	5
3	1	6

RANK() :

This function assigns a unique number for each row and for those rows, which have duplicate value, assign the same ranking. There would be gap in the sequence in case of duplicate value.

Select *, Rank() Over (Order by EmployeeID) as [Rank] From EmployeeCourse

Output:

Employee ID	Course ID	Rank
1	1	1
1	3	1
2	2	3
2	1	3
3	3	5
3	1	5

DENSE_RANK() :

This function assigns a unique number for each row and in case of duplicate value, it assigns the same rank but don’t leave any GAP in sequence.

Select *, DENSE_RANK() Over (Order by EmployeeID) as [Dense Rank] From EmployeeCourse

Output:

Employee ID	Course ID	Dense Rank
1	1	1
1	3	1
2	2	2
2	1	2
3	3	3
3	1	3

Thanks for visiting!!

SQL : Delete duplicate row

This blog will demonstrats how to delete duplicate rows by using Row_Number() function in sql server.
there is a Student table which have few duplcate records (highlighted with yellow color)

ID	Code	Address	Created Date
1	CN01	Memphis, TN	7/1/2017
2	CN02	Nashville, TN	7/9/2017
3	CN03	Cordova, TN	7/10/2017
4	CN01	Memphis, TN	7/11/2017

First we need to decide, which row we are going to keep and which one we are going to deleted.
In below example we are going to keep the oldest record.

Step 1 : Get duplicate student records - see sub query which is grouping by code and selecting only those code which are having more than 1 records

SQL :

Select Code, count(Code) [Ct] from vr.Student group by Code

having count(Code) > 1

Step 2: Generating Row Number by using ROW_NUMBER() function for each duplcate records

SQL :

With CTE AS

(

Select y.* , ROW_NUMBER() OVER(PARTITION BY y.Code ORDER BY y.Code, y.ID) AS RowRank from

(

Select Code, count(Code) [Ct] from dbo.Student group by Code

having count(Code) > 1

) y

)

Select * from CTE

Output :

ID	Code	Address	Created Date	RowRank
1	CN01	Memphis, TN	7/1/2017	1
4	CN01	Memphis, TN	7/11/2017	2

in above output, we can see the oldest duplcate row has low rowrank and now we can delete records exception the oldest record (row rank = 1)

With CTE AS

(

Select y.* , ROW_NUMBER() OVER(PARTITION BY y.Code ORDER BY y.Code, y.ID) AS RowRank from

(

Select Code, count(Code) [Ct] from dbo.Student group by Code

having count(Code) > 1

)

)

Delete from dbo.Student where ID IN (Select ID from CTE where CTE.RowRank <> 1)

SQL: Clustered, Non-Clustered and Index on Multiple Columns

This blog will demonstrates how to improve the SQL query performance by adding index on SQL tables and will be discuss the scenarios where we should go with clustered , non-clustered or single index on multiple columns.

Clustered Index should be created on Column which uniquely identify to each row of tables and it defines the physical order of table records and then generally table’s primary key should be have clustered index. Now when you are going to create other index (Non-clustered) which generally help SQL engine to create execution plan and quickly filter the records.

If you creates non-clustered index in very smartly way, then it will be more efficient and more reusable.

Here is an example - an order table which contain company huge order records

Table: Order

Table Columns:

OrderID	PK	Primary Key
OrderDate
OrderNumber
State
City
Zip

On Order table, mostly we make search by OrderNumber and also search by location like State, City and Zip.

In consideration of uniqueness of record, you create clustered index on OrderID which is primary key of table and one more index (non-clustered) you can create on Order Number.

If you are looking search by location query, we are considering three columns (State, City and Zip)

Then mostly time we are searching order by State, City and zip combination

And sometime by state and city and sometime by zip code only.

If you are going to have single index which covers all columns (state, city and zip) then it will helps in all scenarios except search by zip; so in this case we need to have one more index on zip.

So after considering all above scenarios, table should have below indexes

Index Name	Type of Index	Columns
Inx_OrderOrderID	Clustered Index	OrderID
Inx_OrderOrderNumber	Non-Clustred Index	OrderNumber
Inx_OrderStateCityZip	Non-Clustred Index	State, City, ZIP
Inx_OrderZip	Non-Clustred Index	ZIP

Thanks for visiting!!

Thursday, January 17, 2019

SQL Pivot function

SQL Pivot function helps us to combine and compare of larger amount of data and it transfer the data from rows into columns and it will helpful in writing for cross-tabulation output.

Syntax :

SELECT
Non-PIVOT AND PIVOT COLUMNS
FROM
(
  PIVOT DATA eg. Select Query
)
AS
PIVOT
(
   aggregation function eg. SUM, Avg or Max

FOR
[]
IN ( SETS OF PIVOT COLUMNS)
) AS
;

Example :

EMP Table :
It contains employee records

EMP_ID	EMP_NAME
0001	SLOOPY JOE
0002	BILL GATES
0003	FRED FLINTSTONE

TASK Table :
It is Task Lookup table

TASK_CD	TASK_DESC
A	ADMIN
D	DESIGN
C	CODING
F	FUNCTION TEST
Q	QA

EMPLOYEEHOURS :

This table contains daily employee hours as task wise

DATE	EMP_ID	TASK_CD	HOURS
2013-04-01	0001	D	10
2013-04-02	0001	C	8
2013-04-03	0001	C	8
2013-04-04	0001	F	8
2013-04-05	0001	C	8
2013-04-07	0001	Q	8
2013-04-01	0002	C	10
2013-04-02	0002	C	8
2013-04-03	0002	C	8
2013-04-04	0002	C	8
2013-04-05	0002	Q	8
2013-04-07	0002	C	8
2013-04-08	0002	C	8
2013-04-01	0003	Q	10

Getting hours information as task wise for each employee

EMP_ID	NAME	DESIGN	CODING	FUNCTIONAL_TEST	QA
0002	BILL GATES	0	50	0	8
0003	FRED FLINTSTONE	0	0	0	10
0001	SLOOPY JOE	10	24	8	8

Here is SQL Pivot Script to calcuate each employee 's task hours for Design, Coding , functional Testing and QA and generate above tabluar report

Select
EMP_ID,
EMP_NAME,
ISNULL ([DESIGN],0) [DESIGN],
ISNULL([CODING],0) [CODING],
ISNULL([FUNCTION TEST],0) [FUNCTION TEST],
ISNULL([QA],0)[QA]

FROM

(
SELECT
e.EMP_ID,
e.EMP_NAME,
t.TASK_DESC,
h.HOURS

FROM EMPLOYEEHOURS h

inner join EMP e on h.EMP_ID = e.EMP_ID

inner join TASK t on h.TASK_CD = t.TASK_CD

) p

PIVOT

(

sum (HOURS)

FOR TASK_DESC IN

([DESIGN],[CODING],[FUNCTION TEST],[QA])

) AS pvt

ORDER BY 2;

Thanks for Visiting!!

Sunday, January 20, 2019

SQL : ROW_NUMBER (), RANK () and DENSE_RANK ()

SQL : Delete duplicate row

SQL: Clustered, Non-Clustered and Index on Multiple Columns

Thursday, January 17, 2019

SQL Pivot function

Contributors