Sunday, December 17, 2006

How to find duplicate rows in a table? (For SQL Server 2005)

There are two possibilities,

1. Take account of all the rows: Your table doesn’t have any primary key and you want to check for duplicates.
2. Only for selected rows: Your table is having a primary key, so that will always be unique. Here you want to check for other columns.

They query is very easy for the first possibility.
Suppose a table ‘mytable’ has four columns a, b, c, d. (No Primary key)

I.e. select a,b,c,from mytable group by a,b,c,d having count(*) > 1

But, the query is long for the second option.

Suppose a table ‘mytable’ has four columns a, b, c, d (a and b is composite primary key)

Then the query would be,

select T1.a,T1.b,T1.c,T1.d
from
mytab T1, mytab T2
where
(T1.c=T2.c or (T1.c is null and T2.c is null))
and (T1.d=T2.d or (T1.c is null and T2.c is null))
and T1.A != T2.A
and T1.b != T2.b
order by T1.c,T1.d

OR (this is same as above)

select t1.a,t1.b, t1.c,t1.d
from
mytab T1 inner join mytab T2
on
(T1.c=T2.c or (T1.c is null and T2.c is null))
and (T1.d=T2.d or (T1.c is null and T2.c is null))
where
T1.A != T2.A and T1.b != T2.b
order by t1.c,t1.d