Let’s say we have a table that links products to a category:
productID AS INT
categoryID AS INT
For some reason we managed to get dulicate productID’s and categoryID’s. I don’t really care to add a unique ID (UID) field to reference against using this script:
ALTER TABLE productCats ADD RowID INT IDENTITY(1,1) GO DELETE productCats FROM productCats LEFT OUTER JOIN ( SELECT MIN(RowID) AS RowID, productID, categoryID FROM productCats GROUP BY productID, categoryID ) AS KeepRows ON productCats.RowID = KeepRows.RowId WHERE KeepRows.RowId IS NULL GO ALTER TABLE productCats DROP COLUMN RowID GO
Here we group together the duplicates into one row and assign it the minimum RowID for that group. If the left join returns a NULL when comparing all the RowID’s to the new set of groups then it will be deleted.
In MS SQL 2005+ I can use a Common Table Expression (CTE) to solve this issue:
WITH CTE AS (SELECT ROW_NUMBER() OVER (PARTITION BY productID, categoryID ORDER BY ( SELECT 0) ) RN FROM productCats) DELETE FROM CTE WHERE RN > 1
In this example, we give a row number (RN) to each group of duplicates and delete that row if greater than 1.
For more information on CTE’s see http://msdn.microsoft.com/en-us/library/ms190766(v=sql.105).aspx