Let’s say we have a table that links products to a category:
productID AS INT
categoryID AS INT
For some reason we managed to get dulicate productID’s and categoryID’s. I don’t really care to add a unique ID (UID) field to reference against using this script:
ALTER TABLE productCats
ADD RowID INT IDENTITY(1,1)
GO
DELETE productCats
FROM productCats
LEFT OUTER JOIN (
SELECT MIN(RowID) AS RowID, productID, categoryID
FROM productCats
GROUP BY productID, categoryID
) AS KeepRows ON productCats.RowID = KeepRows.RowId
WHERE KeepRows.RowId IS NULL
GO
ALTER TABLE productCats
DROP COLUMN RowID
GO
Here we group together the duplicates into one row and assign it the minimum RowID for that group. If the left join returns a NULL when comparing all the RowID’s to the new set of groups then it will be deleted.
In MS SQL 2005+ I can use a Common Table Expression (CTE) to solve this issue:
WITH CTE
AS (SELECT ROW_NUMBER() OVER (PARTITION BY productID, categoryID ORDER BY ( SELECT 0) ) RN
FROM productCats)
DELETE FROM CTE
WHERE RN > 1
In this example, we give a row number (RN) to each group of duplicates and delete that row if greater than 1.
For more information on CTE’s see http://msdn.microsoft.com/en-us/library/ms190766(v=sql.105).aspx