Copying MS SQL 2016 Database on AWS RDS

A common task on a SQL Server might be to copy a database into a new database. This could be on the same server or a different server.

On Amazon Web Service’s (AWS) Relational Database Service (RDS) this task becomes a little more complex to plan and execute. There are a few reasons for this:

  1. You don’t have access to the local file system outside of creating a database in the defaulted path.
  2. You do not have the needed permissions to run “Copy Database Wizard”
  3. Doing a “native” backup/restore using S3 would create a duplicate “family_guid” which is impossible to restore on your RDS instance without deleting the source database.

If you use MS SQL 2016 you can export/import a “Data-tier Application” which is stored as a “.bacpac” file. This is a schema and data native format package (not human readable). In reality it’s a .zip file, so if you open it in something like 7-Zip you can see the package contents. This package is going to be lacking items such as the physical index (the index configuration is intact) and the transaction logs. Therefore it tends to be smaller than an actual native backup.

Keep in mind all data is transmitted from the source machine (AWS RDS) to your workstation. If this is a larger database, you may wish to consider doing this from an EC2 instance to create a faster connection, provide for a larger drive workspace and potentially save on network costs.

Here are the steps to take to backup and then restore a “.bacpac” file. This example is done using AWS RDS, but the same steps would be taken for about any environment including local.

  1. Open Microsoft SQL Server Managment Studio (MSSMS) 2016 or better and connect to your database server.
  2. Right click the source database, go to tasks and “Export Data-tier Application”.
    1
  3. Press “Next” to go to “Export Settings”. Enter a path on your local workstation to save the package to and press “Next”.
    2
  4. After you press “Finish”, it will then begin the copy process from the source machine to your local machine.
    3
  5. If you wish to import the database to another server, connect to it now via the MSSMS.
  6. Right-click the “Databases” group object and select “Import Data-tier Application”. Press “Next”.
    4
  7. Specify the path on your local machine to the “.bacpac” file being imported. Then press “Next”.
  8. Specify a new database name. It can not be the name of an existing database. Press “Next” and then “Finish”.
    5
  9. The new database will be created and imported.
    6

Update 10/29/2018
It appears the “family_guid” issue is no longer an issue. I have not verified it as of yet. See https://aws.amazon.com/about-aws/whats-new/2018/10/amazon-rds-for-sql-server-enhances-backup-and-restore-capabilities/

#aws, #export, #import, #microsoft, #rds, #sql

Microsoft SQL Server Management Studio Refresh IntelliSense Cache

When programming in Microsoft’s SQL Server Management Studio (2008), IntelliSense tends to throw notices about invalid object names after adding a new table. This has been annoying me for a long time now and I finally decided to research it. Thankfully I found an easy solution.

CRTL-SHIFT-R

This will clear the IntelliSense cache and it will now pickup any new objects you’ve created in the database.

 

Refresh_Local_Cache

#cache, #intellisense, #sql, #ssms

Multiple Column Dynamic Sorting with T-SQL

I have done a bit of research to find the most efficient way to dynamically sort query results in Microsoft SQL Server 2008. I ran each query test twice, changing the “ORDER BY” clause. I would then check the query plan cache to see if it cached only once. Each time I ran “DBCC FREEPROCCACHE” before the queries to make it easier to weed out. I then ran the following statement to see the query plan caches:

SELECT UseCounts, Cacheobjtype, Objtype, TEXT, query_plan
FROM sys.dm_exec_cached_plans
CROSS APPLY sys.dm_exec_sql_text(plan_handle)
CROSS APPLY sys.dm_exec_query_plan(plan_handle)
WHERE text like '%FROM CategoriesV2%'

The following code caches both queries with separate Adhoc Complied Plans. This is the case because the two queries do not have the exact query string. Note we can sort using a column alias. This would be a case example if you would switch the ORDER BY statement dynamically using variables using an Application Server such as ColdFusion’s #sortByVar#.

Note in all the examples, I start the next statement directly after “GO” without an extra line. This is because the query plan caches the second query with a white space prefix if there where an extra line. Therefore, to eliminate the risk of this being a factor, I do not include an extra line to separate the two queries.

DECLARE @HAS_PRODUCTS BIT = 1

SELECT TOP 2 label AS COL1, categoryID AS COL2
FROM CategoriesV2
WHERE hasProducts = @HAS_PRODUCTS
ORDER BY COL2 DESC, COL1
GO
DECLARE @HAS_PRODUCTS BIT = 1

SELECT TOP 2 label AS COL1, categoryID AS COL2
FROM CategoriesV2
WHERE hasProducts = @HAS_PRODUCTS
ORDER BY NEWID()
GO

The following code also caches both queries with separate Adhoc Complied Plans. This is the case because the “CASE/THEN” statements make a dynamic where clause that is unable to be cached. Note I am sorting by actual column name and not the alias. For some reason, unknown to me, the alias can not be referenced as the sorted column.

DECLARE @SORT VARCHAR(30) = 'SORT1',
		@HAS_PRODUCTS BIT = 1

SELECT TOP 2 label AS COL1, categoryID AS COL2
FROM CategoriesV2
WHERE hasProducts = @HAS_PRODUCTS
ORDER BY
	CASE WHEN @SORT = 'SORT1' THEN label END,
	CASE WHEN @SORT = 'SORT1' THEN categoryID END DESC,
	CASE WHEN @SORT = 'SORT2' THEN categoryID END DESC,
	CASE WHEN @SORT = 'SORT2' THEN label END
GO
DECLARE @SORT VARCHAR(30) = 'SORT2',
		@HAS_PRODUCTS BIT = 1

SELECT TOP 2 label AS COL1, categoryID AS COL2
FROM CategoriesV2
WHERE hasProducts = @HAS_PRODUCTS
ORDER BY
	CASE WHEN @SORT = 'SORT1' THEN label END,
	CASE WHEN @SORT = 'SORT1' THEN categoryID END DESC,
	CASE WHEN @SORT = 'SORT2' THEN categoryID END DESC,
	CASE WHEN @SORT = 'SORT2' THEN label END
GO

In the previous code, I haven’t quite figured out what’s going on using the “CASE/WHEN/THEN” switches. All I know is it works. The closest thing I could think of was it uses a table, which if the “ASC” or “DESC” are appended to each column name, those are the column names and the column names used are populated as the column names in the sort table if it matches the “CASE WHEN” value. If it doesn’t match then the column value is NULL but the column still exists. I came to that perception by replacing “ORDER BY” with “SELECT” and running that clause as a statement by itself. If anyone has better insight on what’s going on here, please let me know.

The following code caches both queries with separate Prepared Complied Plans. This is the case because the two queries, though dynamically created, do not have the exact query string. Note we can sort using a column alias again.

DECLARE @SQL NVARCHAR(MAX),
@SORT VARCHAR(30) = 'COL2 DESC, COL1'

SELECT @SQL = '
SELECT TOP 2 label AS COL1, categoryID AS COL2
FROM CategoriesV2
WHERE hasProducts = @HAS_PRODUCTS
ORDER BY ' + @SORT

EXEC SP_ExecuteSQL @SQL, N'@HAS_PRODUCTS BIT', 1
GO
DECLARE @SQL NVARCHAR(MAX),
@SORT VARCHAR(30) = 'COL1, COL2 DESC'

SELECT @SQL = '
SELECT TOP 2 label AS COL1, categoryID AS COL2
FROM CategoriesV2
WHERE hasProducts = @HAS_PRODUCTS
ORDER BY ' + @SORT

EXEC SP_ExecuteSQL @SQL, N'@HAS_PRODUCTS BIT', 1
GO

In the above statement, if you where to try and inject the @SORT via the SP_ExecuteSQL params, you would receive the error “Msg 1008, Level 16, State 1, Line x: The SELECT item identified by the ORDER BY number 1 contains a variable as part of the expression identifying a column position. Variables are only allowed when ordering by an expression referencing a column name.”

Therefore, from research and testing, I have so far determined that when a dynamic column sort is necessary, a query plan may be cached for each distinct sort value.

If you have found a way to cache a query plan only once, and it is the correct way to accomplish that, please let me know.

#dynamic, #order-by, #sort, #sql, #t-sql

Getting the MS SQL Identity ID With a Complex UPDATE / INSERT Statement

As I posted in my blog entry MSSQL Update/Insert Based Upon Existing Record, it is better to use “IF @@ROWCOUNT=0” rather than “IF EXISTS ()” when updating or inserting a record based upon existence. This is because the preferred method only does a table/index scan once, while the later will do it twice.

What if you need the Identity Column ID whether the record is updated or inserted? Normally in ColdFusion you can use “queryResult[“GENERATEDKEY”]”, however what you will find is this variable only works on simple insert statements. If you try to use @@IDENTITY or SCOPE_IDENTITY, you will find it only returns the Identity Column ID when the insert is triggered.

Introducing MS SQL’s OUTPUT clause, introduced in MS SQL 2005. We can use the OUTPUT clause to return the Identity Column ID(s).

When we add this to the INSERT and UPDATE clauses, the problem we run into is that during an insert it will return an empty set and another set containing the Identity Column ID that was inserted. Therefore we use a temporary table to help us with this.

Now with a temporary table, we introduce back in the second table scan. However this table scan will have very limited rows, if not just one. Plus the temporary table will not need disk access. So the second table scan is of no consequence.

So with all this in mind here’s an example that returns the Identity column that was either inserted or updated. Keep in mind that multiple IDs will be returned as separate rows if more than one table row was updated.

<cfquery name="local.qSetCart">			
	DECLARE @T TABLE (
	    cartID INT NOT NULL
	)
	
	UPDATE
		UserCart
	SET
		dateModified = GETDATE(),
		isGift = <cfqueryparam value="#arguments.isGift#" cfsqltype="cf_sql_bit">
	OUTPUT
		INSERTED.cartID INTO @T
	WHERE
		userID = <cfqueryparam value="#arguments.userID#" cfsqltype="cf_sql_integer">
	
	IF @@ROWCOUNT = 0
	
		INSERT
			INTO UserCart ( userID, isGift)
		OUTPUT
			INSERTED.cartID INTO @T
		VALUES (
			<cfqueryparam value="#arguments.userID#" cfsqltype="cf_sql_integer">,
			<cfqueryparam value="#arguments.isGift#" cfsqltype="cf_sql_bit">
		)
	
	SELECT cartID FROM @T
</cfquery>
		
<cfreturn local.qSetCart.cartID>

#coldfusion-2, #identity, #sql

“Element GENERATEDKEY is undefined…” after ColdFusion SQL Statement

This is a followup post to “Returning IDENTITY ID via ColdFusion 9′s CFQuery“.

Today I created an Insert statement that I needed the new identity value returned. It’s pretty simple, calling the result.generatedkey as such:

<cfquery result="myResult">
    INSERT INTO users( username )
    VALUES( 'test' )
</cfquery>
<cfreturn myResult["GENERATEDKEY"]>

The above code would normally result in the new identity value created for that record. However I received a “GENERATEDKEY is undefined…” error. Not sure what to think I dumped out the myResult variable, put the query into a file of itself and even tried using myResult.IDENTITY instead to no avail. After taking awhile figuring this out and ended up going home for the day and trying there again after a couple of hours. All of a sudden the standalone test worked!

So I’m thinking, okay this has to be some setting somewhere, and I was right. I narrowed it down to a query that runs before it during the login process for my website.

SET NOCOUNT ON

UPDATE TABLE1
SET COL1 = 'myVal'

Notice that I didn’t set “SET NOCOUNT OFF”. There lies the issue. Apparently when you set NOCOUNT off and keep it off, ColdFusion doesn’t retrieve the new Identity value. So the following code fixed my issue:

SET NOCOUNT ON

UPDATE TABLE1
SET COL1 = 'myVal'

SET NOCOUNT OFF

#cfquery, #coldfusion-2, #generatedkey, #identity, #nocount, #sql

ColdFusion CFQuery Zero Records and the “Element x is undefined in x” Error

When you run a CFQuery tag in ColdFusion you “will always” get a query object returned whether or not any results where found. But recently, after running a complicated query in ColdFusion 9.0.1, I found this to not be true. When there where results, everything worked as expected. However, when there where no results then I received a “Element x is undefined in x.” error when calling the CFQuery’s name variable. Here’s a simplified example of my code:

<cfquery name="local.products">
SELECT P.product
FROM Products P
JOIN (
    SELECT PC.productID, MIN( PC.price ) )
    FROM ProductsChildren PC
    GROUP BY PC.productID
) price ON P.productID = price.productID
</cfquery>
<cfdump var="local.products">

After banging my head and doing some research I ran through this thread that helped me resolve the issue: http://forums.adobe.com/message/2595679 . They mention a couple of bug ID’s, but Adobe updated their bug-base and I can’t find them. So at some point I’ll create a new ticket for this issue.

The problem lies in the fact that my SQL was generating a warning message inside the joined sub-query and CF wasn’t handling that very well. One fix is to fix why this message is being generated in the first place by adding a ISNULL() or COALESCE() around the column the MIN() method is reading. For example:

<cfquery name="local.products">
SELECT P.product
FROM Products P
JOIN (
    SELECT PC.productID, MIN( COALESCE( PC.price, 0 ) ) )
    FROM ProductsChildren PC
    GROUP BY PC.productID
) price ON P.productID = price.productID
</cfquery>
<cfdump var="local.products">

The second method would be to add “SET ANSI_WARNINGS OFF;” to the beginning of the SQL statement. For example:

<cfquery name="local.products">
SET ANSI_WARNINGS OFF;

SELECT P.product
FROM Products P
JOIN (
    SELECT PC.productID, MIN( PC.price, 0 ) )
    FROM ProductsChildren PC
    GROUP BY PC.productID
) price ON P.productID = price.productID
</cfquery>
<cfdump var="local.products">

The first fix is probably the best method to choose, however both of these resolved my issue.

There are other workarounds such as using the JDBC driver, but this approach seems to be the most practical.

#coldfusion-2, #join, #query, #sql, #sub-query, #undefined

Removing Duplicates in MS SQL 2005+ Table w/o Adding an UID

Let’s say we have a table that links products to a category:

productID AS INT
categoryID AS INT

For some reason we managed to get dulicate productID’s and categoryID’s. I don’t really care to add a unique ID (UID) field to reference against using this script:

ALTER TABLE productCats
ADD RowID INT IDENTITY(1,1)
GO

DELETE productCats
FROM productCats
LEFT OUTER JOIN (
    SELECT MIN(RowID) AS RowID, productID, categoryID
    FROM productCats
    GROUP BY productID, categoryID
) AS KeepRows ON productCats.RowID = KeepRows.RowId
WHERE KeepRows.RowId IS NULL
GO

ALTER TABLE productCats
DROP COLUMN RowID
GO

Here we group together the duplicates into one row and assign it the minimum RowID for that group. If the left join returns a NULL when comparing all the RowID’s to the new set of groups then it will be deleted.

In MS SQL 2005+ I can use a Common Table Expression (CTE) to solve this issue:

WITH CTE
     AS (SELECT ROW_NUMBER() OVER (PARTITION BY productID, categoryID ORDER BY ( SELECT 0) ) RN
         FROM productCats)
DELETE FROM CTE
WHERE RN > 1

In this example, we give a row number (RN) to each group of duplicates and delete that row if greater than 1.

For more information on CTE’s see http://msdn.microsoft.com/en-us/library/ms190766(v=sql.105).aspx

#cte, #duplicates, #sql, #t-sql

Returning Top SQL HierarchyID Ancestor

In an attempt to optimize a query that is taking me 10 seconds, I was looking for any way possible to shave off time.

The result will populate into a SOLR collection, and we narrow results based upon the top category ID. For example a product in the category “/7/15/92/” would be give a category of “7” in the SOLR collection. Thus we could do a search for a specific belt inside all vacuum parts for example.

In order to achieve this, our first attempt was the following:

REPLACE( SUBSTRING( HID.ToString(), CHARINDEX( '/', HID.ToString() ), CHARINDEX( '/', HID.ToString(), 1 + CHARINDEX( '/', HID.ToString() ) ) ), '/', '' )

This worked great, but I noticed that it was a bit intensive. So I did a little more research and found a method that shaved a second off of the result time:

REPLACE( HID.GetAncestor( HID.GetLevel() - 1 ).ToString(), '/', '' )

I actually couldn’t find this example anywhere and thought it up for this use. If you have a better method, feel free to comment. Thanks!

#ancestor, #hierarchyid, #sql