SQL Server Agent Jobs on AWS RDS Multi-AZ

When running AWS RDS Microsoft SQL Server you may run into a configuration issue that may trip you up either during failover or instance upgrades.

This is taken from the Microsoft SQL Server Multi-AZ Deployment Notes and Recommendations section under the “Multi-AZ Deployments for Microsoft SQL Server with Database Mirroring” document:

If you have SQL Server Agent jobs, you need to recreate them in the secondary, as these jobs are stored in the msdb database, and this database can’t be replicated via Mirroring. Create the jobs first in the original primary, then fail over, and create the same jobs in the new primary.

This is one of the weaknesses in the Multi-AZ for RDS Server Server service.

They use mirroring to keep two RDS instances loaded with identical user table data, but they can’t mirror MSDB because it’s a system database.

One of the reasons jobs are so confusing on Multi-AZ SQL Server is, if you start off as Single-AZ, and move to Multi-AZ, all of your jobs are copied as part of the move to Multi-AZ. That’s because AWS takes a snapshot of all your data (including MSDB) and recreates it on the mirrored instance. This is where it can get confusing: people who look at a multi-AZ instance, and at a “was Single-AZ, now is Multi-AZ” and see inconsistent behavior in the jobs. But it can all be understood if you apply two rules:

  1. Jobs created when you’re Single-AZ will be copied when you move to Multi-AZ, because AWS takes a snapshot of all databases (including MSDB), but
  2. Other than that, no changes to jobs will ever be copied to the mirror unless the changes are done manually on both servers.

Continue reading

Copying MS SQL 2016 Database on AWS RDS

A common task on a SQL Server might be to copy a database into a new database. This could be on the same server or a different server.

On Amazon Web Service’s (AWS) Relational Database Service (RDS) this task becomes a little more complex to plan and execute. There are a few reasons for this:

  1. You don’t have access to the local file system outside of creating a database in the defaulted path.
  2. You do not have the needed permissions to run “Copy Database Wizard”
  3. Doing a “native” backup/restore using S3 would create a duplicate “family_guid” which is impossible to restore on your RDS instance without deleting the source database.

If you use MS SQL 2016 you can export/import a “Data-tier Application” which is stored as a “.bacpac” file. This is a schema and data native format package (not human readable). In reality it’s a .zip file, so if you open it in something like 7-Zip you can see the package contents. This package is going to be lacking items such as the physical index (the index configuration is intact) and the transaction logs. Therefore it tends to be smaller than an actual native backup.

Keep in mind all data is transmitted from the source machine (AWS RDS) to your workstation. If this is a larger database, you may wish to consider doing this from an EC2 instance to create a faster connection, provide for a larger drive workspace and potentially save on network costs.

Here are the steps to take to backup and then restore a “.bacpac” file. This example is done using AWS RDS, but the same steps would be taken for about any environment including local.

  1. Open Microsoft SQL Server Managment Studio (MSSMS) 2016 or better and connect to your database server.
  2. Right click the source database, go to tasks and “Export Data-tier Application”.
    1
  3. Press “Next” to go to “Export Settings”. Enter a path on your local workstation to save the package to and press “Next”.
    2
  4. After you press “Finish”, it will then begin the copy process from the source machine to your local machine.
    3
  5. If you wish to import the database to another server, connect to it now via the MSSMS.
  6. Right-click the “Databases” group object and select “Import Data-tier Application”. Press “Next”.
    4
  7. Specify the path on your local machine to the “.bacpac” file being imported. Then press “Next”.
  8. Specify a new database name. It can not be the name of an existing database. Press “Next” and then “Finish”.
    5
  9. The new database will be created and imported.
    6

Update 10/29/2018
It appears the “family_guid” issue is no longer an issue. I have not verified it as of yet. See https://aws.amazon.com/about-aws/whats-new/2018/10/amazon-rds-for-sql-server-enhances-backup-and-restore-capabilities/

#aws, #export, #import, #microsoft, #rds, #sql

Microsoft SQL Server Management Studio Refresh IntelliSense Cache

When programming in Microsoft’s SQL Server Management Studio (2008), IntelliSense tends to throw notices about invalid object names after adding a new table. This has been annoying me for a long time now and I finally decided to research it. Thankfully I found an easy solution.

CRTL-SHIFT-R

This will clear the IntelliSense cache and it will now pickup any new objects you’ve created in the database.

 

Refresh_Local_Cache

#cache, #intellisense, #sql, #ssms

Multiple Column Dynamic Sorting with T-SQL

I have done a bit of research to find the most efficient way to dynamically sort query results in Microsoft SQL Server 2008. I ran each query test twice, changing the “ORDER BY” clause. I would then check the query plan cache to see if it cached only once. Each time I ran “DBCC FREEPROCCACHE” before the queries to make it easier to weed out. I then ran the following statement to see the query plan caches:

SELECT UseCounts, Cacheobjtype, Objtype, TEXT, query_plan
FROM sys.dm_exec_cached_plans
CROSS APPLY sys.dm_exec_sql_text(plan_handle)
CROSS APPLY sys.dm_exec_query_plan(plan_handle)
WHERE text like '%FROM CategoriesV2%'

The following code caches both queries with separate Adhoc Complied Plans. This is the case because the two queries do not have the exact query string. Note we can sort using a column alias. This would be a case example if you would switch the ORDER BY statement dynamically using variables using an Application Server such as ColdFusion’s #sortByVar#.

Note in all the examples, I start the next statement directly after “GO” without an extra line. This is because the query plan caches the second query with a white space prefix if there where an extra line. Therefore, to eliminate the risk of this being a factor, I do not include an extra line to separate the two queries.

DECLARE @HAS_PRODUCTS BIT = 1

SELECT TOP 2 label AS COL1, categoryID AS COL2
FROM CategoriesV2
WHERE hasProducts = @HAS_PRODUCTS
ORDER BY COL2 DESC, COL1
GO
DECLARE @HAS_PRODUCTS BIT = 1

SELECT TOP 2 label AS COL1, categoryID AS COL2
FROM CategoriesV2
WHERE hasProducts = @HAS_PRODUCTS
ORDER BY NEWID()
GO

The following code also caches both queries with separate Adhoc Complied Plans. This is the case because the “CASE/THEN” statements make a dynamic where clause that is unable to be cached. Note I am sorting by actual column name and not the alias. For some reason, unknown to me, the alias can not be referenced as the sorted column.

DECLARE @SORT VARCHAR(30) = 'SORT1',
		@HAS_PRODUCTS BIT = 1

SELECT TOP 2 label AS COL1, categoryID AS COL2
FROM CategoriesV2
WHERE hasProducts = @HAS_PRODUCTS
ORDER BY
	CASE WHEN @SORT = 'SORT1' THEN label END,
	CASE WHEN @SORT = 'SORT1' THEN categoryID END DESC,
	CASE WHEN @SORT = 'SORT2' THEN categoryID END DESC,
	CASE WHEN @SORT = 'SORT2' THEN label END
GO
DECLARE @SORT VARCHAR(30) = 'SORT2',
		@HAS_PRODUCTS BIT = 1

SELECT TOP 2 label AS COL1, categoryID AS COL2
FROM CategoriesV2
WHERE hasProducts = @HAS_PRODUCTS
ORDER BY
	CASE WHEN @SORT = 'SORT1' THEN label END,
	CASE WHEN @SORT = 'SORT1' THEN categoryID END DESC,
	CASE WHEN @SORT = 'SORT2' THEN categoryID END DESC,
	CASE WHEN @SORT = 'SORT2' THEN label END
GO

In the previous code, I haven’t quite figured out what’s going on using the “CASE/WHEN/THEN” switches. All I know is it works. The closest thing I could think of was it uses a table, which if the “ASC” or “DESC” are appended to each column name, those are the column names and the column names used are populated as the column names in the sort table if it matches the “CASE WHEN” value. If it doesn’t match then the column value is NULL but the column still exists. I came to that perception by replacing “ORDER BY” with “SELECT” and running that clause as a statement by itself. If anyone has better insight on what’s going on here, please let me know.

The following code caches both queries with separate Prepared Complied Plans. This is the case because the two queries, though dynamically created, do not have the exact query string. Note we can sort using a column alias again.

DECLARE @SQL NVARCHAR(MAX),
@SORT VARCHAR(30) = 'COL2 DESC, COL1'

SELECT @SQL = '
SELECT TOP 2 label AS COL1, categoryID AS COL2
FROM CategoriesV2
WHERE hasProducts = @HAS_PRODUCTS
ORDER BY ' + @SORT

EXEC SP_ExecuteSQL @SQL, N'@HAS_PRODUCTS BIT', 1
GO
DECLARE @SQL NVARCHAR(MAX),
@SORT VARCHAR(30) = 'COL1, COL2 DESC'

SELECT @SQL = '
SELECT TOP 2 label AS COL1, categoryID AS COL2
FROM CategoriesV2
WHERE hasProducts = @HAS_PRODUCTS
ORDER BY ' + @SORT

EXEC SP_ExecuteSQL @SQL, N'@HAS_PRODUCTS BIT', 1
GO

In the above statement, if you where to try and inject the @SORT via the SP_ExecuteSQL params, you would receive the error “Msg 1008, Level 16, State 1, Line x: The SELECT item identified by the ORDER BY number 1 contains a variable as part of the expression identifying a column position. Variables are only allowed when ordering by an expression referencing a column name.”

Therefore, from research and testing, I have so far determined that when a dynamic column sort is necessary, a query plan may be cached for each distinct sort value.

If you have found a way to cache a query plan only once, and it is the correct way to accomplish that, please let me know.

#dynamic, #order-by, #sort, #sql, #t-sql

Getting the MS SQL Identity ID With a Complex UPDATE / INSERT Statement

As I posted in my blog entry MSSQL Update/Insert Based Upon Existing Record, it is better to use “IF @@ROWCOUNT=0” rather than “IF EXISTS ()” when updating or inserting a record based upon existence. This is because the preferred method only does a table/index scan once, while the later will do it twice.

What if you need the Identity Column ID whether the record is updated or inserted? Normally in ColdFusion you can use “queryResult[“GENERATEDKEY”]”, however what you will find is this variable only works on simple insert statements. If you try to use @@IDENTITY or SCOPE_IDENTITY, you will find it only returns the Identity Column ID when the insert is triggered.

Introducing MS SQL’s OUTPUT clause, introduced in MS SQL 2005. We can use the OUTPUT clause to return the Identity Column ID(s).

When we add this to the INSERT and UPDATE clauses, the problem we run into is that during an insert it will return an empty set and another set containing the Identity Column ID that was inserted. Therefore we use a temporary table to help us with this.

Now with a temporary table, we introduce back in the second table scan. However this table scan will have very limited rows, if not just one. Plus the temporary table will not need disk access. So the second table scan is of no consequence.

So with all this in mind here’s an example that returns the Identity column that was either inserted or updated. Keep in mind that multiple IDs will be returned as separate rows if more than one table row was updated.

<cfquery name="local.qSetCart">			
	DECLARE @T TABLE (
	    cartID INT NOT NULL
	)
	
	UPDATE
		UserCart
	SET
		dateModified = GETDATE(),
		isGift = <cfqueryparam value="#arguments.isGift#" cfsqltype="cf_sql_bit">
	OUTPUT
		INSERTED.cartID INTO @T
	WHERE
		userID = <cfqueryparam value="#arguments.userID#" cfsqltype="cf_sql_integer">
	
	IF @@ROWCOUNT = 0
	
		INSERT
			INTO UserCart ( userID, isGift)
		OUTPUT
			INSERTED.cartID INTO @T
		VALUES (
			<cfqueryparam value="#arguments.userID#" cfsqltype="cf_sql_integer">,
			<cfqueryparam value="#arguments.isGift#" cfsqltype="cf_sql_bit">
		)
	
	SELECT cartID FROM @T
</cfquery>
		
<cfreturn local.qSetCart.cartID>

#coldfusion-2, #identity, #sql

Develop For The Future

Let’s say we start a specialized e-commerce site that sells fishing lures that are dropped shipped from Henry’s, a supplier. We found a good deal with Henry’s and we’re so excited to start generating revenue and quite our day jobs a pencil pushers. To get this site up and running we create some database tables:

  • HenrysProducts
    • SKU
    • Brand
    • Model
    • Price
  • Orders
    • OrderID
    • Name
    • Address
    • ShippingPrice
    • Tax
    • OrderTotal
    • TrackingNumber
    • ShippingCarrier
    • DateShipped
  • OrderItems
    • OrderID
    • SKU
    • Quantity
    • Price

You slap up a few files:

  • index.cfm
  • products.cfm
  • cart.cfm
  • shippingInfo.cfm
  • paymentInfo.cfm
  • orderConfirmation.cfm
  • contactUs.cfm

Put in some images and “bam” you’re rolling. The lures you’re selling are “hot stuff” and you start bringing in revenue. Yea!

Oh! What’s this? An order for $500 worth of lures? Yes!

The order goes through, but Henry’s says “Sorry, we don’t have 100 of the 500 lures you ordered. We’ll place those on back-order for you.”. You think, “Okay, no problem.”. But wait! How are we going to handle that? Time for a new table and some logic to handle emails when things are backordered, cancelled and shipped:

  • HenrysOrderFulfilment
    • OrderID
    • SKU
    • QuantityShipped
    • QuantityBackOrdered
    • QuantityCancelled
    • ShipmentID
  • HenrysShipments
    • ShipmentID
    • TrackingNumber
    • DateShipped
    • Carrier

Now that we’ve started getting popular, people really want some items that are only available from the manufacturer. But now we have everything setup for Henry’s, not thinking ahead for the future. So we have a few options:

  1. Keep using what we have and append a column to tell us where it’s really coming from
  2. Create new tables for the new manufacturer
  3. Refactor the tables to be more generic

And of course, we should opt for #3. This should have been where we started in the first place. You should never assume your business model will never change, your contract might never expire from your supplier or that your business will never grow. Because with that kind of thinking, it becomes much more costly to grow your business having to deal with the side affects of your starting decisions. That growing time also usually involves lack of time because you become so busy with the messy system. Here’s an example of how it should have been built in the first place.

  • Products
    • ProductID
    • Brand
    • Title
  • ProductChildren
    • ProductChildID
    • ProductID
    • ModelTitle
    • SKU
    • UPC
    • Price
    • Weight
  • ProductSupplierLookup
    • ProductChildID
    • SupplierID
    • Cost
    • Priority
  • Suppliers
    • SupplierID
    • Name
  • Orders
    • OrderID
    • Name
    • Address
    • ShippingPrice
    • Tax
    • OrderTotal
    • TrackingNumber
    • ShippingCarrier
    • DateShipped
  • OrderItems
    • OrderItemID
    • OrderID
    • SKU
    • Quantity
    • Price
  • OrderFulfilment
    • OrderItemID
    • SupplierID
    • ItemCost
    • QuantityOrdered
    • QuantityShipped
    • QuantityBackOrdered
    • QuantityCancelled
    • ShipmentID
  • Shipments
    • ShipmentID
    • TrackingNumber
    • DateShipped
    • Carrier
    • Method

With this type of structure you can add multiple supplier, order from multiple suppliers for each customer order using a priority number, handle back-orders and cancellations, use the most cost efficient supplier, change, add and remove suppliers and keep track of multiple packages shipped.

Of course this is a fairly basic structure, but it’s a good way of thinking forward whether you’re starting from scratch or enhancing your system.

That touches the SQL aspect of it. However if you re-factored your code using the above file structure, I’m betting you’re code is complex and messy. You may have even coded it a year ago and forgot where everything was. You may even have a different developer or developers working on the enhancement.

This is why you should always split out your view code into .CSS, .JS and template files to generate HTML. You should also be using CFC’s, if using ColdFusion, and a FrameWork is a must. You now have a much more manageable file structure that you can manipulate your code with much more ease. Also don’t forget your unit testing so you reduce your chances of accidentally breaking something. I suggest using something like FW/1 as it’s lightweight and easy, one you get the hang of it.

Removing Duplicates in MS SQL 2005+ Table w/o Adding an UID

Let’s say we have a table that links products to a category:

productID AS INT
categoryID AS INT

For some reason we managed to get dulicate productID’s and categoryID’s. I don’t really care to add a unique ID (UID) field to reference against using this script:

ALTER TABLE productCats
ADD RowID INT IDENTITY(1,1)
GO

DELETE productCats
FROM productCats
LEFT OUTER JOIN (
    SELECT MIN(RowID) AS RowID, productID, categoryID
    FROM productCats
    GROUP BY productID, categoryID
) AS KeepRows ON productCats.RowID = KeepRows.RowId
WHERE KeepRows.RowId IS NULL
GO

ALTER TABLE productCats
DROP COLUMN RowID
GO

Here we group together the duplicates into one row and assign it the minimum RowID for that group. If the left join returns a NULL when comparing all the RowID’s to the new set of groups then it will be deleted.

In MS SQL 2005+ I can use a Common Table Expression (CTE) to solve this issue:

WITH CTE
     AS (SELECT ROW_NUMBER() OVER (PARTITION BY productID, categoryID ORDER BY ( SELECT 0) ) RN
         FROM productCats)
DELETE FROM CTE
WHERE RN > 1

In this example, we give a row number (RN) to each group of duplicates and delete that row if greater than 1.

For more information on CTE’s see http://msdn.microsoft.com/en-us/library/ms190766(v=sql.105).aspx

#cte, #duplicates, #sql, #t-sql

Returning Top SQL HierarchyID Ancestor

In an attempt to optimize a query that is taking me 10 seconds, I was looking for any way possible to shave off time.

The result will populate into a SOLR collection, and we narrow results based upon the top category ID. For example a product in the category “/7/15/92/” would be give a category of “7” in the SOLR collection. Thus we could do a search for a specific belt inside all vacuum parts for example.

In order to achieve this, our first attempt was the following:

REPLACE( SUBSTRING( HID.ToString(), CHARINDEX( '/', HID.ToString() ), CHARINDEX( '/', HID.ToString(), 1 + CHARINDEX( '/', HID.ToString() ) ) ), '/', '' )

This worked great, but I noticed that it was a bit intensive. So I did a little more research and found a method that shaved a second off of the result time:

REPLACE( HID.GetAncestor( HID.GetLevel() - 1 ).ToString(), '/', '' )

I actually couldn’t find this example anywhere and thought it up for this use. If you have a better method, feel free to comment. Thanks!

#ancestor, #hierarchyid, #sql

Console Logging of ColdFusion ORM SQL

If you read my post on getting the ColdFusion Builder Console working and use ORM, you may have run into some further questions.

When generating ORM, it’s wise to monitor what SQL queries Hibernate is generating for you, both for performance and debugging reasons. (Wasn’t ORM supposed to make my life easier?).

To start logging ORM’s SQL set this property in your application.cfc:

<cfset this.ormsettings.logsql="true">

You’ll may notice however that the default configuration will not show DDL queries used for creating or updating tables nor will it show the parametrized values (just a ?).

To enable these things look at <cf_home>\lib\log4j.properties (in my case it’s C:\ColdFusion9\lib\log4j.properties).

To enable logging of parametrized values uncomment and change the value for log4j.logger.org.hibernate.type to look like this:

log4j.logger.org.hibernate.type=DEBUG

It seems like a little overkill on what this ends up returning because not only will it return your parametrized values but also what each column returns. I wish I could disable the latter.

To enable logging of exports and updates (DDL) uncomment and change the value for log4j.logger.org.hibernate.tool.hbm2ddl to look like this:

log4j.logger.org.hibernate.tool.hbm2ddl=DEBUG, HIBERNATECONSOLE

I placed an example snippet below. Thanks to Rupesh Kumar for providing this information.

#sql-queries

SQL to ColdFusion ORMType Reference

I have not been able to find a good reference chart out there that maps SQL Data Types to ColdFusion ORM Data Types. It’s always really been my best guess. So I’m going to start a reference chart here that as I figure it out I’ll update. If you have any input on this please comment and I will update. Thanks!

ORMType SQL MySQL
big_decimal DECIMAL, MONEY DECIMAL
binary BINARY, VARBINARY TINYBLOB
blob TINYBLOB
Boolean [SMALLINT], BIT BIT
clob LONGTEXT
date DATE DATE
double DOUBLE, MONEY, NUMERIC DOUBLE
character, char CHAR
float REAL, FLOAT FLOAT
integer, int INT INT
long BIGINT BIGINT
serializable TINYBLOB
short SMALLINT SMALLINT
string CHAR, NCHAR, VARCHAR, NVARCHAR VARCHAR
text TEXT, NTEXT LONGTEXT
timestamp DATETIME, SMALLDATETIME, TIMESTAMP DATETIME
true_false CHAR
yes_no CHAR