Cleaning Up After Crawlers: Managing Bot-Generated Sessions in ColdFusion

For many ColdFusion websites, you never have to think about this. But for ColdFusion websites that maintain sessions, keep a bit of data in the session scope, and have a larger page count, session memory may become something you need to consider.

When a search engine spider, AI bot, or other bot hits your site, they do not maintain cookies. To track user sessions in ColdFusion, cookies are generally required. This means that if a request does not have a CFID, CFTOKEN or JSESSIONID cookie, a new session is created.

Let’s say you have an e-commerce site that has 5,000 products. A search engine spider will crawl through all 5,000 product detail pages, creating 5,000+ sessions within a relatively short period of time. ColdFusion’s default session timeout is 20 minutes, so all these sessions disappear within 20 minutes. But we know that they are never needed again after the first request. So let’s get rid of them right away instead of racking up memory.

Your first question would be, why create a session in the first place if we know they are a bot? The answer to that is you may have code that’s dependent on the session scope. If you don’t create the session, or delete it before the rest of the code runs, your code will then error. You could wrap your variable requests in logic, but who wants to do that?

Below is a very basic Application.cfc file that detects if any cookies are defined for your site on the visitor’s browser. Keep in mind that first-time users to your site will have no cookies until the second request if they have cookies enabled. This code will destroy the session after 1 second if no cookies are found. This could be first-time visitors, bots, or spiders. Once cookies are found, it will increase the session timeout to 20 minutes.

Many people approach this by trying to detect keywords in the user-agent header value. While this works much of the time, it may fail down the road if the bot changes the value to something unexpected or if a bot/spider tried to mimic a browser and not be truthful (or just don’t care) about who they are using the user-agent value.

Important: Some may wonder if this affects everyone globally. This code is request-based and only affects this specific request.

component {
this.name = hash( getCurrentTemplatePath() );
this.sessionManagement = true;
if (!len(cgi.HTTP_COOKIE)) {
/* By default, all of our new sessions will be given a very short timeout. This will be true for all users, spiders, and bots.
We want sessions to always be enabled since our page request might require it. */
this.sessionTimeout = createTimeSpan( 0, 0, 20, 0 );
} else {
this.sessionTimeout = createTimeSpan( 0, 0, 0, 1 );
}
}

Another way of doing this, if for some reason you prefer not to use sessionTimeout, is to use the undocumented setMaxInactiveInterval() method in the session scope. The argument is a long int, so you may need to use JavaCast for your use case, but a simple “1” will do the job for our use case.

component {
this.name = hash( getCurrentTemplatePath() );
this.sessionManagement = true;
this.sessionTimeout = createTimeSpan( 0, 0, 20, 0 );
public boolean function onRequestStart( required string targetPage ) {
// see if cookies are found. Bots usually do not pass cookies which are created by ColdFusion session management.
if (!len(cgi.HTTP_COOKIE)) {
/* Change the timeout on the current session scope to 1 second.
While this invalidates the session for subsequent requests, the memory is not always reclaimed instantly.
It is reclaimed when the underlying server checks for inactive sessions, which may take a moment.
*/
session.setMaxInactiveInterval(1);
}
return true;
}
}

To monitor how you are doing with session counts being created and destroyed, you can use FusionReactor’s Sessions dashboard under the UEM menu. Here, you can track applications and how they are creating, destroying, and rejecting sessions within the last 5 seconds, 1 minute, and 1 hour.

Credit: https://docs.fusionreactor.io/Data-insights/Features/UEM/Sessions/

Check out Charlie Arehart’s article on session tracking in FusionReactor.

#coldfusion-2, #session

Handling Expired Sessions via AJAX & FW/1

This is a followup to my “Framework One AJAX Method (FW/1)” post (https://christierney.com/2012/07/14/framework-one-ajax-method-fw1/).

Scenario:

  1. You use the session scope to define if a user is logged in or not
  2. You use jQuery AJAX to pull JSON data from FW/1 action URL’s
  3. The user’s session has expired after x minutes of inactivity after login
  4. If the session is expired the user is directed to a login page after trying to navigate

So what happens in this scenario? Instead of the expected JSON data your AJAX call receives the HTML of a login page with a status of 200. Can’t do too much with this.

Here’s a code example that will pass the client a 403 error (Forbidden) in the header and return no content. jQuery will then redirect the user to a login screen when it sees this status code.

First here’s a simlified FW/1 Application.cfc setupRequest() method:

void function setupRequest() {

	var reqData = getHTTPRequestData();

	if( structKeyExists( reqData.headers, 'X-Requested-With' ) && reqData.headers[ 'X-Requested-With' ] == 'XMLHttpRequest' && !structKeyExists( session, 'user' ) ) {
		getpagecontext().getresponse().setstatus( 403 );
		abort;
	}
}

This code detects if the call came from an AJAX request ( getHTTPRequestData().headers.X-Requested-With = ‘XMLHttpRequest’ ) and if the session still knows about the user. If it is an AJAX request and the user is not known, then set the status code of the return page to 403 and stop processing any more code. If you try to use throw instead of abort, it will overwrite the status code to 500.

The second simple example is the jQuery piece:

$( document ).ready( function() {

	$( this ).ajaxError( function( e, jqXHR, settings, exception ) {
		if( jqXHR.status == 403 ) {
			location.href = '?logout';
			throw new Error( 'Login Required' );
		} else if( !jqXHR.statusText == 'abort' && jqXHR.getAllResponseHeaders() ) {
			alert( 'There was an error processing your request.\nPlease try again or contact customer service.\nError: ' + jqXHR.statusText );
		}
	});

});

Here we are globally looking at all AJAX requests. Since the status code 403 is in the error class it will throw an error. The .ajaxError() method picks up this error and handles it.

If the status code is detected as a 403 (which we set in our ColdFusion code) then we direct the user to a logout page (which in turn directs to a login page) and throws a JS error. The throw statement is supposed to stop all JS processing, however if you have an error handler attached to the specific AJAX call, then that will still fire. The error message will just be seen if you are viewing the JS console.

If there’s another error caught it first looks to see if the request was aborted or if the user navigated away from the page. In these two cases I don’t want to display an error. If anything else is caught, I display a generic message.

#ajax, #coldfusion-2, #fw1, #jquery, #session