erichynds

Me!

Welcome to my online development portfolio and blog. I'm Eric Hynds, a 23 year old website developer living outside of Boston, Massachusetts, and I'm passionate about developing functional, standard-compliant, and user-friendly websites.

Archive for August, 2009

Paginating large datasets with MySQL

Thursday, August 27th, 2009

Web developers typically choose one of two ways to paginate data:

  1. Select every row in one query and use server-side programming to handle the pagination and output.
  2. Use the LIMIT clause in the query to only select the amount of rows necessary for that page of results.

Which ever you choose, each can have a huge performance implication particularly when the data set is in the thousands of rows, or if the query to select the data is not optimized.

The performance hit in option #1 is obvious: you’re selecting ALL the rows when only a fraction of them are going to be displayed to the user at one time. Depending on the situation and if the query can be cached, I suppose this is a fine option, but in my opinion, pagination is a job for your database layer.

In option #2 the query is only going to return XX number of rows out of the entire recordset. We still need to know how many total rows exist in order to figure out how many pages there will be, and to display other meta data to users (showing records 1-100 of 4000, for example). Therefore, a second query still needs to be issued using a COUNT() statement, or the exact same query again without the LIMIT clause… not very efficient.

Although it doesn’t seem well known in the CF community, this scenario is the perfect fit for MySQL’s SQL_CALC_FOUND_ROWS and FOUND_ROWS(). Basically, this statement and function used together gives us the ability to retrieve the total number of rows that would have been returned had the LIMIT statement not been used. Two queries still have to be issued, but this time without the performance hit. Assuming our start_record and records_per_page variables are already set, it works like this:

<cftransaction>
	<cfquery name="getData" datasource="mydatasource">
	SELECT SQL_CALC_FOUND_ROWS field1, field2, field3
	FROM ...
	WHERE ...
	LIMIT #start_record#, #records_per_page#
	</cfquery>
 
	<cfquery name="getDataCount" datasource="mydatasource">
	SELECT FOUND_ROWS() AS totalrecords
	</cfquery>
</cftransaction>

The query getData retrieves only the rows we’re going to output, whereas query getDataCount retrieves the number of rows getData would have returned without the LIMIT restriction. Note that you can issue a SELECT FOUND_ROWS() statement at any time whether you’re paginating or not, and it’ll return the recordcount for whatever the last SELECT was. Therefore, in the example above, I’m wrapping both queries in a cftransaction tag so the FOUND_ROWS() will always reference the getData query. If another query is issued at the same time by another user of your website, selecting FOUND_ROWS() will return the recordcount of that query, not from getData. Here’s a full example of how to build the pagination variables (untested!):

<!--- the current page we're on --->
<cfparam name="url.page" default="1">
 
<!--- number of records to display per page / retrieve in the query --->
<cfset records_per_page = 45>
 
<!--- row to start on during our DB retrieve --->
<cfset start_record = url.page * records_per_page>
 
<!--- get our data --->
<cftransaction>
	<cfquery name="getData" datasource="mydatasource">
	SELECT SQL_CALC_FOUND_ROWS field1, field2, field3
	FROM ...
	WHERE ...
	LIMIT #start_record#, #records_per_page#
	</cfquery>
 
	<!--- get total number of rows --->
	<cfquery name="getDataCount" datasource="mydatasource">
	SELECT FOUND_ROWS() AS totalrecords
	</cfquery>
</cftransaction>
 
<!--- total number of pages --->
<cfset total_pages = ceiling(getDataCount.totalrecords / records_per_page)>

In future versions of ColdFusion it would be nice to see some kind of implementation in the “result” attribute of CFQUERY. If your result variable is “r”, for example, it would be cool to be able to reference #r.found_rows#, much like you can with #r.generated_key# during an INSERT. I’m sure MSSQL and other DBMS’s have something similar, but I’ve only had to use this with MySQL. Happy paginating!