erichynds

Welcome to my online development portfolio and blog. I'm Eric Hynds, a 23 year old website developer living outside of Boston, Massachusetts, and I'm passionate about developing functional, standard-compliant, and user-friendly websites.

Large loops = out of memory?

I have a script that loops over a 6MB text file and takes each line to do a large amount of processing: creating structs, lots of queries, calling other CFCs, etc. The problem is this loop will eventually take up 100% of its alloted memory and eventually timeout with an “out of memory” error. I cannot figure out if it is something I am doing wrong or if it is the nature of CF. From what I’ve read on the web, CF is notoriously poor at garbage collection and will not flush its memory until after the request is complete, which leads me to believe there is nothing I can do.

I have looked over the code with others and can 100% confirm that all variables in a function are var scoped, each structure is cleared at the head of the loop, and there are no open loops or areas where memory would obviously be leaking from.

I can also confirm with the following code (found here) that with each 1000 iterations through the loop, 20MB of memory is being used, which eventually builds up until it’s maxed out.

<cfset runtime = CreateObject("java","java.lang.Runtime").getRuntime()>
<cfset memoryUsed = (runtime.totalMemory() - runtime.freeMemory()) / 1024 / 1024>

It was my understanding that with each loop iteration, recreating the structures that get built within the loop clear them from memory, but this apparently isn’t the case. I have also tried tapping into Java to manually force garbage collection, which also doesn’t do a thing. My only solution at the moment is to increase the Java heap size to 2500MB – enough for the damn thing to run – but once it has finished and other requests are made, memory is never flushed. I then force kill CF and restart it. This is fine for my development box, but is not going to fly in production, and this script will probably be run a couple times a month.

So, aside from posting ~500 lines of code across a handful of files, any quick ideas I can investigate? I’m running CF8 on Ubuntu 9.04.

Tags:

View Comments to “Large loops = out of memory?”

  1. Eric, how are you executing the loop & which version of CF are you using?

    Previous to 8, you’d have to use CFFILE to read the text file into a variable, then CFLOOP over that variable using CRLF (carriage return) as the decimeter. This method load the whole file into memory first and keeps it there until you’re done.

    CF 8 adds a file attribute to CFLOOP, so you can pass in the file’s path. This mehod abstracts java’s java.io.BufferedReader functionality, which reads the file line by line, using much less memory over time.

    If you’re on CF 6 or 7, you can still use java.io.BufferedReader, it just takes a bit of code: http://coldfusion.sys-con.com/node/86121

    HTH

  2. Eric Hynds says:

    I was looping with the file attribute, yes.

    I’ve narrowed it down to executing a large number of queries per iteration on a datasource; not struct related. Hopefully I can come up with an answer soon

  3. Felix says:

    Hi,
    i had the the problems in the past. I did not find any solution and then i decided to run the garbage myself to fix this problem. I do not know if this is the best solution and it is some time ago i used this!

    Cheers Felix

    <cfset this.runtimeObj = CreateObject("java","java.lang.Runtime").getRuntime()>
    <cfset this.threadObj = CreateObject("java", "java.lang.Thread")>
    <cfset this.systemObj = createObject("java","java.lang.System")/>
    <cfset var appRuntime = StructNew()/>

    <cfset var freeMemory = ”/>
    <cfset var totalMemory = ”/>
    <cfset var maxMemory = ”/>

    <cfset appRuntime.freememory = Round( this.runtimeObj.freeMemory() / 1024 / 1024 )/>
    <cfset appRuntime.totalMemory = Round( this.runtimeObj.totalMemory() / 1024 / 1024 )/>
    <cfset appRuntime.maxMemory = Round( this.runtimeObj.maxMemory() / 1024 / 1024 )/>
    <cfif
    appRuntime.freememory LT 350
    >
    <cfset this.threadObj.sleep(2000)>
    <cfset this.systemObj.gc()/>
    <cfset this.systemObj.runFinalization()/>
    </cfif>

  4. marc esher says:

    I’m certainly no expert here, but I’d be surprised if gc() would work here. If you look at a heap dump while your long request is running, any created objects (queries, cfcs, etc) all have a root back to the coldfusion.runtime.NeoPageContext. So until that object is marked for collection (i.e. after the request completes), I’m not sure how anything that is a “child” of that object would be collected.

    Again, it’s certainly possible I’m way off here.

  5. I recently refactored a process similar to what you have described. Here are few things that I did

    1. Used BufferedReader instead of cffile action=”read”. We are still on CF7
    2. Simplified queries by using Oracle’s merge (aka upsert) operation. A single merge statement is smart enough to do an insert or update based on condition provided.
    3. Instead of running cfquery on each loop iteration, I started using JDBC PreparedStatement as a Batch.

    after these changes were made, a routine that was consistently timing out, could be run in 15-20 minutes.

    Thanks

  6. Brian Kotek says:

    Java Strings are immutable, so every time you concatenate a String or update a String, you’re actually creating an entirely new instance of the String. This is why memory consumption can skyrocket when reading or writing large text files. There are a number of Java classes that will do this for you, such as StringBuffer, BufferedReader, StringWriter, etc. A Google search on these will point you in the right direction.

  7. Eric Hynds says:

    Wow, thanks for the good info guys. Next time I’m working on the project I’ll play around with some of these suggestions and report back.

  8. Eric Hynds says:

    It’s turning out to be a Linux thing. The memory leak(s) do not occur when running the same code on a Windows Server 2008 box.

  9. Danny Armstrong says:

    Do you have memory profiling turned on on your dev machine? This can have a huge memory overhead. I had an application that would balloon like crazy on dev (linux) and not on the server (windows). After turning off the monitoring stuff the performance equalled out.

blog comments powered by Disqus