My buddy Steve just tracked down a Fortran memory error on Windows. I like recording these debugging sessions and thought to share one.
This Fortran90 application couples fluid dynamics and chemistry to do combustion. It uses the MKL version of LAPACK. It failed with a page fault, which means that there was a memory problem. How do you find it?
Fortran90 is unlikely to cause memory errors of its own accord. The language doesn't use C-style pointers, although it does add allocatable arrays to the Fortran77 standard. These arrays shouldn't be capable of corruption under normal circumstances.
The first step for Steve was to use the PageHeap tool. The old-fashioned way to use this is to use gflags.exe to turn on the PageHeap switch in the operating system. I read there is a fancier way to use it in Visual Studio 2005, but gflags.exe will suffice.
Turning on PageHeap made the program crash much sooner than before. It crashed in a particular function during exit from that function. Pseudocode for the function follows.
array = Allocate(4000)
This function only failed when allocate was not called. We knew that this subroutine implicitly calls deallocate when it exits, so it seemed like there might be an invalid array pointer in the heap. That means that the heap corruption discovered during the implicit deallocation was caused by stack corruption which claimed that the array had been allocated.
Steve's friend had the bright idea to use this function as a test to find where the corruption first occurred. He moved it higher and higher in the code until he found a function call to LAPACK. Looking closely at that function call, he realized that he had called LAPACK to process a matrix but had given it the wrong size for the matrix. LAPACK was overwriting past the matrix boundary. Why was it able to do this? The LAPACK libraries are Fortran77, therefore without a header file. Fortran90 was unable to check that the arguments were correct.