Monday, October 09, 2006

Fortran Memory Error

My buddy Steve just tracked down a Fortran memory error on Windows. I like recording these debugging sessions and thought to share one.

This Fortran90 application couples fluid dynamics and chemistry to do combustion. It uses the MKL version of LAPACK. It failed with a page fault, which means that there was a memory problem. How do you find it?

Fortran90 is unlikely to cause memory errors of its own accord. The language doesn't use C-style pointers, although it does add allocatable arrays to the Fortran77 standard. These arrays shouldn't be capable of corruption under normal circumstances.

The first step for Steve was to use the PageHeap tool. The old-fashioned way to use this is to use gflags.exe to turn on the PageHeap switch in the operating system. I read there is a fancier way to use it in Visual Studio 2005, but gflags.exe will suffice.

Turning on PageHeap made the program crash much sooner than before. It crashed in a particular function during exit from that function. Pseudocode for the function follows.

Sub DoWork(array,size)
If size>N
array = Allocate(4000)
end if
End Sub

This function only failed when allocate was not called. We knew that this subroutine implicitly calls deallocate when it exits, so it seemed like there might be an invalid array pointer in the heap. That means that the heap corruption discovered during the implicit deallocation was caused by stack corruption which claimed that the array had been allocated.

Steve's friend had the bright idea to use this function as a test to find where the corruption first occurred. He moved it higher and higher in the code until he found a function call to LAPACK. Looking closely at that function call, he realized that he had called LAPACK to process a matrix but had given it the wrong size for the matrix. LAPACK was overwriting past the matrix boundary. Why was it able to do this? The LAPACK libraries are Fortran77, therefore without a header file. Fortran90 was unable to check that the arguments were correct.

- Drew

Thursday, July 13, 2006

Finally Fixed VTK .NET Widget Bug

I thought there was a subtle interaction between .NET and VTK that caused threading problems during user interaction. The result was that widgets did not always work. Picking would fail. The problem turned out to be that I had defined an enum for EventIds that was a copy of the VTK enum. VTK, itself, changed these values from version 5.0 to the development version, 5.x, and I was using the 5.x enum for the 5.0 code.

The fix for what I thought was a subtle bug is just to use integers instead of the EventId enum. For instance, replace EventIds.EndPickEvent with the number 9. You can look up these values in VTK\Common\vtkCommand.h.

There are no other real bugs in VTK .NET, that I know of. I could do a few things to make debugging easier and wrap HWND values, but that's about it.

That means I can release a version of VTK .NET for VTK 5.0.1. I've already integrated the new release into the patch code, tested the code for the new version, and will package it as soon as I can.

Sunday, June 04, 2006

Compilation of Debugging Tools

Debugging Applications for Microsoft .NET and Microsoft Windows by Robbins has been a great, great help to me. Tonight, I used Visual Studio 2005 to compile several of the utilities supplied with the book and had to make a few changes to upgrade from Visual Studio .NET.

For CrashFinder, change the return value of OnNcHitTest to LRESULT in About.{h,cpp} and Statlink.{h,cpp}. The tricky bit is to add #define _USE_32BIT_TIME_T to stdafx.h. Then you can either modify all of the string handling routines or unselect Treat Warnings as Errors to complete the compilation.

Robbins recommends creating map files with the linker options /MAPINFO:EXPORTS and /MAPINFO:LINES. The LINES option in Visual Studio 2005's linker has been removed, but you can resurrect that information using pdb2map from the book's CD. That's the only way I know to get this information.

Thursday, May 11, 2006

VTK .NET to Sourceforge

I posted VTK .NET on 23 March and have had about one hundred people come to the site each week. I can see that people download examples, so it seems to run well. I have submitted a request for a Sourceforge site, using the BSD license, which is the same that VTK uses. That should take another week or two to process.


Bugs Discovered


  • I did not modify CMakeLists.txt for vtkMy or vtkLocal, which are the standard ways to wrap your own VTK C++ code for wrapper languages.

  • There seems to be a problem with vtk3DWidgets. They mostly work, but a few things do not work. For example, annotatePick.cs from the examples does not notify you of new picks.

What People Want Next


  • I saw a comment that the previous version of VTK .NET needed to wrap HWND types so that you could build a Windows Forms Control in C#. I've built one in managed C++, but I see the logic in the request, and I think I can add that translation to the wrapper code.

  • Two people who emailed me have commented that they are working on using VTK .NET within ASP.NET. One wanted to use it to generate images on the server side. The other is interested in the more ambitious project of providing manipulable 3D graphics, something that interests me, too. There are a number of challenges to making this happen, but it would clearly be a great thing to implement.

Friday, April 28, 2006

SciPy distribution for P3 fails - what illegal instruction means

I have been trying to help someone install SciPy on our clusters. The site provides a package for P4 which works fine and one for PIII which works on my laptop but fails on our test login node. I downloaded CPU-Z to check things out, and it turns out that my Pentium M (Dothan) laptop supports MMX, SSE, and SSE2 while the dual Pentium III-S ancient server for testing supports only MMX and SSE.

As a result the provided installer for Python 2.4 and Pentium III fails for older P3 machines.
scipy-0.4.8.win32-py2.4-pentium3.exe

The test code from that provided distribution, scipy.test() will fail on older P3 chips (such as PIII-S, Tualatin) in several of the package tests with a Dr. Watson error of
Unhandled exception at 0x6988d3f7 in python.exe:
0xC000001D: Illegal instruction

The supplied binaries for ATLAS on a Windows P3 box do not have this problem. I built a running package using online instructions for a MinGW build using the precompiled ATLAS binaries. It is a quick process because you don't have to rebuild ATLAS.

I'm pretty sure we didn't just accidentally install the P4 version because I checked the show_config at the time, and we did it twice. I can't prove it because I didn't save the output. Also, the P4 version fails quickly on the P3, but the fewer tests fail when using an SSE2 P3 version on an older P3.

On the PIII-S where the install failed, the CPU information is
has_mmx has_sse is_32bit is_Intel is_PentiumIII is686
The ATLAS archdef for both the supplied package and my working build was PIII/gcc/misc and mmdef was PIII/gcc/gemm.

We run on Python 2.4.2, Windows XP Service Pack 2

The instructions that failed included cvtsi2sd, movsd, and ucomisd. I found them listed in the MMX section of the Intel assembly specification, but they are included listed only for SSE2 support. For a sense of what the code looks like, the following shows the assembly snippet.
684C5825  mov         dword ptr [esp+68h],0 
684C582D mov eax,dword ptr [edx]
684C582F mov edx,dword ptr [esi]
684C5831 cmp eax,edx
684C5833 cmovg eax,edx
684C5836 cvtsi2sd xmm0,eax
684C583A movsd mmword ptr [esp+14h],xmm0
684C5840 fld qword ptr [esp+14h]
684C5844 mov dword ptr [esp+5Ch],0
684C584C fmul qword ptr ds:[684C5538h]

EAX = 0000000F EBX = 684C550E ECX = 00902DD0 EDX = 0000000F
ESI = 0021EB6C EDI = 0021EB60 EIP = 684C5836 ESP = 0021E8A4
EBP = 00000001 EFL = 00010216

The instruction pointer is at the cvtsi2sd instruction. Normally, exceptions are thrown by the instruction before the instruction pointer, but here the exception happened just before the instruction pointer would be incremented.

Drew Dolgert

Monday, April 10, 2006

Debugging Mixed Interop Native and Managed Assemblies with Visual Studio 2005

I've been trying to track down a bug in a managed C++ wrapper around native C++ classes, and Visual Studio 2005 won't allow me to step directly from the managed code into the native code. I know it is possible, but this is the only way I see to do it:



  1. Put a System.Diagnostics.Debug.Assert(false) near the start of the application.

  2. Run the application and accept the option to debug it.

  3. Opt to "manually choose debugger," and choose both native and managed.

  4. Visual Studio 2005 then allows me to step into the native code.


Drew

Thursday, March 23, 2006

unmanaged Visual C++ for Visual Studio 2005 causes (soluble) dll loading problems

Hi All,

This message is about where dlls go when they are part of “isolated assemblies.” Everyday C++ users may run into this when they move their code from a development machine to a node on a compute cluster.

1. When you copy an executable to the cluster, it now needs not only dlls but also manifest files copied with it, or we have to install them on the cluster for the users.

2. There is a common occurrence where a failure in the linker can cause an executable to fail ever to load. There is a quick fix for this.

Part I:
When compiling unmanaged visual C++ applications in VS2005, you have to pay attention to how you distribute the dlls. Normally, you just put Dlls beside the exe. That isn’t working for some good reasons. I’m going to give you an outline here of where the bodies lie, trying to show you pertinent directories, tools, and project settings.

Google keywords: Isolated applications, side-by-side assemblies, winsxs
Web pages:
About isolated apps and assemblies
http://msdn2.microsoft.com/en-us/library/ms235342(en-US,VS.80).aspx
blog with details on how to do installation with isolated apps
http://blogs.msdn.com/nikolad/
Windows Installer XML – use xml file to create an msi.
http://sourceforge.net/projects/wix

There is a section in VS under Linker->Manifests that says you can
- embed manifest – yes/no
- allow isolation – yes/no

Visual Studio now distributes the CRT and debug CRT and versions of ATL and MFC libraries in VS
C:\Program Files\Microsoft Visual Studio 8\VC\redist\x86
C:\Program Files\Microsoft Visual Studio 8\VC\redist\Debug_NonRedist\x86

Here you find the directory Microsoft.VC80.DebugCRT\ with the files
- Microsoft.VC80.DebugCRT.manifest
- msvcm80d.dll
- msvcp80d.dll
- msvcr80d.dll

Executables, under various conditions, are supposed to be able to find dlls in various directories if you put the manifest with them
- C:\Windows\WinSXS
- <appdirectory>
- <appdirectory>\<appname>.exe.local
- <appdirectory>\<manifestname>


Part II: The Common Deal-breaker: Oldnames.lib

When you compile code which links to libraries using older, deprecated methods, the linker finds those methods in an implicit library called Oldnames.lib. Because of a flaw in the linker, any time it pulls in oldnames.lib, it will also pull in a copy of the wrong CRT. That is, if you link against the debug CRT, it will pull in the release CRT, and vice versa. This wouldn’t normally cause the executable to fail to run, except that the linker includes only the correct assembly. For instance, when linking against the debug CRT, the following manifest is included in the executable, even though the linker mistakenly linked to the runtime CRT, as well.


<assembly xmlns="'urn:schemas-microsoft-com:asm.v1'" manifestversion="'1.0'">
<dependency>
<dependentassembly>
<assemblyidentity type="'win32'" name="'Microsoft.VC80.DebugCRT'" version="'8.0.50608.0'" processorarchitecture="'x86'" publickeytoken="'1fc8b3b9a1e18e3b'">
</dependentassembly>
</dependency>
</assembly>

The solution is to tell the linker to ignore the other CRT. In other words, if you link with msvcrtd.lib, you have to explicitly tell the linker to ignore msvcrt.lib, and vice versa. That makes the linker think longer about where it can find the references it needs, and it comes back around to the correct library.

Mind that Microsoft would not consider this problem a linker flaw because I have only seen it happen when linking to libraries older than my current version of Visual Studio, and linking Visual Studio 2005 projects to those of Visual Studio 2003 or Visual Studio 6 is not supported. This is likely confounded with the fact that I am linking release and debug always to a release library, which is not supported. Does any company really distribute their development libraries as MT, MD, MTd and MDd? I've only seen VRCO do this.

At this point, I’ve also created an installer for the C runtimes, as described in the blog link above, so that’s what is loading. I just run vccrt.msi, then my application runs. I used this tool called WIX to install it. It’s a bit of open source from Microsoft. The project doesn't appear to be under development any longer, but it's still working for now.

Drew

running openmp on cluster requires per-application configuration file

John, Dave, and I were trying to run a Visual Studio 2005 OpenMP code on cluster nodes. We ran into an extra wrinkle with the isolated assembly for Visual Studio’s OpenMP. In addition, a curiously low-tech bug in Visual Studio kept us from discovering we had the correct solution.

Typical deployment of isolated assemblies on a cluster node
We have already covered the typical story for isolated assemblies. If an application uses a dll which was compiled as an isolated assembly, then the application has to contain a manifest which specifies those assemblies. Visual Studio automatically embeds that manifest in the application. When you deploy that application to a new machine, Microsoft recommends that you use an installer to install that assembly as a shared assembly, meaning that the installer places the assembly in the WinSxS folder in the Windows folder, which functions like a GAC. Visual Studio even includes a vcredist.exe program to install all of the release versions of Visual Studio isolated assemblies into the WinSxS folder.

Because we are running on cluster nodes, we want to know how to deploy those isolated assemblies, instead, as private assemblies. This is usually a simple matter. You find directory for the isolated assembly, such as Microsoft.VC80.DebugCRT, in C:\Program Files\Visual Studio 8\VC\redist\Debug_NonRedist\x86. Copy the whole directory to the same directory as your executable on the cluster node, and everything should work.

By the way, Dave installed the redistributable, release version, isolated assemblies on one of the clusters. We have to figure out whether we will install all Visual Studio assemblies on the clusters for the users. The problem will still exist for other isolated assemblies, although we don’t, at this point, know of any coming from companies other than Microsoft or products other than Visual Studio.

How OpenMP requires another step to deploy

When you try the above technique for OpenMP, the executable still fails to find the assembly on the cluster node. When Visual Studio compiles your OpenMP program, it embeds a manifest file that refers to version 8.0.50608.0 of the OpenMP assembly, but the actual assembly is version is 8.0.50727.42. Why would this work on your machine? It works because your computer has a machine-level policy, sitting in %windows%\WinSxS\Policies, which states that, if any application asks for version 8.0.50608.0, it’s okay to give them 8.0.50727.42.

The version mismatch between the distributed assembly and the assembly required by the compiler is a known problem. It happens because different parts of Visual Studio are frozen at differnt times before delivery. What we have to do to make this run on the cluster is to deploy that same policy as an application configuration file.

Application configuration files for native application files are very similar to those for managed files. If your program is myprog.exe, you make a text file, with UTF-8 encoding (with or without a byte-order mark), called myprog.exe.config. In it, you redirect the program’s binding to the newer version of the debug OpenMP assembly. Note that, when you identify the DebugOpenMP assembly, you leave out the version attribute because you will specify it in the bindingRedirect node.

<span style="color:#3333ff;"><configuration>
<windows>
<assemblybinding xmlns="urn:schemas-microsoft-com:asm.v1">
<assemblyidentity name="myprog.exe" processorarchitecture="x86" version="1.0.0.1" type="win32">
<dependentassembly>
<assemblyIdentity type="win32" name="Microsoft.VC80.DebugOpenMP"
processorArchitecture="x86"
publicKeyToken="1fc8b3b9a1e18e3b"/>
<bindingredirect oldversion="8.0.50608.0" newversion="8.0.50727.42">
</dependentassembly>
</assemblybinding>
</windows>
</configuration>
</span>

If we had deployed the OpenMP assembly to WinSxS, so that it was a shared assembly, I think that there would have been a machine-level policy to take care of the version mismatch. As is, we had to make this file for ourselves and deploy it in the same directory as myprog.exe.

How a little bug made this hard to figure out
Yesterday morning, we had figured out that there was a version mismatch and that an application configuration file was necessary. We had copied the debug OpenMP assembly directory into the same directory as our application, but nothing worked. After five hours, we noticed that the debug OpenMP assembly directory was called Micrsoroft.VC80.DebugOpenMP. If you look in your Studio 2005 VC\Debug_NonRedist\x86 directory, you’ll find that the directory name is wrong.

Drew Dolgert

Attempting to contribute to VTK open source

We use the Visualization Toolkit a lot around here to make standalone applications or to use in our CAVE. I usually write the applications in Python using wxPython, but sometimes the client needs an installer and a regular windows application, so I like to use Windows Forms in that case. To that end, I normally wrap VTK code in managed C++. There were, and still are, compiled wrappers provided by a group in Slovenia, but this seems to have been a masters thesis, and they never released the code.

It seemed like a great chance to contribute to the project, so I've created wrappers that could fold into the source tree.
  • Managed C++ wrappers for VTK.
  • Allow use in C#, managed C++, VB, and others.
  • .NET 2.0 only, meaning Visual Studio 2005.
  • Builds within the typical CMake build system for VTK.
  • Includes a .NET Windows Forms Control for drag-and-drop in Visual Studio designer.

We'll have to see how successful these are. Because the CMake build system does not handle managed C++ projects, the build isn't entirely hands-free. Just before hitting Build in Visual Studio, the user has to execute a macro to convert the projects to managed code. That problem means this wrapper couldn't be built in nightly dashboard tests. I'd be willing to extend CMake to handle managed projects if that's what it takes.

Drew Dolgert

Tuesday, March 21, 2006

Welcome

I work on scientific simulations and visualization. Some of my friends plan to chip in. We are driven to start this blog by having spent too much time solving technical problems we are sure others have faced. Welcome.