Monday, October 09, 2006
Fortran Memory Error
This Fortran90 application couples fluid dynamics and chemistry to do combustion. It uses the MKL version of LAPACK. It failed with a page fault, which means that there was a memory problem. How do you find it?
Fortran90 is unlikely to cause memory errors of its own accord. The language doesn't use C-style pointers, although it does add allocatable arrays to the Fortran77 standard. These arrays shouldn't be capable of corruption under normal circumstances.
The first step for Steve was to use the PageHeap tool. The old-fashioned way to use this is to use gflags.exe to turn on the PageHeap switch in the operating system. I read there is a fancier way to use it in Visual Studio 2005, but gflags.exe will suffice.
Turning on PageHeap made the program crash much sooner than before. It crashed in a particular function during exit from that function. Pseudocode for the function follows.
Sub DoWork(array,size)
If size>N
array = Allocate(4000)
end if
End Sub
This function only failed when allocate was not called. We knew that this subroutine implicitly calls deallocate when it exits, so it seemed like there might be an invalid array pointer in the heap. That means that the heap corruption discovered during the implicit deallocation was caused by stack corruption which claimed that the array had been allocated.
Steve's friend had the bright idea to use this function as a test to find where the corruption first occurred. He moved it higher and higher in the code until he found a function call to LAPACK. Looking closely at that function call, he realized that he had called LAPACK to process a matrix but had given it the wrong size for the matrix. LAPACK was overwriting past the matrix boundary. Why was it able to do this? The LAPACK libraries are Fortran77, therefore without a header file. Fortran90 was unable to check that the arguments were correct.
- Drew
Thursday, July 13, 2006
Finally Fixed VTK .NET Widget Bug
The fix for what I thought was a subtle bug is just to use integers instead of the EventId enum. For instance, replace EventIds.EndPickEvent with the number 9. You can look up these values in VTK\Common\vtkCommand.h.
There are no other real bugs in VTK .NET, that I know of. I could do a few things to make debugging easier and wrap HWND values, but that's about it.
That means I can release a version of VTK .NET for VTK 5.0.1. I've already integrated the new release into the patch code, tested the code for the new version, and will package it as soon as I can.
Sunday, June 04, 2006
Compilation of Debugging Tools
For CrashFinder, change the return value of OnNcHitTest to LRESULT in About.{h,cpp} and Statlink.{h,cpp}. The tricky bit is to add #define _USE_32BIT_TIME_T to stdafx.h. Then you can either modify all of the string handling routines or unselect Treat Warnings as Errors to complete the compilation.
Robbins recommends creating map files with the linker options /MAPINFO:EXPORTS and /MAPINFO:LINES. The LINES option in Visual Studio 2005's linker has been removed, but you can resurrect that information using pdb2map from the book's CD. That's the only way I know to get this information.
Thursday, May 11, 2006
VTK .NET to Sourceforge
I posted VTK .NET on 23 March and have had about one hundred people come to the site each week. I can see that people download examples, so it seems to run well. I have submitted a request for a Sourceforge site, using the BSD license, which is the same that VTK uses. That should take another week or two to process.
Bugs Discovered
- I did not modify CMakeLists.txt for vtkMy or vtkLocal, which are the standard ways to wrap your own VTK C++ code for wrapper languages.
- There seems to be a problem with vtk3DWidgets. They mostly work, but a few things do not work. For example, annotatePick.cs from the examples does not notify you of new picks.
What People Want Next
- I saw a comment that the previous version of VTK .NET needed to wrap HWND types so that you could build a Windows Forms Control in C#. I've built one in managed C++, but I see the logic in the request, and I think I can add that translation to the wrapper code.
- Two people who emailed me have commented that they are working on using VTK .NET within ASP.NET. One wanted to use it to generate images on the server side. The other is interested in the more ambitious project of providing manipulable 3D graphics, something that interests me, too. There are a number of challenges to making this happen, but it would clearly be a great thing to implement.
Friday, April 28, 2006
SciPy distribution for P3 fails - what illegal instruction means
As a result the provided installer for Python 2.4 and Pentium III fails for older P3 machines.
scipy-0.4.8.win32-py2.4-pentium3.exe
The test code from that provided distribution, scipy.test() will fail on older P3 chips (such as PIII-S, Tualatin) in several of the package tests with a Dr. Watson error of
Unhandled exception at 0x6988d3f7 in python.exe:
0xC000001D: Illegal instruction
The supplied binaries for ATLAS on a Windows P3 box do not have this problem. I built a running package using online instructions for a MinGW build using the precompiled ATLAS binaries. It is a quick process because you don't have to rebuild ATLAS.
I'm pretty sure we didn't just accidentally install the P4 version because I checked the show_config at the time, and we did it twice. I can't prove it because I didn't save the output. Also, the P4 version fails quickly on the P3, but the fewer tests fail when using an SSE2 P3 version on an older P3.
On the PIII-S where the install failed, the CPU information is
has_mmx has_sse is_32bit is_Intel is_PentiumIII is686
The ATLAS archdef for both the supplied package and my working build was PIII/gcc/misc and mmdef was PIII/gcc/gemm.
We run on Python 2.4.2, Windows XP Service Pack 2
The instructions that failed included cvtsi2sd, movsd, and ucomisd. I found them listed in the MMX section of the Intel assembly specification, but they are included listed only for SSE2 support. For a sense of what the code looks like, the following shows the assembly snippet.
684C5825 mov dword ptr [esp+68h],0
684C582D mov eax,dword ptr [edx]
684C582F mov edx,dword ptr [esi]
684C5831 cmp eax,edx
684C5833 cmovg eax,edx
684C5836 cvtsi2sd xmm0,eax
684C583A movsd mmword ptr [esp+14h],xmm0
684C5840 fld qword ptr [esp+14h]
684C5844 mov dword ptr [esp+5Ch],0
684C584C fmul qword ptr ds:[684C5538h]
EAX = 0000000F EBX = 684C550E ECX = 00902DD0 EDX = 0000000F
ESI = 0021EB6C EDI = 0021EB60 EIP = 684C5836 ESP = 0021E8A4
EBP = 00000001 EFL = 00010216
The instruction pointer is at the cvtsi2sd instruction. Normally, exceptions are thrown by the instruction before the instruction pointer, but here the exception happened just before the instruction pointer would be incremented.
Drew Dolgert
Monday, April 10, 2006
Debugging Mixed Interop Native and Managed Assemblies with Visual Studio 2005
I've been trying to track down a bug in a managed C++ wrapper around native C++ classes, and Visual Studio 2005 won't allow me to step directly from the managed code into the native code. I know it is possible, but this is the only way I see to do it:
- Put a System.Diagnostics.Debug.Assert(false) near the start of the application.
- Run the application and accept the option to debug it.
- Opt to "manually choose debugger," and choose both native and managed.
- Visual Studio 2005 then allows me to step into the native code.
Drew
Thursday, March 23, 2006
unmanaged Visual C++ for Visual Studio 2005 causes (soluble) dll loading problems
This message is about where dlls go when they are part of “isolated assemblies.” Everyday C++ users may run into this when they move their code from a development machine to a node on a compute cluster.
1. When you copy an executable to the cluster, it now needs not only dlls but also manifest files copied with it, or we have to install them on the cluster for the users.
2. There is a common occurrence where a failure in the linker can cause an executable to fail ever to load. There is a quick fix for this.
Part I:
When compiling unmanaged visual C++ applications in VS2005, you have to pay attention to how you distribute the dlls. Normally, you just put Dlls beside the exe. That isn’t working for some good reasons. I’m going to give you an outline here of where the bodies lie, trying to show you pertinent directories, tools, and project settings.
Google keywords: Isolated applications, side-by-side assemblies, winsxs
Web pages:
About isolated apps and assemblies
http://msdn2.microsoft.com/en-us/library/ms235342(en-US,VS.80).aspx
blog with details on how to do installation with isolated apps
http://blogs.msdn.com/nikolad/
Windows Installer XML – use xml file to create an msi.
http://sourceforge.net/projects/wix
There is a section in VS under Linker->Manifests that says you can
- embed manifest – yes/no
- allow isolation – yes/no
Visual Studio now distributes the CRT and debug CRT and versions of ATL and MFC libraries in VS
C:\Program Files\Microsoft Visual Studio 8\VC\redist\x86
C:\Program Files\Microsoft Visual Studio 8\VC\redist\Debug_NonRedist\x86
Here you find the directory Microsoft.VC80.DebugCRT\ with the files
- Microsoft.VC80.DebugCRT.manifest
- msvcm80d.dll
- msvcp80d.dll
- msvcr80d.dll
Executables, under various conditions, are supposed to be able to find dlls in various directories if you put the manifest with them
- C:\Windows\WinSXS
- <appdirectory>
- <appdirectory>\<appname>.exe.local
- <appdirectory>\<manifestname>
Part II: The Common Deal-breaker: Oldnames.lib
When you compile code which links to libraries using older, deprecated methods, the linker finds those methods in an implicit library called Oldnames.lib. Because of a flaw in the linker, any time it pulls in oldnames.lib, it will also pull in a copy of the wrong CRT. That is, if you link against the debug CRT, it will pull in the release CRT, and vice versa. This wouldn’t normally cause the executable to fail to run, except that the linker includes only the correct assembly. For instance, when linking against the debug CRT, the following manifest is included in the executable, even though the linker mistakenly linked to the runtime CRT, as well.
<assembly xmlns="'urn:schemas-microsoft-com:asm.v1'" manifestversion="'1.0'">
<dependency>
<dependentassembly>
<assemblyidentity type="'win32'" name="'Microsoft.VC80.DebugCRT'" version="'8.0.50608.0'" processorarchitecture="'x86'" publickeytoken="'1fc8b3b9a1e18e3b'">
</dependentassembly>
</dependency>
</assembly>
The solution is to tell the linker to ignore the other CRT. In other words, if you link with msvcrtd.lib, you have to explicitly tell the linker to ignore msvcrt.lib, and vice versa. That makes the linker think longer about where it can find the references it needs, and it comes back around to the correct library.
Mind that Microsoft would not consider this problem a linker flaw because I have only seen it happen when linking to libraries older than my current version of Visual Studio, and linking Visual Studio 2005 projects to those of Visual Studio 2003 or Visual Studio 6 is not supported. This is likely confounded with the fact that I am linking release and debug always to a release library, which is not supported. Does any company really distribute their development libraries as MT, MD, MTd and MDd? I've only seen VRCO do this.
At this point, I’ve also created an installer for the C runtimes, as described in the blog link above, so that’s what is loading. I just run vccrt.msi, then my application runs. I used this tool called WIX to install it. It’s a bit of open source from Microsoft. The project doesn't appear to be under development any longer, but it's still working for now.
Drew
running openmp on cluster requires per-application configuration file
Typical deployment of isolated assemblies on a cluster node
We have already covered the typical story for isolated assemblies. If an application uses a dll which was compiled as an isolated assembly, then the application has to contain a manifest which specifies those assemblies. Visual Studio automatically embeds that manifest in the application. When you deploy that application to a new machine, Microsoft recommends that you use an installer to install that assembly as a shared assembly, meaning that the installer places the assembly in the WinSxS folder in the Windows folder, which functions like a GAC. Visual Studio even includes a vcredist.exe program to install all of the release versions of Visual Studio isolated assemblies into the WinSxS folder.
Because we are running on cluster nodes, we want to know how to deploy those isolated assemblies, instead, as private assemblies. This is usually a simple matter. You find directory for the isolated assembly, such as Microsoft.VC80.DebugCRT, in C:\Program Files\Visual Studio 8\VC\redist\Debug_NonRedist\x86. Copy the whole directory to the same directory as your executable on the cluster node, and everything should work.
By the way, Dave installed the redistributable, release version, isolated assemblies on one of the clusters. We have to figure out whether we will install all Visual Studio assemblies on the clusters for the users. The problem will still exist for other isolated assemblies, although we don’t, at this point, know of any coming from companies other than Microsoft or products other than Visual Studio.
How OpenMP requires another step to deploy
When you try the above technique for OpenMP, the executable still fails to find the assembly on the cluster node. When Visual Studio compiles your OpenMP program, it embeds a manifest file that refers to version 8.0.50608.0 of the OpenMP assembly, but the actual assembly is version is 8.0.50727.42. Why would this work on your machine? It works because your computer has a machine-level policy, sitting in %windows%\WinSxS\Policies, which states that, if any application asks for version 8.0.50608.0, it’s okay to give them 8.0.50727.42.
The version mismatch between the distributed assembly and the assembly required by the compiler is a known problem. It happens because different parts of Visual Studio are frozen at differnt times before delivery. What we have to do to make this run on the cluster is to deploy that same policy as an application configuration file.
Application configuration files for native application files are very similar to those for managed files. If your program is myprog.exe, you make a text file, with UTF-8 encoding (with or without a byte-order mark), called myprog.exe.config. In it, you redirect the program’s binding to the newer version of the debug OpenMP assembly. Note that, when you identify the DebugOpenMP assembly, you leave out the version attribute because you will specify it in the bindingRedirect node.
<span style="color:#3333ff;"><configuration>
<windows>
<assemblybinding xmlns="urn:schemas-microsoft-com:asm.v1">
<assemblyidentity name="myprog.exe" processorarchitecture="x86" version="1.0.0.1" type="win32">
<dependentassembly>
<assemblyIdentity type="win32" name="Microsoft.VC80.DebugOpenMP"
processorArchitecture="x86"
publicKeyToken="1fc8b3b9a1e18e3b"/>
<bindingredirect oldversion="8.0.50608.0" newversion="8.0.50727.42">
</dependentassembly>
</assemblybinding>
</windows>
</configuration>
</span>
If we had deployed the OpenMP assembly to WinSxS, so that it was a shared assembly, I think that there would have been a machine-level policy to take care of the version mismatch. As is, we had to make this file for ourselves and deploy it in the same directory as myprog.exe.
How a little bug made this hard to figure out
Yesterday morning, we had figured out that there was a version mismatch and that an application configuration file was necessary. We had copied the debug OpenMP assembly directory into the same directory as our application, but nothing worked. After five hours, we noticed that the debug OpenMP assembly directory was called Micrsoroft.VC80.DebugOpenMP. If you look in your Studio 2005 VC\Debug_NonRedist\x86 directory, you’ll find that the directory name is wrong.
Drew Dolgert
Attempting to contribute to VTK open source
It seemed like a great chance to contribute to the project, so I've created wrappers that could fold into the source tree.
- Managed C++ wrappers for VTK.
- Allow use in C#, managed C++, VB, and others.
- .NET 2.0 only, meaning Visual Studio 2005.
- Builds within the typical CMake build system for VTK.
- Includes a .NET Windows Forms Control for drag-and-drop in Visual Studio designer.
We'll have to see how successful these are. Because the CMake build system does not handle managed C++ projects, the build isn't entirely hands-free. Just before hitting Build in Visual Studio, the user has to execute a macro to convert the projects to managed code. That problem means this wrapper couldn't be built in nightly dashboard tests. I'd be willing to extend CMake to handle managed projects if that's what it takes.
Drew Dolgert