Friday, April 28, 2006

SciPy distribution for P3 fails - what illegal instruction means

I have been trying to help someone install SciPy on our clusters. The site provides a package for P4 which works fine and one for PIII which works on my laptop but fails on our test login node. I downloaded CPU-Z to check things out, and it turns out that my Pentium M (Dothan) laptop supports MMX, SSE, and SSE2 while the dual Pentium III-S ancient server for testing supports only MMX and SSE.

As a result the provided installer for Python 2.4 and Pentium III fails for older P3 machines.

The test code from that provided distribution, scipy.test() will fail on older P3 chips (such as PIII-S, Tualatin) in several of the package tests with a Dr. Watson error of
Unhandled exception at 0x6988d3f7 in python.exe:
0xC000001D: Illegal instruction

The supplied binaries for ATLAS on a Windows P3 box do not have this problem. I built a running package using online instructions for a MinGW build using the precompiled ATLAS binaries. It is a quick process because you don't have to rebuild ATLAS.

I'm pretty sure we didn't just accidentally install the P4 version because I checked the show_config at the time, and we did it twice. I can't prove it because I didn't save the output. Also, the P4 version fails quickly on the P3, but the fewer tests fail when using an SSE2 P3 version on an older P3.

On the PIII-S where the install failed, the CPU information is
has_mmx has_sse is_32bit is_Intel is_PentiumIII is686
The ATLAS archdef for both the supplied package and my working build was PIII/gcc/misc and mmdef was PIII/gcc/gemm.

We run on Python 2.4.2, Windows XP Service Pack 2

The instructions that failed included cvtsi2sd, movsd, and ucomisd. I found them listed in the MMX section of the Intel assembly specification, but they are included listed only for SSE2 support. For a sense of what the code looks like, the following shows the assembly snippet.
684C5825  mov         dword ptr [esp+68h],0 
684C582D mov eax,dword ptr [edx]
684C582F mov edx,dword ptr [esi]
684C5831 cmp eax,edx
684C5833 cmovg eax,edx
684C5836 cvtsi2sd xmm0,eax
684C583A movsd mmword ptr [esp+14h],xmm0
684C5840 fld qword ptr [esp+14h]
684C5844 mov dword ptr [esp+5Ch],0
684C584C fmul qword ptr ds:[684C5538h]

EAX = 0000000F EBX = 684C550E ECX = 00902DD0 EDX = 0000000F
ESI = 0021EB6C EDI = 0021EB60 EIP = 684C5836 ESP = 0021E8A4
EBP = 00000001 EFL = 00010216

The instruction pointer is at the cvtsi2sd instruction. Normally, exceptions are thrown by the instruction before the instruction pointer, but here the exception happened just before the instruction pointer would be incremented.

Drew Dolgert

Monday, April 10, 2006

Debugging Mixed Interop Native and Managed Assemblies with Visual Studio 2005

I've been trying to track down a bug in a managed C++ wrapper around native C++ classes, and Visual Studio 2005 won't allow me to step directly from the managed code into the native code. I know it is possible, but this is the only way I see to do it:

  1. Put a System.Diagnostics.Debug.Assert(false) near the start of the application.

  2. Run the application and accept the option to debug it.

  3. Opt to "manually choose debugger," and choose both native and managed.

  4. Visual Studio 2005 then allows me to step into the native code.