I then decided to investigate this a bit further to understand what was causing the issue and retried the operation using the assembly program and launched the Visual Studio 2008 Just-in-Time Debugger when I was given the opportunity to and found out it was crashing on a mov instruction.


Pointed by the yellow arrow, the instruction causing the crash

I exited from the debugger and retried again to check whether it was always crashing on this line and this time, it simply didn’t crash ! Obviously, it was very strange because nothing changed from the other tries: the files’ location were the same and it didn’t try reassembling the program yet. After pondering some seconds, it occurred to me that I didn’t open the file the same way as the previous tries ! My program’s user-interface supports two ways for loading the file: the first being to use a browse button (launching the standard “Open” windows common dialog box, aka GetOpenFileName) and the second being to drag and drop the file to some area of program’s main window itself. The reason why it was happening was not clear but at this point, I decided I would rather fix the bug now that I could reproduce it for sure 100% of the time and went back to my original assembly source code.

As you can see from the code shown in the debugging window and if you know even only a little about assembly, this part of the code is actually a loop iterating backwards to an array to copy a byte value into it (in other words, it serves the same purpose as the memset C function). Even though the source is a byte value, to reduce the number of iterations required, the routine copies this byte to every part of the 32-bit EDX register to be able to copy 4 bytes at once. Once the loop counter becomes less than 4 bytes (the length is not always a multiple of 4), it copies the 4 bytes again at the start of the buffer to ensure it is completely filled.

I seem to recall that the loop was iterating forward in the first versions of the program (in a way similar to a for(i=0;i<length; i++)), but doing this obviously forces to check the current counter value with the value that should stop the loop, which requires an additional cmp instruction, which obviously occupies some bytes in the final executable and takes some CPU cycles every time executed, which I didn’t really want since this loop could potentially iterate a significant number of times. Changing the direction the buffer was populated allowed me to make use of the the signed flag that would be appropriately set automatically by the processor once my counter (ecx register) would reach a negative value.

While the benefits may be considered as pointless and I don’t even bother implementing this kind of tricks when working in higher languages (especially since some compilers may actually be smart enough to make this optimization by themselves) and there are much faster assembly implementations of memory filling (some code was using MMX registers to copy 8 bytes at once with unrolled loops for example), I considered it was the best balance between speed, compactness, and CPU compatibly.

However, the source of the bug lies in having changed of the loop direction: the buffer getting filled was allocated with VirtualAlloc earlier with the maximum length the program would ever have to deal with (0xFFFF / 65535 bytes). As I had allocated exactly the same size, when the loop had to deal with a 0xFFFF sized block, the mov instruction on the first iteration was trying to copy 4 bytes the value at the position 0xFFFF of my buffer, and obviously there was no memory allocated on my buffer after 0xFFFF, resulting in the crash: in other words, a good old typical buffer overflow. The simplest way to solve this without complexifying the code and reducing the loop performance was to increase the memory allocation by 4 bytes to now allocate 0xFFFF+4 bytes (granted, it does not look very clean, but it is a given that when writing in assembly you will use some little tricks... that’s one of the charms of it, after all...).

Now that the bug was fixed, I decided to check why the bug didn’t happen under some circumstances and I was obviously wondering how such a bug could have been unnoticed all this time by all its users without me hearing about it. The only difference regarding bug reproduction was whether the file was loaded through the “Open file” (no crash – probably the way 99% of users were opening it) button or through simply drag and drop (crash).

As I now knew my bug was of the “memory violation” type, for the crash not to occur meant that memory somehow got allocated at the end the buffer I was explicitly allocating through VirtualAlloc. Now, it was the only part of the program I used dynamic memory allocation on the heap, as I only used stack-based local variables in the other parts. Also, as I was programming in assembly, I knew for sure the compiler didn’t insert any code that may have allocated memory without me being aware.

My most probable theory at this point was that the standard Windows’ Open File dialog was actually allocating memory in my application’s address space for its own use and decided to check that. I switched from the Visual Studio Debugger to the very powerful Immunity Debugger (a user-friendly debugger based on OllyDBG), loaded my executable into it and ran it and then loaded the Memory window from the View menu and made a screenshot of it.

 
The program’s memory map after it started running

Now, I did the same after having clicked the “Open file” button to make the Windows file open dialog appear and closed it without loading a file and finally compared both pictures.

MemoryMap2
The program’s memory map after launching the Windows Open File Dialog

As we can clearly see on the second screenshot, there are way more memory regions allocated after having launched the Windows Open File dialog than before, including a memory block starting from 0x008C0000, which happens to be the what follows 0x008B0000 (the address for my buffer that VirtualAlloc was returning, that we can see of the first screenshot of this post) + 0xFFFF (the size of my buffer). As memory was allocated in this condition, my program wasn’t crashing anymore as it could (over)write to the memory previously allocated by the Windows File Open dialog.

This of course can make one anxious as even if software is tested using several input sources, bugs may remain unnoticed and the only way to avoid this probably is to use unit testing or creating test cases using tools like TestComplete to test all the ways an application has to load a file (or perform any other action), which can of course take a great deal of time for bigger applications (Avery Lee, the VirtualDub developer, reports here that developing the test cases is almost as time-consuming as developing the program itself).

PS: Yes, this post’s title indeed is a (very) humble tribute to Mark Russinovichsama.