SuperSight: a graphical enhancement mod for Brøderbund's <em>Stunts</em> (part III)

Final chapter, because content farming has a limit and I'd like to switch back to the debugger window.

Part III

As we saw at the end of the previous part, the first release of SuperSight was already very functional, with only the issue of excessive flickering. I got rid of such problem by making my algorithm less greedy: instead of trying to always draw everything at maximum detail, it would only do so on the first attempt. If the rendering failed and discarding tiles was needed, only the objects nearer to the camera would be drawn in the high-polygon version.

This simple change removed most of the flickering. I could have packaged the mod and successfully closed the project. Alas, I was too ambitious and decided to try to improve further. What followed were many, many hours of frustrating attempts.

Dissecting the behemoth

My previous ventures in the low-level rendering routines had given me a good insight on how Stunts stores its graphical objects. I was confident it would be relatively easy to extend the corresponding containers, breaking free from the original limit of 400 primitives and 10400 bytes.

The function transformed_shape_op (link) uses a number of variables to organize its objects. Kevin Pickell, lead programmer of Stunts, said that some of the low-level rendering algorithms were written directly in assembly and I suspect that this was one of them. This would explain why its C port is so hard to understand, as well as why the algorithms uses variable-size structures for the primitives.

Naming things is hard and, I dare say, it shows here. Switching to clearer identifiers, or at least making them less similar to each other, could have helped my comprehension. However, I was afraid to give wrong names or to confuse the original contributors of Restunts, so I mostly kept the existing situation.

At the end, I gave up on the idea of understanding the full transformed_shape_op. I did not need to do that, however. After going over at the code many times I was reasonably sure that only three variables needed to be enlarged to make space for more graphics. They were the polyinfoptr buffer as well as the arrays poly_linked_list_40ED6 and polyinfoptrs. All the other variables were either unrelated or just pointers in these containers.

In hindsight, the analysis was almost right. The only small thing I had missed was the sneaky word_411F6, that was intended as terminator for the linked list and had therefore to be relocated to be immediately after it. The issue was quickly found and fixed.

Going above the limits

It was time to finally proceed and give new breathing space to the rendering engine. I decided to extend the original limits by about 50% and see how it would go. Instead of 400 primitives I would allow 592 (0x250, looks good in hex). The buffer polyinfoptr would be extended to 15392 (0x3C20) bytes, to keep the original ratio of 26 bytes per primitive which I assume is the maximum size for each item.

The variables to extend are defined in two different places. The polyinfoptr buffer is allocated through a call to a memory manager, a sort of malloc. I just needed to set the new size:

The other variables, polyinfoptrs and poly_linked_list_40ED6, were defined in the global data segment. I was confident to know how to do the resize them since I had made a very similar operation in part II, so the end seemed near. Repeating the procedure previously used for currenttransshape, I renamed the existing instances of poly_linked_list_40ED6 and polyinfoptr, creating new larger copies at the end of the data segment. The diff in the data segment looked like this:

There remained to adjust the references to the new sizes in the codebase. Fortunately, the numeric values of the original sizes (0x190 and 0x28A0) were peculiar enough that a simple search and replace could adjust the codebase without causing collateral damage. I just needed to rememeber to also adjust the clause if (polyinfoptrnext <= 0x2872) in transformed_shape_op, to keep the threshold just below the new buffer size.

The abyss

My hopes high, I compiled the result and launched the game. It started fine, a good sign. I selected my car and track and started a race. I drove for maybe five seconds before the screen started to go crazy, and the game crashed some moment later.

Not a unusual result for a first attempt, I thought, so I made a couple of small changes and restarted. Same result. I tried again; this time the screen corruption happened already in the track selection screen. I entered in a frantic cycle of small adjustments and new experiments, but the only result would be a merciless sequence of new and creative examples of software gore.

At this point a wise person would have accepted that the previous version was good enough and that it was not worth the effort to try to further push the limits. After all, the game was constrained by the hard bound of the 16-bit segmented memory architecture, so any improvement would have been not revolutionary anyway. But I did not want to yield. “Wir müssen wissen. Wir werden wissen.” said Hilbert, and giving up without even having an idea of why my plan was impossible felt like an intolerable defeat. Moreover, even in my Stunt Island project I was about to throw the towel when a last determined attempt had given me the victory, and I was confident it could happen again.

I like to keep a positive tone in my writing and I don’t think that recounting dozens of hours of failed attempts would be constructive. I’ll just recount my two most important tries, in the hope the techniques can be of help in other contexts.

Analysis of the linked list

One of the relocated variables was poly_linked_list_40ED6, which seemed to me the most fragile structure. The original copy occupies 400 bytes starting from the memory address 2CC4:5766. DOSBox Debug allows to set breakpoints whenever a memory address gets written via the command BPM $address, but has no support for monitoring an entire region. Undeterred, I programmatically printed a long list of BPM commands to cover most of the area:

for (( i = 0x5766; i < 0x5a88; i+= 32 )) ; do printf 'bpm 2CC4:%X \n' i ; done

and copy-pasted them in the debug console. From there I examined the resulting structure. During a running game, it could look like this:

2CC4:5760   00 00 00 00 59 4C 01 00 02 00 03 00 04 00 05 00
2CC4:5770   06 00 07 00 08 00 09 00 0A 00 0B 00 0C 00 0D 00
2CC4:5780   0E 00 0F 00 10 00 11 00 12 00 13 00 14 00 15 00
2CC4:5790   16 00 17 00 18 00 19 00 1A 00 1B 00 1C 00 1D 00
2CC4:57A0   1F 00 21 00 1E 00 22 00 20 00 23 00 24 00 48 00
2CC4:57B0   2B 00 2A 00 29 00 50 00 28 00 2F 00 2D 00 3F 00
2CC4:57C0   31 00 30 00 45 00 26 00 43 00 33 00 37 00 36 00
2CC4:57D0   25 00 35 00 41 00 3D 00 3B 00 32 00 3A 00 3E 00
2CC4:57E0   39 00 38 00 40 00 4E 00 4D 00 4F 00 4B 00 4C 00
2CC4:57F0   49 00 4A 00 27 00 46 00 47 00 44 00 2E 00 42 00
2CC4:5800   34 00 3C 00 2C 00 51 00 52 00 53 00 54 00 55 00
2CC4:5810   56 00 57 00 58 00 59 00 5A 00 5B 00 5C 00 5D 00
2CC4:5820   5E 00 5F 00 60 00 61 00 62 00 63 00 64 00 67 00
2CC4:5830   66 00 6D 00 68 00 69 00 6A 00 6B 00 6C 00 65 00
2CC4:5840   6E 00 6F 00 70 00 71 00 72 00 73 00 76 00 75 00
2CC4:5850   7C 00 77 00 78 00 79 00 7A 00 7B 00 74 00 7D 00
2CC4:5860   7E 00 7F 00 80 00 81 00 82 00 8B 00 84 00 85 00
2CC4:5870   86 00 87 00 88 00 89 00 8A 00 8C 00 83 00 8D 00
2CC4:5880   8E 00 8F 00 90 00 91 00 92 00 93 00 94 00 96 00
2CC4:5890   97 00 98 00 9B 00 99 00 9A 00 95 00 9C 00 9D 00
2CC4:58A0   9E 00 9F 00 A0 00 FF FF 00 00 00 00 00 00 00 00

The first cell of the list occupies the int16 at 2CC4:5766, i.e. the 7th and 8th byte of the first row, forming the number 0x0001. It means that polygon 0x0 in the list is followed by 0x1. On its turn 0x1 is followed by 0x2, etc. But notice how cell 0x24, occupying the last two bytes in the 5th row, contains the number 0x0048. This means that polygon 0x24 is followed by 0x48. Jumping to cell 0x48 we find that the following polygon is 0x3C, etc. Jumping from cell to cell we should cover all the list, assuming that we start from the right one¹. The last cell, in this case 0xA0, contains bytes FF FF, i.e. -1, the usual terminator number.

To check if the structure is still sound one should go through the chain every time the container is changed, ensuring that all the links work properly, without loops or orphan elements. An unreasonable task. The real insight I got from staring at this zone is of a different nature: when running the modified game, the one where this linked list has been relocated elsewhere, this very memory area eventually got hit with writes. Which is a clear sign of malfunction, since the original copy of the list should have been inactivated and the game should not try to touch it!

Very well, I had at last a track to follow: why is the game writing to a dead memory area? Did I forget to update some pointers? But after thorough scan of the code I could not find any forgotten references, nor hard-coded addresses. So I decided to use one last tool in the arsenal: HEAVYLOG.

The HEAVYLOG command has two benefits: first, it grinds DOSBox’s speed almost to a halt, which someone could regard as an annoying collateral effect but can come useful in some circumstances. The second purpose, the intended one, is to store in a ring buffer the state of the registers in the latest 20000 cycles. The operator can then in any moment decide to terminate the emulation and dump the buffer into a text file, obtaining a CAT scan of the CPU status during the most relevant moments.

Here is the output produced by the debugger during one of the crashes, where the inactivated copy of the linked list got clobbered. An exercise for the fool: this blob of information actually contains important insights about what went wrong – can you find it out before I spell the solution below?

167B:000012EC  cmp  [bp-04],ax                 ss:[C89E]=2E70         EAX:0000FFAD EBX:0000407C ECX:00000098 EDX:00009E8B ESI:0000005C EDI:0000005C EBP:0000C8A2 ESP:0000C85E DS:2CC4 ES:0000 FS:0000 GS:0000 SS:2CC4 CF:0 ZF:0 SF:0 OF:0 AF:1 PF:0 IF:1
167B:000012EF  jc   000012CC ($-25)            (up)                   EAX:0000FFAD EBX:0000407C ECX:00000098 EDX:00009E8B ESI:0000005C EDI:0000005C EBP:0000C8A2 ESP:0000C85E DS:2CC4 ES:0000 FS:0000 GS:0000 SS:2CC4 CF:1 ZF:0 SF:0 OF:0 AF:1 PF:1 IF:1
167B:000012CC  les  bx,[bp-3E]                 ss:[C864]=77C4         EAX:0000FFAD EBX:0000407C ECX:00000098 EDX:00009E8B ESI:0000005C EDI:0000005C EBP:0000C8A2 ESP:0000C85E DS:2CC4 ES:0000 FS:0000 GS:0000 SS:2CC4 CF:1 ZF:0 SF:0 OF:0 AF:1 PF:1 IF:1
167B:000012CF  mov  ax,es:[bx]                 es:[77C4]=FF22         EAX:0000FFAD EBX:000077C4 ECX:00000098 EDX:00009E8B ESI:0000005C EDI:0000005C EBP:0000C8A2 ESP:0000C85E DS:2CC4 ES:0000 FS:0000 GS:0000 SS:2CC4 CF:1 ZF:0 SF:0 OF:0 AF:1 PF:1 IF:1
167B:000012D2  mov  dx,es:[bx+02]              es:[77C6]=478B         EAX:0000FF22 EBX:000077C4 ECX:00000098 EDX:00009E8B ESI:0000005C EDI:0000005C EBP:0000C8A2 ESP:0000C85E DS:2CC4 ES:0000 FS:0000 GS:0000 SS:2CC4 CF:1 ZF:0 SF:0 OF:0 AF:1 PF:1 IF:1
167B:000012D6  mov  bx,[bp-0A]                 ss:[C898]=4080         EAX:0000FF22 EBX:000077C4 ECX:00000098 EDX:0000478B ESI:0000005C EDI:0000005C EBP:0000C8A2 ESP:0000C85E DS:2CC4 ES:0000 FS:0000 GS:0000 SS:2CC4 CF:1 ZF:0 SF:0 OF:0 AF:1 PF:1 IF:1
167B:000012D9  mov  [bx],ax                    ds:[4080]=0000         EAX:0000FF22 EBX:00004080 ECX:00000098 EDX:0000478B ESI:0000005C EDI:0000005C EBP:0000C8A2 ESP:0000C85E DS:2CC4 ES:0000 FS:0000 GS:0000 SS:2CC4 CF:1 ZF:0 SF:0 OF:0 AF:1 PF:1 IF:1
167B:000012DB  mov  [bx+02],dx                 ds:[4082]=0000         EAX:0000FF22 EBX:00004080 ECX:00000098 EDX:0000478B ESI:0000005C EDI:0000005C EBP:0000C8A2 ESP:0000C85E DS:2CC4 ES:0000 FS:0000 GS:0000 SS:2CC4 CF:1 ZF:0 SF:0 OF:0 AF:1 PF:1 IF:1
167B:000012DE  add  word [bp-0A],0004          ss:[C898]=4080         EAX:0000FF22 EBX:00004080 ECX:00000098 EDX:0000478B ESI:0000005C EDI:0000005C EBP:0000C8A2 ESP:0000C85E DS:2CC4 ES:0000 FS:0000 GS:0000 SS:2CC4 CF:1 ZF:0 SF:0 OF:0 AF:1 PF:1 IF:1
167B:000012E2  add  word [bp-3E],0004          ss:[C864]=77C4         EAX:0000FF22 EBX:00004080 ECX:00000098 EDX:0000478B ESI:0000005C EDI:0000005C EBP:0000C8A2 ESP:0000C85E DS:2CC4 ES:0000 FS:0000 GS:0000 SS:2CC4 CF:0 ZF:0 SF:0 OF:0 AF:0 PF:1 IF:1
167B:000012E6  inc  word [bp-04]               ss:[C89E]=2E70         EAX:0000FF22 EBX:00004080 ECX:00000098 EDX:0000478B ESI:0000005C EDI:0000005C EBP:0000C8A2 ESP:0000C85E DS:2CC4 ES:0000 FS:0000 GS:0000 SS:2CC4 CF:0 ZF:0 SF:0 OF:0 AF:0 PF:0 IF:1
167B:000012E9  mov  ax,[bp-02]                 ss:[C8A0]=FFAD         EAX:0000FF22 EBX:00004080 ECX:00000098 EDX:0000478B ESI:0000005C EDI:0000005C EBP:0000C8A2 ESP:0000C85E DS:2CC4 ES:0000 FS:0000 GS:0000 SS:2CC4 CF:0 ZF:0 SF:0 OF:0 AF:0 PF:1 IF:1
167B:000012EC  cmp  [bp-04],ax                 ss:[C89E]=2E71         EAX:0000FFAD EBX:00004080 ECX:00000098 EDX:0000478B ESI:0000005C EDI:0000005C EBP:0000C8A2 ESP:0000C85E DS:2CC4 ES:0000 FS:0000 GS:0000 SS:2CC4 CF:0 ZF:0 SF:0 OF:0 AF:0 PF:1 IF:1
167B:000012EF  jc   000012CC ($-25)            (up)                   EAX:0000FFAD EBX:00004080 ECX:00000098 EDX:0000478B ESI:0000005C EDI:0000005C EBP:0000C8A2 ESP:0000C85E DS:2CC4 ES:0000 FS:0000 GS:0000 SS:2CC4 CF:1 ZF:0 SF:0 OF:0 AF:1 PF:0 IF:1
167B:000012CC  les  bx,[bp-3E]                 ss:[C864]=77C8         EAX:0000FFAD EBX:00004080 ECX:00000098 EDX:0000478B ESI:0000005C EDI:0000005C EBP:0000C8A2 ESP:0000C85E DS:2CC4 ES:0000 FS:0000 GS:0000 SS:2CC4 CF:1 ZF:0 SF:0 OF:0 AF:1 PF:0 IF:1
167B:000012CF  mov  ax,es:[bx]                 es:[77C8]=8B08         EAX:0000FFAD EBX:000077C8 ECX:00000098 EDX:0000478B ESI:0000005C EDI:0000005C EBP:0000C8A2 ESP:0000C85E DS:2CC4 ES:0000 FS:0000 GS:0000 SS:2CC4 CF:1 ZF:0 SF:0 OF:0 AF:1 PF:0 IF:1
167B:000012D2  mov  dx,es:[bx+02]              es:[77CA]=0A57         EAX:00008B08 EBX:000077C8 ECX:00000098 EDX:0000478B ESI:0000005C EDI:0000005C EBP:0000C8A2 ESP:0000C85E DS:2CC4 ES:0000 FS:0000 GS:0000 SS:2CC4 CF:1 ZF:0 SF:0 OF:0 AF:1 PF:0 IF:1

Here is the point: we are in the medium assembly model, where DS=SS, i.e. function stack and global data share the same memory segment. But we see the stack pointer EBP:0000C85E, right in the middle of the linked list, which means the function stack has grown so much that has overwritten the global data!

Now, there are weaknesses in this line of reasoning – especially the question of why this does not happen in the original game. But whether right or wrong, this observation got me out of several days of fruitless bug hunting and gave me some way to go forward.

Relocate the extended variables to another memory segment

If the data segment is overflowing, we must make space. This led me to the ambitious idea: attempting to move the enlarged arrays into an extra segment. Since the DS and ES selectors were already used, I created a new segment to be refererred by the FS register, and moved there some of the variables I had relocated:

fseg segment byte public use16
    assume cs:dseg
    assume es:nothing, ss:nothing, ds:dseg, fs:fseg

    public polyinfoptr
    public poly_linked_list_40ED6
    public word_411F6
    public polyinfoptrs
    public transformedshape_zarray
    public transformedshape_indices
    public transformedshape_arg2array

transformedshape_zarray db 250 dup(0)
transformedshape_indices db 250 dup(0)
transformedshape_arg2array db 126 dup(0)
polyinfoptr     dw 700
poly_linked_list_40ED6 dw 190h dup(0)
word_411F6     dw 0
polyinfoptrs dd 300h dup(0)
db 6 dup(0)
fseg ends

Then I added the segment definition at the end of segments.asm, which I understood was used by the linker to compose the binary image.

fseg segment byte public use16
ends

Last touch was to add the .386 directive, since 286s and earlier processors do not have a FS register.

I emphasize that I only had a partial understanding of what I was doing, so I felt like a bit like the sorcerer’s apprentice. I spent a couple of days trying to make it work, but the results were the same as before.

Now I think I know why my extra segment experiment was doomed to fail: first, the segment definition file in Restunts is fragile and if written without care can lead to variables and code ending up in the same place. This could have anyway be solved, and I made various experiments to add “spacing regions” to try to get all the bytes to their right places. Second, and more important, is the fact that storing the variables in a different segment changes the instructions size. As noticed in the appendix 2, altering the size of an assembly function poses a high risk of corrupting the executable. Creating a new segment would modify the size of all functions using the variables located there, so producting a functioning binary was practically impossible.

The same reason doomed to fail my very last attempt, changing the data model to “huge”, i.e. forcing all data pointers to be 32-bit sized. The only noteworthy result was a further surreal experience: this debugging session, where the “step over” function of DOSBox stopped working properly and started to jump several instructions at each invocation.

Every cloud has a silver lining

At this point I felt it was time to wrap up. I had failed to lift the 400-polygon limitation and it was bitter to close with a defeat, but I had given all what I could and even the improvement in depth of sight of the original mod was anyway impressive. In parallel with my experiments I had also made an extensive restructuring of makefile (see appendix 3) to remove some bugs introduced by the previous rewrites in C, so that SuperSight worked as drop-in replace for the original game with perfect reliability. Or did it?

The mod was unstable even with the original polygon limit? This could have been the fatal blow as I did not have any clue of how to address this. But I started the code editor again, determined to make it work one way or another. I was lucky enough to find a reproducible way to crash the game, so that I could immediately test if any remedy was effective.

I ventured again in the rendering code and finally I noticed what I had failed to realize in the previous two weeks: namely, that the resizing of currenttransshape and of the transformedshape_* arrays was unnecessary; I could use the original versions and spare myself their relocation. And, lo and behold, removing the larger copies freed a couple of kilobytes in the data segment, and this was all what was needed to make the game work flawlessly!

And that’s not all – the resized polyinfo arrays required much less space than what the enlargement of currenttransshape had costed. Once removed currenttransshape I could indeed put polyinfo in the freed space and increase the number of primitives to 592 without the game protesting. Then I increased again the heap-allocated polyinfoptr buffer, settling to a final value of 13 KB. And so it was: after almost two weeks of fruitless attempts, a sudden realization solved all my problems. I published the enhanced version of SuperSight the very same night that I got the bug report.

The SuperSight project was finally at the end! It took some suffering but it was enjoyable and instructive, and the final result was well appreciated by the community.

Conclusion

This has been a very long writeup and I don’t want to bore you further with long philosopical reflections. I’ll just mention one takeaway I got, namely that I should use more powerful debugging tools in future. DOSBox debug is not bad for a quick analysis, but a product like Spice86 would have made my life easier. I hope I will find a good project to try it on soon!

Also, if you have some more minutes, consider visiting the Stunts portal. The community is still thriving, many decades after this little racing game was released, and there is always a competition awaiting to see your lap on this month’s track, or maybe to hear your childhood memories of the game in the forum!

This is usually the first, but it’s not a given: a separate variable might contain the head of the list. But I digress. ↩

SuperSight: a graphical enhancement mod for Brøderbund's Stunts