Alberto Marnetto's Notebook
Final chapter, because content farming has a limit and I'd like to switch back to the debugger window.
As we saw at the end of the previous part, the first release of SuperSight was already very functional, with only the issue of excessive flickering. I got rid of such problem by making my algorithm less greedy: instead of trying to always draw everything at maximum detail, it would only do so on the first attempt. If the rendering failed and discarding tiles was needed, only the objects nearer to the camera would be drawn in the high-polygon version.
This simple change removed most of the flickering. I could have packaged the mod and successfully closed the project. Alas, I was too ambitious and decided to try to improve further. What followed were many, many hours of frustrating attempts.
My previous ventures in the low-level rendering routines had given me a good insight on how Stunts stores its graphical objects. I was confident it would be relatively easy to extend the corresponding containers, breaking free from the original limit of 400 primitives and 10400 bytes.
The function transformed_shape_op
(link) uses a number of variables to organize its objects. Kevin Pickell, lead programmer of Stunts, said that some of the low-level rendering algorithms were written directly in assembly and I suspect that this was one of them. This would explain why its C port is so hard to understand, as well as why the algorithms uses variable-size structures for the primitives.
The roles of some variables are (now) clear to me:
polyinfoptr
: the 10400-byte buffer where the primitives to render are stored. The primitives themselves have variable size, e.g. the size of a polygon primitive scales with the number of vertexes.polyinfoptrnext
: offset of the first free available byte in polyinfoptr
.transshapepolyinfo
: pointer to the primitive in polyinfoptr
currently being processed.polyinfoptrs
: array of pointers into the polyinfoptr
, containing the starting byte of each primitive.polyinfonumpolys
: index of the last valid element in polyinfoptrs
, i.e. represents the total number of primitives.Many other are the variables of which I only have a partial understanding:
poly_linked_list_40ED6
: one of the few variables of whose name I am guilty. A compact linked list where each element is just an integer representing the offset of the following element, while the offset of the cell itself is also the implicit payload and represents the index of a primitive in polyinfoptrs
. I have, however, no certainty about what the ordering represents nor of how the struct is used. The list is only read once, in the assembly routine get_a_poly_info
, and I think that it might encode the z-ordering of the primitives. There remain open points, as this Platonic dialogue illustrates:
var_polyvertunktabptr
: seems to be a pointer in a read-only table describing the game objects, i.e. a “source” pointer used to read the data of the primitives to be copied in the buffer.var_transshapepolyinfoptptr
: equals to transshapepolyinfo
plus 6 bytes, points into the variable-size part of the primitive in polyinfoptr
currently being processed.transshapepolyinfopts
: as above, but for different kind of primitives.Naming things is hard and, I dare say, it shows here. Switching to clearer identifiers, or at least making them less similar to each other, could have helped my comprehension. However, I was afraid to give wrong names or to confuse the original contributors of Restunts, so I mostly kept the existing situation.
At the end, I gave up on the idea of understanding the full transformed_shape_op
. I did not need to do that, however. After going over at the code many times I was reasonably sure that only three variables needed to be enlarged to make space for more graphics. They were the polyinfoptr
buffer as well as the arrays poly_linked_list_40ED6
and polyinfoptrs
. All the other variables were either unrelated or just pointers in these containers.
In hindsight, the analysis was almost right. The only small thing I had missed was the sneaky word_411F6
, that was intended as terminator for the linked list and had therefore to be relocated to be immediately after it. The issue was quickly found and fixed.
It was time to finally proceed and give new breathing space to the rendering engine. I decided to extend the original limits by about 50% and see how it would go. Instead of 400 primitives I would allow 592 (0x250, looks good in hex). The buffer polyinfoptr
would be extended to 15392 (0x3C20) bytes, to keep the original ratio of 26 bytes per primitive which I assume is the maximum size for each item.
The variables to extend are defined in two different places. The polyinfoptr
buffer is allocated through a call to a memory manager, a sort of malloc
. I just needed to set the new size:
diff --git a/src/restunts/c/shape3d.c b/src/restunts/c/shape3d.c index 283ff1e..1a00eda 100644 --- a/src/restunts/c/shape3d.c +++ b/src/restunts/c/shape3d.c @@ -2659,7 +2659,7 @@ void calc_sincos80(void) { } void init_polyinfo(void) { - polyinfoptr = mmgr_alloc_resbytes("polyinfo", 0x28A0); + polyinfoptr = mmgr_alloc_resbytes("polyinfo", 0x3C20); mat_rot_y(&mat_y0, 0); mat_rot_y(&mat_y100, 0x100);
The other variables, polyinfoptrs
and poly_linked_list_40ED6
, were defined in the global data segment. I was confident to know how to do the resize them since I had made a very similar operation in part II, so the end seemed near. Repeating the procedure previously used for currenttransshape
, I renamed the existing instances of poly_linked_list_40ED6
and polyinfoptr
, creating new larger copies at the end of the data segment. The diff in the data segment looked like this:
diff --git a/src/restunts/asm/dseg.asm b/src/restunts/asm/dseg.asm index 09ef909..b586124 100644 --- a/src/restunts/asm/dseg.asm +++ b/src/restunts/asm/dseg.asm @@ -22280,8 +22282,8 @@ mat_y300 db 0 transshapepolyinfo dd 0 word_40ECE dw 0 select_rect_param dw 0 -poly_linked_list_40ED6 dw 0 +qpoly_linked_list_40ED6 dw 0 dw 0 dw 0 dw 0 @@ -22681,8 +22683,8 @@ poly_linked_list_40ED6 dw 0 dw 0 dw 0 dw 0 -word_411F6 dw 0 -polyinfoptrs dd 0 +qword_411F6 dw 0 +qpolyinfoptrs dd 0 dd 0 dd 0 dd 0 @@ -41823,5 +41825,21 @@ word_46486 dw 0 db 0 db 0 db 0 ;; The variables added in part II ; currenttransshape: 20 x TILES_TO_DRAW_COUNT times db currenttransshape db 2210 dup(0) ; transformedshape_zarray: 2 x TILES_TO_DRAW_COUNT times db transformedshape_zarray db 250 dup(0) ; transformedshape_indices: 2 x TILES_TO_DRAW_COUNT times db transformedshape_indices db 250 dup(0) ; transformedshape_arg2array: 1 x TILES_TO_DRAW_COUNT times db transformedshape_arg2array db 126 dup(0) ;; The additions +;; poly_linked_list_40ED6: MAX_POLY_INFO_COUNT times dw +poly_linked_list_40ED6 dw 250h dup(0) +;; terminator for the list? In doubt, we copy that here +word_411F6 dw 0 +;; polyinfonumpolys: MAX_POLY_INFO_COUNT times dd, plus 6 db +polyinfoptrs dd 250h dup(0) + +db 6 dup(0) ;; some slack space just in case dseg ends end
There remained to adjust the references to the new sizes in the codebase. Fortunately, the numeric values of the original sizes (0x190 and 0x28A0) were peculiar enough that a simple search and replace could adjust the codebase without causing collateral damage. I just needed to rememeber to also adjust the clause if (polyinfoptrnext <= 0x2872)
in transformed_shape_op
, to keep the threshold just below the new buffer size.
My hopes high, I compiled the result and launched the game. It started fine, a good sign. I selected my car and track and started a race. I drove for maybe five seconds before the screen started to go crazy, and the game crashed some moment later.
Not a unusual result for a first attempt, I thought, so I made a couple of small changes and restarted. Same result. I tried again; this time the screen corruption happened already in the track selection screen. I entered in a frantic cycle of small adjustments and new experiments, but the only result would be a merciless sequence of new and creative examples of software gore.
At this point a wise person would have accepted that the previous version was good enough and that it was not worth the effort to try to further push the limits. After all, the game was constrained by the hard bound of the 16-bit segmented memory architecture, so any improvement would have been not revolutionary anyway. But I did not want to yield. “Wir müssen wissen. Wir werden wissen.” said Hilbert, and giving up without even having an idea of why my plan was impossible felt like an intolerable defeat. Moreover, even in my Stunt Island project I was about to throw the towel when a last determined attempt had given me the victory, and I was confident it could happen again.
I like to keep a positive tone in my writing and I don’t think that recounting dozens of hours of failed attempts would be constructive. I’ll just recount my two most important tries, in the hope the techniques can be of help in other contexts.
One of the relocated variables was poly_linked_list_40ED6
, which seemed to me the most fragile structure. The original copy occupies 400 bytes starting from the memory address 2CC4:5766
. DOSBox Debug allows to set breakpoints whenever a memory address gets written via the command BPM $address
, but has no support for monitoring an entire region. Undeterred, I programmatically printed a long list of BPM
commands to cover most of the area:
for (( i = 0x5766; i < 0x5a88; i+= 32 )) ; do printf 'bpm 2CC4:%X \n' i ; done
and copy-pasted them in the debug console. From there I examined the resulting structure. During a running game, it could look like this:
2CC4:5760 00 00 00 00 59 4C 01 00 02 00 03 00 04 00 05 00
2CC4:5770 06 00 07 00 08 00 09 00 0A 00 0B 00 0C 00 0D 00
2CC4:5780 0E 00 0F 00 10 00 11 00 12 00 13 00 14 00 15 00
2CC4:5790 16 00 17 00 18 00 19 00 1A 00 1B 00 1C 00 1D 00
2CC4:57A0 1F 00 21 00 1E 00 22 00 20 00 23 00 24 00 48 00
2CC4:57B0 2B 00 2A 00 29 00 50 00 28 00 2F 00 2D 00 3F 00
2CC4:57C0 31 00 30 00 45 00 26 00 43 00 33 00 37 00 36 00
2CC4:57D0 25 00 35 00 41 00 3D 00 3B 00 32 00 3A 00 3E 00
2CC4:57E0 39 00 38 00 40 00 4E 00 4D 00 4F 00 4B 00 4C 00
2CC4:57F0 49 00 4A 00 27 00 46 00 47 00 44 00 2E 00 42 00
2CC4:5800 34 00 3C 00 2C 00 51 00 52 00 53 00 54 00 55 00
2CC4:5810 56 00 57 00 58 00 59 00 5A 00 5B 00 5C 00 5D 00
2CC4:5820 5E 00 5F 00 60 00 61 00 62 00 63 00 64 00 67 00
2CC4:5830 66 00 6D 00 68 00 69 00 6A 00 6B 00 6C 00 65 00
2CC4:5840 6E 00 6F 00 70 00 71 00 72 00 73 00 76 00 75 00
2CC4:5850 7C 00 77 00 78 00 79 00 7A 00 7B 00 74 00 7D 00
2CC4:5860 7E 00 7F 00 80 00 81 00 82 00 8B 00 84 00 85 00
2CC4:5870 86 00 87 00 88 00 89 00 8A 00 8C 00 83 00 8D 00
2CC4:5880 8E 00 8F 00 90 00 91 00 92 00 93 00 94 00 96 00
2CC4:5890 97 00 98 00 9B 00 99 00 9A 00 95 00 9C 00 9D 00
2CC4:58A0 9E 00 9F 00 A0 00 FF FF 00 00 00 00 00 00 00 00
The first cell of the list occupies the int16 at 2CC4:5766
, i.e. the 7th and 8th byte of the first row, forming the number 0x0001. It means that polygon 0x0 in the list is followed by 0x1. On its turn 0x1 is followed by 0x2, etc. But notice how cell 0x24, occupying the last two bytes in the 5th row, contains the number 0x0048. This means that polygon 0x24 is followed by 0x48. Jumping to cell 0x48 we find that the following polygon is 0x3C, etc. Jumping from cell to cell we should cover all the list, assuming that we start from the right one1. The last cell, in this case 0xA0, contains bytes FF FF
, i.e. -1, the usual terminator number.
To check if the structure is still sound one should go through the chain every time the container is changed, ensuring that all the links work properly, without loops or orphan elements. An unreasonable task. The real insight I got from staring at this zone is of a different nature: when running the modified game, the one where this linked list has been relocated elsewhere, this very memory area eventually got hit with writes. Which is a clear sign of malfunction, since the original copy of the list should have been inactivated and the game should not try to touch it!
Very well, I had at last a track to follow: why is the game writing to a dead memory area? Did I forget to update some pointers? But after thorough scan of the code I could not find any forgotten references, nor hard-coded addresses. So I decided to use one last tool in the arsenal: HEAVYLOG
.
The HEAVYLOG
command has two benefits: first, it grinds DOSBox’s speed almost to a halt, which someone could regard as an annoying collateral effect but can come useful in some circumstances. The second purpose, the intended one, is to store in a ring buffer the state of the registers in the latest 20000 cycles. The operator can then in any moment decide to terminate the emulation and dump the buffer into a text file, obtaining a CAT scan of the CPU status during the most relevant moments.
Here is the output produced by the debugger during one of the crashes, where the inactivated copy of the linked list got clobbered. An exercise for the fool: this blob of information actually contains important insights about what went wrong – can you find it out before I spell the solution below?
167B:000012EC cmp [bp-04],ax ss:[C89E]=2E70 EAX:0000FFAD EBX:0000407C ECX:00000098 EDX:00009E8B ESI:0000005C EDI:0000005C EBP:0000C8A2 ESP:0000C85E DS:2CC4 ES:0000 FS:0000 GS:0000 SS:2CC4 CF:0 ZF:0 SF:0 OF:0 AF:1 PF:0 IF:1
167B:000012EF jc 000012CC ($-25) (up) EAX:0000FFAD EBX:0000407C ECX:00000098 EDX:00009E8B ESI:0000005C EDI:0000005C EBP:0000C8A2 ESP:0000C85E DS:2CC4 ES:0000 FS:0000 GS:0000 SS:2CC4 CF:1 ZF:0 SF:0 OF:0 AF:1 PF:1 IF:1
167B:000012CC les bx,[bp-3E] ss:[C864]=77C4 EAX:0000FFAD EBX:0000407C ECX:00000098 EDX:00009E8B ESI:0000005C EDI:0000005C EBP:0000C8A2 ESP:0000C85E DS:2CC4 ES:0000 FS:0000 GS:0000 SS:2CC4 CF:1 ZF:0 SF:0 OF:0 AF:1 PF:1 IF:1
167B:000012CF mov ax,es:[bx] es:[77C4]=FF22 EAX:0000FFAD EBX:000077C4 ECX:00000098 EDX:00009E8B ESI:0000005C EDI:0000005C EBP:0000C8A2 ESP:0000C85E DS:2CC4 ES:0000 FS:0000 GS:0000 SS:2CC4 CF:1 ZF:0 SF:0 OF:0 AF:1 PF:1 IF:1
167B:000012D2 mov dx,es:[bx+02] es:[77C6]=478B EAX:0000FF22 EBX:000077C4 ECX:00000098 EDX:00009E8B ESI:0000005C EDI:0000005C EBP:0000C8A2 ESP:0000C85E DS:2CC4 ES:0000 FS:0000 GS:0000 SS:2CC4 CF:1 ZF:0 SF:0 OF:0 AF:1 PF:1 IF:1
167B:000012D6 mov bx,[bp-0A] ss:[C898]=4080 EAX:0000FF22 EBX:000077C4 ECX:00000098 EDX:0000478B ESI:0000005C EDI:0000005C EBP:0000C8A2 ESP:0000C85E DS:2CC4 ES:0000 FS:0000 GS:0000 SS:2CC4 CF:1 ZF:0 SF:0 OF:0 AF:1 PF:1 IF:1
167B:000012D9 mov [bx],ax ds:[4080]=0000 EAX:0000FF22 EBX:00004080 ECX:00000098 EDX:0000478B ESI:0000005C EDI:0000005C EBP:0000C8A2 ESP:0000C85E DS:2CC4 ES:0000 FS:0000 GS:0000 SS:2CC4 CF:1 ZF:0 SF:0 OF:0 AF:1 PF:1 IF:1
167B:000012DB mov [bx+02],dx ds:[4082]=0000 EAX:0000FF22 EBX:00004080 ECX:00000098 EDX:0000478B ESI:0000005C EDI:0000005C EBP:0000C8A2 ESP:0000C85E DS:2CC4 ES:0000 FS:0000 GS:0000 SS:2CC4 CF:1 ZF:0 SF:0 OF:0 AF:1 PF:1 IF:1
167B:000012DE add word [bp-0A],0004 ss:[C898]=4080 EAX:0000FF22 EBX:00004080 ECX:00000098 EDX:0000478B ESI:0000005C EDI:0000005C EBP:0000C8A2 ESP:0000C85E DS:2CC4 ES:0000 FS:0000 GS:0000 SS:2CC4 CF:1 ZF:0 SF:0 OF:0 AF:1 PF:1 IF:1
167B:000012E2 add word [bp-3E],0004 ss:[C864]=77C4 EAX:0000FF22 EBX:00004080 ECX:00000098 EDX:0000478B ESI:0000005C EDI:0000005C EBP:0000C8A2 ESP:0000C85E DS:2CC4 ES:0000 FS:0000 GS:0000 SS:2CC4 CF:0 ZF:0 SF:0 OF:0 AF:0 PF:1 IF:1
167B:000012E6 inc word [bp-04] ss:[C89E]=2E70 EAX:0000FF22 EBX:00004080 ECX:00000098 EDX:0000478B ESI:0000005C EDI:0000005C EBP:0000C8A2 ESP:0000C85E DS:2CC4 ES:0000 FS:0000 GS:0000 SS:2CC4 CF:0 ZF:0 SF:0 OF:0 AF:0 PF:0 IF:1
167B:000012E9 mov ax,[bp-02] ss:[C8A0]=FFAD EAX:0000FF22 EBX:00004080 ECX:00000098 EDX:0000478B ESI:0000005C EDI:0000005C EBP:0000C8A2 ESP:0000C85E DS:2CC4 ES:0000 FS:0000 GS:0000 SS:2CC4 CF:0 ZF:0 SF:0 OF:0 AF:0 PF:1 IF:1
167B:000012EC cmp [bp-04],ax ss:[C89E]=2E71 EAX:0000FFAD EBX:00004080 ECX:00000098 EDX:0000478B ESI:0000005C EDI:0000005C EBP:0000C8A2 ESP:0000C85E DS:2CC4 ES:0000 FS:0000 GS:0000 SS:2CC4 CF:0 ZF:0 SF:0 OF:0 AF:0 PF:1 IF:1
167B:000012EF jc 000012CC ($-25) (up) EAX:0000FFAD EBX:00004080 ECX:00000098 EDX:0000478B ESI:0000005C EDI:0000005C EBP:0000C8A2 ESP:0000C85E DS:2CC4 ES:0000 FS:0000 GS:0000 SS:2CC4 CF:1 ZF:0 SF:0 OF:0 AF:1 PF:0 IF:1
167B:000012CC les bx,[bp-3E] ss:[C864]=77C8 EAX:0000FFAD EBX:00004080 ECX:00000098 EDX:0000478B ESI:0000005C EDI:0000005C EBP:0000C8A2 ESP:0000C85E DS:2CC4 ES:0000 FS:0000 GS:0000 SS:2CC4 CF:1 ZF:0 SF:0 OF:0 AF:1 PF:0 IF:1
167B:000012CF mov ax,es:[bx] es:[77C8]=8B08 EAX:0000FFAD EBX:000077C8 ECX:00000098 EDX:0000478B ESI:0000005C EDI:0000005C EBP:0000C8A2 ESP:0000C85E DS:2CC4 ES:0000 FS:0000 GS:0000 SS:2CC4 CF:1 ZF:0 SF:0 OF:0 AF:1 PF:0 IF:1
167B:000012D2 mov dx,es:[bx+02] es:[77CA]=0A57 EAX:00008B08 EBX:000077C8 ECX:00000098 EDX:0000478B ESI:0000005C EDI:0000005C EBP:0000C8A2 ESP:0000C85E DS:2CC4 ES:0000 FS:0000 GS:0000 SS:2CC4 CF:1 ZF:0 SF:0 OF:0 AF:1 PF:0 IF:1
Here is the point: we are in the medium assembly model, where DS=SS
, i.e. function stack and global data share the same memory segment. But we see the stack pointer EBP:0000C85E
, right in the middle of the linked list, which means the function stack has grown so much that has overwritten the global data!
Now, there are weaknesses in this line of reasoning – especially the question of why this does not happen in the original game. But whether right or wrong, this observation got me out of several days of fruitless bug hunting and gave me some way to go forward.
If the data segment is overflowing, we must make space. This led me to the ambitious idea: attempting to move the enlarged arrays into an extra segment. Since the DS and ES selectors were already used, I created a new segment to be refererred by the FS
register, and moved there some of the variables I had relocated:
fseg segment byte public use16
assume cs:dseg
assume es:nothing, ss:nothing, ds:dseg, fs:fseg
public polyinfoptr
public poly_linked_list_40ED6
public word_411F6
public polyinfoptrs
public transformedshape_zarray
public transformedshape_indices
public transformedshape_arg2array
transformedshape_zarray db 250 dup(0)
transformedshape_indices db 250 dup(0)
transformedshape_arg2array db 126 dup(0)
polyinfoptr dw 700
poly_linked_list_40ED6 dw 190h dup(0)
word_411F6 dw 0
polyinfoptrs dd 300h dup(0)
db 6 dup(0)
fseg ends
Then I added the segment definition at the end of segments.asm
, which I understood was used by the linker to compose the binary image.
fseg segment byte public use16
ends
Last touch was to add the .386
directive, since 286s and earlier processors do not have a FS
register.
I emphasize that I only had a partial understanding of what I was doing, so I felt like a bit like the sorcerer’s apprentice. I spent a couple of days trying to make it work, but the results were the same as before.
Now I think I know why my extra segment experiment was doomed to fail: first, the segment definition file in Restunts is fragile and if written without care can lead to variables and code ending up in the same place. This could have anyway be solved, and I made various experiments to add “spacing regions” to try to get all the bytes to their right places. Second, and more important, is the fact that storing the variables in a different segment changes the instructions size. As noticed in the appendix 2, altering the size of an assembly function poses a high risk of corrupting the executable. Creating a new segment would modify the size of all functions using the variables located there, so producting a functioning binary was practically impossible.
The same reason doomed to fail my very last attempt, changing the data model to “huge”, i.e. forcing all data pointers to be 32-bit sized. The only noteworthy result was a further surreal experience: this debugging session, where the “step over” function of DOSBox stopped working properly and started to jump several instructions at each invocation.
core=normal
makes “step over” work in any circumstance. However I have never seen anything like that when debugging properly functioning executables.
At this point I felt it was time to wrap up. I had failed to lift the 400-polygon limitation and it was bitter to close with a defeat, but I had given all what I could and even the improvement in depth of sight of the original mod was anyway impressive. In parallel with my experiments I had also made an extensive restructuring of makefile (see appendix 3) to remove some bugs introduced by the previous rewrites in C, so that SuperSight worked as drop-in replace for the original game with perfect reliability. Or did it?
The mod was unstable even with the original polygon limit? This could have been the fatal blow as I did not have any clue of how to address this. But I started the code editor again, determined to make it work one way or another. I was lucky enough to find a reproducible way to crash the game, so that I could immediately test if any remedy was effective.
I ventured again in the rendering code and finally I noticed what I had failed to realize in the previous two weeks: namely, that the resizing of currenttransshape
and of the transformedshape_*
arrays was unnecessary; I could use the original versions and spare myself their relocation. And, lo and behold, removing the larger copies freed a couple of kilobytes in the data segment, and this was all what was needed to make the game work flawlessly!
And that’s not all – the resized polyinfo
arrays required much less space than what the enlargement of currenttransshape
had costed. Once removed currenttransshape
I could indeed put polyinfo
in the freed space and increase the number of primitives to 592 without the game protesting. Then I increased again the heap-allocated polyinfoptr
buffer, settling to a final value of 13 KB. And so it was: after almost two weeks of fruitless attempts, a sudden realization solved all my problems. I published the enhanced version of SuperSight the very same night that I got the bug report.
The SuperSight project was finally at the end! It took some suffering but it was enjoyable and instructive, and the final result was well appreciated by the community.
This has been a very long writeup and I don’t want to bore you further with long philosopical reflections. I’ll just mention one takeaway I got, namely that I should use more powerful debugging tools in future. DOSBox debug is not bad for a quick analysis, but a product like Spice86 would have made my life easier. I hope I will find a good project to try it on soon!
Also, if you have some more minutes, consider visiting the Stunts portal. The community is still thriving, many decades after this little racing game was released, and there is always a competition awaiting to see your lap on this month’s track, or maybe to hear your childhood memories of the game in the forum!
This is usually the first, but it’s not a given: a separate variable might contain the head of the list. But I digress. ↩