SuperSight: a graphical enhancement mod for Brøderbund's Stunts

The story so far: in the past episode I have delved into Stunts' rendering engine and modified its executable to alter the shape of the cone of vision.

Part II

Hex-editing the binary file of a game is fun, but it has its limits. If I wanted to do something more powerful with Stunts’ graphical engine I needed to work at the source code level. Which meant I had to operate the Restunts build machinery.

Restunts uses tricky and fragile tricks to piece together .asm and .c files, but miracolously the system works. The documentation is very good and the build system fulfills its purpose well. After spending time trying to rewrite the makefiles to work with Linux, I found that it’s much easier to just follow the intended workflow and run make from a Wine console.

Wine is a wonderful tool, but its console is a horror. It has no tab completion, the command history is non persisted and, worst of all, pressing CTRL+C closes the console instead of stopping the program running in it.

I have seen several convoluted build processes, so Restunts’ one did not scare me. It should have though: I was using my Linux PC to run an emulation of a Windows console, where I called a 32-bit make program which creates multiple invocations of Borland’s 16-bit TASM assembler, each one in its own child DOSBox window, and other calls to Borland’s 32-bit C compiler in the main Wine console, and the pandemonium is followed by a call to a 16-bit linker.

Amazingly, the final product compiles and runs successfully, and the resulting executable is not distingushable from the original game!

I made a small tweak in the makefile to get rid of the cascade of subwindows and redirect all the build output to the Wine console. After that, the build experience become much smoother. Running the makefile took a couple of minutes, allowing a decent pace of iteration¹. A problem that I only partially solved was that not all the dependencies are stated explicitly, and more than one time I was blocked by a bug due a stale file that make had not recompiled due to the missing dependency. After a while I got a sense of when a manual clean of the build products was necessary.

I returned to frame.c and started analyzing again. Most of the variables had been given meaningful names by Restunts’ contributors, but a couple of them were misleading, and there were still many entities that still bore the generic names autogenerated by the disassembler. I begun renaming variables and function as I gained more understanding of their role in the code.

A wonder of Restunts, that I was unfortunately forced to abandon, is its dynamic system of variable name mapping. The original contributors used the IDA Pro disassembler to create a table of entity names: when they discovered the meaning of a variable or a function, the exe file could be disassembled anew, and the emitted .asm and .c files would both contain the new names. But I do not own IDA Pro, so I had to use more primitive methods. For the .c files I could employ my QtCreator IDE, but I also had to keep the corresponding assembly files synchronized, so a blind recursive search & replace using sed proved to be the most practical solution. An added advantage is that I could keep track of the invocation to sed, so that I could port them later to the vanilla project. The result of my efforts can be seen in today’s version of frame.c, where in addition to the new variable names I added extensive comments documenting some of the crucial phases of the algorithm engine.

A wrong plan: the bad part

No good reverse-engineering story would be realistic without including some false tracks and faulty analyses. However, in this project I feel I spent the majority on my time working on the wrong things. Still, it’s good to write down the process, at least as example that one can complete a successful project even going through some false paths.

Let’s briefly recap the goal: we want to extend the number of rendered track tiles from the default 23 to many more, as much as the memory allows. For that we need to alter the sizes of the corresponding arrays. Reading the code to identify them, I noted down some data allocated in the stack of the rendering function, update_frame:

unsigned char tiles_to_draw_terr_type_vec[24];
char should_skip_tile[24];
char tile_detail_level[24];
char tiles_to_draw_south[24];
char tiles_to_draw_east[24];
unsigned char tiles_to_draw_elem_type_vec[24];

and some global variables

extern char* lookahead_tiles_tables[];
extern struct TRANSFORMEDSHAPE3D currenttransshape[29];
extern int transformedshape_zarray[];
extern int transformedshape_indices[];
extern char transformedshape_arg2array[];

whose definitions are in the assembly file dseg.asm. Let’s have a look:

.model medium
nosmart
; ...
DGROUP group dseg
dseg segment byte public 'STUNTSD' use16
    assume cs:dseg
    ; ...
    ; ...

lookahead_tiles_tables     dw offset unk_3BE9A
    dw offset unk_3BEE0
    dw offset unk_3BF26
    dw offset unk_3BF6C
    dw offset unk_3BFB2
    dw offset unk_3BFF8
    dw offset unk_3C03E
    dw offset unk_3BE54
    ; ...

unk_3BE9A     db 2
    db 252
    db 2
    db 1
    ; ...

currenttransshape     db 0
    db 0
    db 0
    db 0
    ; ...

transformedshape_zarray     dw 0
    db 0
    db 0
    db 0
    ; ...

transformedshape_indices     dw 0
    db 0
    db 0
    db 0
    ; ...

transformedshape_arg2array     db 0
    db 0
    db 0
    db 0
    ; ...

dseg ends
end

The variable lookahead_tiles_tables was called off_3C084 in my past article. It contains the coordinates of the tiles to render, expressed as offsets with respect to the tile where the camera is: this is obviously pertinent to the goal. What is not pertinent are the other four arrays, which are also used by update_frame but are unrelated to the number of tiles. Somehow I thought the contrary: I noticed that currenttransshape and the transformedshape_* arrays all have 29 elements, and assumed that 29 was somehow the sum of the 23 tiles to render with 6 fixed elements, so that if I wanted to increase the rendered tiles I had to enlarge those arrays proportionally. This error would burden me for the rest of the project.

Following on my bad intuition, I changed the size of all the arrays. Enlarging them seemed as simple as defining new bytes in the data segment, keeping into account the size of the type of the array elements. The dup directive made the definitions clearer:

; Assuming we want to store 110 items

; currenttransshape: 20 bytes/item, +10 slack bytes
currenttransshape db 2210 dup(0)
; transformedshape_zarray: 2 bytes/item, +30 slack bytes
transformedshape_zarray db 250 dup(0)
; transformedshape_indices: 2 bytes/item, +30 slack bytes
transformedshape_indices db 250 dup(0)
; transformedshape_arg2array: 1 byte/item, +16 slack bytes
transformedshape_arg2array db 126 dup(0)

A wrong plan: the good part

After this misled change, I proceeded to the update_frame function to adjust the C code. What I did there was much more appropriate and is still present in the current version of my mod, with various incremental improvements.

The original game defines the tile offsets in eight different arrays and selects one of them according to the heading of the camera, but I did not want to follow such approach. On modern emulators, virtual CPU cycles are abundant while memory is still bound to the original limits of DOS, so it was more convenient to just define one list of tile offsets and rotate/mirror it as appropriate at every time step, depending on the camera orientation.

Moreover, hand editing the original 23-element array of offsets during the development of “Stunts with binoculars” had been unpleasant and I had no desire to repeat the experience on a much bigger matrix. I decided to write a simple Python program to automatically create the list, which allowed me to iterate much faster.

To determine shape and size of the cone of vision, I considered typical camera views:

Some typical views used by players: first-person, chase cam and bird's-eye.

Tiles we should render are:

Those in a ~120° cone around the camera axis. The real angle of view is not so wide, but we must keep into account that we are rounding the axis to the nearest cardinal direction, so that when the heading approaches 45° discarding too many lateral tiles risks to cut out necessary pieces of the scenario. This happens in the vanilla game: note, in the second image, the missing piece of hill on the right.
Some tiles behind the camera, since they may fall in the image when seeing the scene from a bird’s-eye view. The vanilla game not doing that results in glitches: note, in the third picture, the missing road in front of the car (lower part of the picture)

So, imagining a car at (0, 0) and pointing towards the y axis, I decided to select as candidate tiles those in the infinite triangle defined by y >= abs(x) * 0.5 - 2. I also created a formula to determine the drawing priority, so that when changing the number of tiles to draw I could automatically choose the N most important ones. The nearest tiles were obviously privileged, but I also added a coefficient to prefer tiles close to the viewing axis (y in the image) against nearer but more lateral ones. The result is a formula calculating a sort of weighed distance.

The tiles and their weighed distance (lower = more important) for a camera located at (0, 0) and heading North. Greyed out tiles are assumed to be always behind the plan of view and hence never rendered.

The script I wrote performs the calculations and sorting, emitting a ready-to-use C array of tiles. When taking 110 tiles, the value I chose after some experiments, the distance threshold is at 6.6. The script prints all the tiles having a weighed distance lower than the threshold, sorted by decreasing distance. We need to use such ordering since, as we experienced in the first part, more distant tiles must be rendered earlier in order for the painter’s algorithm to display the scene correctly.

The tile coordinates are in “x = East, y = North” system for a car pointing North, but they are valid for any orientation as long as they are rotated properly. We can consider them to be expressed in a camera-centered system, with the two axes pointing right and forward with respect to the camera view, or rather to its projection on the ground. With this interpretation, I named the coordinates “width” and “depth”:

struct TILE_REL_COORDS {
	char width, depth;
};

struct TILE_REL_COORDS lookahead_tiles_supersight[TILES_TO_DRAW_COUNT] = {
{  -6,   4 },
{   0,  10 },
{   5,   6 },
{  -5,   6 },
{   6,   3 },
{  -6,   3 },
{   2,   9 },
{  -2,   9 },
{   6,   2 },
{  -6,   2 },
{   4,   7 },
// ...

I needed to adjust some lines of the tile-rendering algorithm to use the camera-based coordinates, but nothing major. After that, I was ready to start the experiment.

An early success

After fixing a couple of compile errors, the build process succeeded. The executable started and showed the menu correctly, but froze as soon as I started a race. It was not bad for a first attempt and I started to investigate. I had been warned in many forum posts that altering any part of the assembly code could potentially break the executable: the reason is that the disassembly is not perfect and sometimes the global variables are referred not by their symbol but by their hard-coded address, so that pushing around any piece of memory can result in a disaster.

Armed with this piece of knowledge it was easy to guess what happened. By extending the arrays defined in the global data segment I had displaced the addresses of everything that followed, possibly breaking a lot of instructions relying on those cells being at fixed locations.

The Borland linker produces a practical .MAP file listing the addresses of all entities in the built executable. With some work (due to the relocations made by DOS) one can use this list to get the real memory locations of these vars when the game is running. Moreover, the list is also very useful to determine what was moved after changing the code: just build the project before and after the change and diff the two generated .MAP files.

I devised a simple method to avoid moving entities unnecessarily: instead of enlarging the arrays in place, I renamed them to dummy names and created new, larger copies with the right name at the end of the data segment.

diff --git a/src/restunts/asm/dseg.asm b/src/restunts/asm/dseg.asm
index 6a30254..69633e2 100644
--- a/src/restunts/asm/dseg.asm
+++ b/src/restunts/asm/dseg.asm
@@ -27513,7 +27513,7 @@ td10_track_check_rel     dd 0
 basdres     dd 0
 followOpponentFlag_copy     db 0
     db 0
-currenttransshape     db 0
+qcurrenttransshape     db 0
     db 0
     db 0
     db 0
@@ -38066,7 +38066,7 @@ word_454CE     dw 0
 trackdata6     dd 0
 video_flag3_isFFFF     dw 0
 trackdata18     dd 0
-transformedshape_zarray     dw 0
+qtransformedshape_zarray     dw 0
     db 0
     db 0
     db 0
@@ -39307,7 +39307,7 @@ fontledresptr     dd 0
 someZeroVideoConst     dw 0
 nextPosAndNormalIP     dw 0
 word_45A00     dw 0
-transformedshape_arg2array     db 0
+qtransformedshape_arg2array     db 0
     db 0
     db 0
     db 0
@@ -40239,7 +40239,7 @@ byte_45E16     db 0
     db 0
 passed_security     db 0
     db 0
-transformedshape_indices     dw 0
+qtransformedshape_indices     dw 0
     db 0
     db 0
     db 0
@@ -41823,5 +41823,14 @@ word_46486     dw 0
     db 0
     db 0
     db 0
+; currenttransshape: 20 x TILES_TO_DRAW_COUNT times db
+currenttransshape db 2210 dup(0)
+; transformedshape_zarray: 2 x TILES_TO_DRAW_COUNT times db
+transformedshape_zarray db 250 dup(0)
+; transformedshape_indices: 2 x TILES_TO_DRAW_COUNT times db
+transformedshape_indices db 250 dup(0)
+; transformedshape_arg2array: 1 x TILES_TO_DRAW_COUNT times db
+transformedshape_arg2array db 126 dup(0)
+
 dseg ends
 end

The changes to the data segment (git diff 1fa8687^! -- src/restunts/asm/dseg.asm)

Again, I want to stress that all of this was only needed because I unnecessarily altered arrays unrelated to the number of tiles to display. As a partial consolation, this knowledge would become useful later.

In any case, I was happy to see that the technique had worked. Now I could start the game regularly and, for the first time ever, Stunts was able to display much more than 23 tiles! How did this first prototype fare?

By the way, this track is “No Rails”, by Argammon

Not great, not terrible! Players who know Stunts should be able to recognize how much longer the field of view is. Players unfamiliar with the game will instead point out the flickering and the occasional disappearance of the scenery, and sometimes even of the car. Both groups have good points.

Diving deeper

I spent a couple of hours trying to find out why parts of the scene were vanishing. I suspected my patch had introduced memory corruption, but the game never froze, and the car controls responded regularly even when the graphics went crazy. This behaviour hinted at a glitch in the modified rendering code, something one can find via some printf debugging. “Wait”, I hear you saying, “printf debugging on a graphical game without a separate console?” But sure, why not?

Esthetics be damned: few things are more enjoyable than an experiment that proves your hypothesis after hours of research. If you are too horrified, the appendix 2 might give you some relief.

This video was the successful conclusion of my debug session. Reading more carefully the update_frame function, I had found out that despite its length it mostly performes only tile-level processing. It draws a couple of shapes here and there (notably, the car explosions), but the bulk of the 3D processing is not in the body. On the contrary, it is delegated to a function hidden in a couple of lines at the end of the per-tile loop:

    var_transformresult = transformed_shape_op(&currenttransshape[0]);
    if (var_transformresult > 0) {
        break;
    }

The inconspiciously named transformed_shape_op is a behemoth spanning over 1574 assembly lines. In Restunts it was ported to C, but the glorious effort was only partially successful, leaving a 2400-line blob which mixes C code and commented lines with the original assembly. Despite the scary-looking outcome, the work of the contributors was not in vain: the C translation allowed me to see what was happening. As the snippet above shows, the whole tile processing routine is interrupted if transformed_shape_op returns non-zero. This return code is in its turn determined by the result of an auxiliary function called at the very end of transformed_shape_op:

	word_40ECE = transformed_shape_op_helper(temp0, temp);
	if (word_40ECE == 0) goto loc_25E04;
	return 1;

Thankfully, the helper function is short, and the last lines are easy to understand:

	if (polyinfonumpolys == 0x190) return 1;
	if (polyinfoptrnext <= 0x2872) return 0;
	return 1;

In summary, the 3D renderer allocates a buffer for the graphical primitives to be processed. If the total number of primitives exceeds 400 (0x190) or if they fill up the buffer (which has a size of 10400 = 0x28A0 bytes), the inner algorithm returns a failure code. When this happens, the upper level tile cycle is interrupted and only the elements processed up to this point get rendered. But remember that the tiles are sorted starting from the most distant: therefore, if the scene is too complex the game will only render the farthest shapes and skip those in the immediate surroundings of the player, including the car itself. A couple of strategically placed printfs proved that this is what was happening, a conclusion supported by the fact that the most distant pieces of scenery were rendered even when the other parts started to disappear.

How to prevent the disaster? The simple answer would have been to increase the storage space for the primitives, but I feared that to do this it would not be sufficient to just change the magic numbers in those two lines. I made an attempt anyway, which was unsurprisingly met by game crashes.

Seeing the light

Since altering the variable sizes had already proved to be a minefield, I decided to go for an alternative plan. Noticing that:

CPU cycles are not a factor.
The rendering overflows the buffer bounds only in some moments, when the scene is particularly crowded.
However, we cannot know in advance when we are trying to render too many tiles. When we do, the buffer is already filled with the least important data.

I devised a wasteful but safe solution to the problem:

First try to render the scene with the maximum number of tiles.
If the processing fails due the buffer being full, discard the whole computation, reset the variables and retry with a reduced number of tiles.
Iterate till the cycle succeds.

The iterative algorithm. A `goto` fits well in this brutal but effective implementation.

The method worked better than I hoped. At the typical speed of Dosbox the multiple rendering passes did not slow down the emulated machine significantly, and the tiles ceased to disappear. The field of view was clearly superior to both the original game and the “Stunts with binoculars” prototype I had developed. To top up the eye candy, just as before, I changed the code to always display the high-polygon version of the scenery elements.

So, how did the whole combination turned out?

Original game (left) and mod (right). The lap is my own performance on “One of These Days” by Eric Barros. Just like my replay, the mod is not award-winning but displays great potential.

Pretty well! There were still issues to address, especially the flickering, but the whole apparatus was definitely playable and an improvement over the visuals of the standard game.

It was time to officially baptise and publish my creation. With an announcement on the Stunts Forum, SuperSight v1.0 was released!

The story of my mod continues in Part III.

Later in the year, developer “llm” has managed to replace the assembler with a native Windows equivalent, cutting the build time to a couple of seconds. Good news for future tinkerers! ↩

Contents