mailto: blog -at- heyrick -dot- eu

Disassembling some ABC

Following a recent discussion about some of the (many!) quirks of ABC, I thought I'd have a look at what is going on within the compiled program.

We'll start with something simple.

10 END

This generates an executable of 3,140 bytes.

The first word is a branch. This skips ahead to the program startup code.

Then follow two words that are &FFFFFFCE (-50 if signed, or &32 in hex). Given that they are &32, I wonder if this is some sort of 32 bit compatible marker?

Afterwards, 102 words that are null, with the exception of the 18th which is &FF. Then follows an embedded copyright identity for ABC, and then 452 words that are mostly null, though some have a value, purpose unknown.

After this, what appears to be "ARM" following a 12 byte, twice, with some other data. Purpose unknown (were there ever plans to target other architectures?) but it seems to be required to initialise the library.

Finally, our actual program (at &88EC) that looks like this:

BL     &8900
ADR    R0, &8B50
ADD    R0, R0, #0
CMP    PC, #0
B      &8B40

The first line, the branch, goes to this (at &8900):

STMDB  R13!, {R0, R14}
LDMIA  R13!, {R0, PC}

I'm not sure what this is for, perhaps memory allocation (unused in this case), however it is interesting that the compiler baked in the dummy code instead of realising that the code did nothing and simply not including it at all.

The word at &8B50 is our return code, so the next two lines set up a long ADR (even though it is unnecessary).

The CMP of PC with 0? No idea. Because the final branch is to the termination and it looks like this:

LDR    R2, [R0], #4
ADR    R1, &8C30
SWI    OS_Exit

The LDR will retrieve the return code, stashing it into R2. Notice the writeback to add 4 to R0 after the read. The subsequent word will be the error block.
The ADR sets R1 to point to the magic string "ABEX".
And, finally, we tell the OS to pass control to the most recent exit handler. This is a normal tidy way of exiting a program.

So, as you can see, the flags aren't used so I'm not sure what the point of the CMP PC, #0 is.

Back to our disassembling. We've seen the program, we've seen the empty function. Now there's even more code that appears to be unused. At &8908 we have this:

BL     &891C
ADR    R0, &8B50
ADD    R0, R0, #0
CMP    PC, #0
B      &8B40
STMDB  R13!, {R0, R14}
LDMIA  R13!, {R0, PC}

Yeah. It's basically the entire program all over again. Only this one isn't called. It, uh...does nothing.

Following this is the library startup that we come to from the branch at the very start of the program.

This stashes the value of R14 (on entry) to the word at &8B4C (I can forgive a pointless ADRL here as it's probably boilerplate code copied verbatim). R10 is then set to point to &8C34.
Next, we get the amount of memory from OS_GetEnv, setting up R13 to point to the limit (it's a full descending stack), and R11 to point 4096 bytes lower. This is probably being used as a stack limit. Should R13 <= R10 then we're out of stack!

Next we point R1 to the "ARM" strings mentioned earlier (specifically: &0,&C,"ARM",&0,&0,&0,&0,&C,"ARM",&0,&0,&0) ready to call the SWI XABCLib_Init2632.

It is interesting that, unlike compiled C programs, ABC programs do not attempt to load or version check the ABC module. It is purely up to the !Run file to so that. If the module is not present, the call will fail, and the program will abort with the error "ABCLib not loaded".
Also, it's worth noting that for all of the start-up rigamarole, ABC suffers the exact same problem as SharedCLibrary, in that it is possible to kill off the module with active programs running.

 

The error generation code is a bit weird. It is:

ADR    R0, &8B94
CMP    PC, #&80000000
CMNVC  PC, &#80000000
SWI    OS_GenerateError

The first line points to the error block, error &81A901 followed by the textual message "ABCLib not loaded". The next two lines ensure that the V flag is set. And finally we call the non-X form of OS_GenerateError to raise the error and pass it to the current error handler (which usually has the side effect of aborting the program unless a custom handler is in use).

Thing is, why are we setting the V flag? OS_GenerateError is a veneer for the failed SWI return case. Literally the first thing it does is set the V flag. I NOPed out the two instructions to set V, killed off ABCLib, and ran the program. It behaved, as expected, exactly the same way.

Whatever R10 was on entry to the library initialisation, it is also something on exit, which is stored at &86E8-&400 (finally, an ADRL that does something!). The value is &8C34, which is four words from the end of the executable (and where the heap will be placed).
R9 is then set to R11 (the base of our stack) minus R10.

R8 is set to point to &8804-&800, which means it's that &FFFFFFCE word we saw at the start. R7 is then set to point to &86C0-&400 which appears to be some empty memory.
R0 and R1 are loaded from R8+0 and R8+4 respectively, meaning both will be &FFFFFFCE word.

The next bit of code is weird. a TST of the value in R1 with &80000000, and if EQ we branch.
But it isn't, so we don't.
Next a TST of R1 with &FF, and if NE we branch.
We do.
A simple RSB of R1 with itself and 0 basically flips the value from &FFFFFFCE to &32 (like abs()). This, for some unknown reason, is then multiplied with the value in R9, to arrive at the value of &C7A7DD8. R0 is then set to &64 and we're then making our first jump into ABCLib.

This is with a 4MiB slot. It looks 'safe' up to about 40MiB. Going any further causes the high bit to be set, for example 42MiB results in R1 being &833A7DD8. This is obviously being treated as a signed value somewhere because a little bit further on, we are tossed out with a "Not enough memory" error.

I'm going to skip what happens next. It looks like ABCLib returns from that call with R0 set to &FFFFFFCE again, so the test is largely performed again based upon that, and some other stuff happens. I'm guessing there are some sorts of magic values going on here, and without access to the ABC source code, it's mostly going to be wild guesses.

All the manipulations done, we claim a heap based at &8C34 (following the "ABEX" string), four words from the end of the executable (it's okay, they're null words). With a 4MiB slot, the heap is calculated as being &1FF1E0 bytes long (just a shade over 2MiB). Two blocks are claimed by the runtime. The first is 88 bytes, and the second is 2048 bytes.
If we're using the {NEWHEAP} option, there's an extra call into the library just after the second heap claim.

It looks like the basic memory mapping of the program is like this:

We jump into the library two more times, and finally a branch to &88EC which is the start of our code.

Following this branch, four blocks of code calling OS_GenerateError for the following errors: Failed to initialise workspace, Not enough memory, ABCLib not loaded, and Array subscript out of range.
Then there's the exit handler previously discussed, then the word where the entry R14 is stored, then the word that is our return code. This is followed by a word that is "END" with a null terninator. It appears that this is what is passed to OS_Exit as the error block pointer!

Then follow four error blocks that correspond to the four OS_GenerateError code blocks.

Finally, we have three "useful functions". The first looks like it strips flags from R14 (so 26 bit behaviour) to get an address, the contents of which are loaded into R0. R1 is set to 0. R14 is incremented to point at the next instruction before calling into the library. Looks like something that might be passed to Debugger_Disassemble?

The second routine sets R7-R1 (yes, backwards) to zero, then returns to caller. Perhaps used prior to setting up a SWI? If so, one might want to let them know that SYS accepts values in R8 and R9 these days.

The final routine uses three EOR instructions to swap the contents of R0 and R1, then calls into the library.

Finally, the "ABEX" word used by OS_Exit, and four null words that are replaced by the start of the heap.

&nbsp;

Looking at a dump of the program post-execution, we can spot some other things.

Many of the words from &800C to &82D0 appear to be unused. I wonder if this is perhaps used as temporary workspace for, say, constructs like PRINT "Result is "+FNtest ?

The words from &830C to &8348 are a base 2 table from 1 to &8000.

The words from &834C to &8388 are a series of values that is like the base 2 table minus one: 1, 3, 7, 15, 31... these are typically used as mask values, like masking a byte is (value AND 255).

The words from &838C to &83C8 is an inverse series of mask values starting from the second nibble.

The words from &83CC to &8650 are all LDR PC,&xxxx as they are the jump points into the ABC library module. With 644 words, there are 161 different routines that could be called.
This is followed, &8654 to &88D8, by a list of addresses used by the jump points. It works in the same manner as SharedCLibrary in using LDR to pick up an address to push directly into PC.

Following that are those two "ARM" strings, and now we have a complete overview of how the program is put together.

&nbsp;

Yes, that second routine (R7-R1 = 0) is used by SYS calls. I made a test program that made a SYS call, and it was assembled as follows:

BL      &8C1C
MOV     R3, #3
MOV     R2, #2
MOV     R1, #1
MOV     R0, #0
SWI     XOS_CRC
BL      &8934

It calls the function to blank all of the registers, sets up the ones it needs, then directly calls the SWI. The final BL is the one to an empty function. Then we would carry on to set R0 to point to the return code, compare PC with 0, and call the exit routine.

It is worth noting that ABC cannot cope with using registers R8 and R9 in SYS calls. So I hope you never wish to use the following with ABC:

  • DeviceFS_CallDevice
  • FileCore_* (most of them)
  • OS_CallAVector
  • OS_DynamicArea 0 (create area) and 2 (area info)
  • OS_Hardware
  • OS_FSControl 26 (copy) and 28 (count)
  • OS_Upcall 3 (file being modified)
  • SCSI_* (most of them)
Most are unlikely to turn up in ABC programs, but it does imply that you cannot call HAL functions, nor devices (such as USB devices), and you can't use dynamic areas either. You can't say "well, I can make DAs without names" because you don't actually know what value will be in R8. It might not even be a valid address.

I just tried a simple program to create a DA, and the name was "ÎÿÿÿÎÿÿÿ" because at that point, R8 was pointing at the two &FFFFFFCE words.

Looking at the executable of that, it seems that ABC can be quite literal in its behaviour. On return from the OS_DynamicArea call, I keep a copy of R1 (area number) and R3 (area base address) to report to the user.
Which leads to this:

STR     R1, [R12, #128]  ; store DA number
STR     R3, [R12, #132]  ; store DA base
LDR     R0, [R12, #128]  ; pick up DA number for STR$

It looks like ABC uses the bare minimum of registers which is usually only R0 and R1 and sometimes R2, and it performs all of its behaviour in little chunks that load and save values from/to memory, without context as to what is happening elsewhere, otherwise an optimiser might have noticed that the LDR could be easily replaced by MOV R0, R1.

To put that comment into context, I wrote something similar in C. The first version (using a single printf command) noted that the printf was being called with R0 being a pointer to the string, R1 being the first parameter, and R2 being the second. So the entire output was to ADR the string to R0, and then move R3 into R2 (as R1 happened to be already correctly set up).

I split the printf into pieces to better reflect how BASIC behaves, and in this case the DDE compiler pushed R1 into R4 and R3 into R5. It then ADRed R0 to what was to be printed, retrieved into R1 one of the values in R4 or R5 if necessary, and then called the printf routine. Repeat until it is all printed out. Nothing is loaded nor stored as in this instance the necessary data can be entirely held in registers.

 

Well, what else to do on a drizzly day? ☺

 

 

Your comments:

Please note that while I check this page every so often, I am not able to control what users write; therefore I disclaim all liability for unpleasant and/or infringing and/or defamatory material. Undesired content will be removed as soon as it is noticed. By leaving a comment, you agree not to post material that is illegal or in bad taste, and you should be aware that the time and your IP address are both recorded, should it be necessary to find out who you are. Oh, and don't bother trying to inline HTML. I'm not that stupid! ☺ ADDING COMMENTS DOES NOT WORK IF READING TRANSLATED VERSIONS.
 
You can now follow comment additions with the comment RSS feed. This is distinct from the b.log RSS feed, so you can subscribe to one or both as you wish.

John, 19th September 2020, 21:46
You got drizzle? I should be so lucky, so lucky…
David Pilling, 20th September 2020, 18:29
Mono-spaced in Courier comments. Have you done the usual futile thing of finding out who wrote ABC and asking for the source code. Guess there are a few BBC Basic compilers around.
Rick, 20th September 2020, 20:25
No reason the comments should look different. ;-) 
One day, I might get around to writing some PHP that can take snippets of BASIC and output formatted colourised HTML that I can drop into my pages. One day... 
 
There are three BASIC compilers that I know about. 
 
The first, that is easily discounted, is Whizz. It is incomplete. I downloaded a copy ages ago and messed around with it to work on a 32 bit system (it is crunched BASIC). It can compile a number of simple programs, but the output code is *HORRIBLE*. In one case, I noticed it stacking all of the registers FIVE times in a row simply to set a register to 1 and then unstack them all again. Indeed, a program created by Whizz WILL spend the majority of its time stacking and unstacking everything. 
This is acceptable in an early incomplete build of a compiler. But, sadly, that's all Whizz is. There's no source, it doesn't ever seem to have been continued, and... 
 
The next compiler is RiscBasic by Silicon Vision. Everybody says it is "as good as BASIC" and supports everything correctly. It last got updated for the StrongARM and then it pretty much vanished into obscurity. I guess the SV guys gave up on RISC OS at that point? 
 
Finally, we have ABC. The current BASIC compiler supplied with the DDE. It has numerous quirks and restrictions (for some people, anything regarding the use of LOCAL is a berserk button) and more importantly it only supports the RISC OS 3.10 dialect of BASIC. 
Sources are NDA or something, ROOL has a copy but like much of the DDE they can't be made public. 
I have asked. And have been asking and dropping hints for... oh my god... over half a DECADE ago. To be honest, I've lost hope and lost interest. 
 
Which is a shame, because when there's somebody (who already has the DDE) who wants to try to extend the features to cover some more "recent" (as in 1996!) versions of BASIC, such as COLOUR r,g,b and who is willing to do it with respect of the closed nature of the sources AND for free... it seems somewhat silly not to take them up on it. 
But, alas, I guess it is somebody's pet and they don't want to let it go. It only recently (as in last year) gained support for understanding hex in lower case or mixed case ... something I noted way back when; but then I'd have been inclined to toss the string at OS_ReadUnsigned or whatever and get the OS to handle it, rather than trying to roll my own hex parser. 
Steve Drain, 20th September 2020, 22:03
There is also sBASIC. It is very simple, written in sensible BASIC and outputs BASIC assembler. It was never developed.
Rick, 20th September 2020, 22:38
I've also just found Dan's Tiny BASIC Compiler. 
It is extremely rudimentary, not terribly helpful (parser errors usually result in "Missing"), but I still have to commend it. 
Why? 
Because it's a command line program that looks like it was written for Arthur. Yet not only it but ALSO IT'S OUTPUT work exactly as they should on a 32 bit Pi! 
(I haven't tested everything, but what I've tried so far has worked) 
Steve Drain, 21st September 2020, 11:56
Just some details for SBASIC: 
 
By Barry Wickett in Acorn User May 1992
Rick, 21st September 2020, 14:10
Just been in touch with Darren regarding Whizz. He no longer has a copy. Something else lost to time. :-( 
 
AU in May 92? I think that's post Yellow Pages isn't it? So I'd need both the magazine and an image of the cover disc. The magazine I can sort - 8bs has scans of them...
Rick, 21st September 2020, 14:17
Nope, the listings are all there. Got a copy from http://8bs.com/aumags.htm, I'll toss the relevant pages to the laser when I get home... ;-)
Steve Drain, 21st September 2020, 16:40
I have an image of the disc, or just the app, if you want it.
Rick, 21st September 2020, 19:46
Sure, a copy of the app would save typing it all in. ;-) 
 
Whoa, that takes me back. Printed out the stuff, plus the reviews of Hearsay 2 and the PC card (I have one of those somewhere) and kind of got stuck reading through all the stuff in the WE adverts and their infamous stripey green/black edged pages. Just like I did countless times way back when...

Add a comment (v0.11) [help?] . . . try the comment feed!
Your name
Your email (optional)
Validation Are you real? Please type 40224 backwards.
Your comment
French flagSpanish flagJapanese flag
Calendar
«   September 2020   »
MonTueWedThuFriSatSun
 25
791011
151820
2425
282930    

(Felicity? Marte? Find out!)

Last 5 entries

List all b.log entries

Return to the site index

Geekery
 
Alphabetical:

Search

Search Rick's b.log!

PS: Don't try to be clever.
It's a simple substring match.

Etc...

Last read at 04:57 on 2024/09/09.

QR code


Valid HTML 4.01 Transitional
Valid CSS
Valid RSS 2.0

 

© 2020 Rick Murray
This web page is licenced for your personal, private, non-commercial use only. No automated processing by advertising systems is permitted.
RIPA notice: No consent is given for interception of page transmission.

 

Have you noticed the watermarks on pictures?
Next entry - 2020/09/21
Return to top of page