Rick's b.log - 2020/09/19 |
|
It is the 24th of November 2024 You are 3.144.90.108, pleased to meet you! |
|
mailto:
blog -at- heyrick -dot- eu
We'll start with something simple.
This generates an executable of 3,140 bytes.
The first word is a branch. This skips ahead to the program startup code.
Then follow two words that are
Afterwards, 102 words that are null, with the exception of the 18th which is
After this, what appears to be "
Finally, our actual program (at
The first line, the branch, goes to this (at
I'm not sure what this is for, perhaps memory allocation (unused in this case), however it is interesting that the compiler baked in the dummy code instead of realising that the code did nothing and simply not including it at all.
The word at
The
The
So, as you can see, the flags aren't used so I'm not sure what the point of the
Back to our disassembling. We've seen the program, we've seen the empty function. Now there's even more code that appears to be unused. At
Yeah. It's basically the entire program all over again. Only this one isn't called. It, uh...does nothing.
Following this is the library startup that we come to from the branch at the very start of the program.
This stashes the value of
Next we point
It is interesting that, unlike compiled C programs, ABC programs do not attempt to load or version check the ABC module. It is purely up to the !Run file to so that. If the module is not present, the call will fail, and the program will abort with the error "ABCLib not loaded".
The error generation code is a bit weird. It is:
The first line points to the error block, error
Thing is, why are we setting the V flag?
Whatever
The next bit of code is weird. a
This is with a 4MiB slot. It looks 'safe' up to about 40MiB. Going any further causes the high bit to be set, for example 42MiB results in
I'm going to skip what happens next. It looks like ABCLib returns from that call with
All the manipulations done, we claim a heap based at
It looks like the basic memory mapping of the program is like this:
We jump into the library two more times, and finally a branch to
Following this branch, four blocks of code calling
Then follow four error blocks that correspond to the four
Finally, we have three "useful functions". The first looks like it strips flags from
The second routine sets
The final routine uses three
Finally, the "
Looking at a dump of the program post-execution, we can spot some other things.
Many of the words from
The words from
The words from
The words from
The words from
Following that are those two "
Yes, that second routine (R7-R1 = 0) is used by SYS calls. I made a test program that made a SYS call, and it was assembled as follows:
It calls the function to blank all of the registers, sets up the ones it needs, then directly calls the SWI. The final
It is worth noting that ABC cannot cope with using registers
I just tried a simple program to create a DA, and the name was "ÎÿÿÿÎÿÿÿ" because at that point,
Looking at the executable of that, it seems that ABC can be quite literal in its behaviour. On return from the
It looks like ABC uses the bare minimum of registers which is usually only
To put that comment into context, I wrote something similar in C. The first version (using a single printf command) noted that the printf was being called with
I split the printf into pieces to better reflect how BASIC behaves, and in this case the DDE compiler pushed
Well, what else to do on a drizzly day? ☺
Disassembling some ABC
Following a recent discussion about some of the (many!) quirks of ABC, I thought I'd have a look at what is going on within the compiled program.
10 END
&FFFFFFCE
(-50
if signed, or &32
in hex). Given that they are &32
, I wonder if this is some sort of 32 bit compatible marker?
&FF
. Then follows an embedded copyright identity for ABC, and then 452 words that are mostly null, though some have a value, purpose unknown.
ARM
" following a 12
byte, twice, with some other data. Purpose unknown (were there ever plans to target other architectures?) but it seems to be required to initialise the library.
&88EC
) that looks like this:
BL &8900
ADR R0, &8B50
ADD R0, R0, #0
CMP PC, #0
B &8B40
&8900
):
STMDB R13!, {R0, R14}
LDMIA R13!, {R0, PC}
&8B50
is our return code, so the next two lines set up a long ADR
(even though it is unnecessary).
CMP
of PC
with 0
? No idea. Because the final branch is to the termination and it looks like this:
LDR R2, [R0], #4
ADR R1, &8C30
SWI OS_Exit
LDR
will retrieve the return code, stashing it into R2
. Notice the writeback to add 4
to R0
after the read. The subsequent word will be the error block.
The ADR
sets R1
to point to the magic string "ABEX
".
And, finally, we tell the OS to pass control to the most recent exit handler. This is a normal tidy way of exiting a program.
CMP PC, #0
is.
&8908
we have this:
BL &891C
ADR R0, &8B50
ADD R0, R0, #0
CMP PC, #0
B &8B40
STMDB R13!, {R0, R14}
LDMIA R13!, {R0, PC}
R14
(on entry) to the word at &8B4C
(I can forgive a pointless ADRL
here as it's probably boilerplate code copied verbatim). R10
is then set to point to &8C34
.
Next, we get the amount of memory from OS_GetEnv
, setting up R13
to point to the limit (it's a full descending stack), and R11
to point 4096 bytes lower. This is probably being used as a stack limit. Should R13 <= R10
then we're out of stack!
R1
to the "ARM
" strings mentioned earlier (specifically: &0,&C,"ARM",&0,&0,&0,&0,&C,"ARM",&0,&0,&0
) ready to call the SWI XABCLib_Init2632
.
Also, it's worth noting that for all of the start-up rigamarole, ABC suffers the exact same problem as SharedCLibrary, in that it is possible to kill off the module with active programs running.
ADR R0, &8B94
CMP PC, #&80000000
CMNVC PC, �
SWI OS_GenerateError
&81A901
followed by the textual message "ABCLib not loaded". The next two lines ensure that the V flag is set. And finally we call the non-X form of OS_GenerateError
to raise the error and pass it to the current error handler (which usually has the side effect of aborting the program unless a custom handler is in use).
OS_GenerateError
is a veneer for the failed SWI return case. Literally the first thing it does is set the V flag. I NOPed out the two instructions to set V, killed off ABCLib, and ran the program. It behaved, as expected, exactly the same way.
R10
was on entry to the library initialisation, it is also something on exit, which is stored at &86E8-&400
(finally, an ADRL
that does something!). The value is &8C34
, which is four words from the end of the executable (and where the heap will be placed).
R9
is then set to R11
(the base of our stack) minus R10
.
R8
is set to point to &8804-&800
, which means it's that &FFFFFFCE
word we saw at the start. R7
is then set to point to &86C0-&400
which appears to be some empty memory.
R0
and R1
are loaded from R8+0
and R8+4
respectively, meaning both will be &FFFFFFCE
word.
TST
of the value in R1
with &80000000
, and if EQ
we branch.
But it isn't, so we don't.
Next a TST
of R1
with &FF
, and if NE
we branch.
We do.
A simple RSB
of R1
with itself and 0
basically flips the value from &FFFFFFCE
to &32
(like abs()
). This, for some unknown reason, is then multiplied with the value in R9
, to arrive at the value of &C7A7DD8
. R0
is then set to &64
and we're then making our first jump into ABCLib.
R1
being &833A7DD8
. This is obviously being treated as a signed value somewhere because a little bit further on, we are tossed out with a "Not enough memory" error.
R0
set to &FFFFFFCE
again, so the test is largely performed again based upon that, and some other stuff happens. I'm guessing there are some sorts of magic values going on here, and without access to the ABC source code, it's mostly going to be wild guesses.
&8C34
(following the "ABEX
" string), four words from the end of the executable (it's okay, they're null words). With a 4MiB slot, the heap is calculated as being &1FF1E0
bytes long (just a shade over 2MiB). Two blocks are claimed by the runtime. The first is 88 bytes, and the second is 2048 bytes.
If we're using the {NEWHEAP}
option, there's an extra call into the library just after the second heap claim.
&88EC
which is the start of our code.
OS_GenerateError
for the following errors: Failed to initialise workspace, Not enough memory, ABCLib not loaded, and
Then there's the exit handler previously discussed, then the word where the entry R14
is stored, then the word that is our return code. This is followed by a word that is "END
" with a null terninator. It appears that this is what is passed to OS_Exit
as the error block pointer!
OS_GenerateError
code blocks.
R14
(so 26 bit behaviour) to get an address, the contents of which are loaded into R0
. R1
is set to 0
. R14
is incremented to point at the next instruction before calling into the library. Looks like something that might be passed to Debugger_Disassemble
?
R7-R1
(yes, backwards) to zero, then returns to caller. Perhaps used prior to setting up a SWI? If so, one might want to let them know that SYS
accepts values in R8
and R9
these days.
EOR
instructions to swap the contents of R0
and R1
, then calls into the library.
ABEX
" word used by OS_Exit
, and four null words that are replaced by the start of the heap.
&800C
to &82D0
appear to be unused. I wonder if this is perhaps used as temporary workspace for, say, constructs like PRINT "Result is "+FNtest
?
&830C
to &8348
are a base 2 table from 1
to &8000
.
&834C
to &8388
are a series of values that is like the base 2 table minus one: 1, 3, 7, 15, 31... these are typically used as mask values, like masking a byte is (value AND 255)
.
&838C
to &83C8
is an inverse series of mask values starting from the second nibble.
&83CC
to &8650
are all LDR PC,&xxxx
as they are the jump points into the ABC library module. With 644 words, there are 161 different routines that could be called.
This is followed, &8654
to &88D8
, by a list of addresses used by the jump points. It works in the same manner as SharedCLibrary in using LDR
to pick up an address to push directly into PC
.
ARM
" strings, and now we have a complete overview of how the program is put together.
BL &8C1C
MOV R3, #3
MOV R2, #2
MOV R1, #1
MOV R0, #0
SWI XOS_CRC
BL &8934
BL
is the one to an empty function. Then we would carry on to set R0
to point to the return code, compare PC
with 0
, and call the exit routine.
R8
and R9
in SYS calls. So I hope you never wish to use the following with ABC:
Most are unlikely to turn up in ABC programs, but it does imply that you cannot call HAL functions, nor devices (such as USB devices), and you can't use dynamic areas either. You can't say "well, I can make DAs without names" because you don't actually know what value will be in R8
. It might not even be a valid address.
R8
was pointing at the two &FFFFFFCE
words.
OS_DynamicArea
call, I keep a copy of R1
(area number) and R3
(area base address) to report to the user.
Which leads to this:
STR R1, [R12, #128] ; store DA number
STR R3, [R12, #132] ; store DA base
LDR R0, [R12, #128] ; pick up DA number for STR$
R0
and R1
and sometimes R2
, and it performs all of its behaviour in little chunks that load and save values from/to memory, without context as to what is happening elsewhere, otherwise an optimiser might have noticed that the LDR
could be easily replaced by MOV R0, R1
.
R0
being a pointer to the string, R1
being the first parameter, and R2
being the second. So the entire output was to ADR
the string to R0
, and then move R3
into R2
(as R1
happened to be already correctly set up).
R1
into R4
and R3
into R5
. It then ADR
ed R0
to what was to be printed, retrieved into R1
one of the values in R4
or R5
if necessary, and then called the printf routine. Repeat until it is all printed out. Nothing is loaded nor stored as in this instance the necessary data can be entirely held in registers.
John, 19th September 2020, 21:46 You got drizzle? I should be so lucky, so lucky…David Pilling, 20th September 2020, 18:29 Mono-spaced in Courier comments. Have you done the usual futile thing of finding out who wrote ABC and asking for the source code. Guess there are a few BBC Basic compilers around.Rick, 20th September 2020, 20:25 No reason the comments should look different. ;-)
One day, I might get around to writing some PHP that can take snippets of BASIC and output formatted colourised HTML that I can drop into my pages. One day...
There are three BASIC compilers that I know about.
The first, that is easily discounted, is Whizz. It is incomplete. I downloaded a copy ages ago and messed around with it to work on a 32 bit system (it is crunched BASIC). It can compile a number of simple programs, but the output code is *HORRIBLE*. In one case, I noticed it stacking all of the registers FIVE times in a row simply to set a register to 1 and then unstack them all again. Indeed, a program created by Whizz WILL spend the majority of its time stacking and unstacking everything.
This is acceptable in an early incomplete build of a compiler. But, sadly, that's all Whizz is. There's no source, it doesn't ever seem to have been continued, and...
The next compiler is RiscBasic by Silicon Vision. Everybody says it is "as good as BASIC" and supports everything correctly. It last got updated for the StrongARM and then it pretty much vanished into obscurity. I guess the SV guys gave up on RISC OS at that point?
Finally, we have ABC. The current BASIC compiler supplied with the DDE. It has numerous quirks and restrictions (for some people, anything regarding the use of LOCAL is a berserk button) and more importantly it only supports the RISC OS 3.10 dialect of BASIC.
Sources are NDA or something, ROOL has a copy but like much of the DDE they can't be made public.
I have asked. And have been asking and dropping hints for... oh my god... over half a DECADE ago. To be honest, I've lost hope and lost interest.
Which is a shame, because when there's somebody (who already has the DDE) who wants to try to extend the features to cover some more "recent" (as in 1996!) versions of BASIC, such as COLOUR r,g,b and who is willing to do it with respect of the closed nature of the sources AND for free... it seems somewhat silly not to take them up on it.
But, alas, I guess it is somebody's pet and they don't want to let it go. It only recently (as in last year) gained support for understanding hex in lower case or mixed case ... something I noted way back when; but then I'd have been inclined to toss the string at OS_ReadUnsigned or whatever and get the OS to handle it, rather than trying to roll my own hex parser.
Steve Drain, 20th September 2020, 22:03 There is also sBASIC. It is very simple, written in sensible BASIC and outputs BASIC assembler. It was never developed.Rick, 20th September 2020, 22:38 I've also just found Dan's Tiny BASIC Compiler.
It is extremely rudimentary, not terribly helpful (parser errors usually result in "Missing"), but I still have to commend it.
Why?
Because it's a command line program that looks like it was written for Arthur. Yet not only it but ALSO IT'S OUTPUT work exactly as they should on a 32 bit Pi!
(I haven't tested everything, but what I've tried so far has worked)Steve Drain, 21st September 2020, 11:56 Just some details for SBASIC:
By Barry Wickett in Acorn User May 1992Rick, 21st September 2020, 14:10 Just been in touch with Darren regarding Whizz. He no longer has a copy. Something else lost to time. :-(
AU in May 92? I think that's post Yellow Pages isn't it? So I'd need both the magazine and an image of the cover disc. The magazine I can sort - 8bs has scans of them...Rick, 21st September 2020, 14:17 Nope, the listings are all there. Got a copy from http://8bs.com/aumags.htm, I'll toss the relevant pages to the laser when I get home... ;-)Steve Drain, 21st September 2020, 16:40 I have an image of the disc, or just the app, if you want it.Rick, 21st September 2020, 19:46 Sure, a copy of the app would save typing it all in. ;-)
Whoa, that takes me back. Printed out the stuff, plus the reviews of Hearsay 2 and the PC card (I have one of those somewhere) and kind of got stuck reading through all the stuff in the WE adverts and their infamous stripey green/black edged pages. Just like I did countless times way back when...
© 2020 Rick Murray |
This web page is licenced for your personal, private, non-commercial use only. No automated processing by advertising systems is permitted. RIPA notice: No consent is given for interception of page transmission. |