mailto: blog -at- heyrick -dot- eu

Navi: Previous entry Display calendar Next entry
Switch to desktop version

FYI! Last read at 18:30 on 2024/11/21.

HeyRick dot what?

As the banner above says, HeyRick.co.uk is now HeyRick.eu. Yes, EU. Because I'm a European, dammit!

The proper URI is https://heyrick.eu/<whatever>. If you add the "www.", it'll redirect. The banner shows the "www." because I'm traditional and have spent twenty two years manually typing "www." and don't see any reason to change because the rest of the world is lazy. ☺

Don't panic! If you have references to, or bookmarks for, the .co.uk site, they'll redirect accordingly. All the old URIs are still valid, the site itself will convert .co.uk to .eu so you probably didn't even notice until you saw the big blatant banner at the top of the page, right?

One final thing, if you're in a shonky place with WiFi that tries to feed you dodgy certificates, you can still access this site without SSL. While all of the redirects will push you to SSL (it's expected these days), this one does not: http://heyrick.eu/<whatever> (without the 'www.' prefix). It's there in case you need it. This is why.

And, of course, many thanks to Rob for sorting out the transition and the SSL for the new domain. You're able to read all of this rubbish thanks to him. Hint - there's a donate button over there on the right (desktop view) or at the bottom (mobile view). ☺

 

BASIC is weird!

Last updated 2019/06/16.

Every so often, the topic of compiling BASIC arises in the ROOL forums. Or indeed, if BASIC has a formal specification in order to write tools to detect programming errors and the like.

The basic answer (see what I did there?) is No. To both. BASIC is inherently uncompilable, and it has no formal specification. The only specification that exists is that which is printed in the various BASIC manuals, but they describe what the language is, does, and the keywords. This is far from a proper grammar of the language.

But, wait, if BASIC cannot be compiled, how come RiscBasic and ABC? Well, because the majority of BASIC can be compiled, so long as one ignores the EVAL command and the many various edge cases.

Edge cases? Yup. Because BASIC is interpreted, it really only concerns itself with the line of code that it is currently dealing with. For this reason, there are a number of... shall we say... freaky things that are possible.

This page is intended to document some of these freaky things, so the next time somebody talks about writing a BASIC compiler, one can just point to this page.

It isn't exhaustive, but if you happen to know of other good BASIC weirdness, please send me examples by email. Don't write them in the comments below (it'll mess up formatting). Just mail them to me and I'll periodically update this page. Thanks.

 

Note that all of these examples are not likely to be found in actual programs....we hope!...but they are accepted as valid by the BASIC interpreter so, as much as they may disturb the karma, we have little choice but to accept them as being legitimate BASIC code.

 

What sort of variable does a function return?

BASIC supports two types of named function. Procedures (PROC) that don't return a value, and functions (FN) that do.

REM >FNreturntype

PRINT FNgivemeanumber("int")
PRINT FNgivemeanumber("float")
PRINT FNgivemeanumber("string")

PRINT FNgivemeanumber("int") + FNgivemeanumber("int")
PRINT FNgivemeanumber("float") + FNgivemeanumber("float")
PRINT FNgivemeanumber("string") + FNgivemeanumber("string")

END

:

DEFFNgivemeanumber(type$)
  CASE type$ OF
    WHEN "int"    : = 123
    WHEN "float"  : = 1.23
    WHEN "string" : = "123"
  ENDCASE
=0

Which, when run, results in this:

>RUN
       123
      1.23
123
       246
      2.46
123123
>

WHY? One doesn't specify a function exit variable type until the end of the function is reached. Since it's an interpreted language, there's nothing to say that different code paths might not result in different variable types being offered on exit.
By contrast, a language such as C or Pascal contains a fixed exit type in its declaration.

ABC? Cannot compile this, it reports Incompatible types for function when it gets to the string return.

 

Function/procedure entry points

Some programmers believe that functions should have exactly one exit point. I think that can lead to a mass of inefficient conditional blocks, so I believe that a function should exit as soon as possible.
Nobody makes up rules about functions only having one entry point, because that's nonsense. Functions only ever have one entry point...
...don't they?

REM >PROCentrypoint

A% = 0 : PROCaddone   : PRINT A%
A% = 0 : PROCaddtwo   : PRINT A%
A% = 0 : PROCaddthree : PRINT A%
END

DEFPROCaddthree
A% += 1
DEFPROCaddtwo
A% += 1
DEFPROCaddone
A% += 1
ENDPROC

Which, when run, results in this:

>RUN
         1
         2
         3
>

WHY? When the program is being executed, BASIC actually ignores the DEF commands. They are scanned as the program is loaded so BASIC knows where to look for the functions and procedures, but after that...

ABC? Compiles this incorrectly. It reports Unclosed PROC/FN body (warning) and implicitly ends the procedure when encountering the DEF; meaning that each function only does a single +=1, so the behaviour differs from the interpreter (though understandable why).

 

Function calls as parameters, and variable scope

This is based upon something I remember nemo pointing out. That RETURN and using functions as parameters to functions can have rather weird effects.

REM >VarScope1

PRINT "This does what is expected (lowercase, then uppercase):"
A$ = "UPPER"
PROCdisplay(FNlower(A$))
PROCdisplay(A$)

PRINT '"This...does not:"
A$ = "UPPER"
B$ = FNlower(A$)
PROCdisplay(B$)
PROCdisplay(A$)

END

:

DEFFNlower(RETURN A$)
  LOCAL loop%
  FOR loop% = 1 TO LEN(A$)
    MID$(A$, loop%, 1) = CHR$( ASC(MID$(A$, loop%, 1)) OR 32 )
  NEXT
=A$

DEFPROCdisplay(A$)
  PRINT A$
ENDPROC

On the face of it, these are two ways to display the string "UPPER", first in lower case, and then in upper case. But since it's using RETURN, the first one is going to be faulty and the second one, using a second variable (B$) will be fine, right?
Nope!

>RUN
This does what is expected (lowercase, then uppercase):
upper
UPPER

This...does not:
upper
upper
>

WHY? Why the first one doesn't go wrong is due to some peculiar rules as to what the scope of each value is at the time of function call. It appears as if the scope of A$ in the first example is local to the calling function (as it's a parameter) even thought it seems to contain the global value, so updating it by RETURN doesn't corrupt the global value.
The lower version doesn't contain the FNlower call as a function parameter, so the RETURN updates the global value of A$.

ABC? Has 'issues' with local variable scope, so it compiles the program, when then outputs everything in lower case, so behaviour differs from the interpreter.

 

More on variable scope

Cast your eyes over this:
REM >VarScope2

PRINT "Loop one"
REPEAT
  PROCzero
UNTIL z% = 0

PRINT "Loop two"
REPEAT
  PROCzero
UNTIL Z% = 0

END

DEFPROCzero
  LOCAL z%, Z%
  z% = 0 : Z% = 0
ENDPROC

When run, this happens:

>RUN
Loop one
Loop two
(hangs, press ESC to abort)

WHY? This one is interesting. If the zero procedure didn't exist, BASIC would abort with an "Unknown or missing variable" error at the UNTIL z% = 0 test.
However, since that value was declared with local scope, it shouldn't exist outside of the scope of the function. Only it does, it's a known variable name to BASIC and it has been created with the initial value of zero. So the first loop terminates.
The second loop uses the resident integer variable Z%, which if you look using LVAR, you'll see that on entry to BASIC, the resident integers (A% to Z%) are randomly defined. This is an issue particularly for smart BASIC compressors that rename common variables to the same as the resident ones - which can have side effects if one is expecting it to begin with a value of zero.
It gets better. If you LVAR after ESCaping out of the program, you may find that the global variable Z% has been mysteriously set to zero, so running the program a second time would work (observed on my Pi2 using v1.64 from March 2017). Why is the assignment of a local scope variable affecting the global one?
[if this doesn't work for you, try again - if you ESCape within the PROCedure, it happens]

Note, incidentally, that CLEAR doesn't reset the values of the resident integers to zero...

ABC? Cannot compile this, it reports z% is not defined, which is - frankly - a logical response.

 

DIM and EVAL put to the test

My code at the start to build an array and put data into it.
Followed by some code by Sophie Wilson (posted on the ROOL forum by Steve Drain) that will dump the contents of the array.

REM >ArrayDump

REM Create an array
DIM A(3,3,3)

REM Populate it
FOR l1% = 0 TO 3
  FOR l2% = 0 TO 3
    FOR l3% = 0 TO 3
      A(l1%, l2%, l3%) = l1% * l2% * l3%
    NEXT
  NEXT
NEXT

REM Now for some scary Sophie Wilson code
REM to dump the contents of the array.

string$="A("
FOR I = 1 TO DIM( A() )
  string$ += "a" + STR$I + ","
NEXT
RIGHT$(string$, 1) = ")"

FOR a1 = 0 TO DIM(A(), 1)
  IF DIM( A() ) > 1 FOR a2 = 0 TO DIM(A(), 2)
  IF DIM( A() ) > 2 FOR a3 = 0 TO DIM(A(), 3)

  PRINT EVAL string$;

  IF DIM( A() ) > 2 NEXT
  IF DIM( A() ) > 1 NEXT
NEXT

This isn't so much weird, as an example of some BASIC that would give a compiler a nervous breakdown. The general solution is "don't support EVAL", but I wonder how many would support the use of DIM and the loop behaviour in this example?

I'm not going to show the results, it's just a list of the array contents.

But it gets better. The clever code will happily work with a one, two, or three dimensional array. Edit (or delete) the population code at the top, and...

DIM A(14)
DIM A(7,2)
DIM A(77,12,9)
and anything else you want from 1-3 array elements will work.

WHY? Because DIM as a parameter reports the size of the elements of an array. If we were to DIM A%(14,3), then PRINT DIM(A%()) would report 2 (array has two elements), and then PRINT DIM(A%(), 1) would report 14 (size of first element), and PRINT DIM(A%(), 2) would report 3 (size of second element). So we can programmatically determine that A%() was defined as a two dimensional array with 14 and 3 as the counts of each element. Couple that with EVAL to report each one and some complicated loop code to loop through reporting the value of each of the array elements in the smallest amount of code, and here you have something that I'd be impressed if a BASIC compiler could handle.

ABC? Cannot compile this. It reports Bad factor type when it encounters TO DIM ( A() )...

 

Which NEXT matches which FOR?

No example, just look at the clever coding just above again, and try to work out how to determine which NEXT is associated with which FOR given that most of them are conditional!

 

Unusual three-way test

Imagine, it's a pH test. 1 is hard acid, 14 is hard base (alkaline). Acceptable is from 6 to 8, with 7 being pH neutral.
We can report on this using a slightly unusual CASE construct:

REM >pHtest

INPUT "Enter pH (1-14) : ";A%

IF ( (A% < 1) OR (A% > 14) ) THEN ERROR 1, "1 to 14 please!"

CASE TRUE OF
  WHEN A% < 6 : PRINT "Too acidic"
  WHEN A% > 8 : PRINT "Too alkaline"
  OTHERWISE   : PRINT "Acceptable"
ENDCASE
END

Here's a run:

>RUN
Enter pH (1-14) : ?1
Too acidic
>RUN
Enter pH (1-14) : ?7
Acceptable
>RUN
Enter pH (1-14) : ?12
Too alkaline
>

Why? The magic here is in understanding that both parts of CASE (the beginning of the construct and the WHEN lines) accept expressions. It's quite normal to do a CASE like this:

  CASE pollcode% OF
    WHEN  0 : REM Null poll
    ...
    WHEN 17 : REM User Message
    WHEN 18 : REM User Message Recorded
in which case the expression for each WHEN is checked against the entry value (pollcode%) to see if it matches (expression evaluates to TRUE).
So what's going on here is we're setting the construct entry value to TRUE, and then providing expressions in the WHEN part. The first is to see if A is less than six. If it is, the expression is TRUE and this also matches the entry value, so it's the selected response.

ABC? No problems.

 

It's said that CASE is not particularly fast, so an alternative might be an IF construct. However one would wonder if CASE might hold an advantage as the number of checks increase? As for speed, here's the code:

REM >pHtest

PROCtest(2)
PROCtest(7)
PROCtest(12)
END

DEFPROCtest(A%)
  PRINT "Testing evaluation speed with pH value of ";A%
  T% = TIME : REPEAT : UNTIL (T% <> TIME)
  T% = TIME
    FOR l% = 1 TO 1000000
    CASE TRUE OF
      WHEN A% < 6 : REM Acid
      WHEN A% > 8 : REM Base
      OTHERWISE   : REM Okay
    ENDCASE
  NEXT
  PRINT "Case took ";TIME - T%
  :
  T% = TIME : REPEAT : UNTIL (T% <> TIME)
  T% = TIME
    FOR l% = 1 TO 1000000
    IF A% < 6 THEN
      REM "Acid"
    ELSE
      IF A% > 8 THEN
        REM Base
      ELSE
        REM Good
      ENDIF
    ENDIF
  NEXT
  PRINT "Nested IF took ";TIME - T%
ENDPROC

Running that (in the shell, not a TaskWindow, ARMv7 Pi2) shows that the actual speed depends upon the work done:

>RUN
Testing evaluation speed with pH value of 2
Case took 110
Nested IF took 97
Testing evaluation speed with pH value of 7
Case took 137
Nested IF took 154
Testing evaluation speed with pH value of 12
Case took 139
Nested IF took 160
>

ABC took 10cs for every test on a Pi2 in the command line and TaskWindow, with the exception of the first IF test (pH value 2) which took 9cs.

 

Not all tokenised code is alike

If we ignore the language differences and just assume the common subset of BASIC commands, then one thing that you may encounter is that tokenised programs from BBC BASIC for Windows are actually somewhat different to the official (defined as "written by Sophie Wilson") versions of BASIC.

The official programs have lines in the following format:

  <CR>  <LineHi>  <LineLo>  <Length>  <Tokenised code...>
and the end of the program is:
  <CR>  <&FF>

However BBfW (and Z80/x86 versions of BASIC by the same author) use:

  <Length>  <LineLo>  <LineHi>  <Tokenised code...>  <CR>
and the end of the program is:
  <&00>  <&FF>  <&FF>

There are also differences in the tokenisation of extended commands, that is to say anything later than the BBC MOS versions of BASIC. It's worth keeping this in mind just in case...

Why? It would make sense to say it's to do with the "endianness" of the processor (that is to say, whether a 16 bit value has the high bit first or the low bit first), however the 6502, Z80, and 8086 are all little endian machines. Maybe it's simply "what worked best for the system at the time of writing?".

In the comments below, J.G.Harston suggests that the Russell versions of BASIC use NEW to zero the length, and OLD unzeroes it; while the Wilson BASIC sets LineHi to &FF on NEW, and zeroes it on OLD; the side effect being that if the first line number is greater than 256, it is mangled.
This is easy to test, and... yes...

>NEW
>2000 A% = 10
>2001 PRINT A%
>LIST
 2000 A% = 10
 2001 PRINT A%
>NEW
>OLD
>LIST
  208 A% = 10
 2001 PRINT A%
>

 

No short-circuit evaluation

When you're evaluating a clause, many languages support short-circuit evaluation. That means that if the first expression encountered is FALSE, then it won't bother to evaluate the rest, as the whole expression is now known not to be TRUE.

In BASIC, all parts of an expression are evaluated. Which means that code such as this:

IF (var% <> 0) AND (!var% <> 123) THEN PROCdosomething
would have appeared to have worked on older versions of RISC OS; however since both parts of that expression are always executed, what do you think happens if var% is in fact zero? Well, it's a zero page access. Which on recent versions of RISC OS will either throw an entry into the ZeroPain log, or will cause the program to crash with an exception.

While this is not a BASIC bug, exactly, it is worth mentioning because there may be programs out there that expect the behaviour of BASIC, and if a compiler tries to apply a C-like expression evaluation, the results may be subtely wrong.

Why? BASIC isn't C, that's why. :-)
Incidentally, it can be written to short-circuit using this fugly construct:

IF (var% <> 0) THEN IF (!var% <> 123) THEN PROCdosomething

 

Matching If Then Else

(suggested by Rob - thanks for this)

Which ELSE matching which IF THEN?

Consider this:

REM >MatchIfThenElse
REM (suggestion by Rob)

A% = 0
B% = 0
C% = 0

IF A% = 1 THEN IF B% = 1 THEN C% = 5 ELSE C% = 10
PRINT C%

You'd look at that and think that A% is zero, right?

>RUN
        10
>

Why? Because when the IF clause evaluates to FALSE, BASIC will scan along the line looking for an ELSE token. So what BASIC actually sees here is:

IF A% = 1 THEN blah blah blah ELSE C% = 10
It is worth noting that the ELSE is used by both IF clauses. If A% is not 1, then ELSE C% = 10. Also, if A% is 1, but B% is not 1, then ELSE C% = 10.

ABC? ABC gets this wrong and incorrectly reports A% as being zero, and there's no warning output either.
The instruction manual says that one shouldn't use a dangling ELSE, but that looks to pretty much be an excuse to implement logical left-to-right evaluation, which isn't what BASIC does. We can see what ABC is doing by looking at the disassembly of the compiled program:

  LDR   R0, [R12, #20]    ; load value of A%
  EORS  R0, R0, #1        ; check to see if it is '1'
  BNE   reportzero        ; if it is NOT, go to reporting the value
  LDR   R0, [R12, #24]    ; load value of B%
  EORS  R0, R0, #1        ; check to see if it is '1'
  BNE   setCotherwise     ; if it is NOT, go to the OTHER value of C%
  MOV   R0,#5             ; it was, so set the value to 5
  STR   R0, [R12, #28]    ; and write that to C%
  B     report            ; then jump to report it

setCotherwise             ; the other value that C% could be
  MOV   R0, #&0A          ; set value to 10
  STR   R0, [R12, #28]]   ; and write it to C%

report
  MOV   R1, #&0A          ; format specifier?
  MOV   R2, #0            ; ?
  LDR   R0, [R12, #28]    ; pick up value of C%
  BL    print             ; call an ABCLib routine to print the variable
  SWI   XOS_NewLine       ; and end with a newline

 

And...?

More? Certainly. Email me! :-)

 

 

Your comments:

Stuart Painting, 16th June 2019, 00:54
That "fugly construct" - as you called it - appears in approximately 80% of my BASIC programs. It was a speed-up I discovered several years before Acorn existed, never mind the BBC Micro...
Rick, 16th June 2019, 01:25
It's acceptable from that era, there was no such thing as a multi line IF.
Rick, 16th June 2019, 01:28
If you want really horrific code, try those eight bit era books, like the Osborne "write your own game" sort of thing. They had to come up with code that would work on as many machines as possible with the fewest number of changes, and given most BASICs were inferior, it's a maze of GOTO and GOSUB...
J.G.Harston, 16th June 2019, 02:50
Richard Russell has said he used <len> <lo> <hi> so that NEW could zero the <len> and OLD would un-zero it, preserving the line number. Wilson-format BASIC implements NEW by setting <hi> to &FF and OLD sets <hi> to &00, so if the first line of your program is higher than 255 the number is managled down MOD 256.
Gavin Wraith, 17th June 2019, 21:05
Another weirdness is pointless inconsistency of syntax. Thus A$=GET$ but not GET$ A$ as opposed to READ A$ but not A$=READ. Just another wrinkle to make life harder for the programmer. Maybe Dijkstra was right!
Rick, 17th June 2019, 22:35
That's perhaps because GET$ assigns a string value from that which is entered at the keyboard (or from a file if GET$#), while READ loads data from a DATA statement. They may sound like the same sort of thing, until you realise that you can READ A$, B%, C$. 
 
The standard C library, incidentally, is a bit messy as to whether the file pointer is the first or last parameter. There's probably some logic, but fifty odd years later, it's just an additional annoyance.
David Pilling, 18th June 2019, 22:59
Does tokenisation count as compilation. I wonder why Sophie used eval to print out those arrays. Rather than a recursive function. I wrote simple BBC Basic compilers (BBC Micro) and it was usually not worth the effort because all the time went in things like float routines which went the same speed regardless. To put it another way interpretation time was small compared to calculation time. Would have been interesting to see what could be done with modern compilation techniques. Modern languages also have things that are not easy to compile. I regret that people didn't just leave Basic as a simple language for beginners.
David Pilling, 19th June 2019, 13:24
ABC seems to offer a big speed up for the case statement - wonder if it is just optimising - since the code does nothing, not executing it. The days when one could have an empty FOR loop as a test of speed are gone. As to using EVAL, presumably the code has to deal with A(1), A(1,2), A(1,2,3) and that is why one uses EVAL. Although there are still special cases in the code like IF DIM( A() ) > 2. Maybe code can be improved, but yep, stuck with EVAL for the general case, or a lot of typing out each case. 
C was always advertised as efficient because one can only have constants in a switch (case) statement.
Gavin Wraith, 19th June 2019, 13:55
In assembler there are many circumstances where a jump table is more efficient than a long cascade of binary switches. Unfortunately computed GOSUB is all that BASIC offers in this direction. Faced with literal, rather than computed, comparison values in a case statement, a good C compiler can calculate an optimal perfect hash. Does ABC do this?
Steve Drain, 19th June 2019, 20:10
Do not forget ON PROC is a sort of jump table.

Add a comment (v0.11) [help?]
Your name:

 
Your email (optional):

 
Validation:
Please type 62382 backwards.

 
Your comment:

 

Navi: Previous entry Display calendar Next entry
Switch to desktop version

Search:

See the rest of HeyRick :-)