title: "Turbo-Basic XL and Atari BASIC parser tool" author: https://github.com/dmsc/tbxl-parser book: true classoption: [oneside] titlepage: true, titlepage-text-color: "FFFFFF" titlepage-rule-color: "FFFFFF" titlepage-rule-height: 0 titlepage-background: "expr.pdf" ...
This program parses and tokenizes a Turbo-Basic XL or Atari BASIC listing in a flexible format and produces any of three outputs:
-
A tokenized binary file, directly loadable in the original Turbo-Basic XL (or Atari BASIC if the
-A
option is given) interpreter. This mode also replaces variables with single letters by default, but with the-f
option writes the full variable names and with the-x
option writes empty variable names, making the program unable to be listed or edited.This is the default operating mode, and also can be forced with the
-b
command line switch. -
A minimized listing, replacing variable names with single letters, using abbreviations, removing spaces and using Atari end of lines.
This mode is selected with the
-s
command line switch. Adding the-f
option keeps the names of variables with 2 or less characters. -
A pretty printed expanded listing, with one statement per line and indentation, and standard ASCII line endings.
Note that this format can be read back again, but some statements are transformed in the process, this can lead to problems in non-standard
IF
/THEN
constructs.Currently,
IF
/THEN
with statements after theTHEN
are converted to multi-lineIF
/ENDIF
statements.This mode is selected with the
-l
command line switch.
The following is an example of a simple program in free form:
' Example program
' One statement per line:
print "Hello All"
print "---------"
print "This is a heart: \00"
' Also, multiple statements per line:
for counter = 0 to 10 : ? "Iter: "; counter : next counter
' Line numbers
30
' And abbreviations:
g. 30
To generate a tokenized BAS file, loadable by Turbo-Basic XL, simply type:
basicParser samples/sample-1.txt
This will generate a sample-1.bas
file in the same folder.
If on the other hand you want a minimized listing file in ATASCII format (suitable
for ENTER
into Atari BASIC, type:
basicParser -l -A samples/sample-1.txt
This will generate a sample-1.lst
file in the same folder.
There are more sample programs, located in the samples
folder that illustrate
the free-form input format.
The parser accepts standard listings for Atari BASIC or Trubo-Basic XL programs, with Atari or ASCII end of lines.
All the standard abbreviations available in the original interpreters are also accepted.
As with Turbo-Basic XL, the input is case insensitive (uppercase, lowercase and mixed case is supported).
You can omit line numbers, only lines that are target to GOTO
/ GOSUB
/
THEN
needs them. If you use only labels, no line numbers are needed.
Also, line numbers can appear alone in a line, for better readability.
Comments can be started by '
in addition to the Turbo-Basic XL .
, --
or rem
. In short listing an tokenized output formats all comments are
removed unless the -k
option is given.
All comment types are supported in Atari BASIC mode.
Inside strings, special characters can be specified by using a backslash
followed by an hexadecimal number in upper-case, (i.e., "\00\A0"
produces a
string with a "heart" and an inverse space "♥█"), this allows editing special
characters on any standard editor.
Note that to force a backslash before a valid hex number, you can use two
backslashes (i.e., "123\\456"
produces 123\456
).
There is support for extended strings, with embedded character names.
Extended strings start with with ["
and ends with "]
, and can contain:
-
Special characters with
{name}
or{count*name}
, with count a decimal number and name from the list:heart
,rbranch
,rline
,tlcorner
,lbranch
,blcorner
,udiag
,ddiag
,rtriangle
,brblock
,ltriangle
,trblock
,tlblock
,tline
,bline
,blblock
,clubs
,brcorner
,hline
,cross
,ball
,bbar
,lline
,bbranch
,tbranch
,lbar
,trcorner
,esc
,up
,down
,left
,right
,diamond
,spade
,vline
,clr
,del
,ins
,tbar
,rbar
,eol
,bell
. -
Inverse video characters surrounded by
~
. -
Multiple lines, you can terminate the string in a different line than the start. Note that this will embed end-of-line characters in the string, so it will only work in tokenized output, not short-listing output.
Arguments follow the PROC
label after a comma, and local variables follow
after a semicolon:
D = 3
EXEC Testing, D+5, "Hello"
PRINT D
PROC Testing, A, B$(10); D
D = A + 1
PRINT D; " and "; B$
ENDPROC
As the example shows, string variables must include the dimensioned length,
as the parser adds a DIM
at the start of the program to initialize. The
dimensioned length must be an integer, a $define
or a %
number.
Also, setting the value of variable "D" inside the procedure does not alter the value of the variable "D" outside the procedure.
The parser transform this construct by creating new variables that hold the parameters and local variables, so the resulting procedures don't support recursion.
Some of the extra statements from Turbo-Basic XL are supported even in Atari BASIC output mode, those are converted to equivalent forms:
-
Multi-line
IF
/ENDIF
statements are converted toIF
/THEN
. -
The
%0
to%3
tokens are converted to the numbers 0 to 3. -
PUT
without I/O channel is converted toPUT #16
. This relies on a bug in Atari BASIC that makes I/O channel 16 equal to 0. -
String constants are converted to decimal constants.
There are parsing directives added, that consist on lines starting with a
dollar sign $
. A list of available directives is documented bellow.
basicParser [options] [-o output] filenames
Options:
-
-n nun
Sets the maximum line length before splitting lines tonum
. Note that if a single statement is longer than this, the line is output anyway. The default is 120 characters (the standard Atari Editor limit) -
-l
Output long (readable) listing, suitable for editing, with standard end of lines and lowercase statements. -
-s
Output a short, minimized listing, with ATASCII end of lines. The default output file name is the same as input with.lst
extension added. -
-b
Output a binary tokenized file instead of a listing. The default output file name is the same as input with.bas
extension added. Note that this is the default behaviour. -
-A
Accept (and produce) standard Atari BASIC language, without the extended statements and syntax. Note that some of the optimizations are specific to Turbo-Basic XL and won't run in this mode. -
-x
In binary output mode, writes null variable names, making the program unlistable. This options does nothing on listing output. -
-f
In binary output mode, writes the full variable names, this eases debugging the program. In short listing mode, keeps the names of variables with less than two characters, renaming all longer or invalid names. -
-k
In binary output mode, keeps comments in the output. Note that only standard comments are included, not new style ('
) comments. -
-a
In long output, replace Atari characters in comments with approximating characters. -
-v
Shows more parsing information, like name of renamed variables. (verbose mode) -
-q
Don't show any parsing output, only errors. (quiet mode) -
-o
Sets the output file name. By default, the output is the name of the input with.lst
(listing) or.bas
(tokenized) extension. If the given name starts with a dot, use as output file name extension. -
-c
Output to standard output instead of a file. -
-O
Enables parser optimizations to produce smaller or faster code. Without and argument enables all optimizations, an argument can be given similar to theoptimize
directive in the code, see bellow for the possible options. The option can be specified multiple times, an example for producing short listings is-O -O -convert_percent -O -const_replace
-
-h
Shows help and exit.
Directives add extra features to the parser, much like C and C++. Directives start with a dollar as the first non blank character on a line, and continue up to the end of the line.
Bellow is a description of available directives.
The options directive alter the way the parsing is done, accepting a list of comma separated options, valid for the current file. Valid options:
mode=compatible
: Disable features to be more compatible with the Turbo-Basic XL parser.mode=extended
: Makes the parser to accept more extended features.mode=default
: Returns the parser to the default mode.optimize
or+optimize
: Allows the parser to optimize the output to produce smaller or faster code.-optimize
: Disable the optimizations.optimize=+
suboption: Enable the particular optimization option.optimize=-
suboption: Disable the particular optimization option.
The optimization sub-options are:
-
const_folding
: Replace operations on constants with the result. -
convert_percent
: Replace small integers with the%*
equivalent, this is only available in Turbo-Basic XL mode. -
commute
: Swap arguments to binary operations to minimize runtime. -
line_numbers
: Remove all BASIC line numbers that are unused. -
const_replace
: Replace repeated constant values (numeric or string) with a variable initialized to the value. The initialization code is added before any statement in the program, and tries to use the minimum number of bytes posible. -
fixed_vars
: This is the complement of theconst_replace
option, tries to identify variables with a fixed value in the whole program and removes the variable. Use this optimization when converting original basic listings, as reversing the constant replacing gives a simpler listing and allows to apply further optimizations. Note that currently this option can produce bad results, as it does not follows the program flow and can't detect if a variable is used before the first assignment, so it is not enabled by default. You need to check each removed variable, as printed in the output and in the comments in the resulting program. -
then_goto
: SearchesIF
statements withTHEN GOTO
and removes theGOTO
statement, replacing with the line number alone. Note: If the line number is not a constant, the resulting program will be executed and listed correctly by both Atari BASIC and Turbo-Basic XL, but can't be entered because of an original parser limitation. Therefore, this conversion is only done for constant values when the output is a short listing.Example:
IF X THEN GOTO 100
becomesIF X THEN 100
-
if_goto
: Performs the same optimization asthen_goto
, but also replaces instances of multi-lineIF
statements containing aGOTO
withTHEN
and the target line number.This optimization is not enabled by default because it can produce larger code by forcing a newline in the file.
Example:
IF X GOTO 100 ENDIF
becomes
IF X THEN 100
Note that options can be changed at any place in the file, this is an example of changing the parser mode in the middle of the file:
' Example program using directives
$ options optimize, mode=default
error1 = 2
? error1 : ' This is parsed like Turbo-Basic XL, as ? ERR OR 1
$options mode = extended
? error1 : ' This is parsed as ? error1
Printa : ' This is a parsing error.
A good optimization mode for producing short listings is:
$options +optimize, optimize=-convert_percent-const_replace
The above line instructs the parser to avoid converting numbers to %
values
and the replacement of constants, producing a smaller listing. Note that
replacement of constants can be beneficial, so try enabling the optimization
and running with "-v" option to see what variables are good candidates for
replacement.
This directive defines new symbols that are replaced at parsing time with the values, like C macros.
Replacement names are prefixed by @
to differentiate from variables, and
as variables, string defines end in $
, the syntax of the directive is:
$define
defineName =
value
Keep in mind that as the value is replaced each time the variable is used, it is probably best to assign them to a variable instead if the value will be used multiple times, and you should enable optimizations so that the usage is simplified at parsing time.
This is an example usage of the $define
directive:
' Example usage of defines
$options +optimize
$define Message$ = "Hello world!"
$define PCOLR0 = $2C0
print @Message$ : ' Replaced by: ? "Hello world!"
print len(@Message$) : ' Replaced by: ? 12
poke @PCOLR0+2, $1F : ' Replaced by: POKE 706,31
This directive allows including data from a binary file to a new string definition. The content of the file is read at parsing time and the full content is stored in the define. The syntax of the directive is:
$incbin
defineName$ , "
fileName"
[ , offset [, length ] ]
The optional offset parameter specifies a starting offset in bytes for the included data, and the optional length parameter specifies the number of bytes to read. If length is not given, the file read completely.
This is an example usage of the $incbin
directive:
$options +optimize
$incbin asmBin$, "myasm.bin"
asmRut = adr( @asmBin$ ) : ' Store address in variable to use multiple times.
? usr(asmRut, 1, 2) : ' Call routine. Should be relocatable and less than 242 bytes.
This directive allows including data from a binary file to a DATA
BASIC
statement. The content of the file is read at parsing time and the full content
is stored as is. The syntax of the directive is:
$incdata
"
fileName"
[ , offset [, length ] ]
The optional offset parameter specifies a starting offset in bytes for the included data, and the optional length parameter specifies the number of bytes to read. If length is not given, the file read completely.
Note that you can use this directive to store arbitrary bytes inside the
statement, but BASIC parses the actual data at READ
time.
There are some incompatibilities in the way the source is interpreted with the standard Turbo-Basic XL and Atari BASIC parsers:
-
The ASCII LF character (hexadecimal $10) is interpreted as end of line in addition to the ATASCI EOL (hexadecimal $9B). This means that in
DATA
statements and comments the LF character is not accepted. -
The parsing of special characters inside strings means that a valid hexadecimal sequence (
\**
, with*
an hexadecimal number in uppercase) or two backslashes are interpreted differently. -
Extra statements after an
IF
/THEN
/LineNumber
are converted to a comment, with the exception ofDATA
statements. In the original, those statements are never executed, so this is not a problem with proper code. -
Any string is accepted as a variable name, even if it is already an statement, function name or operator.
The following code is valid:
PRINTED = 0 : ' Invalid in Atari BASIC, as starts with "PRINT" DONE = 3 : ' Invalid in Turbo-Basic XL, as starts with "DO"
This relaxed handling of variable naming creates an incompatibility, as the first example above is parsed differently as the standard Atari BASIC, where it means "
PRINT (ED = 0)
" instead of "LET PRINTED = 0
".Note that currently, even full statements are accepted as variable names, but avoid using them as they could produce hard to understand errors.
-
In long format listing output,
IF
/THEN
are converted toIF
/ENDIF
statements. This introduces an incompatibility with the following code:FOR A = 0 TO 2 ? "A="; A; " - "; IF A <> 0 ? "1"; IF A = 1 THEN ELSE ? "2"; ENDIF ? " -" NEXT A
This code should produce the following at output:
A=0 - 2 - A=1 - 1 - A=2 - 12 -
After conversion, the
ELSE
is associated with the secondIF
instead of the first, giving the wrong result. -
Parsing of
TIME$=
statement allows a space betweenTIME$
and the equals sign, but in Turbo-Basic XL this gives an error.
To compile from source, you need gawk
and peg
, both are available in any
recent Debian or Ubuntu Linux distro, install with:
apt-get install gawk peg
To compile, simply type make
in the sources folder, a folder build
will be
created with the executable program inside.