[ << ] | [ >> ] | [] | [] | [] | [ ? ] |
This chapter is under construction!
This chapter describes some of the internals of vasm
and tries to explain
what has to be done to write a cpu module, a syntax module
or an output module for vasm
.
However, if someone wants to write one, I suggest to contact me first,
so that it can be integrated into the source tree.
Note that this documentation may mention explicit values when introducing symbolic constants. This is due to copying and pasting from the source code. These values may not be up to date and in some cases can be overridden. Therefore do never use the absolute values but rather the symbolic representations.
This section deals with the steps necessary to build the typical
vasm
executable from the sources.
The vasm-directory contains the following important files and directories:
The main directory containing the assembler sources.
The Makefile used to build vasm
.
Directories for the syntax modules.
Directories for the cpu modules.
Directory the object files will be stored in.
All compiling is done from the main directory and
the executables will be placed there as well.
The main assembler for a combination of <cpu>
and
<syntax>
will be called vasm<cpu>_<syntax>
.
All output modules are usually integrated in every executable
and can be selected at runtime. Otherwise you have to adapt
the OUTFMTS
definition in ‘make.rules’ and select
those you want.
Before building anything you have to insert correct values for your compiler and operating system in the ‘Makefile’.
TARGET
Here you may define an extension which is appended to the executable’s name. Useful, if you build various targets in the same directory.
TARGETEXTENSION
Defines the file name extension for executable files. Not needed for most operating systems. For Windows it would be ‘.exe’.
CC
Here you have to insert a command that invokes an ANSI C
compiler you want to use to build vasm. It must support
the ‘-I’ option in the same way like e.g. vc
or
gcc
.
COPTS
Here you will usually define an option like ‘-c’ to instruct the compiler to generate an object file. Additional options, like the optimization level, should also be inserted here as well. Specifying the host OS helps to determine work-directories for DWARF and defines the appropriate internal symbol for the host’s file system path style. The following are supported:
-DAMIGA
AmigaOS (M68k or PPC), MorphOS, AROS.
Defines the internal symbol __AMIGAFS
.
-DATARI
Atari TOS.
Defines the internal symbol __MSDOSFS
.
-DMSDOS
CP/M, MS-DOS, Windows.
Defines the internal symbol __MSDOSFS
.
-DUNIX
All kinds of Unix (Linux, BSD) including MacOSX and Atari-MiNT.
Defines the internal symbol __UNIXFS
.
-D_WIN32
Windows.
Defines the internal symbol __MSDOSFS
.
Building without specifying a host-OS is allowed. Then vasm defaults to Unix-style path handling and will not define a file system symbol for conditional assembly. Other options:
-DLOWMEM
Builds for a host-OS with a low amount of memory. This will basically reduce all hash tables to minimal size.
CCOUT
Here you define the option which is used to specify the name of an output file, which is usually ‘-o’.
LD
Here you insert a command which starts the linker. This may be the
the same as under CC
.
LDFLAGS
Here you have to add options which are necessary for linking. E.g. some compilers need special libraries for floating-point.
LDOUT
Here you define the option which is used by the linker to specify the output file name.
RM
Specify a command to delete a file, e.g. rm -f
.
An example for the Amiga using vbcc
would be:
TARGET = _os3 TARGETEXTENSION = CC = vc +aos68k CCOUT = -o COPTS = -c -c99 -cpu=68020 -DAMIGA -O1 LD = $(CC) LDOUT = $(CCOUT) LDFLAGS = -lmieee RM = delete force quiet
An example for a typical Unix-installation would be:
TARGET = TARGETEXTENSION = CC = gcc CCOUT = -o COPTS = -c -O2 LD = $(CC) LDOUT = $(CCOUT) LDFLAGS = -lm RM = rm -f
Open/Net/Free/Any BSD systems will probably require an additional
‘-D_ANSI_SOURCE’ in COPTS
.
Note to users of BSD systems: You will probably have to use GNU make instead of BSD make, i.e. in the following examples replace "make" with "gmake".
Type:
make CPU=<cpu> SYNTAX=<syntax>
For example:
make CPU=ppc SYNTAX=std
The following CPU modules can be selected:
CPU=6502
CPU=6800
CPU=6809
CPU=arm
CPU=c16x
CPU=hans
CPU=jagrisc
CPU=m68k
CPU=pdp11
CPU=ppc
CPU=qnice
CPU=test
CPU=tr3200
CPU=unsp
CPU=vidcore
CPU=x86
CPU=z80
The following syntax modules can be selected:
SYNTAX=std
SYNTAX=mot
SYNTAX=madmac
SYNTAX=oldstyle
SYNTAX=test
For Windows and various Amiga targets there are already Makefiles included,
which you may either copy on top of the default ‘Makefile’, or call
it explicitly with make
’s ‘-f’ option:
make -f Makefile.OS4 CPU=ppc SYNTAX=std
Important global variables, which may be read or modified by syntax-, cpu- or output-modules.
source *cur_src;
Pointer to the current source text instance (see structures below).
char *defsectname;
Name of a default section which vasm creates when a label or code occurs
in the source without any preceding section
or org
directive.
Assigning NULL means that the default is an absolute section and its
base address is taken from defsectorg
.
These defsect...
variables can be overridden by syntax- or
output-modules.
taddr defsectorg;
Used when defsectname==NULL
. Defines the base address of a default
absolute org
section.
char *defsecttype;
Attributes of the default section (see above). May be NULL to indicate that no default has been defined and vasm will show an error.
char emptystr[];
An empty string (zero length).
int exec_out;
Non-zero, when the output file is an executable and not an object file.
char *filename;
Defaults to the file part of the input source file name. Syntax modules
may modify it with setfilename(char *)
and output modules may
use it for their own purpose.
char *inname;
Input source file name.
int octetsperbyte;
Number of 8-bit bytes used to represent a backend’s target-byte. The macro
OCTETS(n)
may be used to calculate the number of 8-bit bytes for
n
target-bytes.
char *outname;
Output object file name.
int output_bitsperbyte;
May be assigned by an output module with 1
during its init function,
to indicate that it supports target-bytes with BITSPERBYTE
.
Otherwise an output module is expected to support 8-bit bytes only.
This section describes the fundamental data structures used in vasm which are usually necessary to understand for writing any kind of module (cpu, syntax or output). More detailed information is given in the respective sections on writing specific modules where necessary.
A source structure represents a source text module, which can be either the main source text, an included file, a macro or a repetition. There is always a link to the parent source from where the current source context was included or called.
struct source *parent;
Pointer to the parent source context. Assembly continues there when the current source context ends.
int parent_line;
Line number in the parent source context, from where we were called. This information is needed, because line numbers are only reliable during parsing and later from the atoms. But an include directive doesn’t create an atom.
struct source_file *srcfile;
The source_file
structure has the unique file name, index
and text-pointer for this source text instance.
Used for debugging output, like DWARF.
char *name;
File name of the main source or include file, or macro name.
char *text;
Pointer to the source text start.
size_t size;
Size of the source text to assemble in bytes.
struct source *defsrc;
This is a NULL
-pointer for real source text files. Otherwise
it is a reference to the source which defines the current macro
or repetition.
int defline;
Valid when defsrc
is not NULL
. Contains the starting
line number of a macro or repetition in a source text file.
macro *macro;
Pointer to macro structure, when currently inside a macro
(see also num_params
).
unsigned long repeat;
Number of repetitions of this source text. Usually this is 1, but
for text blocks between a rept
and endr
(or similar)
directive it allows any number of repetitions, which is decremented
every time the end of the source text block is reached.
char *irpname;
Name of the iterator symbol in special repeat loops, which use a
sequence of arbitrary values, being assigned to this symbol within
the loop. Example: irp
directive in std-syntax.
struct macarg *irpvals;
A list of arbitrary values to iterate over in a loop. With each iteration the frontmost value is removed from the list until it is empty.
int cond_level;
Current level of conditional nesting while entering this source
text. It is automatically restored to the previous level when
leaving the source prematurely through end_source()
.
struct macarg *argnames;
The current list of named macro arguments.
int num_params;
Number of macro parameters passed at the invocation point from the parent source. For normal source files this entry will be -1. For macros 0 (no parameters) or higher.
char *param[MAXMACPARAMS];
Pointers to the macro arguments.
int param_len[MAXMACPARAMS];
Number of characters per macro argument.
int num_quals;
(If MAX_QUALIFIERS!=0
.) Number of qualifiers for a macro.
when not passed on invocation these are the default qualifiers.
char *qual[MAX_QUALIFIERS];
(If MAX_QUALIFIERS!=0
.) Pointer to macro qualifiers.
int qual_len[MAX_QUALIFIERS];
(If MAX_QUALIFIERS!=0
.) Number of characters per macro qualifier.
unsigned long id;
Every source has its unique id. Useful for macros supporting
the special \@
argument for creating unique labels.
char *srcptr;
The current source text pointer, pointing to the beginning of the next line to assemble.
int line;
Line number in the current source context. After parsing, the line number of the current atom is stored here.
size_t bufsize;
Current size of the line buffer (linebuf
). The size of the
line buffer is extended automatically, when an overflow happens.
char *linebuf;
A buffer for the current line being assembled in this source text. A child-source, like a macro, can refer to arguments from this buffer, so every source has got its own. When returning to the parent source, the linebuf is deallocated to save memory.
expr *cargexp;
(If CARGSYM
was defined.) Pointer to the current expression
assigned to the CARG-symbol (used to select a macro argument) in
this source instance. So it can be restored when reentering this
instance.
long reptn;
(If REPTNSYM
was defined.) Current value of the repetition
counter symbol in this source instance. So it can be restored when
reentering this instance.
One of the top level structures is a linked list of sections describing
continuous blocks of memory. A section is specified by an object of
type section
with the following members that can be accessed by
the modules:
struct section *next;
A pointer to the next section in the list.
char *name;
The name of the section.
char *attr;
A string describing the section flags in ELF notation (see,
for example, documentation of the .section
directive in
the standard syntax module).
atom *first;
atom *last;
Pointers to the first and last atom of the section. See following sections for information on atoms.
taddr align;
Alignment of the section in target-bytes.
uint32_t flags;
Flags of the section. Currently available flags are:
HAS_SYMBOLS
At least one symbol is defined in this section.
RESOLVE_WARN
The current atom changed its size multiple times, so atom_size()
is now called with this flag set in its section to make the
backend (e.g. instruction_size()
) aware of it and do less
aggressive optimizations.
UNALLOCATED
Section is unallocated, which means it doesn’t use any memory space
in the output file. Such a section will be removed before creating
the output file and all its labels converted into absolute expression
symbols. Used for "offset" sections. Refer to
switch_offset_section()
.
LABELS_ARE_LOCAL
As long as this flag is set new labels in a section are defined as local labels, with the section name as global parent label.
ABSOLUTE
Section is loaded at an absolute address in memory.
PREVABS
Remembers state of the ABSOLUTE
flag before entering
relocated-org
mode (IN_RORG
). So it can be restored later.
IN_RORG
Section has entered relocated-org mode, which also sets the
ABSOLUTE
flag. In this mode code is written into the current
section, but relocated to an absolute address. No relocation
information are generated.
NEAR_ADDRESSING
Section is marked as suitable for cpu-specific "near" addressing modes. For example, base-register relative or zero/direct-page. The cpu backend can use this information as an optimization hint when referencing symbols from this section.
FAR_ADDRESSING
Section requires cpu-specific "far" addressing modes. For example an addressing mode including the bank or "segment". The cpu backend may use this information to select appropriate addressing modes when referencing symbols from this section.
taddr org;
Start address of a section. Usually zero.
Set to an absolute start address in ABSOLUTE
mode.
taddr pc;
Current address in this section. Can be used
while traversing through the section. Has to be updated by a
module using it. Is set to org
at the beginning.
unsigned long idx;
A member usable by the output module for private purposes.
Symbols are represented by a linked list of type symbol
with the
following members that can be accessed by the modules:.
int type;
Type of the symbol. Available are:
#define LABSYM 1
The symbol is a label defined at a specific location.
#define IMPORT 2
The symbol is externally defined and its value is unknown.
#define EXPRESSION 3
The symbol is defined using an expression (equate).
uint32_t flags;
Flags of this symbol. Available are:
#define TYPE_UNKNOWN 0
The symbol has no type information.
#define TYPE_OBJECT 1
The symbol defines an object.
#define TYPE_FUNCTION 2
The symbol defines a function.
#define TYPE_SECTION 3
The symbol defines a section.
#define TYPE_FILE 4
The symbol defines a file.
#define EXPORT (1<<3)
The symbol is exported to other object files.
#define INEVAL (1<<4)
Used internally.
#define COMMON (1<<5)
The symbol is a common symbol and also has a size. It will be allocated by the linker.
#define WEAK (1<<6)
The symbol is weak, which means the linker may overwrite it with any global definition of the same name. Weak symbols may also stay undefined, in which case the linker would assign them a value of zero.
#define LOCAL (1<<7)
Only informational. A symbol can be explicitly declared as local
by a syntax-module directive. Otherwise all symbols without the
EXPORT
flag are not considered for object linking.
#define VASMINTERN (1<<8)
Vasm-internal symbol, which must not be exported into an object file.
#define PROTECTED (1<<9)
Used internally to protect the current-PC symbol from deletion.
#define REFERENCED (1<<10)
Symbol was referenced in the source and a relocation entry has been created.
#define ABSLABEL (1<<11)
Label was defined inside an absolute section, or during a code block in relocated-org mode. Therefore it has an absolute address and will not generate a relocation entry when being referenced.
#define EQUATE (1<<12)
Symbols flagged as EQUATE
are constant expressions and their
value must not be changed.
#define REGLIST (1<<13)
Symbol is a register list definition.
#define USED (1<<14)
Symbol appeared in an expression. Symbols which were only defined, (as label or equate) and otherwise never appear throughout the whole source, don’t get this flag set.
#define NEAR (1<<15)
Symbol may be referenced by "near" addressing mode. For example, base register relative. Used as an optimization hint to the cpu backend.
#define XDEF (1<<16)
This symbol must become defined in the source. Which means
its type
must not remain IMPORT
. Otherwise a
warning is displayed.
#define XREF (1<<17)
Symbol is externally defined and its type
must never
become something else than IMPORT
. Otherwise an error
is displayed.
#define RSRVD_S (1L<<24)
The range from bit 24 to 27 (counted from the LSB) is reserved for use by the syntax module.
#define RSRVD_O (1L<<28)
The range from bit 28 to 31 (counted from the LSB) is reserved for use by the output module.
The type-flags can be extracted using the TYPE()
macro which
expects a pointer to a symbol as argument.
char *name;
The name of the symbol.
expr *expr;
The expression in case of EXPRESSION
symbols.
expr *size;
The size of the symbol in target-bytes, if specified. Common symbols always have a size.
section *sec;
The section a LABSYM
symbol is defined in.
taddr pc;
The address of a LABSYM
symbol.
taddr align;
The alignment of the symbol in target-bytes.
unsigned long idx;
A member usable by the output module for private purposes.
Optional register symbols are available when the backend defines
HAVE_REGSYMS
in ‘cpu.h’ together with the hash table size.
Example:
#define HAVE_REGSYMS #define REGSYMHTSIZE 256
A register symbol is defined by an object of type regsym
with the following members that can be accessed by the modules:
char *reg_name;
Symbol name.
int reg_type;
Optional type of register.
unsigned int reg_flags;
Optional register symbol flags.
unsigned int reg_num;
Register number or value.
Refer to ‘symbol.h’ for functions to create and find register symbols.
The contents of each section are a linked list built out of non-separable atoms. The general structure of an atom is:
struct atom { struct atom *next; int type; taddr align; taddr lastsize; unsigned changes; source *src; int line; listing *list; union { instruction *inst; dblock *db; symbol *label; sblock *sb; defblock *defb; void *opts; int srcline; char *ptext; printexpr *pexpr; expr *roffs; taddr *rorg; assertion *assert; aoutnlist *nlist; } content; };
The members have the following meaning:
struct atom *next;
Pointer to the following atom (NULL
if last).
int type;
The type of the atom. Can be one of
#define VASMDEBUG 0
Used for internal debugging.
#define LABEL 1
A label is defined here.
#define DATA 2
A fixed number of target-bytes with constant data are put here.
#define INSTRUCTION 3
Generally refers to a machine instruction or pseudo/opcode. These atoms
can change their size during optimization passes and will be translated to
DATA
-atoms later.
#define SPACE 4
Defines a block of data filled with one value of a given size (up to
MAXPADSIZE
8-bit bytes). BSS sections usually contain only such atoms,
but they are also sometimes useful as shorter versions of
DATA
-atoms in other sections.
#define DATADEF 5
Defines data of fixed size which can contain cpu specific operands and
expressions. Usually generated by data in a source text, which are no
machine instructions. Will be translated to DATA
-atoms later.
#define LINE 6
A source text line number (usually from a higher level language) is bound to the atom’s address. Useful for source level debugging in certain ABIs.
#define OPTS 7
A means to change assembler options at a specific source text line.
For example optimization settings, or the cpu type to generate code for.
The cpu backend has to define HAVE_CPU_OPTS
and export the required
functions if it wants to use this type of atom.
#define PRINTTEXT 8
A string is printed to stdout during the final assembler pass. A newline is automatically appended.
#define PRINTEXPR 9
Prints the value of an expression during the final assembler pass to stdout.
#define ROFFS 10
Set the program counter to an address relative to the section’s start
address. These atoms will be translated into SPACE
atoms in the
final pass.
#define RORG 11
Assemble this block under the given base address, while the code is still written into the original memory region.
#define RORGEND 12
Ends a RORG block and returns to the original addressing.
#define ASSERT 13
The assertion expression is checked in the final pass and an error message is generated (using the expression string and an optional message out of this atom) when it evaluates to 0.
#define NLIST 14
Defines a stab-entry for the a.out object file format. nlist-style stabs can also occur embedded in other object file formats, like ELF.
taddr align;
The alignment of this atom. Address must be dividable by align
.
taddr lastsize;
The size of this atom in the last resolver pass. When the size has changed in the current pass, the assembler will request another resolver run through the section.
unsigned changes;
Number of changes in the size of this atom since pass number
FASTOPTPHASE
. An increasing number usually indicates a problem in
the cpu backend’s optimizer and will be flagged by setting
RESOLVE_WARN
in the Section flags, as soon as changes
exceeds
MAXSIZECHANGES
. So the backend can choose not to optimize this atom
as aggressive as before.
source *src;
Pointer to the source text object to which this atom belongs.
int line;
The source line number that created this atom.
listing *list;
Pointer to the listing file object to which this atom belongs.
instruction *inst;
(In union content
.) Pointer to an instruction structure in the case
of an INSTRUCTION
-atom. Contains the following elements:
int code;
The cpu specific code of this instruction.
char *qualifiers[MAX_QUALIFIERS];
(If MAX_QUALIFIERS!=0
.) Pointer to the qualifiers of this instruction.
operand *op[MAX_OPERANDS];
(If MAX_OPERANDS!=0
.) The cpu-specific operands of this instruction.
instruction_ext ext;
(If the cpu backend defines HAVE_INSTRUCTION_EXTENSION
.)
A cpu-specific structure. Typically used to store appropriate
opcodes, allowed addressing modes, supported cpu derivates etc.
dblock *db;
(In union content
.) Pointer to a dblock structure in the case
of a DATA
-atom. Contains the following elements:
taddr size;
The number of target-bytes stored in this atom.
uint8_t *data;
A pointer to the constant data. Consider using writebyte()
or
setval()
to write target-bytes which are different from 8 bits.
The internal ordering of target-bytes on an 8-bit host is big-endian,
and a target-byte will always allocate the next minimal number of
8-bit host-bytes. The macro OCTETS(n)
may be used to calculate
the number of host-bytes required to represent n
target-bytes.
rlist *relocs;
A pointer to relocation information for the data.
symbol *label;
(In union content
.) Pointer to a symbol structure in the case
of a LABEL
-atom.
sblock *sb;
(In union content
.) Pointer to a sblock structure in the case
of a SPACE
-atom. Contains the following elements:
size_t space;
The number of space-elements (see below) to generate here.
expr *space_exp;
The above size as an expression, which will be evaluated during assembly
and copied to space
in the final pass.
size_t size;
The size of each space-element and of the fill-pattern in target-bytes.
uint8_t fill[MAXPADSIZE];
The fill pattern, up to MAXPADSIZE 8-bit bytes.
expr *fill_exp;
Optional. Evaluated and copied to fill
in the final pass, when not null.
rlist *relocs;
A pointer to relocation information for the space.
taddr maxalignbytes;
An optional number of maximum padding bytes to fulfil the atom’s alignment requirement. Zero means there is no restriction.
uint32_t flags;
SPC_UNINITIALIZED
This space is completely uninitialized. May be used as a hint by output modules.
SPC_DATABSS
The output module should not allocate any file space for this atom, when possible (example: DataBss section, as supported by the "hunkexe" output file format). It is not needed to set this flag when the output section is BSS.
defblock *defb;
(In union content
.) Pointer to a defblock structure in the case
of a DATADEF
-atom. Contains the following elements:
taddr bitsize;
The size of the definition in bits.
operand *op;
Pointer to a cpu-specific operand structure.
void *opts;
(In union content
.) Points to a cpu-backend specific options object
in the case of a OPTS
-atom.
int srcline;
(In union content
.) Line number for source level debugging in the
case of a LINE
-atom.
char *ptext;
(In union content
.) A string to print to stdout in case of a
PRINTTEXT
-atom.
printexpr *pexpr;
(In union content
.) Pointer to a printexpr structure in the case of
a PRINTEXPR
-atom. Contains the following elements:
expr *print_exp;
Pointer to an expression to evaluate and print.
short type;
Format type of the printed value. We can print as hexadecimal
(PEXP_HEX
), signed decimal (PEXP_SDEC
),
unsigned decimal (PEXP_UDEC
), binary (PEXP_BIN
) OR
ASCII (PEXP_ASC
).
short size;
Size (precision) of the printed value in bits. Excessive bits will be masked out, and sign-extended when requested.
expr *roffs;
(In union content
.) The expression holds the relative section offset
to align to in case of a ROFFS
-atom.
taddr *rorg;
(In union content
.) Assemble the code under the base address in
rorg
in case of a RORG
-atom.
assertion *assert;
(In union content
.) Pointer to an assertion structure in the case of
an ASSERT
-atom. Contains the following elements:
expr *assert_exp;
Pointer to an expression which should evaluate to non-zero.
char *exprstr;
Pointer to the expression as text (to be used in the output).
char *msgstr;
Pointer to the message, which would be printed when assert_exp
evaluates
to zero.
aoutnlist *nlist;
(In union content
.) Pointer to an nlist structure, describing an
aout stab entry, in case of an NLIST
-atom. Contains the following
elements:
char *name;
Name of the stab symbol.
int type;
Symbol type. Refer to stabs.h
for definitions.
int other;
Defines the nature of the symbol (function, object, etc.).
int desc;
Debugger information.
expr *value;
Symbol’s value.
DATA
and SPACE
atoms can have a relocation list attached
that describes how this data must be modified when linking/relocating.
They always refer to the data in this atom only.
There are a number of predefined standard relocations and it is possible to add other cpu-specific relocations. Note however, that it is always preferable to use standard relocations, if possible. Chances that an output module supports a certain relocation are much higher if it is a standard relocation.
A relocation list uses this structure:
typedef struct rlist { struct rlist *next; void *reloc; int type; } rlist;
Type identifies the relocation type. All the standard relocations have
type numbers between FIRST_STANDARD_RELOC
and
LAST_STANDARD_RELOC
. Consider ‘reloc.h’ to see which
standard relocations are available.
Standard types may be combined with the modifier flags REL_MOD_S
for signed and REL_MOD_U
for unsigned relocations. The default,
when both flags are missing, is to check the value being inserted into
the given relocation field against both ranges, signed and unsigned.
Setting both flags together is illegal!
Typical signed relocations are PC-relative. An example for an unsigned relocation would be a zero- or direct-page addressing mode on the 6502/6809 families. If you are unsure, just don’t set any of these flags.
To get access to the real standard relocation type, without any
flags, you have to use the STD_REL_TYPE(type)
macro.
The detailed information can be accessed
via the pointer reloc
. It will point to a structure that depends
on the relocation type, so a module must only use it if it knows the
relocation type.
All standard relocations point to a type nreloc
structure
with the following members:
size_t byteoffset;
Offset in target-bytes, from the start of the current DATA
atom, to
the beginning of the relocation field. This may also be the address which is
used as a basis for PC-relative relocations. Or a common basis for multiple
separated relocation fields, which will be translated into a single
relocation type by the output module.
size_t bitoffset;
Offset in bits to the beginning of the relocation field. Adds to
byteoffset*BITSPERBYTE
. Bits are counted in a bit-stream from lower
to higher address bytes. But note, that within a little-endian byte bits
are counted from the LSB to the MSB, while they are counted from the MSB to
the LSB for big-endian targets.
int size;
The size of the relocation field in bits.
taddr mask;
The mask defines which portion of the relocated value is set by this relocation field.
taddr addend;
Value to be added to the symbol value.
symbol *sym;
The symbol referred by this relocation
To describe the meaning of these entries, we will define the steps that shall be executed when performing a relocation:
size
number of bits from the DATA
atom, starting
with bit number byteoffset*BITSPERBYTE+bitoffset
.
We start counting bits from the lowest to the highest numbered byte
in memory.
Within a big-endian byte we count from the MSB to the LSB. Within
a little-endian byte we count from the LSB to the MSB.
sym
plus
the addend
. For other relocation types, more complex
calculations will be needed.
For example, in a program-counter relative relocation,
the value will be obtained by subtracting the address of the data
atom plus byteoffset
from the value
of sym
plus addend
.
AND
of the value obtained in the step
above and the mask
value.
mask
.
size
bits of this value into the DATA
atom starting with bit byteoffset*BITSPERBYTE+bitoffset
.
Whenever a CPU module requires a relocation type which cannot be
expresssed by any standard type you have the option to define your own,
beginning with FIRST_CPU_RELOC
(defined in reloc.h
).
The last CPU-specific relocation has to be defined in your backend’s
cpu.h
by LAST_CPU_RELOC
, which is also a hint for the
assembler’s core routines that CPU-specific relocations do exist.
Example:
/* PPC specific relocations */ #define REL_PPCEABI_SDA2 (FIRST_CPU_RELOC) #define REL_PPCEABI_SDA21 (FIRST_CPU_RELOC+1) #define REL_PPCEABI_SDAI16 (FIRST_CPU_RELOC+2) #define REL_PPCEABI_SDA2I16 (FIRST_CPU_RELOC+3) #define REL_MORPHOS_DREL (FIRST_CPU_RELOC+4) #define REL_AMIGAOS_BREL (FIRST_CPU_RELOC+5) #define LAST_CPU_RELOC REL_AMIGAOS_BREL
Also the CPU module must implement and export the following functions:
size_t cpu_reloc_size(rlist *)
Return the size of your cpu-specific relocation in 8-bit host-bytes for
the given type. If this type uses the standard nreloc
structure,
just return zero.
void cpu_reloc_print(FILE *,rlist *)
Print the relocation name and its parameters to the given file.
Usually it has the form:
rname(startbyte,startbit-endbit,mask,addend,symbol)
.
When using nreloc
you may call print_nreloc()
(refer to reloc.h
) to simplify output.
void cpu_reloc_write(FILE *,rlist *)
If this relocation type uses a cpu-specific structure, write it to
the object file’s relocation table in VOBJ
format.
Note, that support for cpu-specific relocations has to be added to
other tools reading and writing the VOBJ
format, like
vlink
and vobjdump
, as well.
Otherwise they will be ignored (but do not cause any failure).
Also, if you want cpu-specific relocations to be recognized in other output modules, you have to handle these types there appropriately.
Each module can provide a list of possible error messages contained
e.g. in ‘syntax_errors.h’ or ‘cpu_errors.h’. They are a
comma-separated list of a printf-format string and error flags. Allowed
flags are WARNING
, ERROR
, FATAL
, MESSAGE
and
NOLINE
.
They can be combined using or (|
). NOLINE
has to be set for
error messages during initialization or while writing the output, when
no source context is available. Errors cause the assembler to return false.
FATAL
causes the assembler to terminate
immediately.
The errors can be emitted using the function syntax_error(int n,...)
,
cpu_error(int n,...)
or output_error(int n,...)
. The first
argument is the number of the error message (starting from zero). Additional
arguments must be passed according to the format string of the
corresponding error message.
Useful support functions for writing CPU-, Syntax- and Output-modules.
void *mymalloc(size_t sz)
Allocate memory. See also mycalloc()
and myrealloc()
.
void myfree(void *p)
Free an allocated memory block with one of the functions above.
uint64_t readval(int be,void *src,size_t size)
Reads an unsigned value with size
target-bytes and byte-ordering
be
from the src
pointer (0 is little-, 1 is big-endian).
void *setval(int be,void *dest,size_t size,uint64_t val)
Stores the value val
to dest
, which has a size of
size
target-bytes, using a byte-ordering of be
(0 is little-, 1 is big-endian).
uint64_t readbits(int be,void *p,unsigned bfsize,unsigned offset,unsigned size)
Reads an unsigned value from the bitfield with bfsize
bits at
p
. size
bits at bit-offset offset
will be extracted.
Use endianness be
while reading the bitfield.
void setbits(int be,void *p,unsigned bfsize,unsigned offset,unsigned size,uint64_t d)
Writes the value d
with size
bits into a bitfield at p
with bfsize
bits, starting at bit-offset offset
, following
the rules of the given endianness be
.
utaddr readbyte(void *src)
Reads an unsigned target-byte from src
.
void writebyte(void *dest,utaddr val)
Stores the value val
into a target-byte at dest
.
OCTETS(n)
This macro expands to the number of 8-bit host-bytes required for n
target-bytes.
expr *parse_expr(char **pp)
Parses any expression starting at *pp
. Updates *pp
to point
at the next character after the expression.
expr *parse_expr_tmplab(char **pp)
Same as parse_expr()
, but defines a temporary label for the PC
symbol, so it does not change its value when this expression is
evaluated at other section locations.
expr *parse_expr_huge(char **pp)
Parses a constant expression (no labels allowed) with 128 bit integers.
expr *parse_expr_float(char **pp)
Parses a floating point expression (no labels allowed), if the
backend supports floating point constants (defines FLOAT_PARSER
).
taddr parse_constexpr(char **pp)
Parses an expression which is immediately evaluated to a constant value. Generates an error message and returns zero when it depends on non-constant, 128-bit (huge) or floating point values.
void free_expr(expr *tree)
Free an expression.
int type_of_expr(expr *tree)
Returns the type of an expression, which may be either NUM
,
HUG
(128 bit constant) or FLT
(floating point). This
will tell you which evaluation function to use (see below).
void simplify_expr(expr *tree)
Try to evaluate the expression as far as possible. Subexpressions only containing constants or absolute symbols are simplified.
int eval_expr(expr *tree,taddr *result,section *sec,taddr pc)
Evaluate an expression using the pc
in section sec
and
store the result in *result
. The return value is non-zero
when the result is constant (i.e. only depends on constants or absolute
symbols).
int eval_expr_huge(expr *tree,thuge *result)
Evaluate a constant 128-bit integer expression, which is written to
*result
. The return value becomes zero when there were problems
(like unsupported operations).
int eval_expr_float(expr *tree,tfloat *result)
Evaluate a constant floating point expression and write the result to
*result
. The return value becomes zero when there were problems
(like unsupported operations).
int find_base(expr *p,symbol **base,section *sec,taddr pc)
Tests, if an expression is based only on one non-absolute symbol
plus constants or minus label. Writes either that symbol-pointer to
*base
or NULL
.
The return value defines the type of base-relative
operation, which is BASE_OK
(normal base plus constant),
BASE_PCREL
(base minus label is pc-relative) or
BASE_ILLEGAL
(illegal arithmetic operation with base symbol).
expr *number_expr(taddr val)
Create a constant expression from val
.
expr *huge_expr(thuge val)
Create a constant 128-bit expression from val
.
expr *float_expr(tfloat val)
Create a constant floating point expression from val
.
symbol *new_abs(const char *name,expr *tree)
Create a new absolute symbol (type EXPRESSION
).
symbol *new_equate(const char *name,expr *tree)
Same as new_abs()
but set the EQUATE
flag to indicate that
the symbol must not change its value (unlike a .set
symbol).
symbol *new_import(const char *name)
Add an externally defined symbol name.
symbol *new_labsym(section *sec,const char *name)
Create a label symbol (type LABSYM
) for the current PC in the given
section. Uses the current/default section when sec
is NULL
.
symbol *new_tmplabel(section *sec)
Create a temporary label symbol with a unique name for the given section.
symbol *internal_abs(const char *name)
Create an internal absolute symbol (type EXPRESSION
), which gets
the flag VASMINTERN
, so it is never written into object files.
regsym *new_regsym(int redef,int no_case,const char *name,int type,unsigned int flags,unsigned int num)
Create a new CPU register symbol with name name
and number
num
. The type
and flags
can be used by the
backend as needed (for example integer and floating point register type).
A non-zero redef
allows redefinition of a register symbol without
error message. A non-zero no_case
defines that the case of the
symbol name is ignored.
regsym *find_regsym(const char *name,int len)
Find register symbol with name
and len
characters.
regsym *find_regsym_nc(const char *name,int len)
Like above, but ignores the case of the symbol.
instruction *new_inst(const char *inst,int len,int op_cnt,char **op,int *op_len)
Traverses the backend’s mnemonic table for instructions called inst
and tries to parse op_cnt
operands using the cpu module’s
parse_operand()
function. Returns an instruction pointer when the
requirements from the mnemonic table were met or NULL
.
dblock *new_dblock(void)
Allocate a new dblock
structure for storing constant data.
sblock *new_sblock(expr *space,size_t size,expr *fill)
Allocate a new sblock
structure for storing space
elements
of size
target-bytes, filled with fill
, or zeros when
fill
is NULL
.
atom *new_atom(int type,taddr align)
Allocate a new atom with given type and alignment.
atom *new_inst_atom(instruction *p)
Allocate a new INSTRUCTION
atom from p
.
atom *new_data_atom(dblock *p,taddr align)
Allocate a new DATA
atom with alignment align
atom from
p
.
atom *new_label_atom(symbol *p)
Allocate a new LABEL
atom from symbol p
.
atom *new_space_atom(expr *space,size_t size,expr *fill)
Allocate a new SPACE
atom with space
elements of
size
target-bytes, filled with fill
, or zeros when
fill
is NULL
.
atom *new_datadef_atom(size_t bitsize,operand *op)
Allocate a new DATADEF
atom from operand op
with
bitsize
bits.
void add_atom(section *sec,atom *a)
Adds a new atom to the end of the specified section. If sec
is
NULL
then the current section is used. If there is no current
section yet, then a default section is created.
A new syntax module must have its own subdirectory under ‘vasm/syntax’. At least the files ‘syntax.h’, ‘syntax.c’ and ‘syntax_errors.h’ must be written.
#define ISIDSTART(x)/ISIDCHAR(x)
These macros should return non-zero if and only if the argument is a
valid character to start an identifier or a valid character inside an
identifier, respectively.
ISIDCHAR
must be a superset of ISIDSTART
.
#define ISBADID(p,l)
Even with ISIDSTART
and ISIDCHAR
checked, there may be
combinations of characters which do not form a valid initializer (for
example, a single character). This macro returns non-zero, when this is
the case. First argument is a pointer to the new identifier and second
is its length.
#define ISEOL(x)
This macro returns true when the string pointing at x
is either
a comment character or end-of-line.
#define CHKIDEND(s,e) chkidend((s),(e))
Defines an optional function to be called at the end of the identifier
recognition process. It allows you to adjust the length of the identifier
by returning a modified e
. Default is to return e
. The
function is defined as char *chkidend(char *startpos,char *endpos)
.
#define BOOLEAN(x) -(x)
Defines the result of boolean operations. Usually this is (x)
, as
in C, or -(x)
to return -1 for True.
#define NARGSYM "NARG"
Defines the name of an optional symbol which contains the number of arguments in a macro.
#define CARGSYM "CARG"
Defines the name of an optional symbol which can be used to select a
specific macro argument with \.
, \+
and \-
.
#define REPTNSYM "REPTN"
Defines the name of an optional symbol containing the counter of the current repeat iteration.
#define EXPSKIP() s=exp_skip(s)
Defines an optional replacement for skip() to be used in expr.c, to skip
blanks in an expression. Useful to forbid blanks in an expression and to
ignore the rest of the line (e.g. to treat the rest as comment). The
function is defined as char *exp_skip(char *stream)
.
#define IGNORE_FIRST_EXTRA_OP 1
Should be defined as non-zero (true) if the syntax module wants to ignore the operand field on instructions without an operand. Useful, when everything following the operand should be regarded as comment, without requiring a comment character.
#define MAXMACPARAMS 35
Optionally defines the maximum number of macro arguments, if you need more than the default number of 9.
#define SKIP_MACRO_ARGNAME(p) skip_identifier(p)
An optional function to skip a named macro argument in the macro definition. Argument is the current source stream pointer. The default is to skip an identifier.
#define MACRO_ARG_OPTS(m,n,a,p) NULL
An optional function to parse and skip options, default values and
qualifiers for each macro argument. Returns NULL
when no argument
options have been found.
Arguments are:
struct macro *m;
Pointer to the macro structure being currently defined.
int n;
Argument index, starting with zero.
char *a;
Name of this argument.
char *p;
Current source stream pointer. An updated pointer will be returned.
Defaults to unused.
#define MACRO_ARG_SEP(p) (*p==',' ? skip(p+1) : NULL)
An optional function to skip a separator between the macro argument names in the macro definition. Returns NULL when no valid separator is found. Argument is the current source stream pointer. Defaults to using comma as the only valid separator.
#define MACRO_PARAM_SEP(p) (*p==',' ? skip(p+1) : NULL)
An optional function to skip a separator between the macro parameters in a macro call. Returns NULL when no valid separator is found. Argument is the current source stream pointer. Defaults to using comma as the only valid separator.
#define EXEC_MACRO(s)
An optional function to be called just before a macro starts execution.
Parameters and qualifiers are already parsed.
Argument is the source
pointer of the new macro.
Defaults to unused.
A syntax module has to provide the following elements (all other functions
should be static
to prevent name clashes):
const char *syntax_copyright;
A string that will be emitted as part of the copyright message.
hashtable *dirhash;
A pointer to the hash table with all directives.
char commentchar;
A character used to introduce a comment until the end of the line.
int dotdirs;
Define dotdirs
as non-zero, when the syntax module works with
directives starting with a dot (.
).
int init_syntax(void);
Will be called during startup, after argument parsing Must return zero if initializations failed, non-zero otherwise.
int syntax_args(char *);
This function will be called with the command line arguments (unless they were already recognized by other modules). If an argument was recognized, return non-zero.
int syntax_defsect(void);
Lets the syntax module define a default section, which is used when no
section was created by any section
or org
directive, before
the first code or data is defined.
May set defsectname
, defsecttype
and defsectorg
accordingly and return with non-zero. Or return with zero and accept
the defaults, which are: defsectname=".text"
and
defsecttype="acrx"
.
char *skip(char *);
A function to skip whitespace etc.
void eol(char *);
This function should check that the argument points to the end of a line (only comments or whitespace following). If not, an error or warning message should be omitted.
char *const_prefix(char *,int *);
Check if the first argument points to the start of a constant. If yes return a pointer to the real start of the number (i.e. skip a prefix that may indicate the base) and write the base of the number through the pointer passed as second argument. Return zero if it does not point to a number.
char *const_suffix(char *,char *);
First argument points to the start of the constant (including prefix) and the second argument to first character after the constant (excluding suffix). Checks for a constant-suffix and skips it. Return pointer to the first character after that constant. Example: constants with a ’h’ suffix to indicate a hexadecimal base.
void parse(void);
This is the main parsing function. It has to read source text lines via
the read_next_line()
function, parse them and create sections,
atoms and symbols. Pseudo directives are usually handled by the syntax
module. Instructions can be parsed by the cpu module using
parse_instruction()
.
char *parse_macro_arg(struct macro *,char *,struct namelen *,struct namelen *);
Called to parse a macro parameter by using the source stream pointer in
the second argument. The start pointer and length of a single passed
parameter is written to the first struct namelen
, while the optionally
selected named macro argument is passed in the second struct namelen
.
When the len
field of the second namelen
is zero, then the
argument is selected by position instead by name. Returns the updated
source stream pointer after successful parsing.
int expand_macro(source *,char **,char *,int);
Expand parameters and special commands inside a macro source. The second
argument is a pointer to the current source stream pointer, which is
updated on any successful expansion. The function will return the
number of characters written to the destination buffer (third argument)
in this case. Returning -1
means: no expansion took place.
The last argument defines the space in characters which is left in the
destination buffer.
char *get_local_label(char **);
Gets a pointer to the current source pointer. Has to check if a valid local label is found at this point. If yes return a pointer to the vasm-internal symbol name representing the local label and update the current source pointer to point behind the label.
Have a look at the support functions provided by the frontend to help.
Syntax modules may support additional features, which can be enabled or
disabled by a preprocessor define.
Like allowing the #
character for introducing comments in the
std-syntax module, when the CPU’s operand parser doesn’t need it
(Example: PPC or x86).
Defines for these optional features follow the general syntax
[module type]_[module name]_[feature name]
.
#define SYNTAX_STD_COMMENTCHAR_HASH
A new cpu module must have its own subdirectory under ‘vasm/cpus’. At least the files ‘cpu.h’, ‘cpu.c’ and ‘cpu_errors.h’ must be written.
A cpu module has to provide the following elements (all other functions
should be static
to prevent name clashes) in cpu.h
:
#define LITTLEENDIAN 1
#define BIGENDIAN 0
Define these according to the target endianness. For CPUs which support big-
and little-endian, you may assign a global variable here. So be aware of
it, and never use #if BIGENDIAN
, but always if(BIGENDIAN)
in
your code.
#define VASM_CPU_<cpu> 1
Defines a cpu-specific macro. May be used to perform special handling in syntax- or output-modules.
#define BITSPERBYTE 8
The number of bits per byte of the target cpu. Usually 8. We require
that vasm is running on a host architecture and file system which uses
8-bit bytes. When writing output for a backend with BITSPERBYTE > 8
the vasm-internal ordering of 8-bit host-bytes within a target-byte
is big-endian.
#define MAX_OPERANDS 3
Maximum number of operands of one instruction.
#define MAX_QUALIFIERS 0
Maximum number of mnemonic-qualifiers per mnemonic.
#define NO_MACRO_QUALIFIERS
Define this, when qualifiers shouldn’t be allowed for macros. For some architectures, like ARM, macro qualifiers make no sense.
typedef int32_t taddr;
Data type to represent a target-address. Preferably use the types from
‘stdint.h’. Does not necessarily have to match the cpu’s address
bus size (refer to bytespertaddr
), but choose it according to the
largest data you will be able to do calculations with. For example, you may
want to allow 32-bit data definitions for an 8-bit cpu.
typedef uint32_t utaddr;
Unsigned data type to represent a target-address.
#define INST_ALIGN 2
Minimum instruction alignment.
#define DATA_ALIGN(n) ...
Default alignment for n
-bit data. Can also be a function.
#define DATA_OPERAND(n) ...
Operand class for n-bit data definitions. Can also be a function. Negative values denote a floating point data definition of -n bits.
typedef ... operand;
Structure to store an operand for a machine instruction or a data constant. Stores, for example, addressing modes and expressions.
typedef ... mnemonic_extension;
Mnemonic extension for the cpu’s instruction table. Often used for the actual opcode or cpu-model flags.
Optional features, which can be enabled by defining the following macros:
#define FLOAT_PARSER 1
Enables the floating point parser and floating point evalulation in the
expression module. With this option the backend has to be prepared that
expressions may contain floating point constants, which can be checked
by testing the result of type_of_expr(expression)
for FLT
.
Then use eval_expr_float(expression,&float_val)
to retrieve the
floating point value with type tfloat
.
It is up to the backend to convert the host’s floating point format,
which should be IEEE, into the backend’s native format. The vasm frontend
only supports IEEE to IEEE conversion via conv2ieee32()
and
conv2ieee64()
.
#define HAVE_INSTRUCTION_EXTENSION 1
If cpu-specific data should be added to all instruction atoms.
typedef ... instruction_ext;
Type for the above extension.
#define CLEAR_OPERANDS_ON_START 1
Backend requires zeroed operand structures when calling parse_operand()
for the first time. Might be useful to parse operands only once.
Defaults to undefined.
#define CLEAR_OPERANDS_ON_MNEMO 1
Backend requires zeroed operand structures when calling parse_operand()
for any new mnemonic. Useful to parse the same operand multiple times on
the current mnemonic, but reset everything for the next mnemonic.
Defaults to undefined.
START_PARENTH(x)
Valid opening parenthesis for instruction operands. Defaults to '('
.
END_PARENTH(x)
Valid closing parenthesis for instruction operands. Defaults to ')'
.
#define MNEMONIC_VALID(idx)
An optional function with the arguments (int idx)
. Returns true
when the mnemonic with index idx
is valid for the current state of
the backend (e.g. it is available for the selected cpu model).
#define MNEMOHTABSIZE 0x4000
You can optionally overwrite the default hash table size defined in ‘vasm.h’. May be necessary for larger mnemonic tables. Run vasm with option ‘-debug’ to print the number of collisions in the hash tables.
#define OPERAND_OPTIONAL(p,t)
When defined, this is a function with the arguments
(operand *op,int type)
, which returns true when the given operand
type (type
) is optional. The function is only called for missing
operands and should also initialize op
with default values (e.g. 0).
Implementing additional target-specific unary operations is done by defining the following optional macros:
#define EXT_UNARY_NAME(s)
Should return True when the string in s
points to an operation name
we want to handle.
#define EXT_UNARY_TYPE(s)
Returns the operation type code for the string in s
. Note that the
last valid standard operation is defined as LAST_EXP_TYPE
, so the
target-specific types will start with LAST_EXP_TYPE+1
.
#define EXT_UNARY_EVAL(t,v,r,c)
Defines a function with the arguments (int t, taddr v, taddr *r, int c)
to handle the operation type t
returning an int
to indicate
whether this type has been handled or not. Your operation will by applied on
the value v
and the result is stored in *r
. The flag c
is passed as 1 when the value is constant (no relocatable addresses involved).
#define EXT_FIND_BASE(b,e,s,p)
Defines a function with the arguments
(symbol **b, expr *e, section *s, taddr p)
to save a pointer to the base symbol of expression e
into the
symbol pointer, pointed to by b
. The type of this base is given
by an int
return code. Further on, e->type
has to checked
to be one of the operations to handle.
The section pointer s
and the current pc p
are needed to call
the standard find_base()
function.
A cpu module has to provide the following elements (all other functions
and data should be static
to prevent name clashes) in cpu.c
:
int bytespertaddr;
The number of bytes per target address. Note, that this really
defines the size of a backend’s address pointer in target-bytes and
might differ from the actual size of taddr
(see above).
mnemonic mnemonics[];
The mnemonic table keeps a list of mnemonic names and operand types the
assembler will match against using parse_operand()
. It may also
include a target specific mnemonic_extension
.
const char *cpu_copyright;
A string that will be emitted as part of the copyright message.
const char *cpuname;
A string describing the target cpu.
int init_cpu(void);
Will be called during startup, after argument parsing. Must return zero if initializations failed, non-zero otherwise.
int cpu_args(char *);
This function will be called with the command line arguments (unless they were already recognized by other modules). If an argument was recognized, return non-zero.
char *parse_cpu_special(char *);
This function will be called with a source line as argument and allows
the cpu module to handle cpu-specific directives etc. Functions like
eol()
and skip()
from the syntax-module should be used to
keep the syntax consistent.
operand *new_operand();
Allocate and initialize a new operand structure.
int parse_operand(char *text,int len,operand *op,int requires);
Parses the source at text
with length len
to fill the target
specific operand structure pointed to by op
. Return with one
of the following codes:
PO_NOMATCH
The source did no match the operand type given in requires
.
PO_CORRUPT
The source was definitely identified as garbage, making it useless to try matching it against any other operand types.
PO_MATCH
The parsed source matches the operand type in requires
. As soon
as all the instruction’s operands have been matched, the instruction
is successfully recognized.
PO_SKIP
Works like PO_MATCH
, but skips the next operand from the mnemonic
table. For example, because it was already handled together with the
current operand.
PO_COMB_OPT
Works like PO_MATCH
, but requests parsing of the next argument from
the source text, if any, with a pointer to the same operand
structure
as before. This makes it possible to merge multiple operands into a
single operand structure.
PO_COMB_REQ
Like PO_COMB_OPT
, requests parsing of the next argument
with a pointer to the same operand
structure. But this time the
additional argument is mandatory.
PO_NEXT
Source did not match the given operand type in requires
. Request
parsing the same chunk of source text again, but using the following
operand type. Can be used to break a PO_COMB_OPT
or PO_COMB_REQ
attempt and continue normally.
size_t instruction_size(instruction *ip, section *sec, taddr pc);
Returns the size of the instruction ip
in target-bytes, which, in the
final pass, must be identical to the number of bytes written by
eval_instruction()
(see below).
dblock *eval_instruction(instruction *ip, section *sec, taddr pc);
Converts the instruction ip
into a DATA atom, including relocations,
if necessary.
dblock *eval_data(operand *op, taddr bitsize, section *sec, taddr pc);
Converts a data operand into a DATA atom, including relocations.
void init_instruction_ext(instruction_ext *);
(If HAVE_INSTRUCTION_EXTENSION
is set.)
Initialize an instruction extension.
char *parse_instruction(char *,int *,char **,int *,int *);
(If MAX_QUALIFIERS
is greater than 0.)
Parses instruction and saves extension locations.
int set_default_qualifiers(char **,int *);
(If MAX_QUALIFIERS
is greater than 0.)
Saves pointers and lengths of default qualifiers for the selected CPU and
returns the number of default qualifiers. Example: for a M680x0 CPU this
would be a single qualifier, called "w". Used by execute_macro()
.
cpu_opts_init(section *);
(If HAVE_CPU_OPTS
is set.)
Gives the cpu module the chance to write out OPTS
atoms with
initial settings before the first atom for a section is generated.
cpu_opts(void *);
(If HAVE_CPU_OPTS
is set.)
Apply option modifications from an OPTS
atom. For example:
change cpu type or optimization flags.
print_cpu_opts(FILE *,void *);
(If HAVE_CPU_OPTS
is set.)
Called from print_atom()
to print an OPTS
atom’s contents.
Output modules can be chosen at runtime rather than compile time. Therefore, several output modules are linked into one vasm executable and their structure differs somewhat from syntax and cpu modules.
Usually, an output module for some object format fmt
should be contained
in a file ‘output_<fmt>.c’ (it may use/include other files if necessary).
To automatically include this format in the build process, the
OUTFMTS
definition in ‘make.rules’ has to be extended.
The module should be added to the OBJS
variable
at the start of ‘make.rules’. Also, a dependency line should be added
(see the existing output modules).
An output module must only export a single function which will return pointers to necessary data/functions. This function should have the following prototype:
int init_output_<fmt>( char **copyright, void (**write_object)(FILE *,section *,symbol *), int (**output_args)(char *) );
In case of an error, zero must be returned. Otherwise, It should perform all necessary initializations, return non-zero and return the following output parameters via the pointers passed as arguments:
copyright
A pointer to the copyright string.
write_object
A pointer to a function emitting the output. It will be called after the assembler has completed and will receive pointers to the output file, to the first section of the section list and to the first symbol in the symbol list. See the section on general data structures for further details.
output_args
A pointer to a function checking arguments. It will be called with all command line arguments (unless already handled by other modules). If the output module recognizes an appropriate option, it has to handle it and return non-zero. If it is not an option relevant to this output module, zero must be returned.
At last, a call to the init_output_<fmt>()
has to be added in the
init_output()
function in ‘vasm.c’ (should be self-explanatory).
Besides assigning the above mentioned function pointers, this function
can be used to redefine the assembler’s behaviour.
For example you may optionally set the following global variables:
asciiout = 1;
Set when the output module likes the output file to be opened in text-mode instead of binary-mode.
unnamed_sections = 1;
Set when the output module cannot handle section names. Usually such an output module differentiates sections by their type only: text, data or bss.
secname_attr = 1;
Set when the section attributes are used to differentiate between two sections with the same name.
output_bitsperbyte = 1;
Set when the output module supports target-bytes with BITSPERBYTE
.
Otherwise it is expected that all output modules do work at least
with 8-bit target-bytes.
Writing a section’s contents is typically done by traversing over all
the section’s atoms, establish alignment and write the contents of
a DATA
or SPACE
atom using fwdblock()
or
fwsblock()
into the output file.
section *s; atom *p; for (p=s->first,pc=(unsigned long long)s->org; p; p=p->next) { npc = fwpcalign(f,p,s,pc); if (p->type == DATA) fwdblock(f,p->content.db); else if (p->type == SPACE) fwsblock(f,p->content.sb); pc = npc + atom_size(p,s,npc); }
Useful support functions for output modules, when writing data into the output file:
void fw8(FILE *f,uint8_t x)
Write 8 bits of data.
void fw16(FILE *f,uint16_t x,int be)
Write 16 bits of data with endianness be
(0 is little, 1 is big).
void fw24(FILE *f,uint32_t x,int be)
Write 24 bits of data with endianness be
(0 is little, 1 is big).
void fw32(FILE *f,uint32_t x,int be)
Write 32 bits of data with endianness be
(0 is little, 1 is big).
void fwdata(FILE *f,const void *buf,size_t n)
Write n
8-bit bytes of data.
void fwspace(FILE *f,size_t n)
Write n
zeroed 8-bit bytes.
void fwbytes(FILE *f,void *buf,size_t n)
Write n
target-bytes (BITSPERBYTE
).
void fwdblock(FILE *f,dblock *db)
Write the target-bytes within a dblock
.
void fwsblock(FILE *f,sblock *sb)
Write the target-bytes within a sblock
.
void fwalign(FILE *f,taddr n,taddr align)
Write as many zero target-bytes as required to align address n
to align
bytes.
int fwpattern(FILE *f,taddr n,uint8_t *pat,int patlen)
Write n
target-bytes, which are initialized with pattern pat
.
The patlen
is given in target-bytes as well. Note, that the
pattern output may be preceded by a number of zero-bytes when n
is not a multiple of patlen
. The function returns non-zero if
that happened.
taddr fwpcalign(FILE *f,atom *a,section *sec,taddr pc)
Write as many target-bytes as required to achieve proper alignment
for atom a
. This space will either be filled by a SPACE
atom’s fill-pattern, or otherwise by the section’s default pattern
(section.pad
).
The newly aligned pc
is returned.
Some remarks:
fw*
functions from above. Otherwise
you can check the global variable output_bytes_le
, which will be
zero for big-endian and non-zero for little-endian target-byte output.
The default is vasm’s internal target-byte endianness, which is big-endian.
#ifdef VASM_CPU_MYCPU
... #endif
or similar.
Also, if the selected CPU is not supported, the init function should fail.
output_error
function.
As all output modules are linked together, they have a common list of error
messages in the file ‘output_errors.h’. If a new message is needed, this
file has to be extended (see the section on general data structures for
details).
When the cause for an error relates to an atom
you may also use
the output_atom_error
function instead, which additionally prints
the atom’s source text line.
In ‘output_errors.h’ use the NOLINE
flag when no atom is
available.
vasm
has a mechanism to specify rather complex relocations in a
standard way (see the section on general data structures). They can be
extended with CPU-specific relocations, but usually CPU modules will
try to create standard relocations (sometimes several standard relocations
can be used to implement a CPU-specific relocation). An output
module should try to find appropriate relocations supported by the
object format. The goal is to avoid special CPU-specific
relocations as much as possible.
Volker Barthelmann vb@compilers.de
[ << ] | [ >> ] | [] | [] | [] | [ ? ] |