The Blackfin BF532 has the following internal memory architecture (see Blackfin Manual):
- L1 instruction memory - SRAM (4-way set-associative cache)
- 16kB instruction SRAM/cache: 0xFFA10000 - 0xFFA14000
- 32kB SRAM instruction memory: 0xFFA08000 - 0xFFA10000
- L1 data memory - consists of 2 banks of 2-way associative cache SRAM, each one is 16kB.
- databank A: 0xFF804000 to 0xFF808000
- databank B: 0xFF904000 to 0xFF908000
- L1 scratchpad RAM - accessible as data SRAM, cannot be configured as cache memory, 4kB. Address is (0xFFB00000 to 0xFFB010000)
For our firmware, we want to load:
- The instructions into the 32kB SRAM instruction memory.
- Signal chain calculations and intermediate values (
W1
), past samples for template comparison (T1
), template match bits (MATCH
) , look-up tables (ENC_LUT
andSTATE_LUT
), and radio packet frames to L1 databank A. - Signal chain coefficients/parameters (
A1
), template and aperture coefficients (A1
), and pointer values (FP_
) to L1 databank B.
These memory layout and address are visualized in each firmware directory’s memory_firmwareVersion.ods
files, and memory_firmwareVersion.h
header files.
To compile our code to fit this memory structure, we use the blackfin linux bare-metal tool chain, which is a gnu gcc based toolchain that allows us to compile assembly and c/c++ code for different blackfin architectures. The installation comes with header files such as defBF532.h
that lists the memory addresses for various MMRs and system registers.
After running make, the file Linker.map shows the order of files linked in the final stage.dxe
file, which includes:
-
crt0.o
- fromcrt0.asm
. Includes the globalstart()
routine from where BF532 starts execution. Also includes most of the processor peripheral setup, such as PLL and clock frequency, and interrupt handler setup.See VisualDSP++ 5.0 C/C++ Compiler and Library Manual for Blackfin for details on blackfin execution process.
-
radio_AGC_IIR_SAA.o
- fromradio_AGC_IIR_SAA.asm
, this is where awherewherewhere all the DSP and interesting code go. The different firmware versions only really differ in this file. -
enc.o
- generated byenc.asm
, generated byenc_create.cpp
. It’s a subroutine called in the data memory setup code inradio_AGC_IIR_SAA.asm
to install the encoding look-up tables. -
divsi3.o
- generated bycommon_bfin/divsi3.asm
, a subroutine for signed division…not really actually used, but could be useful. -
main.o
- frommain.c
. The blackfin starts execution fromcrt0.asm
, which eventuall jumps to themain
routine withinmain.c
. Within themain
routine, more blackfin and Nordic radio chip configurations are done. The radio setup is done throught the SPI interface (such as the radio channel). At the end ofmain()
, it jumps to the setup-up coderadio_bidi_asm
withinradio_AGC_IIR_SAA.asm
.See the section on Mixing C/C++ and Assembly Usage in VisualDSP++ 5.0 C/C++ Compiler and Library Manual for Blackfin
-
print.o
- fromcommon_bfin/print.h
andcommon_bfin/print.c
. Not actually needed for the headstage, rather a leftover from compilation setup for BF537. Not really relevant to headstage firmware. -
util.o
- fromcommon_bfin/util.h
andcommon_bfin/util.c
. Functions defined not actually needed for the headstage. -
spi.o
- fromcommon_bfin/spi.h
andcommon_bfin/spi.c
. Include functions in C to talk to the SPI port, which is needed for reading the flash memory and talking to the radio - functions called from themain()
routine.
Finally, the file bftiny.x
is the linker script to produce the final binary .dxe
file. Written by unknown author, but it works!
The assembly version of the final compiled code and their memory addresses can be found in decompile.asm
. The code order follows the link order. See this document for more details on the compile process.
The flash process will upload the final stage.dxe
file into the onboard flash memory. The blackfin is wired to boot from flash - upon power up, blackfin will read the flash memory through SPI and load the data and instructions to the corresponding memory addresses appropriately.
The architecture of the firmware (in radio_AGC_IIR_SAA.asm
for RHD-headstage or radio5.asm
for RHA-headstage) is essentially two threads: DSP service routine reads in samples from the amplifiers and does the DSP, the radio control thread handles radio transmission and reception.
The radio control thread (radio_loop
) fills the SPI read/write registers and changes flags to the radio to transmit packets. It also reads the incoming packets and writes the desired changes to the requested memory locations. Interleaved between these instructions, the DSP service routine is called.
The DSP service routine blocks until it receives until a new set of samples (4 samples, one from each amplifier) is read and fully processed and then returns control back to the radio control thread. To preserve state for each routine, circular buffers are used to store:
- DSP signal-chain calculated intermediate values (W1).
- DSP signal-chain constants and coefficient values (A1).
- Past stored samples for spike detection (T1).
- Data to be transmitted by the radio (packet buffer).
After all amplifier channels have been processed once, template matches and raw samples are written to the packet buffers. The radio-control thread monitors the packet buffer and transmit when packets are available.
One key limit to this threading architecture is the ADC sampling rate. The processor is operating at 400MHz. The ADC onboard the Intan operates at 1MHz. That means the code-length for calling the DSP service routine, return and execute an instruction in the radio loop, then back, cannot exceed 400 clock cycles.
Whenever the code-length violates this limit, data corruption and glitches can happen.
Outside of the values stored in the circular buffers, all persisten variables that do not have their own registers, such as the number of packets enqueued, are stored in fixed offsets from the frame pointer to permit one-cycle access latency.
The memory map spreadsheet is very helpful in debugging and understanding code execution.