Dec 24, 2015 - Wireless Headstage: AGC with Blackfin BF532 and Intan RHD2132

Wireless Project github

I have been working on improving our lab's old wireless headstage using Analog Device's Blackfin BF532, Intan's analog 32-channel amplifiers RHA2136, and Nordic's NRF24L01 2.4MHz transceivers.
The improvement is replacing the RHA2136 with Intan's digital version, RHD2132. There are 4 RHA2136 per headstage in the original headstage, each of which requires a voltage regulator and ADC, both of which can be eliminated when using RHD2132. The resulting (ideally working) board would be half the size.

The PCBs have been made. I originally imagined the firmware changes to be minimal -- essentially changing how BF532 would talk to the amplifiers through SPI, while keeping most of the signal chain code constant. The code architecture should not have to change much.

However, it has proved to be a lot more difficult than the expected plug-and-play scenario. One of the first issues was to first get Blackfins to talk to the Intans correctly via SPORT. Then I ran into the problem where the original signal chain code in between sample conversions were too long and resulted in data corruption. Therefore, I need to pick parts of the signal chain to cut and modify.

This post documents my work to understand and adapt the integrator-highpass-AGC, the first stage of the signal chain.

Background

The general idea of this wireless system is that, we would acquire samples from four amplifiers at the same time, and then do processing on two of theses samples at a time. Signals at the different stages of the signal chain can be streamed to gtkclient - the PC GUI for display and for spike-sorting. The samples are displayed in a plexon-style interface, and templates are made using PCA. These templates are then sent back to the headstage. In the final firmware, at the end of the signal chain, template matching are done using those results, and a match (or a spike) is found if 16 samples on a given channel is within a certain "aperture" of the templates. (The template is a 16-point time-series. Aperture is the L1-norm of the difference between the template and the most recent 16-samples of the current channel)

Because of this, it would be nice to keep the input signal to template matching at a stable level -- a signal with the correct spiking shape, but different signal amplitude may not be catagorized as a match. Thus enters Automatic Gain Control (AGC).

Algorithm

The basic idea is that, the user sets a reference level for the signal power. If the signal power is above this reference level, then decrease the front-end gain. If it is below, then increase it. There are all kinds of fancy implementation, including using different attack/release rate (different gain increasing/decreasing rate). In terms of embeded system, I find this thread, and "TI DSP Applications With the TMS320 Family, Theory, Algorithms and Implementations Volume 2" (pages 389-403) to be helpful in understanding how it works.

Tim describes the firmware in his thesis, page 162:

Four samples from each of the 4 ADCs are read at once; as the blackfin works efficiently with two 16-bit samples at once, two samples (from two of the 4 headstages) are then processed at a time. These two come in at 12 bits unsigned, and must be converted to 1.15 signed fixed-point, which is done by pre-gain (usually 4) followed by an integraotr highpass stage. The transfer function of the high pass stage is H(z)=4(1-z^-1)/(1-z^-1(1-mu)); the normal value of mu is 800/16384 [...] This is followed by an AGC stage which applies a variable gain 0-128 in 7.8 fixed-point format; the absolute value of the scaled sample is compared to the AGC target (stored as the square root to permit 32-bit integers) and the difference is saturated at 1 bit plus sign. This permits the gain to ramp from 0 to 127.999 in 2^15 clocks.

The accompanying diagram is shown below:

image1

Below is the firmware related to this part - it is obvious now I understand how it works, and Tim's thesis explanations now makes perfect sense (it did before, but they seemed more like clues to an elaborate puzzle).

 //read in the samples -- SPORT0
 r0 = w[p0] (z); // SPORT0-primary: Ch0-31
 r1 = w[p0] (z); // SPORT0-sec:     Ch32-63
 r2.l = 0xfff;
 r0 = r0 & r2;
 r1 = r1 & r2;
 r1 <<= 16;  //secondary channel in the upper word.
 r2 = r0 + r1;
    
 //apply integrator highpass + gain (2, add twice)).
 r5 = [i0++]; //r5 = 32000,-16384. (lo, high)
.align 8
    a0 = r2.l * r5.l, a1 = r2.h * r5.l || r1 = [i1++]; // r1 = integrated mean
    a0 += r2.l * r5.l, a1 += r2.h * r5.l || r6 = [i0++]; //r6 = 16384 (0.5), 800 (mu)
    r0.l = (a0 += r1.l * r5.h), r0.h = (a1 += r1.h * r5.h) (s2rnd);
    a0 = r1.l * r6.l , a1 = r1.h * r6.l; //integrator
    r2.l = (a0 += r0.l * r6.h), r2.h = (a1 += r0.h * r6.h) (s2rnd)

    /*AGC*/	|| r5 = [i1++] || r7 = [i0++]; //r5 = gain, r7 AGC targets (sqrt)
    a0 = r0.l * r5.l, a1 = r0.h * r5.h || [i2++] = r2; //save mean, above
    a0 = a0 << 8 ; // 14 bits in SRC (note *4 above), amp to 16 bits, which leaves 2 more bits for amplification (*4)
    a1 = a1 << 8 ; //gain goes from 0-128 hence (don't forget about sign)
    r0.l = a0, r0.h = a1 ;  //default mode should work (treat both as signed fractions)
    a0 = abs a0, a1 = abs a1;
    r4.l = (a0 -= r7.l * r7.l), r4.h = (a1 -= r7.h * r7.h) (is) //subtract target, saturate, store difference
        || r6 = [i0++]; //r6.l = 16384 (1), r6.h = 1 (mu)
    a0 = r5.l * r6.l, a1 = r5.h * r6.l || nop; //load the gain again & scale.
    r3.l = (a0 -= r4.l * r6.h), r3.h = (a1 -= r4.h * r6.h) (s2rnd) || nop; //r6.h = 1 (mu); within a certain range gain will not change.
.align 8
    r3 = abs r3 (v) || r7 = [FP - FP_WEIGHTDECAY]; //set weightdecay to zero to disable LMS.
    r4.l = (a0 = r0.l * r7.l), r4.h = (a1 = r0.h * r7.l) (is) || i1 += m1 || [i2++] = r3;
                //saturate the sample (r4), save the gain (r3).

Line 2-8 stores the 12-bit samples in the LO and HI 16-bits of r2.

Line 13 and 14 implement the pre-gain. r5.l=32000=0x7d00 is 0.9765625 in Q15. Multiplying a 12-bit number by 0x7d00 in blackfin's default mode is equivalent of converting the 12-bit UNSIGNED sample into signed Q15 format. This IS VERY IMPORTANT as I shall see later. Line 14 does it again and then add to the original result. This means by line 15, a0 and a1 contains the original samples multiplied by 2 (although r5.l should really be 0x7fff, which is closer to 1 than 0x7d00 in Q15).

Line 15: r1.l=integrated mean, r5.l=-16384=(Q15) -0.5. The multiply-accumulate (MAC) mode is s2rnd. So, in algebraic form, at the end of the instruction, a0=2x-0.5m[n-1]. r0.l=2*a0=4x-m[n-1]. Here x is the incoming sample, and m[n-1] is the previous integrated mean. Note that since x is 12-bits, then a0 and r0 are at most 14-bits, therefore still positive in Q15 format. Finally, r0 here is actually the output of the integrator-highpass stage (call it y), as also indicated by the diagram.

Line 16: As the comment hints, loads 0.5*m[n-1] into the accumulator. Both numbers are in Q15, the result in the accumulator is Q31.

Line17: From line 15, we have r0=4x-m[n-1]. r6.h=mu=800=(Q15)0.0244140625, which is a constant used to get the desired high-pass response. At the end of this instruction, a0=0.5m[n-1]+mu(4x-m[n-1]).

Since the 40-bit accumulator value is extracted into r2 with (s2rnd) mode, we have r2.l and r2.h to be

2(0.5m[n-1]+mu(4x-m[n-1]))=m[n-1]+2mu(4x-m[n-1]).

As the comment in line 20 hints, this is the new mean.

To characterize the behavior of this integrator-filter stage, we have the following two equations: m=mz1(12μ)+8μxy=4xmz1

where m is integrated mean, 1/z is delay, x is the original 12-bit samples, y is the output. Note again that since x is 12-bit unsigned, conversion into Q15 does not change anything, therefore the system equations are easy to derive: YX=41z11+z1(2μ1)

The gain is 4, and the 3dB point is determined by \(z=1-2\mu\). In the code \(\mu\approx 0.0244\), so \(z=0.9512\). Note also, in Tim's thesis, \(\mu\) is actually \(0.0488\), this is because (s2rnd) was used in the actual implementation.

The -3dB frequency can be calculated by plugging in the transfer function into MATLAB, since there's no easy way to get the Bode plot of a z-transform from the poles and zeros like the analog systems..unless if substituting$z=exp{j\omega}$ into the transfer function and calculating the magnitude response is easy..

>> num=[1,-1];
den=[1, -0.9512];
[h,w] = freqz(num,den,'whole',2001);
plot(w/pi,20*log10(abs(h)))
xlabel('Normalized Frequency (\times\pi rad/sample)')
ylabel('Magnitude (dB)')

This will plot the two-sided magnitude response. We only care the part where normalized frequency is from 0 to 1. The normalized frequency ωc at the -3dB point can be found be estimated from the plot. Convert to Hz via fc=ωcπFs/(2π). For this AGC setting fc225Hz.

Line 20-23 is the first step of AGC, and is pretty tricky. r5 contains the AGC gain, which as described in the thesis is Q7.8 format (really Q8.8, but we never have negative gain here).

Line 20 multiplies r0 with the AGC gain. But the default mode of multiplication treats both numbers as Q15 fractions, the value inside the accumulator is Q31 and will not be correct. In Q31, the decimal point is after bit 31. But when a Q15 number multiplies a Q7.8 number, there are 15+8=23 fractional bits, and the decimal point is in fact after bit 23.

Therefore, before we can extract the resulting 16-bit number in line 23 (MAC to half-register), we have to shift the results in accumulator by 8. Since the default extraction option takes bit 31-16 after rounding, we get the correct 16-bit results. One questions might be, the result of Q15 and Q7.8 multiplication also have 9 integer bits. In the shift-and-extraction operation, we are essentially throwing away 8 of these integer bits. Turns out the gain is never very big for this to matter (in other words, Q7.8 gives a big range of available gain, but this range is probably not needed).

Finally, r0 in line 23 is the output of the AGC stage.

Important to note here r0 in line 20 is the difference between the new samples and the previous integrated mean, therefore when we take the absolute value of the accumulators in line 21 and line 23, we are essentially getting a measurement of the sigal envelope's amplitude. And the goal of AGC is to adjust the gain so the signal envelope stays within a target range. This is a very important point. Since Intan implements a highpass DSP filter onboard, I initially thought I could skip the integrator-highpass stage, and plug in the AGC code directly to process the incoming samples. But I simply got rubbish, because the instanteous signal value is not a good measurement of the signal power. The integrator is required to obtain the mean and for this specific AGC implementation to work.

Line 25 does a clever bit of MAC trick. a0 contains the difference of the signal level and the target level. Note that the target level is stored in memory as the square root of the value. Using the (is) mode treats both all numbers involved as signed integer, and the extraction will either be 0x8000 or 0x7fff, for a negative or positive difference. This is significant since 0x8000 and 0x7fff are respectively the smallest negative number and greatest positive number that can be represented in Q15, and allows us to update our AGC gain without doing any if-statement style conditions.

In line 26, r6.h=1 is not the value of mu, but simply indicates whether we have enabled AGC.

Line 27 and line 28 involve yet another fixed-point trick. These two lines updates the AGC gain. The AGC gain in r5 is loaded into the accumulator by multiplying 0x4000 (16384). Afterwards, the MAC with s2rnd option subtract either 0x8000 or 0x7fff from it. This whole procedure is the equivalent of either adding 28 or subtracting 27 from the Q7.8 value of the original AGC gain.

The key is noticing that multiplying a Q7.8 number by 0x4000 while treating both as Q15 numbers, then extracting the resulting Q31 number in s2rnd mode, results in a 16-bit number that has the same Q7.8 value as before. This can be demonstrated by the following lines using functions defined in AGC_sim.py:

gain = 52                               # original gain
Q8_gain = int(decimal_to_Q8_8(52),16)
acc_val = int(decimal_to_Q1_31(hex_to_Q15(Q8_gain) * hex_to_Q15(16384)),16)
final_gain = mult_acc_to_halfReg(acc_val, 0, 0, 'sub', 's2rnd')
final_gain = hex_to_Q8_8(final_gain)    # should be 52 again.

In line 30, we simply take the absolute value of r3, which contains the new AGC gain, just to be safe.

Implementation for Intan

Understanding that the integrator stage is required for the AGC to work was crucial. Initially giving the unsigned 16-bit Intan samples to the AGC stage (starting around line 20), I simply got rail-to-rail oscillations that didn't make any sense.

Adding the integrator stage, however, still gave me similar results, but less oscillations. The key to solving this is to realize the original code was written to work with 12-bits unsigned samples. I wish this was emphasized more in the thesis, or perhaps I was just too n00b to realize the significance here.

Since the incoming samples from Intan are 16-bits unsigned in offset binary format -- reference value is 0x8000, 0 volts would be 0x0000, and VDD is 0xffff, the conversion step in line 13 by multiplying 0x7fff will often result in a negative result. For example, treating both 0x8000 and 0x7FFF as Q15 numbers, the product is close to -1. Thus the result mean would be -2, rather than 0 (since 0x8000 means signal value at reference voltage). Further, this sign flip would result in new values with different amplitude from the original values.

There are two solutions to this problem:

  1. Right-shift the incoming 16-bit unsigned binary offset samples to 12 bits. Do this right after line 8, and the rest of the code stays unchanged. This approach the least changes to the old firmware (of course, line 1-8 need to be altered to work with Intan, but that's not algorithmic).

  2. RHD2132 is capable of outputting the ADC samples in 16-bit signed, two's complement format. This would solve the above mentioned problem of inadvertently flipped signs, since the samples will already in Q15 format. But this would also require changing the integrator-highpass stage, to obtain the same behavior. Specifically, in the system function, there is a pre-gain of 4 in front. This is a result of the line 14, and the two s2rnd operations in line 15 and line 17.

    Now that our samples are full 16-bits, we do not need the extra gain. This requires changing the coefficient values in:

    • r5 in line 11: from r5.l=0x7fff, r5.h=0xc000 to r5.l=0x7fff, r5.h=0x8000.
    • r6 in line 14: from r6.l=0x4000, r6.h=800 to r6.l=0x7fff, r6.h=1600.

    In addition to deleting the accumulator instructions in line 14, and changing the MAC option from (s2rnd) to default in line 15 and line 17.

I ended up using the second approach, because I don't like throwing away bits. But both works.


**Edit: 4/6/2016

After validation of recording and sorting quality between RHA-headstage, RHD-headstage, and plexon, it looks like AGC amplifies noise too much under low SNR conditions.

As a result, the final deployment firmware uses a fixed pre-gain stage before the IIR stage. This gain is Q7.8 value and can be set in the appropriately compiled version of gtkclient in 0.5 increment from -128 to 127.5.

Jul 28, 2015 - Gowda, Carmena 2014: Designing Dynamical Properties of Brain–Machine Interfaces to Optimize Task-Specific Performance

Designing Dynamical Properties of Brain–Machine Interfaces to Optimize Task-Specific Performance

Pretty heavy math, will have to go over this again. Premise is that small changes in closed-loop BMI dynamical properties can have significant and predicatable effects on performance. PVKF was used as a specific algorithm to study.

Two undesired effects:

  • presence of unintended attractor points which pulled the cursor to arbitrary workspace positions.
  • calibration-dependent control memory property which parameterized the tradeoff between speed and hold performance. likely driven by fluctuation in the closed-loop control memory of the decoder.

Both properties were not only present in experimental data but also their mathematical causes could be traced back to the mechanics of the closed-loop decoder. For KF, and generally for recursive Bayesian decoders, the closed-loop properties induced by interaction between the sate-space model and the neural firing model can produce unintentional dynamical artifacts.

There are numerous benefits of keeping both the decoder and the neural ensemble fixed across sessions. However, CLDA is needed to adapt to nonstationarity of the ensemble, and initial calibration. But CLDA can also change system properties that one might prefer to keep fixed if one wishes to keep the system as consistent as possible before and after calibration.

References to Read

  • F. R. Willett , A. J. Suminski , A. H. Fagg and N. G. Hatsopoulos "Improving brain-machine interfaceperformance by decoding intended future movements", J.Neural Eng., vol. 10, no. 2, pp.026011 2013
  • B. Jarosiewicz , S. M. Chase , G. W. Fraser , M. Velliste , R. E. Kass and A. B. Schwartz "Functional network reorganizationduring learning in a brain-computer interface paradigm", Proc. Nat. Acad. Sci., vol. 105, no. 49, pp.19486 -19491 2008
  • E. J. Hwang , P. M. Bailey and R. A. Andersen "Volitional control of neuralactivity relies on the natural motor repertoire", Curr.Biol., vol. 23, no. 5, pp.353 -361 2013
  • W. Wu , Y. Gao , E. Bienenstock , J. P. Donoghue and M. J. Black "Bayesian population decoding of motor corticalactivity using a Kalman filter", Neural Comput., vol. 18, no. 1, pp.80 -118 2006
  • Z. Li , J. E. O'Doherty , M. A. Lebedev and M. A. L. Nicolelis "Adaptive decoding for brain-machineinterfaces through Bayesian parameter updates", NeuralComput., vol. 23, no. 12, pp.3162 -3204 2011
  • M. Golub , S. M. Chase and B. M. Yu "Learning an internal dynamics model fromcontrol demonstration", Proc. Int. Conf. Mach.Learn., vol. 28, pp.606 -614 2013

Jul 28, 2015 - Orsborn, Carmena 2014: Closed-loop decoder adaptation shapes neural plasticity for skillful neuroprosthetic control

Closed-loop decoder adaptation shapes neural plasticity for skillful neuroprosthetic control

Focuses on the acquisition of "neuroprosthetic skill", performance and neural representations that are robust over time and resistant to interfernce. As shown before, learning in brain-control facilitate the formation of BMI-specific control networks, changes in cortical and corticostriatal plasticity show speficificty for BMI control neurons. Evidence seems to suggest that neuroplasticity creates specialized BMI control network that allows skillful control.

Cortical map formation has been shown to be sensitive to the details of the BMI system, such as the neurons input into the decoder, and decoder parameters. Training new decoders regularly, even with the same neural ensemble, elminated cortical map formation and the associated performance improvement. This would then suggest a fixed decoder would allow neuroplastiicty to yield continued task performance from day to day. However, the decoder's performance is sensitive to changes in the neural signals. Thus closed-loop adapation can facilitate and maintain learning in the presence of changing neural inputs to the BMI. Therefore the control architecture of closed-loop decoder adaptation (CLDA) as a two-learner system can provide more robust BMI performance.

In the first experiment, a delayed center-out hold task is used. CLDA was used initially to train the decoder (via smooth-batch or ReFIT). CLDA was stopped until the monkey can navigate the cursor across workspace - the goal is to limit decoder adapation to maximize improvement driven by neural adapation, perhaps this allows for more robust neuroprosthetic skill retention?

Performance (measured by % correct, success rate - trials/min, and mean error - deviation from ideal trajectory) continued to improve after the decoders were held fixed. Intermittent CLDA was used to compensate for performance drops and changes in the neural ensemble.

One key finding was that

[...] gradual refinement of cursor control, with continued improvements in movement errors and success rates reached a plateau. These improvements were absent when CLDA was used each day to maximize performance starting from varying initial decoders that used differeint neural ensembles. Whereas CLDA could achieve high task performance, movement kinematics showed no improvements. Daily performance also showed variability commonly observed with daily re-training. This finding confirms that observed learning was not purely reflective of increased practice in BMI and highlights the improtance of some degree of neural and decoder stability for learning.

While the specifics of how CLDA is used exactly in their training scheme are needed for me to accept their conclusion (reseeding CLDA every day would introduce abrupt changes to the decoder, as the authors mention), but the idea of intermittent decoder adapation and neural plastiticty can be useful. Also supports using reinforcement learning to train the decoder...

Neural data analysis:

  1. Tuning map correlate strongly in later sessions, less tuning changes in later sessions indicate saturation.
  2. Degree of neural adaptation depends on amount of performance improvements - correlation of tuning maps with day 1 decrease with sessions and performance rate. Really just a corrollary of analysis 1.
  3. Neural adapation is shaped by decoder properties - tuning properties changed primarily when necessary to improve performance and otherwise stable. Specifically, units were more likely to change their preferred directions if the initial decoder assigned them an "incorrect" PD. If they did the same analysis on the Ganguly, Carmena 2011 paper on direct and indirect neuron populations, would they still reach the same results with the direct neuron population? What about the indirect neurons?
  4. They compared the onset time of directionally tuned activity and time of peak firing for each unit during different sessions. Averaging across all units and series, they found that after learning:
    • units were both directionally tuned earlier and reached peak firing earlier in the trial (Wilcoxon sign-rank test).
    • Majority of units developed tuning prior to the go-cue - indicate planning or preparation to move.
    • Cursor speed profiles also shifted earlier with learning, and clear increases in speed occured prior to the go-cue. (WTF the cursor isn't supposed to move before go-cue!).

Second Experiment
To test the robustness of neuroprosthetic skill and the emerging neural map's resistance to interference from exposure to other contexts and perturbing neural inputs, specifically, from native motor networks, the BMI-SC task involved the monkey pressing down on a force sensor with arm contralateral to the implants, whlie performing the regular task. The decoder used in BMI-SC and regular BMI task is the same.

Results here include:

  1. Isometric force task significantly disrupted the task performance.
  2. BMI-only performance and learning not disrupted by performing the BMI-SC task in the same day.
  3. Sessions in which BMI-only and BMI-SC tasks were perofrmed in A-B-A block showed minimal within-session interference between contexts, confirms the reversible modification hypothesis from Ganguly, Carmena 2011.
  4. BMI-SC performance also improved across the series, approaching that of BMI-only performance on the last day in one example. These improvements may be due in part to reduced interference of arm-movement-related activity with BMI control, the disruption of simultaneous arm movements may not be fully blocked by skill formation. Learning Transfer between contexts.

Performance improved even when CLDA was used to fully adapt the decoder, suggesting that neural plasticity may provide benefits beyond decoder adapation alone. Gradual adapation might be kye.

References to Read

  • Collinger, J.L., Wodlinger, B., Downey, J.E., Wang, W., Tyler-Kabara, E.C., Weber, D.J., McMorland, A.J., Velliste, M., Boninger, M.L., and Schwartz, A.B. (2013). High-performance neuroprosthetic control by an individual with tetraplegia. Lancet 381, 557–564
  • Dangi, S., Orsborn, A.L., Moorman, H.G., and Carmena, J.M. (2013). Design and analysis of closed-loop decoder adaptation algorithms for brain-machine interfaces. Neural Comput. 25, 1693–1731
  • Gilja, V., Nuyujukian, P., Chestek, C.A., Cunningham, J.P., Yu, B.M., Fan, J.M., Churchland, M.M., Kaufman, M.T., Kao, J.C., Ryu, S.I., and Shenoy, K.V. (2012). A high-performance neural prosthesis enabled by control algorithm design. Nat. Neurosci. 15, 1752–1757
  • Orsborn, A.L., Dangi, S., Moorman, H.G., and Carmena, J.M. (2012). Closedloop decoder adaptation on intermediate time-scales facilitates rapid BMI performance improvements independent of decoder initialization conditions. IEEE Trans. Neural Syst. Rehabil. Eng. 20, 468–477.
  • Suminski, A.J., Tkach, D.C., and Hatsopoulos, N.G. (2009). Exploiting multiple sensory modalities in brain-machine interfaces. Neural Netw. 22, 1224–1234
  • Suminski, A.J., Tkach, D.C., Fagg, A.H., and Hatsopoulos, N.G. (2010). Incorporating feedback from multiple sensory modalities enhances brain-machine interface control. J. Neurosci. 30, 16777–16787.
  • Wander, J.D., Blakely, T., Miller, K.J., Weaver, K.E., Johnson, L.A., Olson, J.D., Fetz, E.E., Rao, R.P.N., and Ojemann, J.G. (2013). Distributed cortical adaptation during learning of a brain-computer interface task. Proc. Natl. Acad. Sci. USA 110, 10818–10823.

Jul 28, 2015 - Raspopovic, Micera 2014: Restoring Natural Sensory Feedback in Real-Time Bidirectional Hand Prostheses

Restoring Natural Sensory Feedback in Real-Time Bidirectional Hand Prostheses

Sensory feedback was achieved by stimulating peripheral nerve fascicles, which, in turn, allowed real-time closed-loop control of a prosthetic hand by a human subject with a transradial amputation. To restore the lost sensory feedback, four TIMEs were implanted in the median and ulnar nerve fascicles, and two stimulation sites that were able to elicit distinct and repeatable sensations on the innervation territories of the two nerves (3–5) were then selected at the end of systematic testing of all the contacts and then connected to the artificial hand sensors. Sensations were elicited in a range from slight contact to just below the reported pain threshold, to dynamically control the intensity of stimulation delivered, according to the prosthetic hand sensor readouts.

The participant controlled the prosthesis through voluntary contractions of the remaining muscles on the stump, being able to perform different (ulnar, pinch, and palmar) grasps, and hand opening by online processing of sEMG signals. The grasps were performed in terms of position control such that he was able to finely modulate the fastening and opening of the prosthetic hand.

The best part is with the nerve stimulation, the subject can distinguish between at least three different shapes, sizes, and stiffness. The stimulation was modulated at 50Hz, biphasic, and only amplitude varied.

Jul 28, 2015 - Shenoy, Carmena 2014: Combining Decoder Design and Neural Adaptation in Brain-Machine Interfaces

Perspective article: Combining Decoder Design and Neural Adaptation in Brain-Machine Interfaces

Thesis: optimal codesign of decoders and concomitant engagement of neural adaptation is important.

Decoding vs. Learning.

Decoder design begins by (1) training a decoder to predict natural or intended movement from previously recorded open-loop neural data and (2) assumes that the statistics of the neural activity remain the same when controlling the BMI during the closed-oop phase.

Neural adapation approach begins by (1) starting with a decoder that is not trained to predict natural or intended movement, and instead has minimal structure within its decoder paratmers, and (2) takes the view that there is a de novo, closed-loop system that has to be learned by the brain in order to achieve proficient BMI control.

These two approaches do not have to be exclusive.

Three experimental model:

  1. hand-controlled training, hand-free BMI control. No context switch between change, so movement continue from training to BMI. Lead to high performance with biomemtic decoder design.
  2. hand-controlled training, hand-independent BMI control - involves context change between control change. Hand free to move in brain-control. The same movements that were used to train the open-loop decoder cannot be made during BMI. This process predicts more meaningful neural adaptation changes.
  3. hand-constrained training, hand-constrained BMI control without context change via passive observation. Less mismatch between closed-loop BMI operation and open-loop decoder training, can lead to high BMI perforance but predict less neural adapatation.

CLDA e.g. Carmena's PVKF with Smooth-batch or ReFIT may combine decoder adapation with neural adapation. Requires "intention estimation" which I think is hard to generalize, but it's also the easiest closed-loop adaptation to implement.

Neuro-adapation allows animals to effectively learn at least a slightly different decoder each day before being able to perform the task proficiently. But this is difficult for more complex task with higher DOF. The key is to pair stable neural populations with fixed, static decoder. Differential modulation of neuronal populations based on their causal link to neuroprosthetic control, resulting netowrks co-existing with native cortical networks, can switch without interference based on task requirements, blablabla nothing new here, just re-iterating the Ganguly-Carmena 2011 paper.

The real challenge is to control two different BMI using two decoder from two separate neural ensembles!

**How do these adapations happen? Basal ganglia? **

Mouse studies (Yin et al., 2009)showed increasing coherence between M1 and dorsostlateral striatum neurons during learning to brain-control using M1 neurons.

Changes in cohereance were temporally precise and significantly more pronounced in direct M1 neurons than in those not controlling the BMI - consistent with formation of a BMI-specific network. Further, knckout mice lacking NMDA receptors in the striatum were not able to learn the same task, supporting the notion that corticostriatal plasticity is necessary for neuroprosthetic learning. Together these results suggest that corticobasal ganglia circuits are invovles in BMI learning, even when they do not require physical movement. [...] neural adapatation not only elicits changes in motor cortical networks, but also recruits elements fo the natural motor system outside of the cortex such as the basal ganglia.

This is pretty cool...brain has so much generalization capability, motor learning is like the greatest API ever for BMI! On the other hand, if BMI learning is so closely coupled with the motor learning process (signaled by the activation of the corticobasal ganglia circuits), would that imply subjects who have more proficient motor learning would also be more proficient at learning different modes of BMI control (at least when using M1 ensembles)?

Can try a bimanual task where one hand is controlled by BMI the other by native arm.

How does the decoder know if the user intends to use the BMI controlled device? Motor cortex is always active and not strictly lateralized and is part of a coordinated bimanual circuit. This correlation may not be readily separated by the decoder, especially if neural activity from both contra- and ipsilateral motor cortex couupy the same neural dimensions.

Thus the onus would then fall on neural adaptation to create a new motor control circuit, which effectively decorrelates control of the intact arm from the prosthetsis, so that the subject can have independent control of both her real and prosthetic limb and hand.

References to Read

  • Chase, S.M., Kass, R.E., and Schwartz, A.B. (2012). Behavioral and neural correlates of visuomotor adaptation observed through a brain-computer interface in primary motor cortex. J. Neurophysiol. 108, 624–644.
  • Chestek, C.A., Gilja, V., Nuyujukian, P., Foster, J.D., Fan, J.M., Kaufman, M.T., Churchland, M.M., Rivera-Alvidrez, Z., Cunningham, J.P., Ryu, S.I., and Shenoy, K.V. (2011). Long-term stability of neural prosthetic control signals from silicon cortical arrays in rhesus macaque motor cortex. J. Neural Eng. 8, 045005.
  • Dangi, S., Gowda, S., Moorman, H.G., Orsborn, A.L., So, K., Shanechi, M., and Carmena, J.M. (2014). Continuous closed-loop decoder adaptation with a recursive maximum likelihood algorithm allows for rapid performance acquisition in brain-machine interfaces. Neural Comput. 26, 1811–1839.
  • Durand, D.M., Ghovanloo, M., and Krames, E. (2014). Time to address the problems at the neural interface. J. Neural Eng. 11, 020201.
  • Fan, J.M., Nuyujukian, P., Kao, J.C., Chestek, C.A., Ryu, S.I., and Shenoy, K.V. (2014). Intention estimation in brain-machine interfaces. J. Neural Eng. 11, 016004.
  • Fetz, E.E. (2007). Volitional control of neural activity: implications for braincomputer interfaces. J. Physiol. 579, 571–579
  • Gage, G., Ludwig, K., Otto, K., Ionides, E., and Kipke, D. (2005). Naive coadaptive
    cortical control. J. Neural Eng. 2, 52–63
  • Jarosiewicz, B., Masse, N.Y., Bacher, D., Cash, S.S., Eskandar, E., Friehs, G., Donoghue, J.P., and Hochberg, L.R. (2013). Advantages of closed-loop calibration in intracortical brain-computer interfaces for people with tetraplegia. J. Neural Eng. 10, 046012.
  • Kao, J.C., Stavisky, S., Sussillo, D., Nuyujukian, P., and Shenoy, K.V. (2014). Information systems opportunities in brain-machine interface decoders. Proceedings of the IEEE 102, 666–682.
  • Koralek, A.C., Jin, X., Long, J.D., II, Costa, R.M., and Carmena, J.M. (2012). Corticostriatal plasticity is necessary for learning intentional neuroprosthetic skills. Nature 483, 331–335.
  • Koralek, A.C., Costa, R.M., and Carmena, J.M. (2013). Temporally precise cellspecific coherence develops in corticostriatal networks during learning. Neuron 79, 865–872.
  • Lebedev, M.A., Carmena, J.M., O’Doherty, J.E., Zacksenhouse, M., Henriquez, C.S., Principe, J.C., and Nicolelis, M.A. (2005). Cortical ensemble adaptation to represent velocity of an artificial actuator controlled by a brainmachine interface. J. Neurosci. 25, 4681–4693.
  • Merel, J., Fox, R., Jebara, T., and Paninski, L. (2013). A multi-agent control framework for co-adaptation in brain-computer interfaces. Proceedings of the 26th Annual Conference on Neural Information Processing Systems, 2841–2849.
  • Milla´ n, Jdel.R., and Carmena, J.M. (2010). Invasive or noninvasive: understanding brain-machine interface technology. IEEE Eng. Med. Biol. Mag. 29, 16–22
  • Yin, H.H., Mulcare, S.P., Hila´rio, M.R., Clouse, E., Holloway, T., Davis, M.I., Hansson, A.C., Lovinger, D.M., and Costa, R.M. (2009). Dynamic reorganization of striatal circuits during the acquisition and consolidation of a skill. Nat. Neurosci. 12, 333–341.
  • Sussillo D, Nuyujukian P, Fan JM, Kao JC, Stavisky SD, Ryu SI, Shenoy KV (2012) A recurrent neural network for closed-loop intracortical brain-machine interface decoders. Journal of Neural Engineering. 9(2):026027

Jul 28, 2015 - Shenoy, Kaufman, Sahani, Churchland 2011: A dynamical systems view of motor preparation: Implications for neural prosthetic system design

A dynamical systems view of motor preparation: Implications for neural prosthetic system design

This book chapter/review paper seeks to tie up a bunch of Shenoy lab work (mostly Churchland) on premotor cortex together with a dynamical system hypothesis - the optimal subspace hypothesis. Central question is what are the neural processes that precede movement and lead to the initiation of movement? And what links movement preparation and movement generation.

The experiment used to derive all the results here is a instructed-delay task, where a monkey first holds its hand in the center, a target then appears, then a go-cue appears, at which the monkey moves to the target.

They first noticed that delay time (between target appearnce and go-cue) is inversely proportional to the reaciton time (time between go-cue and movement-onset). This is hypothesized as a result of more time for movement preparation. So they recorded from the premotor cortex (PMd) and M1, and found activity changes there during the delay period, and seek to find how activties there (indvidual neuron changes vary) contribute to the reduction in movement delay.

(Seems like they are begging the question in terms of which cortical areas to focus on. They mentioned a number of areas show changes in activity during the delay period, yet focus only on PMd and M1. Probably because they only have implants there. Contributions of specific cortical areas to movement preparation is then unclear)

Optimal Suspace Hypothesis

If approaching this question from a dynamical systems perspective, then they will have to answer:

  • How the activity of a neural population evolves and achieves the needed preparatory state?
  • How this preparatory state impacts the subsequent arm movement?
  • What are the underlying dynamics of the neural circuit?

Their solution to the delay period-RT observation is to hypothesis that time is needed for the neural population to achieve some kind of preparatory state which is neccessary for the subsequent movement. Seems like a reasonable hypothesis to me. Specifically, the motor preparation may be the act of optimizing preparatory activity so that the generated movement has the desired properties (what these properties are, they have yet to answer at all).

In a dynamical systems framework, they considered, for a given reach, there is presumably some small sub-region of space containing those values of P that are adequate to produce a successful reach. A smooth relationship exist between firing rate and movement (reasonable), therefore, the small subregion of space is conceived of as being contiguous.

This framework would then predict several things that one can test in the lab (although the order of experiments and hypothesis formulation is probably the opposite in practice here...).

A direct corollary of this framework would be that:

If prep activity is equivalent to initializing a traverse to some optimal subspace for the subsequent movement, these activities for different movements must be different. Movements are defined by their kinematic parameters, thus prep activity should be characterized, or tuned, to these different movement parameters as well.

This is presumable true, have yet to read the Messier and Kalasak, 2000 paper.

Other predictions:

  • Prediction 1: Reach-speed modulation: Preparatory activity should covary with other meanginful aspects of movement, including peak reach speed. Experiment conducted where monkeys were instructed to reach targets with either faster or lower speed, and the neural data during the preparatory period analyzed.
  • Prediction 2: Reach-speed (trial-by-trial) correlation: Preparatory activity should correlate, on a trial-by-trial basis, with the peak reach speed. This is mainly to demonstrate the smooth mapping between the neural space and movement parameters, such a trial on the slower side within the instructed-fast condition, should be closer to the those found in the instructed-slow condition than the others. Closeness is measured via distance on the (speed, mean-firing-rate) axis.
  • Prediction 3: Across-trial firing-rate variance (Fano factor) reduces through time: Preparatory activityshould become, through time, quite accurate and therefore quite similar across trials. In other words, at the end of the preparatory period, the neural activity should become constrained inside the optimal subspace for the subsequent movement, this should manifest in the reduction in variance of the neural activity. The metric they used is the Fano factor (seems rather convoluted), need to read the actual method.
  • Prediction 4-I and 4-II: Lower Fano factor correlation with lower RTs.
  • Prediction 5: Perturbing neural activity increases RT: Delivering subthreshold electrical microstimulation (how do they define subthreshold here?) at different times prior to the go-cue in PMd, resulted in longer RT if the stimulation was closer to the go-cue. Stimulating M1 impacted RT less. Effect of stimulation was specific to arm movements and produced little increase in saccadic eye movement RT. The results aligns with theory pretty well, but are the shifts in RT significant? Need to reference the actual paper.
  • Prediction 6: It should be possible to construct single-trial state-space enural trajectories and use them to directly confirm that across-trial firing-rate variability decrease through time. They recorded a bunch of neurons simultaneously and plotted the neural activity trajectories using Gaussian Process Factor Analysis (GPFA) with dimensionality reduction to visualize. The plots look fine, as in the trajectories run around and converge into different subspaces at different stages of the task. But this can also be due to dimension selection. Need to check the GPFA technique.
  • Prediction 7: Farther and faster along loop reduces RT. This prediction tries to answer how preparatory state at the time of go cue influence subsequent movement. They hypotehsize that this prep state serves as the initial state of a subsequent dynamical system - thus some region of the brain appear to moniotr and wait for this to be true before "pulling the trigger" to initiate movement. When th trigger is pulled, the further along the loop, the lower the RT. They tested this with offline analysis on a trial-by-trial basis, how far along the loop the prep state was, in relation with that trial's RT. Of course, this would neccessitate the use of the GPFA plotting of the neural trajectories in finding the actual loop.

To really test these predictions however, we really need to monitor the PMd during brain-control tasks - does the Fano factor analysis still hold? Does the RT still vary with the delay period in brain-control? What else do these prep activities tune for? Does the location of the optimal subspace change following learning (Byron Yu's null hypothesis papers)?

Implications to BMI if this hypothesis turns out to be a good framework: How long does it take for the neural trajectory to change from baseline to the optimal subspace? Can we make decoders detect/classify these spaces? Shenoy suggests 200ms as measured with single-trial neural trajectories as the transition period needed for transition between baseline to the optimal subspace, thus decoding during this period should be avoided (will have to read the paper to evaluate this).

References to Read

  • Churchland, M. M., Afshar, A., & Shenoy, K. V. (2006a). A central source of movement variability. Neuron, 52, 1085–1096
  • Churchland, M. M., Cunningham, J. P., Kaufman, M. T.,Ryu, S. I., & Shenoy, K. V. (2010a). Cortical preparatory activity: Representation of movement or first cog in a dynamical machine? Neuron, 68, 387–400
  • Churchland, M. M., Kaufman, M. T., Cunningham, J. P., Ryu, S. I., & Shenoy, K. V. (2010b). Some basic features of the neural response in motor and premotor cortex. In Program No. 382.2. Neuroscience meeting planner. San Diego, CA: Society for Neuroscience Online
  • Churchland, M. M., Santhanam, G., & Shenoy, K. V. (2006b). Preparatory activity in premotor and motor cortex reflects the speed of the upcoming reach. Journal of Neurophysiology, 96, 3130–3146
  • Churchland, M. M., & Shenoy, K. V. (2007a). Delay of movement caused by disruption of cortical preparatory activity. Journal of Neurophysiology, 97, 348–359.
  • Churchland, M. M., Yu, B. M., Cunningham, J. P., Sugrue, L. P., Cohen, M. R., Corrado, G. S., et al. (2010c) Stimulus onset quenches neural variability: A widespread cortical phenomenon. Nature Neuroscience, 13, 369–378
  • Churchland, M. M., Yu, B. M., Ryu, S. I., Santhanam, G., & Shenoy, K. V. (2006c). Neural variability in premotor cortex provides a signature of motor preparation. The Journal of Neuroscience, 26, 3697–3712
  • Messier, J., & Kalaska, J. F. (2000). Covariation of primate dorsal premotor cell activity with direction and amplitude during a memorized-delay reaching task. Journal of Neurophysiology, 84, 152–165.
  • Santhanam, G., Ryu, S. I., Yu, B. M., Afshar, A., & Shenoy, K. V. (2006). A high-performance brain-computer interface. Nature, 442, 195–198.
  • Santhanam, G., Sahani, M., Ryu, S. I., & Shenoy, K. V. (2004). In: An extensible infrastructure for fully automated spike sorting during online experiments (pp. 4380–4384). Proceedings of the 26th annual international conference of IEEE EMBS, San Francisco, CA.
  • Yu, B. M., Afshar, A., Santhanam, G., Ryu, S. I., Shenoy, K. V., & Sahani, M. (2006). Extracting dynamical structure embedded in neural activity. In Y. Weiss, B. Schölkopf & J. Platt (Eds.), Advances in neural information processing systems, (Vol. 18, pp. 1545–1552). Cambridge, MA: MIT Press
  • Yu, B. M., Cunningham, J. P., Santhanam, G., Ryu, S. I., Shenoy, K. V., & Sahani, M. (2009). Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population activity. Journal of Neurophysiology, 102, 614–635.

Jul 27, 2015 - Francis 2015 - Toward an Autonomous Brain Machine Interface: Integrating Sensorimotor Reward Modulation and Reinforcement Learning

Toward an Autonomous Brain Machine Interface: Integrating Sensorimotor Reward Modulation and Reinforcement Learning

This paper seeks to demonstrate that single units/multiunits and local field potentials in M1 are modulated by reward expectaiton during reaching movements and that this modulation is present even while subjects passively viewed cursor motions that are predictive of either reward or nonreward. They also tried to classify whether a trial is rewarding vs. nonrewarding based on the neural data, on a moment-to-moment basis.

Experiments:
Monkey arm wears an exoskeletal robot through all tasks. Use a single target during all trials

  1. Manual Task. The monkey was required to hold at the center target for 325ms before teh peripheral target appeared, then hold an additional 300ms before the center target disappearted, the go cue appearing for moving to the peripheral target, and hold for 325ms before receiving a liquid reward or no reward.

    The paper does not manually control for monkey arm trajectory, speed, etc. But in offline analysis they selected kinematically indistinguishable trajectories between the two reward contingencies to isolate the effect of reward.

  2. OT1. Monkey arm is fixed by the robot and cannot move. The monkey would fixate at the center target and observe the center target change color; red represents a rewarding trial, blue represents a nonrewarding trial. The cursor would then move toward the peripheral target at a constant speed with movement toward a red target resulitng in a reward, once the cursor arrived inside the target. For blue targets, reward was withheld. The monkey had to view the target plane to start a trial and maintain visual gaze until the color cue was given.

  3. OT2. The monkey observed the cursor moving away or twoard a target with movement toward a the target resulting in a reward, once the cursor arrived inside the target. No reward otherwise.

Results:

Some neurons fired higher throughout reward trials, while others behaved the opposite way. Both contralateral and ipsilateral M1 contains units that simultaneously correlat with reward and kinematics during reaching and observation...but how to identify which is reward-modulated in a less controlled BMI experiment?

Classifiers were trained using PCA components to classify reward vs. nonreward trials. The average classifer performance (over trial length) is around 70%, pretty good.

Simulated RL-BMI using TD-learning, NN for action-reward mappings. Works well, not too surprising, but not too much value either with a two-state classification task.

References to Read

  • Dura-Bernal S, Chadderdon GL, Neymotin XZ, Przekwas A, Francis JT, Lytton WW (2014) IEEE Signal Processing in Medicine and Biology Symposium (SPMB'13) Virtual musculoskeletal arm and robotic arm driven by a biomimetic model of sensorimotor cortex with reinforcement learning
  • Hosp JA, Pekanovic A, Rioult-Pedotti MS, Luft AR (2011) Dopaminergic projections from midbrain to primary motor cortex mediate motor skill learning. J Neurosci 31:2481–2487.
  • Legenstein R, Chase SM, Schwartz AB, Maass W (2010) A reward-modulated hebbian learning rule can explain experimentally observed network reorganization in a brain control task. J Neurosci 30:8400–8410.
  • Todorov E, Jordan MI (2002) Optimal feedback control as a theory of motor coordination. Nat Neurosci 5:1226–1235.

Jul 27, 2015 - Ganguly, Carmena 2011: Reversible large-scale modification of cortical networks during neuroprosthetic control

Reversible large-scale modification of cortical networks during neuroprosthetic control

Monitored ensembles of neurons that were either casually linked to BMI control or indirectly invovled, found that proficient neuroprosthetic control is associated with large-scale modifications to the cortical network. [...] Specifically, there were changes in the preferred direction of both direct and indirect neurons. Notably, with learning, there was a relative decrease in the net modulation of indirect neural activity in comparison with direct activity. [...] Thus, the process of learning BMI control is associated with differential modification of neural populations based on their specific relation to movement control.

Results pretty intuitive, but analysis methods are useful to know.

Experiment
Recorded ensembles of M1 neurons while only a subset were assigned to have a causal role during control as direct neurons. The monkyes performed center-out reaching movements using a exoskeleton that constrained movements to the horizontal plane.

Used a linear decoder for motor commands. Decoder was held constant after initial training. Stability of recordings across days assessed by measuring the stationarity of spike waveforms and the interspike interval distribution, as well as directional modulation during manual control sessions.

Results

  1. Modification of preferred directions
    There is a relative remapping of the preferred directions for all neurons without any substantial systematic rotational shifts for each neural opulation.

  2. Differential modification of modulation depths
    Direct and indirect neurons had different tuning depth. This differential modulation was specifically present during proficient neuroprosthetic control and not during the initial learning period. At population level, no significant systematic differences in the mean firing rate between manual control and brain control for either populations.

  3. Reversibility of modifications
    Modulation depths for the manul-BMI-manul experimental scheme changes, while being similar during the manual sessions. This suggests reversibility in the large-scale modifications dependent on cortical state (manual vs. BMI_

  4. Stability
    Indirect neurons maintained a relatively fixed neuron-behavior relationship during brain-control through sessions across days. May suggest an active role to brain-control? Probably related to the irrotational shift of tuning properties.

Used bootstrap resampling methods for testing significance in directional tuning and mean modulation depth changes.

Jul 27, 2015 - Sanchez 2013 - Towards autonomous neuroprosthetic control using Hebbian reinforcement learing

Towards autonomous neuroprosthetic control using Hebbian reinforcement learing

Theory paper toward which the Sanchez guys have been working toward. The RLBMI decoder architecture is now presented as an actor-critic method of RL, where the neuroprosthetic controller (actor) learned the neural state-to-action mapping based on the user's evaluative feedback. The role of teh critic is to translate the user's feedback into an explicity training signal taht can be used by the factor for adaptation. The aim is to learn and automatically produce a stable neural to motor mapping and respond to pertubations by readjusting its parameters in order to maintian the performance, using a binary evaluative feedback. The metrics for the controller's performances used were:

  • speed of convergence.
  • generalization
  • accuracy and recovery from perturbation

Reinforcement Learning, control architecture, learning algorithm - Instead of using Q-learning with a NN to act as the action-reward mapping, a Hebbian reinforcement learning framework is used. I'm confused as to why this group decided to switch the RL-framework. I am not very familiar with all the ML techniques yet, but according to scholarpedia, we need to distinguish between the machine learning/TD-learning [...] and the neuronal perspective. The machine learning perspective deals with states, values, and actions, etc., whereas the neuronal perspectives tries to obtain neuronal signals related to reward-expectation or prediction-error. Regardless, the basic strategy is the same: initialize a random decoder, based on some feedback on whether the decoded action is correct or not, update the NN weights. With more trials, both the decoder and the user adapt toward each other.

Tasks

  • The sequential mode is where the controller has to perfrom a sequence of actions over mulitple steps in order to accomplish the goal of a trial. This means that there can be multiple sequences of actions that could eventually result in the end goal, and thus mulitple solutions that could be learned by the HRL. Reward/feedback is received by the controller after every action. The metric for reward in movement given in this paper is whether a specific time-step the action moves the manipulandum closer towards the targe.

    But I can imagine this approach suffering from the problem common to some greedy algorithms - sometimes the greedy approach might not give the optimal solution. Also possible that this intention estimation is too constraining - the animal might not actually want to move in the most direct trajectory possible, but this may not be important in terms of BMI for patients, efficiency is probably the most important factor there.

  • The episodic mode where the controller can select only one action in each trial, and thus that action must achieve the goal for a successful trial. The episoic learning mode is suitable for classification tasks. This is what Sanchez 2011 used for their center-out task. Since achieving the trial goals requires only a signel action here, the HRL can be encouraged to learn more quickly in the episodic task compared to the sequential task. Experience replay can be used to seed learning in this paradigm.

    Since the sequential mode is a sequence of actions, using episodic mode to intitialize the decoders for sequential mode can probably speed up the learning process.

Network was first tested in simulation experiments, developed for a reaching task in 2D grid space, useful for validating the actual computational model. Three components:

  1. Synthetic neural data generator, with each neuron tuned to one action (moving left, right, up, and down) or none. Goal of the user is to reach targets in 2D space. Input feature is firing rates in 100ms bins.
  2. Neuroprosthetic controller (actor).
  3. Behavioral paradigm - gives binary feedback to the controller.

Experiments were done in common-marmosets, utilizing a Go/No-Go motor task to move a robot arm to spatial targets to either the left or right side of the monkey. Catch Trial - traning technique to ensure the monkeys understood the necessity of the robot movements where the robot moved in the opposite direction commanded by the monkey and thus the monkey received no reward.

Results

Simulation results showed the the weights converging to achieve high success rates for both tasks, as expected. Robust against neuron loss and tuning changes.

The monkey neural data were used offline to map to four different directions. Complete BS, the analysis makes no sense since the monkey was simply trained to perform Go/No-go arm movement. This is bad.

References to Read

  • Francis J T and Song W 2011 Neuroplasticity of the sensorimotor cortex during learning Neural Plast. 2011 310737
  • Heliot R et al 2010 Learning in closed-loop brain-machine interfaces: modeling and experimental validation IEEE Trans. Syst. Man Cybern. B 40 1387–97
  • Prasad A et al 2012 Comprehensive characterization and failure modes of tungsten microwire arrays in chronic neural implants J. Neural Eng. 9 056015
  • Polikov V S, Tresco P A and Reichert W M 2005 Response of brain tissue to chronically implanted neural electrodes J. Neurosci. Methods 148 1–18
  • Mahmoudi B and Sanchez J C 2011 A symbiotic brain-machine interface through value-based decision making PLoS One 6 e14760
  • Schultz W, Tremblay L and Hollerman J R 1998 Reward prediction in primate basal ganglia and frontal cortex Neuropharmacology 37 421–9
  • Izawa E I, Aoki N and Matsushima T 2005 Neural correlates of the proximity and quantity of anticipated food rewards in the ventral striatum of domestic chicks Eur. J. Neurosci. 22 1502–12
  • Wawrzyński P 2009 Real-time reinforcement learning by sequential actor–critics and experience replay Neural Netw. 22 1484–97
  • Schultz W 2000 Multiple reward signals in the brain Nature Rev. Neurosci. 1 199–207
  • Prins N et al 2013 Feature extraction and unsupervised classification of neural population reward signals for reinforcement based BMI Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. (EMBC) (Osaka, Japan, 2013) at press
  • Ribas-Fernandes J J F et al 2011 A neural signature of hierarchical reinforcement learning Neuron 71 370–9

Jul 27, 2015 - Sanchez 2014 - Using Reinforcement Learning to Provide Stable Brain-Machine Interface Control Despite Neural Input Reorganization

Using Reinforcement Learning to Provide Stable Brain-Machine Interface Control Despite Neural Input Reorganization

Simple binary BMI-control demonstration with the RLBMI framework. Finally showed some neural data, and showed that the RLBMI approach is robust against neuron loss. They are calling their RL neural decoder in Sanchez 2013 "Associative reinforcement learning that combined elements of supervised learning with reinforcment based optimization".

Since the experiment control is so simplistic, cannot derive much value about this decoder's performance compared to other closed-loop based methods. But

The RLBMI maintained high performance when applied in a contiguous fashion across experiment sessions spanning up to 17 days. The decoder weights started from random initial conditiosn during the first session, and during subsequent sessions the system was intialized from wegiths learning in the previous session, and was then allowed to adapt as usual without any new initializations or interventions. [...] Half of the neural input signals were lost between day 9 and 16. However, the system was able to quickly adapt and this loss resulted in only a slight dip in performance.

References to Read

  • Ludwig KA, Miriani RM, Langhals NB, Joseph MD, Anderson DJ, et al. (2009) Using a common average reference to improve cortical neuron recordings from microelectrode arrays. J Neurophysiol 101: 1679–1689. doi: 10.1152/jn.90989.2008
  • Schultz W (2000) Multiple reward signals in the brain. Nat Rev Neurosci 1: 199–207. doi: 10.1038/35044563