Wiener filter is used to produce an estimate of a target random process by LTI filtering of an observed noisy process, assuming known stationary siganl and noise spectra. This is done by minimizing the mean square error (MSE) between the estimated random process and the desired process.
Wiener filter is one of the first decoding algorithms used with success in BMI, used by Wessberg in 2000.
Derivations of Wiener filter are all over the place.
Many papers and classes refer to Haykin's Adaptive Filter Theory, a terrible book, in terms of setting the motivation, presenting intuition, and the horrible typesetting.
MIT's 6.011 notes on Wiener Filter has a good treatment of DT noncausal Wiener filter derivation, and includes some practical examples. Its treatment on Causual Wiener Filter, which is probably the most important regarding real-time processing, is mathematical and really obscures the intuition -- we reach this equation from completing the square and using Parseval's theorem, without mentioning the consquences in the time domain. My notes
Max Kamenetsky's EE264 notes gives the best derivation and numerical examples of the subject for practical application, even though only DT is considered. It might get taken down, so extra copy [here](/reading_list/assets/wienerFilter.pdf}. Introducing the causual form of Wiener filter using input whitening method developed by Shannon and Bode fits into the general framework of Wiener filter nicely.
Wiener Filter Derivation Summary
Both the input (observation of the target process) and output (estimate of the target process , after filtering) are random processes.
Derive expression for cross-correlation between input and output.
Derive expression for the MSE between the output and the target process.
Minimize expression and obtain the transfer function in terms of and , where are Z or Laplace transforms of the cross-correlations.
This looks strangely similar to a linear regression. Not surprisingly, it is known to statisticians as the multivariate linear regression.
Kalman Filter
In Wiener Filter, it is assumed that both the input and target processes are WSS. If not, one may assume local stationarity and conduct the filtering. However, nearly two decades after Wiener's work, Rudolf Kalman developed the Kalman filter, which is the optimu mean square linear filter for nonstationary processes (evolving under a certain state space model) AND stationary ones (converging in steady state to Wiener filter).
Two good sources on Kalman Filter:
Dan Simon's article describes the intuition and practical implementation of KF, specifically in an embedded control context. The KF should be coupled with a controller when the sensor reading is unreliable -- it filters out the sensor noise. My notes
Update (12/9/2021): New best Kalman Filter derivation, by Thacker and Lacey. Full memo, Different link.
The general framework of the Kalman filter is (using notation from Welch and Bishop):
The first equation is the state model. denotes the state of a system, this could be the velocity or position of a vehicle, or the desired velocity or position of a cursor in BMI experiment. is the transition matrix -- describes how the state will transition depending on past states. is the input to the system. describes how the inputs are coupled into the state. is the process noise.
The second equation is the observation model - it describes what kind of measurements we can observe from a system with some state. is the measurement, it could be the speedometer reading of the car (velocity would be the state in this case), or it could be neural firing rates in BMI (kinematics would be the state). is the observation matrix (tuning model in BMI applications). is the measurement or sensor noise.
One caveat is that both and are assumed to be Gaussian with zero mean.
The central insight with Kalman filter is that, depending on whether the process noise or sensor noise dominates, we weigh the state model and observation model differently in their contributions to compute the new underlying state, resulting in a posteriori state estimate \[\hat{x_k}=\hat{x_k}^- + K_k(z_k-H\hat{x}_k^-)\] Note the subscript denotes at each discrete time step.
The first term is the contribution from the state model - it is the a priori state estimate based on the state model only. The second term is the contribution from the observation model - given the a posteriori state estimate.
Note a priori denotes a state estimate at a given time state to use only knowledge of the process prior to step . a posteriori denotes a state estimate that also uses the measurement at step , .