The field of DSP is pretty broad, but I found knowing/remembering the following basic concepts are both useful in practice, and in interviews (giving or receiving). Understanding the concepts below can also make debugging unexpected results much easier, and I’ve had to come back to these concepts many times.

FIR vs. IIR filters

  1. FIR filters are easy to implement, always causal.
  2. IIR filters are more concise to implement, but it requires feedback. Can implement higher order IIR filters with biquad structures (SOS-implementations) for stability (not needed for FIR).
  3. FIR filters are always linear phase.
  4. IIR filters can try to be linear phase within the passband.

Phase Delay

  1. Linear phase is when the phase-shift introduced to different frequencies are linearly increasing.
  2. This means higher-frequency signals are delayed more in phase.
  3. The result is that all frequencies are delayed the same in time, despite differently in phase.
  4. Intuitively, linear phase means the filtered signal will have similar shape as before.

Spectrum Leakage

  1. This phenomenon stems from the assumption of DFT, that the input signal is ONE PERIOD of a PERIODIC SIGNAL.
  2. If the part of the signal that is windowed is not an integer multiple of the periodic signal, then the resulting frequency spectrum may not have frequency bins corresponding exactly to the frequency of interest – the resulting DFT is not sharp, and
  3. The frequency of interest is “spread” out into the surrounding frequency bins. producing a “leakage”.
  4. This can be solved with “zero-padding”, adding zeros to the end of the time-domain signal. This approximates the effect of taking the CTFT of a windowed periodic signal.
  5. The result is that the DFT of this zero-padded signal looks like an interpolation of the previously “spread-out” DFT of the non- zero-padded signal. For signal with a single tone, this would recover the correct peak frequency.
  6. Zero-padding does not improve frequency resolution (interpolation doesn’t increase resolution). To improve resolution we need longer duration recording of the signal.

FFT^2 vs. PSD of a signal

  1. PSD usually applies to a stochastic process (usually stationary). For non-stationary processes such as speech, STFT should be used.
  2. Wiener-Khinchin Theorem: For stationary stochastic process, PSD is defined as the Fourier Transform of the Autocorrelation Sequence of the signal. From this we get the amount of power per frequency bin.
  3. PSD can be estimated by taking the magnitude squared of the FT of the signal – this is called the Periodogram.
    • This is not a consistent estimator as it does not tend to a limit with increasing sample size, as the individual values are exponentially distributed.
  4. An alternative is to get a truncated version of the auto covariance signal, then take the Fourier Transform.
    • This leads to spectral window of some width, has lower sampling variability and with minor assumptions IS a consistent estimator.
  5. The key weakness of the periodogram is that it takes only one “realization” of a stochastic process and therefore has high variability..
    • The PSD can be thought of as a random variable – you need to average over many outcomes to get a decent estimate
    • In fact, another definiton of PSD is “an average of the magnitude squared of the Fourier Transform”.
    • Very CHEAP computationally, but has high variance.
  6. Consequently, averaging multiple peridograms can approach the PSD.
  7. The unit of PSD is /Hz. Integrating PSD over a delta of frequency bin gives the energy (Watts).
  8. For a mean-zero signal, integral of the entire PSD is equal to the variance.
  9. Key Duality: a quadratic quantity in the frequency domain (energy spectral density in determinstic case, power spectral density in the stochastic case) corresponds to a correlation (which is essentially a convolution) in the time domain.
  10. Multi-taper approach Is a fancy way of averaging periodograms. It averages a pre-determined number of periodograms obtained by different window (taper) on the same signal. The windows selected have two key properties:
    • The windows are orthogonal, this means that the periodgrams are uncorrelated, so averaging multiple peridograms give an estimate with lower variance than using just a single taper.
    • The windows have the best possible concentration in the frequency domain for a fixed signal length to minimize spectral leakage.
    • Probably the best estimator for stationary time-series that’s not super long..
  11. Informative resource with a good discussion on why so many PSD calculations: 1

Savitzky-Golay filter

I actually haven’t encountered this in grad school, probably because I dealt mostly with spikes, and didn’t need the smoothing abilities that SGF is especially suited for. Everyone in my job seems to love it for some handwavy reason so I wanted to demystify it. A good overall discussion of the pros/cons beyond the basic formulation is on stackoverflow. Some highlights:

  1. Let’s get it straight, SGF is not magical, adaptive or anything. It’s an FIR filter designed with local polynomial approximation.
  2. If the noise spectrum significantly overlaps with the signal spectrum, a more careful approach is required, and brute-force attenuation will not work well because either you leave too much noise (by choosing the cut-off frequency too high) or you distort the desired signal too much. In this case Savitzky-Golay (S-G) filters may be a good choice.
  3. SGF tutorial:
    • Can be thought of a FIR low-pass filter, with flat passband, and moderate attenutation in the stop band.
    • Good for preserving peak shapes and height, bad for rejecting noise.
    • Use when you want to smooth, but the noise and signal share similar frequency.
  4. SGF with smaller window length means less attenuation in higher frequencies, this means it distorts the peaks less, but noise that’s slightly out of the band are not filtered as sharply.
  5. Longer window attenuates more, but keeps the amplitude constant for a basic sinusoid.
  6. Examples/Tests:
    • Clean signal of 1Hz, corrupt with 10Hz, and 40Hz (yes the noise are out of band). Len-31 sgf would perform better than len-101 because it distorts the peaks less, and the resulting waveform is almost identical to the base 1Hz waveform.
    • Clean signal of 1Hz, corrupt with 1.2Hz and 40Hz. Len-31 sgf results in a smoother version of the original signal. Len-101 sgf results in a 1Hz signal with smaller amplitude. In this case, Len-101 correctly filtered out the high frequency components better.
  7. In practice, I actually found SGF to be hard to use/tune. Much better to do actual adaptive filtering (e.g. H-infinity, LMS) if a noise measurement is available.

Utilizing ergodicity to increas SNR

  • In diffusion correlation spectroscopy, for example, SNR is proportionally to photon rate (more signal per time), and integration time (for more accurate auto-correlation calculation).
  • However, more than one measurement can be taken at a time, increasing the number of measurements spatially can then achieve similar SNR increase as a greater photon rates and/or integration time.
  • This is based on the assumption of ergodicity – i.e. a random process averaged over time/space has the same mean and variance.

Time Series ARIMA models and Signal Processing relations

  1. PSU stats510 is a great reference for the basics.
  2. Moving Average (MA) are like FIR filters
  3. Autogressive (AR) models are like IIR filters.
  4. Every autoregressive-integrated-moving-average (ARIMA) model can be converted to an infinte order MA model – similar to how IIR filters can be approximated by FIR filters!
  5. When decomposing (trend + seasonal + random) additive models, trend is extracted by sliding window centered moving averages (similar to low-pass filtering), with a window length equal to the seasonal span.
  6. Smoothing: LOWESS/LOESS are equivalent to Savitzky-Golay – i.e. fitting regressions or polynomials locally to each point, may include weighting function applied to different points.
  7. We shouldn’t blindly apply exponential smoothing because the underlying proces smight not be well modeled by an ARIMA(0,1,1). The reason is that Exponential Moving Average (EMA) is equivalent to a first-order moving average (MA1) model.
  8. Diagnostics:
    • AR models should show decaying autocorrelation function (ACF) and cutoff in partial autocorrelation function (PACF).
    • MA models should show cutoff ACF, and decaying PACF.
    • ACF and PACF both showing spike-cutoff patterns suggests ARIMA model.
    • Seasonal trends show periodic cutoff ACF or PACF
  9. How to fit ARIMA model to linear model residuals:
    • Cochrane-Orcutt (example) does pre-whitening, then apply OLS to get the beta coefficients and SE. R-squared after this procedure is ususually less than that of LS
    • ARIMAX: Fitting regression with ARIMA error structure, can be done with Cochrane-Orcutt, but better with maximum-likelihood estimation to joint estimate both the regression model and error ARIMA model.
    • Can treat this as a state-space model – the ARIMA error describes the state transition, the exogeneous regressors describe the observation model.
    • Some connection to Kalman filtering after some manipulations.

Yule-Walker…because I can never remember this name..

  1. Yule-Walker equations is used to fit AR model of certain orders.
  2. Order structure is determined by maximum-likelihood or AIC of the fit to the residuals.
  3. Wiener-Hopf equation is a generalization of the Yule-Walker equations.

DSP-implementation gotchas:

  1. SOS/biquad implementation are better than transfer function implementations in practice due to robustness in overflow and quantization error. For example, see scipy issue.
  2. When implementing cascade of filters in DSP, it’s important to think about and set the initial conditions for the 2nd stages and on. These filter stages should have initial condition set to the step response steady-state of the previous filter, corresponding to that state’s initial state, and so on.