Skip to content

Model Validation

ftheunissen edited this page Nov 18, 2020 · 2 revisions

Model Validation.

1. Convolution Theorem

The convolution of a time series x(t) with a filter h(t) to yield y(t) is given by: y(t)=xh=∫_0^T▒x(t-τ)h(τ)dτ Here h is causal and finite (up to time T). In discrete form this operation is written as: y(i)=∑_(j=1)^N▒〖x(i-j)h(j)〗 In Matlab convolution is done using the command conv or filter. Both conv and filter use the same time-domain algorithm but filter is a more general function since it can also be used for IIR filtering (infinite impulse response). IIR filters have both a feedforward (a convolution) and a feedback term. The convolution theorem states that the Fourier Transform (FT) of a convolution is the product of the fourier transform of each component in the operation (x and h in this case): FT[y]=FT[xh]=FT[x]∙FT[h] Y(f)=X(f)∙H(f) Where we use H(f) to denote the Fourier Transform of h(t). Note that the Fourier Transforms of x, h and of their convolution y are all complex numbers.
The convolution theorem does not play a role in our signal to noise analysis but it is useful to present it here to compare it and contrast it to the correlation theorem.

2. The Correlation Theorem

The correlation between two time series x(t) and y(t) is defined as the average of the cross product of x and y for different delays : C_xy (τ)=1/T ∫_0^T▒〖x^* (t)y(t+τ)dt〗 Here we are averaging over the entire duration of the signals x and y: T. If the signal length’s are infinite the correlation can be written mathematically as : C_xy (τ)=lim┬(T→∞)⁡〖1/T ∫_0^T▒〖x^* (t)y(t+τ)dt〗〗 In discrete form the correlation is written as: C_xy (k)=1/(N-k) ∑_(i-1)^(N-k)▒〖x^* (i)y(i+k)〗 Here k is the delay and N is the length of the time series x and y in sample points. Note that the cross-correlation is sometimes written with R instead of C (and C reserved for cross-covariance). Note that the order of the indices xy is important. As written, a positive delay,  or k, corresponds to y coming after x. The matlab function for correlation is xcorr (and xcov for cross covariance). Note that in xcorr the convention is reverse for the order of x and y. Also the default for xcorr is to perform a cross-correlation without normalization (without dividing by N-k). This default is mostly useless and you will want to specify ‘unbiased’ as an option.

The correlation theorem says that: S_xy=FT[C_xy ]=〈X^* (f)Y(f)〉 Sxy is called the cross-spectral density. The <> stand for the expectation value (or average). This average is obtained by dividing the data into segments. The optimal length of the segment depends both on the correlation time of the system (the memory of the system) and on the amount of data that you have since you will want to average over a minimum number of segments to obtain good estimates.
When x=y, the cross-correlation function becomes the auto-correlation function and the cross-spectral density becomes the power-spectral-density. S_xx=FT[C_xx ]=〈X^* (f)X(f)〉=〈|X(f)|^2 〉

Add notes on where to find this code in our python toolboxes.

3. Signal and Noise

If the noise is additive, the noise corrupted response ,R, can be written as a sum of signal, S, and noise, N : R(t)=S(t)+N(t) By averaging responses obtained to the same stimulus, one can obtain an estimate of the underlying signal: R ̅(t)≅S(t) In single unit recordings R ̅ is the PSTH, in BOLD or ECOG R ̅ is the average response (repeats.mean(0) is Gallant Code): The noise can then be estimated by subtraction: R(t)-(R(t) ) ̅≅N(t) The signal to noise ratio as a function of frequency can be obtained by calculating the power-spectral densities of S and R: SNR(f)=〈‖S(f)‖^2 〉/〈‖N(f)‖^2 〉

To obtain a single quantifier for the overall signal to noise ratio, one can calculate the channel capacity also called the information upper bound: I=∫_0^∞▒〖〖log〗_2 (1+〈S^2 〉/〈N^2 〉 )df〗 bits/s where we have dropped the frequency for simplicity. The channel capacity gives the maximum information that could be transferred to this channel given that the signal and noise have Gaussian distributions. If the signal is not Gaussian, the actual maximum information that can be transmitted will be smaller.

Alternatively, one can estimate the overall signal power or noise by summing or averaging across time points. For example, N_tot=∑_t▒〖N(t)^2 〗
Parseval’s theorem tells us that summing in time is equivalent to summing in frequency. Therefore: N_tot=∑_f▒‖N(f)‖^2

In the following sections, I will use the notation |N|^2 and |S|^2 to designate noise power or signal power either at a particular frequency, or overall, meaning averaging (or summing but using the same number of points) in time or in frequency. The equations are identical. Similarly, the equations for the coherence at a particular frequency are also valid for the square of the correlation coefficient.

4. The coherency and the coherence.

The coherence is used to normalize the cross-spectrum of two time series. The coherency is the cross-spectrum normalized by the squareroot of the psd. The coherency is the equivalent of the correlation coefficient for time series (r). The coherency is a function of frequency and is a complex number. Its inverse Fourier Transform can be taken to obtain a normalized cross-correlation function in the time domain. The coherency is: γ_(x,y=S_xy/√(S_xx S_yy )) The coherence is the absolute value square of the coherency. The coherence is therefore a real function (of frequency) bounded between 0 and 1. The coherence can be thought of as the square of the Pearson correlation coefficient, r^2 , for time series. γ^2=|S_xy |^2/(S_xx S_yy )

5. The coherence and SNR.

The coherence between a single trial response (R=S+N) and the signal (S) is function of signal to noise. Using the fact that the noise is uncorrelated to the signal, it can easily be shown that: γ_(R,S)^2=〈S^2 〉/(〈S^2 〉+〈N^2 〉 ) Equivalently: 〈S^2 〉/〈N^2 〉 =γ^2/(1-γ^2 ) The information capacity can therefore be rewritten as: I_Capacity=∫_0^∞▒〖〖-log〗2 (1-γ(R,S)^2 )df〗 More generally, the coherence measures the degree of the linear relationship between two time-series. If time series y, can be obtained by convolving time series x with a filter and adding Gaussian noise the actual mutual information between x and y is also given by: I_Linear=∫_0^∞▒〖〖-log〗2 (1-γ(x,y)^2 )df〗

6. Bias-free estimations of signal and noise.

We noted that R ̅≅S and N_est=R-R ̅≅N. If we are using this averaging technique to estimate the signal and noise, do we obtain a bias free estimate of the SNR? To answer this question we can calculate the power-spectral densities of our estimates of signal and noise and assess whether we are under or over estimating these quantities. In the following section, I have stopped using <> to show averaging but all quantities shown are expectation values. M is the number of trials (repetitions) used to estimate the mean response. The noise across trials is uncorrelated. The power of the average response is then: |R ̅ |^2=|S|^2+|N|^2⁄M We are therefore overestimating signal power by |N|^2⁄M. The power of our noise estimate is: |N_est |^2=|S-S|^2+|N(1-1/M)|^2+〖(M-1)/M^2 |N|〗^2 The |N(1-1/M)|^2term comes from the noise in the average that is the same than the noise in the trial. The 〖(M-1)/M^2 |N|〗^2 term comes from the other M-1 noise in the average that are uncorrelated with each other and with the noise in the current trial. |N_est |^2=|N|^2 [(1-1/M)^2+(M-1)/M^2 ] |N_est |^2=|N|^2 [(M-1)/M] We are therefore under estimating the noise by M-1/M. To obtain bias free estimates of the SNR one could therefore correct the noise estimate by the factor M/(M-1) and then use the bias free noise estimate to correct the signal estimate by removing |N|^2⁄M. In Hsu, Borst and Theunissen (2004), we show how to obtain bias free estimates of these signal to noise ratio using the coherence function. Our approach uses the same algebra as in this derivation but it also allows one to obtain bias-free estimates of model predictions.

7. The explainable variance as defined in the Gallant Lab.

The gallant lab code uses the term residual for our noise. Res(t)≅N(t). Res(t) is obtained by subtracting the mean response from single responses (called repeats). The variance of Res(t) is therefore the biased noise power: |Res(t)|^2=|N|^2 ((M-1)/M) Because the overall responses (S(t) + N(t)) are z-scored is the code (or at least should be): |S|^2+|N|^2=1 The uncorrected explainable variance (ev) is given by: ev=1-|Res(t)|^2 ev=|S|^2+|N|^2-|N|^2 ((M-1)/M) ev=|S|^2+|N|^2/M Notice that: 1-ev=|Res(t)|^2=|N|^2 ((M-1)/M)

Thus, to obtain the unbiased signal power (or signal variance), one can calculate a corrected explainable variance, ev_c: ev_c=ev-(1-ev)/(M-1) ev_c=|S|^2+|N|^2/M-|N|^2 ((M-1)/M)(1/(M-1)) ev_c=|S|^2

The explainable variance is the signal power.

8. Estimating SNR directly from coherence or correlation coefficients.

The unbiased estimates of signal power and noise power describe above can be used to calculate the coherence and for model validation compare the predicted signal power to the actual signal power. These calculations can also be performed by splitting the data into two halves as shown in Hsu, Borst and Theunissen (2004). More specifically the coherence between signal+noise and signal can be obtained with: 1/(γ_(S,S+N)^2 )-1=M/2 (-1+√(1/(γ_(M/2)^2 ))) , Where (γ_M^2)/2 is the coherence obtained between half of the trials (or repeats) and the other half. When one has 2 trials: γ_(S,S+N)^2=√(γ_(M/2)^2 ) Similarly, when comparing to the prediction of the model, P , we find: (γ_(P,S+N)^2)/(γ_(P,R ̅)^2 )=(γ_(S,S+N)^2)/(γ_(S,R ̅)^2 )=(1+√(1/(γ_(M/2)^2 )))/(M(-1+√(1/(γ_(M/2)^2 )))+2)

In model validation, we are interested in comparing γ_(P,S+N)^2 to the ceiling value γ_(S,S+N)^2 or: (γ_(P,S+N)^2 )/(γ_(S,S+N)^2 )

These same equations apply to correlation coefficients.

9. The Jackknife resampling technique.

The Jackknife is a statistical technique that enables one to generate bias free estimates of power, coherence, etc by extrapolating the results to infinite data size. Moreover, as for other resampling techniques, the Jackknife gives you an estimate of the error in your estimate. If is the estimate of our parameter (e.g. coherence) for all the data of size n=N and θ_i ̂ is the estimate obtained after deleting the ith trial, then one can obtain a pseudo-value pi given by a 1/n extrapolation to infinite data size.

The pseudo-value can be shown to be: p_i=Nθ-(N-1)θ_i ̂ The average value of the pseudo-values yields the bias-corrected estimator and the standard error of this mean yields an estimate of the standard error of your measurement. When the jackknife is used for coherence correction and error estimation it is often done after transformation. Thomson and Chave recommend using the inverse hyperbolic tangent.
The coherence estimator (compute_coherence_mean) in the tutorial (and in strflab) calculates the coherence using multi-taper methods that are bias corrected using the jackknife after data transformation.

Contents

General

Calendars and scheduling
Lab funds and purchases
Advising, Social Justice, Sexual Harassment, and Real World Shit * Support Resources

Dry lab

Getting connected to the lab network
Data storage and access
Computing
Working Remotely
Other Services

Wet lab

Animal Care

Husbandry, who to call, recordkeeping
Bird care links

Behavior

Pecking Test (NAF 125)
Field Station

Surgeries, Histology, Imaging

Protocols, "how to"s, techniques, and recipes
Instructions for individual pieces of equipment
Imaging

Electrophysiology

Instructions
Hardware, software, and techniques for ephys

Calcium imaging

* Ca imaging Notes

fMRI

Data Collection
Data Analysis

Theory

Modulations

STRFs

Other




Old pages:

Wetlab


Pages in progress:

Clone this wiki locally