-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Layering of WtdAv–WeightedAv–WeightedAverage #582
Comments
The starting point for Isoplot (and therefore SQUID 2.50) in terms of the generalised calculation of a weighted mean is via the (public) function WtdAv (which means you can invoke it in your Excel spreadsheet as an array function, if you know how... Ludwig wrote a little "toolbar-only function" to help users do this). Note that Squid3 does not start at this point when calculating a weighted mean, which is why I am documenting this. Plus, the function has 6 optional arguments, but Ludwig has only publicly documented the first five... and (unsurprisingly) it turns out that this is where some of our confusion arises! Function WtdAv is defined as
where:
This function actually doesn't do very much, except ensure that the range ValuesAndErrors has exactly 2 columns (and at least 2 rows), plus formalise the value of each of the optional arguments, before passing them to subroutine WeightedAv, which does a bit more of the preparatory work. |
Subroutine WeightedAv is invoked solely within WtdAv, so it is not really necessary to define the arguments beyond that call of the subroutine, and I will include the assigned values in the description. In Excel, WeightedAv has one option Boolean argument ( Subroutine WeightedAv is defined as
where
This subroutine does do a bit of work, especially on ValuesErrs. For the specified 2-column range, each row is assessed. If the Value is numeric and the Err is numeric, then the next check ensures that the Value is non-zero and the Err is non-zero and the row has not been independently identified by the user as a manual rejection (e.g. Strikethrough font in Excel; check-box in Squid3). Note also that if input errors were specified as percentage, they are converted to absolute values at this stage. If the row passes all these tests, it is added to the 2-column array InpDat, which holds only the "clean" ValuesErrs data (with absolute errors), which will be used for the real weighted-mean arithmetic. That is handled by subroutine WeightedAverage... which I will describe next. |
Subroutine WeightedAverage is invoked within WeightedAv, but also within WtdAvPlot-related functions, so I will define the arguments beyond that call of the subroutine, and I will include the assigned values in the description. But this is where the arithmetic actually happens. Subroutine WeightedAverage is defined as
where
In subroutine WeightedAv, the subroutine WeightedAverage is invoked as
which means OneSigOut = TRUE when SigmaLevelOut = 1, otherwise OneSigOut = FALSE. The next post will document the WeightedAverage code. |
Boolean value: NoEvar Integer values: Count, i, j, N0, Nn, nU Double-precision values: ExtSigma, ExtSigmaMean, ExtVar, ExtXbarSigma, IntMean, IntSigmaMean, IntVar, MSWD, PointError, Probability, q, Resid, Sums, SumWtdRatios, temp, t68, t95, Tolerance, Tot1sig, TotalError, TotVar, Trial, Weight, WtdAvg, WtdAvgErr, WtdR2, WtdResid Double-precision arrays: InverseVar, IvarY, tbx, yy String vector (4 elements): r
If MinProb = 0, Isoplot sets it using the value stored in Isoplot's Numeric Preferences (statistical). In practice, MinProb should be 0.05 unless otherwise indicated.
Some vector/array-sizing, and calculation of the inverse variance values, to weight the weighted mean:
The following is the "re-entry point" for recalculation of a weighted mean, when CanReject = TRUE and Isoplot has identified an eligible rejection:
Now some fundamental calculations (re-sequenced from Ludwig for clarity, and reader sanity), plus calculation of Student's t at the 68.26% confidence level and 95% confidence level (these values are a function of population size). The Excel function TINV returns the two-tailed inverse of the Student's t-distribution, and is documented at https://support.microsoft.com/en-us/office/tinv-function-a7c85b9d-90f5-41fe-9ca5-1cd2f3e1ed7c :
Sums can be used to calculate MSWD and Probability, in conjunction with Excel's FDIST function, which returns the (right-tailed) F probability distribution (degree of diversity) for two data sets. This is documented at https://support.microsoft.com/en-us/office/fdist-function-ecf76fba-b3f1-4e7d-a57e-6a5b7460b786 :
Provisionally populate some of the elements of output array ww:
At this point, WeightedAverage has completed population of the ww.Int... elements, as per this view of the VBA6 code editor/debugger in Excel 2003: The next step is to determine whether the data are dispersed (relative to a specific threshold), and if so, to calculate the constant "external" variance (calculated by Ludwig via maximum-likelihood estimation), and the associated "external" weighted mean and its error. Note that this process is independent of the value of the user-specified Boolean ConstExtErr (from subroutine WeightedAv). That code (which follows immediately on) commences:
Vectors yy and IvarY are the inputs to the subroutine that calculates the "external" variance, via the secant method, described by Press et al. (1987), p. 250–251 (more detail below). I am going to step out of this unfinished If statement to describe the generalised function WtdExtFunc, which is an important part of that process, and then return. |
Stepping out of the unfinished If statement above, I am going to describe the generalised function WtdExtFunc, which is used repeatedly in the iterative calculation of the "external" error. Function WtdExtFunc is defined as
where
Ludwig described this function as "WtdExtFunc will be zero when the external variance, TrialExtVar, is chosen correctly (Nn >= 3)". This is a bit cryptic, because TrialExtVar is definitely an input... I think what he means is that this function will generate an improved estimate for TrialExtVar, and the function can be used iteratively. It proceeds as follows: Integer value: i Double-precision values: ff, Resid, SumW, SumW2resid2, SumXW Double-precision array: W
|
Returning to the unfinished If statement above, the code continues:
Ludwig described subroutine WtdExtRTSEC as finding the root (thought to lie between 0 and MaxExtVar) of a function WtdExtFunc, using the secant method. The root, returned as WtdExtRTSEC, is refined until its accuracy is +/-xacc. He said the method is described in Numerical Recipes in C: The Art of Scientific Computing (Press et al., 1987, p. 250-251). Subroutine WtdExtRTSEC is invoked exclusively within WeightedAverage, so I have defined almost all of the arguments in terms of WeightedAverage usage. Subroutine WtdExtRTSEC is defined as
where
The subroutine proceeds as follows: Boolean value: NoEvar Integer values: j, rct Double-precision values: dx, f, facc, FL, Lastf1, Lastf2, MaxExtVar, rts, SwapTemp, tmp, xacc, xl
Ludwig starts by checking that FL and f are not too close in value. Curiously, he does this by converting each double-precision value to single-precision (via Excel VBA function CSng) for comparison, and aborting the subroutine if the single-precision values are equivalent:
Then some ring-a-rosy aimed at ensuring that the bound with the smaller WtdExtFunc value is chosen as the initial guess for subsequent iterative calculations in the secant loop:
Now the secant loop, with a maximum of 99 iterations:
The next post will return (again) to the unfinished If statement above, and assign "external error" values to the elements of array ww in subroutine WeightedAverage. |
Returning to the unfinished If statement above, the code in subroutine WeightedAverage continues by evaluating the success or otherwise of subroutine WtdExtRTSEC, and assigns "external error" values to the elements of array ww on that basis. Three scenarios are considered by the code:
remembering that this If statement opened in the final code-block from this post: #582 (comment). At this point, WeightedAverage has finished populating its ww.Ext... elements, like this: |
Subroutine WeightedAverage continues by considering its own automated point-rejection routine, which is invoked if user-specified ( CanReject = TRUE ) AND ( Probability < MinProb ):
This is followed (for user-specified CanTukeys = TRUE) by evaluation of the three ww.BiWt... and three ww.Median... elements. Those functions are quite trivial, but I am going to document them in a separate post, because the code looks very old (and untidy), and I think I can see a bug, unfortunately. If WeightedAverage's auto-rejection routine is bypassed (either by the user or by the statistical coherence of the population), that concludes the Weighted Average subroutine.
All that is left is the description of the auto-rejection routine. But the start of this code is difficult, and I include a screenshot of Ludwig's VBA verbatim: My first problem with this screenshot is the evaluation of "If ww.Ext2sigma", given that ww.Ext2sigma is a double-precision number. I have not yet found any documentation specific to VBA6, but for C++, "If Double" = FALSE when Double is zero or null, else TRUE. For the moment, I am assuming the convention is the same in VBA6. In this context, it is relevant that ww.Ext2sigma is explicitly set to 0 at the beginning of every iteration of the subroutine WeightedAverage, irrespective of whether it is the initial run-through, or a subsequent iteration demanded by Recalc:, and also irrespective of whether or not MSWD and Probability demand an evaluation of the "external error". Non-zero values of ww.Ext2sigma occur in only two circumstances:
With all of this in mind, I have re-expressed "If ww.Ext2sigma" as "If ww.Ext2sigma <> 0" because I think that is the explicit meaning.
My second problem with the screenshot is the evaluation of WtdAvgErr in the scenario where OneSigOut = TRUE. Ludwig's code is unambiguous: WtdAvgErr = ExtErr68, irrespective of the evaluation of "If ww.Ext2sigma". But is that correct? If ww.Ext2sigma = 0, then the various evaluation-paths indicate that ExtErr68 is either 0 or not evaluated. Is WtdAvgErr = ExtErr68 really appropriate in those cases? The (temporary) saving grace is that the assigned value of WtdAvgErr is not used anywhere in subroutine WeightedAverage. But I have a feeling it is called in the parent subroutine WeightedAv, so we will just have to wait and see whether this is correct, or an inconsequential error, or (perhaps) an important error. My rendition of this code continues:
Now the outlier rejection testing routine commences. Overall, no more than 15% of analyses can be rejected from the original population, and the rejection tolerance increases with each pass through the data, starting at 2sigma when attempting the first rejection, increasing to 2.5sigma when attempting a second rejection, 3sigma when attempting a third, and so on.
This is the end of subroutine WeightedAverage. The next stage is to step back up to the parent subroutine WeightedAv, to understand how the elements of WeightedAverage output array ww are translated into the WeightedAv output array W. I believe this is the piece of the puzzle currently missing, in terms of rigorously mapping the Squid3-based output of WeightedAverage to the Isoplot3 output of WtdAv. |
So taking a wtd avg for unknown samples with rejection enabled can give us different answers depending on how the spots are sorted? |
@cwmagee based on my reading of the Ludwig code, yes, I believe so. Needs to be tested, should be straightforward, but I don't have a dataset to hand. I suggest 9 x OG1 207/206 spot-dates (containing exactly 1 outlier at the 2-3sigma level), and a 10th spot being TEMORA. Using the y-bar spreadsheet function with CanReject = TRUE ought to give discernibly different results depending on whether TEMORA spot is at the top of the list or the bottom (because only 1 rejection is ever allowed when n = 10). |
I think the MtPainter data requires 1 rejection, but it can be either high or low. So I'll see if the mean changes based on order. |
Stepping out of WeightedAverage back up to WeightedAv, recall that the latter invokes the former as:
which means OneSigOut = TRUE when SigmaLevelOut = 1, otherwise OneSigOut = FALSE. The parent subroutine WeightedAv takes the form:
where The following sticks quite closely to how Ludwig maps the elements of WeightedAverage entity
|
Recall that the top-level invocation of the Isoplot3 WtdAv function is:
so there are up to six arguments to be specified, beyond ValuesAndErrors. 1. In the context of initial (automated) WtdMeanAcalc, the VBA invocation verbatim is:
so PercentOut = PercentIn = TRUE, SigmaLevelIn = SigmaLevelOut = 1, ConstantExternalErr = TRUE. I think NoReject (and its opposite, CanReject) are specified by the user. 2. In the context of REDO of a WtdMeanAcalc, the VBA invocation verbatim is:
The difference here (relative to automated initial execution of WtdMeanAcalc) is that CanReject = FALSE, because the user has had the opportunity to manually reject and un-reject points from the population, so it is important that the Redo of the calculation does not independently perform the automatic (albeit deficient) rejection routine enabled by CanReject = TRUE. 3. In the context of manual/redo weighted mean AGE calculations for Samples (not Reference Materials), as executed by SQUID 2.50 is subroutine ExtractGroup, the VBA invocation verbatim is:
so PercentOut = PercentIn = FALSE (because the age calculations), SigmaLevelIn = 1 (because that's the levelk at which the uncertainties of the constituent analyses are held), SigmaLevelOut = 2 (because mean ages should be quoted at 2sigma/95% confidence as far as possible), and ConstantExternalErr = FALSE (because statistical dispersion in an unknown population is usually assumed to be geologically significant, rather than a consequence of underestimated errors). 4. In the context of the SQUID-specific function sqWtdAv, which itself is invoked as:
Isoplot's WtdAv function is invoked via:
This is a bit unusual because the SigmaLevelIn and SigmaLevelOut are both hard-coded to 1, which is different to everything else except WtdMeanAcalc (the latter being a completely different scenario where that treatment is justified). Those are all the invocations I could easily find in SQUID 2.50 and Isoplot... if something obvious is missing, please let me know. |
Jim @bowring , the long code-block in the post from 18 March is probably the key one, in terms of "translating" values calculated by the WtdAv arithmetic into values portrayed by Ludwig as output, especially in the context of subroutine WtdMeanAcalc, and you might not need to look further. The next post after that has the settings for all the Booleans and SigmaLevel variables accompanying the invocation of WtdAv by WtdMeanAcalc. |
Hi Jim @bowring , I have finished looking through the WeightedAverage (and overlaid) code, but it is complicated because there are lots of ways into it, both from Squid (e.g. using WtdMeanAcalc for calibration constants, or sqWtdAv in user-defined expressions) and in Excel, from Isoplot (using the array function in a spreadsheet, or the toolbar-based function).
In addition to these various entry-points, we know that the array output of the calculation is conditional (with respect to both the numerical values reported and the labels assigned to them). Some of the conditions are dictatable by the user at the outset of the calculation, but the function-layering renders these relationships opaque. I am going to try to tease it out here (in instalments, and skipping over as much of the Excel-specific guff as I can). Note that in Excel, pretty much all of the below is part of the Isoplot codebase, not SQUID 2.50.
As a preface, function sqWtdAv is defined as:
sqWtdAv ( Values, Errs, PercentErrsIn, PercentErrsOut, CanReject )
where:
Values
is a mandatory 1-column range of numeric values.Errs
is a mandatory 1-column range of numeric uncertainties associated with the Values (and of the same length as Values).PercentErrsIn
is an optional Boolean, permitting the user to specify whether the Errs values are percentage (TRUE) or absolute (FALSE). If not specified, it defaults to FALSE.PercentErrsOut
is an optional Boolean, permitting the user to specify whether the uncertainties output by the weighted mean calculation (error of the mean, with or without "external" error) should be specified as percentage (TRUE) or absolute (FALSE). If not specified, it defaults to FALSE.CanReject
is an optional Boolean, permitting the user to specify whether Isoplot should be permitted to try excluding one or more outliers from the submitted set of Values and Errs, if the initial weighted mean calculation yields a probability-of-fit < 0.05. (Isoplot has its own rules for this process: no more than 15% of the total population can ever be rejected, and although rejection tolerance commences at 2sigma, the tolerance loosens with each successive pass. More on this later.) If not specified, it defaults to FALSE.sqWtdAv is unique in that it requires Values and Errs to be specified as separate vectors, and the sqWtdAv code then explicitly defines
ValuesErrs
as the unified 2-column array. This is to bring the input into line with all other invocations of the "starter" Isoplot function WtdAv, which I will describe next.The text was updated successfully, but these errors were encountered: