-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
CircleCI
committed
Apr 5, 2024
1 parent
d50e3e9
commit fd4d4c9
Showing
367 changed files
with
5,999 additions
and
3,582 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file modified
BIN
+0 Bytes
(100%)
openturns/master/_downloads/0ef2e737c16c571c8945b202ec11aae3/auto_calibration_jupyter.zip
Binary file not shown.
Binary file modified
BIN
+0 Bytes
(100%)
...s/master/_downloads/32764d685caaf9dcc0d31be3ef384e64/auto_functional_modeling_jupyter.zip
Binary file not shown.
161 changes: 161 additions & 0 deletions
161
...ter/_downloads/422e4d843537e714798e207c87b705cd/plot_distribution_linear_regression.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,161 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"\n# Distribution of estimators in linear regression\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Introduction\n\nIn this example, we are interested in the distribution of the estimator of the variance\nof the observation error in linear regression.\nWe are also interested in the estimator of the standard deviation of the\nobservation error.\nWe show how to use the :class:`~openturns.PythonRandomVector` class in order to\nperform a study of the sample distribution of these estimators.\n\nIn the general linear regression model, the observation error $\\epsilon$ has the\nnormal distribution $\\cN(0, \\sigma^2)$ where $\\sigma > 0$\nis the standard deviation.\nWe are interested in the estimators of the variance $\\sigma^2$\nand the standard deviation $\\sigma$:\n\n- the variance of the residuals, $\\sigma^2$, is estimated from :meth:`~openturns.LinearModelResult.getResidualsVariance`;\n- the standard deviation $\\sigma$ is estimated from :meth:`~openturns.LinearModelAnalysis.getResidualsStandardError`.\n\nThe asymptotic distribution of these estimators is known (see [vaart2000]_)\nbut we want to perform an empirical study, based on simulation.\nIn order to see the distribution of the estimator, we simulate an observation of the estimator,\nand repeat that experiment $r \\in \\Nset$ times, where $r$\nis a large integer.\nThen we approximate the distribution using :class:`~openturns.KernelSmoothing`.\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"import openturns as ot\nimport openturns.viewer as otv\nimport pylab as pl" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## The simulation engine\nThe simulation is based on the :class:`~openturns.PythonRandomVector` class,\nwhich simulates independent observations of a random vector.\nThe `getRealization()` method implements the simulation of the observation\nof the estimator.\n\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"class SampleEstimatorLinearRegression(ot.PythonRandomVector):\n def __init__(\n self, sample_size, true_standard_deviation, coefficients, estimator=\"variance\"\n ):\n \"\"\"\n Create a RandomVector for an estimator from a linear regression model.\n\n Parameters\n ----------\n sample_size : int\n The sample size n.\n true_standard_deviation : float\n The standard deviation of the Gaussian observation error.\n coefficients: sequence of p floats\n The coefficients of the linear model.\n estimator: str\n The estimator.\n Available estimators are \"variance\" or \"standard-deviation\".\n \"\"\"\n super(SampleEstimatorLinearRegression, self).__init__(1)\n self.sample_size = sample_size\n self.numberOfParameters = coefficients.getDimension()\n input_dimension = self.numberOfParameters - 1 # Because of the intercept\n centerCoefficient = [0.0] * input_dimension\n constantCoefficient = [coefficients[0]]\n linearCoefficient = ot.Matrix([coefficients[1:]])\n self.linearModel = ot.LinearFunction(\n centerCoefficient, constantCoefficient, linearCoefficient\n )\n self.distribution = ot.Normal(input_dimension)\n self.distribution.setDescription([\"x%d\" % (i) for i in range(input_dimension)])\n self.errorDistribution = ot.Normal(0.0, true_standard_deviation)\n self.estimator = estimator\n # In classical linear regression, the input is deterministic.\n # Here, we set it once and for all.\n self.input_sample = self.distribution.getSample(self.sample_size)\n self.output_sample = self.linearModel(self.input_sample)\n\n def getRealization(self):\n errorSample = self.errorDistribution.getSample(self.sample_size)\n noisy_output_sample = self.output_sample + errorSample\n algo = ot.LinearModelAlgorithm(self.input_sample, noisy_output_sample)\n lmResult = algo.getResult()\n if self.estimator == \"variance\":\n output = lmResult.getResidualsVariance()\n elif self.estimator == \"standard-deviation\":\n lmAnalysis = ot.LinearModelAnalysis(lmResult)\n output = lmAnalysis.getResidualsStandardError()\n else:\n raise ValueError(\"Unknown estimator %s\" % (self.estimator))\n return [output]\n\n\ndef plot_sample_by_kernel_smoothing(\n sample_size, true_standard_deviation, coefficients, estimator, repetitions_size\n):\n \"\"\"\n Plot the estimated distribution of the biased sample variance from a sample\n\n The method uses Kernel Smoothing with default kernel.\n\n Parameters\n ----------\n repetitions_size : int\n The number of repetitions of the experiments.\n This is the (children) size of the sample of the biased\n sample variance\n\n Returns\n -------\n graph : ot.Graph\n The plot of the PDF of the estimated distribution.\n\n \"\"\"\n myRV = ot.RandomVector(\n SampleEstimatorLinearRegression(\n sample_size, true_standard_deviation, coefficients, estimator\n )\n )\n sample_estimator = myRV.getSample(repetitions_size)\n if estimator == \"variance\":\n sample_estimator.setDescription([r\"$\\hat{\\sigma}^2$\"])\n elif estimator == \"standard-deviation\":\n sample_estimator.setDescription([r\"$\\hat{\\sigma}$\"])\n else:\n raise ValueError(\"Unknown estimator %s\" % (estimator))\n mean_sample_estimator = sample_estimator.computeMean()\n\n graph = ot.KernelSmoothing().build(sample_estimator).drawPDF()\n graph.setLegends([\"Sample\"])\n bb = graph.getBoundingBox()\n ylb = bb.getLowerBound()[1]\n yub = bb.getUpperBound()[1]\n curve = ot.Curve([true_standard_deviation ** 2] * 2, [ylb, yub])\n curve.setLegend(\"Exact\")\n curve.setLineWidth(2.0)\n graph.add(curve)\n graph.setTitle(\n \"Size = %d, True S.D. = %.4f, Mean = %.4f, Rep. = %d\"\n % (\n sample_size,\n true_standard_deviation,\n mean_sample_estimator[0],\n repetitions_size,\n )\n )\n return graph" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Distribution of the variance estimator\nWe first consider the estimation of the variance $\\sigma^2$.\nIn the next cell, we consider a sample size equal to $n = 6$ with\n$p = 3$ parameters.\nWe use $r = 1000$ repetitions.\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"repetitions_size = 1000\ntrue_standard_deviation = 0.1\nsample_size = 6\ncoefficients = ot.Point([3.0, 2.0, -1.0])\nestimator = \"variance\"\nview = otv.View(\n plot_sample_by_kernel_smoothing(\n sample_size, true_standard_deviation, coefficients, estimator, repetitions_size\n ),\n figure_kw={\"figsize\": (6.0, 3.5)},\n)\npl.subplots_adjust(bottom=0.25)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"If we use a sample size equal to $n = 6$ with\n$p = 3$ parameters, the distribution is not symmetric.\nThe mean of the observations of the sample variances remains close to\nthe true value $0.1^2 = 0.01$.\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Then we increase the sample size $n$.\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"repetitions_size = 1000\ntrue_standard_deviation = 0.1\nsample_size = 100\ncoefficients = ot.Point([3.0, 2.0, -1.0])\nview = otv.View(\n plot_sample_by_kernel_smoothing(\n sample_size, true_standard_deviation, coefficients, estimator, repetitions_size\n ),\n figure_kw={\"figsize\": (6.0, 3.5)},\n)\npl.subplots_adjust(bottom=0.25)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"If we use a sample size equal to $n = 6$ with\n$p = 3$ parameters, the distribution is almost symmetric and\nalmost normal.\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Distribution of the standard deviation estimator\nWe now consider the estimation of the standard deviation $\\sigma$.\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"repetitions_size = 1000\ntrue_standard_deviation = 0.1\nsample_size = 6\ncoefficients = ot.Point([3.0, 2.0, -1.0])\nestimator = \"standard-deviation\"\nview = otv.View(\n plot_sample_by_kernel_smoothing(\n sample_size, true_standard_deviation, coefficients, estimator, repetitions_size\n ),\n figure_kw={\"figsize\": (6.0, 3.5)},\n)\npl.subplots_adjust(bottom=0.25)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"If we use a sample size equal to $n = 6$ with\n$p = 3$ parameters, we see that the distribution is almost symmetric.\nWe notice a slight bias, since the mean of the observations of the\nstandard deviation is not as close to the true value 0.1\nas we could expect.\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"repetitions_size = 1000\ntrue_standard_deviation = 0.1\nsample_size = 100\ncoefficients = ot.Point([3.0, 2.0, -1.0])\nestimator = \"standard-deviation\"\nview = otv.View(\n plot_sample_by_kernel_smoothing(\n sample_size, true_standard_deviation, coefficients, estimator, repetitions_size\n ),\n figure_kw={\"figsize\": (6.0, 3.5)},\n)\npl.subplots_adjust(bottom=0.25)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"If we use a sample size equal to $n = 100$ with\n$p = 3$ parameters, we see that the distribution is almost normal.\nWe notice that the bias disappeared.\n\n" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.11.8" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 0 | ||
} |
Binary file modified
BIN
+0 Bytes
(100%)
...ns/master/_downloads/4df3e0c0f81319718eb3fb7cc1e6e39e/auto_functional_modeling_python.zip
Binary file not shown.
Binary file modified
BIN
+0 Bytes
(100%)
openturns/master/_downloads/58b7cedd7592157d47e84db4a498c83f/auto_data_analysis_jupyter.zip
Binary file not shown.
Oops, something went wrong.