diff --git a/modules/.ipynb_checkpoints/module7-checkpoint.md b/modules/.ipynb_checkpoints/module7-checkpoint.md new file mode 100644 index 00000000..059be225 --- /dev/null +++ b/modules/.ipynb_checkpoints/module7-checkpoint.md @@ -0,0 +1,74 @@ +# 7) Markov Chains + +```note +## Lab 7: Markov Models & Markov Chain Monte Carlo (MCMC) + +Download the lab and data files to your computer. Then, upload them to your JupyterHub [following the instructions here](/resources/b-learning-jupyter.html#working-with-files-on-our-jupyterhub). + +* [Lab 7-1: Markov Chains - Basic Examples](lab7/lab7-1.ipynb) + * data: [markov_random4.txt](data/markov_random4.txt) +* [Lab 7-2: Markov Chain - ENSO Phases](lab7/lab7-2.ipynb) + * data: [ENSO_to2022.csv](data/ENSO_to2022.csv) +* [Lab 7-3: MCMC Rating Curves](lab7/lab7-3.ipynb) + * data: [Lyell_h_Q_sorted.mat](data/Lyell_h_Q_sorted.mat) + +``` + + +## Homework 7: + + +### Problem 1: ENSO Phases +Following Lab 7-1 and Lab 7-2, +A) Use the time series of the phase of the El Niño Southern Oscillation (ENSO) from 1900-2022 to create a lag-1 Markov model of the ENSO phase. +where the observed Phases of ENSO are as follows: + +1: warm (El Niño) +2: neutral (ENSO neutral) +3: cool, (La Niña) + +B) Using this Markov model and a random number generator, simulate 5,000 years of ENSO data. + +C) Using this randomly generated data, answer the following questions. + + - According to the model, what is the probability that three warm ENSO years would occur in a row? + - What is the large-sample probability that three cool ENSO years would happen in a row? (Try refreshing the numbers several times to increase the sample size if the condition never happens.) + - Check out this [blog](https://www.climate.gov/news-features/blogs/september-2022-la-ni%C3%B1a-update-it%E2%80%99s-q-time) about why we care about ENSO and the exciting current probability of getting a cool ENSO (La Nina) again in 2023, making it three in a row. + +### Probelm 2: Rating Curves and Application of Bayes Theorem with MCMC + +Following the class discussion and Lab 7-3, explore how the rating curve and the 95% confidence intervals for the Lyell Fork streamflow site change depending on the method you use to determine the rating curve: + +- Least squares linear regression fitting (with transformed variables) using b = 0.28 m + - Make 95% confidence intervals around this regression fit + - Then, assume that we don't know exactly what b is. Try additional linear regressions using different values of b = 0.10, 0.20, 0.30, 0.40, and 0.50 m (you do not need to calculate 95% confidence intervals for these additional fits) + - Qualitatively, is the range between these 5 additional lines with different b values larger or smaller than the range between the 95% confidence lines from the original fitted line (the one with b = 0.28 cm)? +- Direct monte carlo parameter estimation +- Bayesian MCMC fitting + +Using the code in Lab 7-3, create plots and discuss the differences in the results from these three methods. + +### Problem 2 grads: Work on your term projects (CEWA 565) + + +### Problem 2 undergrads: Statistics Synthesis (CEE 465) + +(Your final exam questions will look similar to this.) +You are given the below dataset of annual peak flows on the Sauk River: + +![Sauk River Plot](lab7/sauk-river-plot.png) + +(Note, you do not need to do any actual analysis here) + +For each of the following questions about this dataset, I want you to answer: + - How do you ask this question statistically? + - What tools should you use to answer this question? (think of techniques we’ve learned in class) + - What should you be careful about? (think of caveats and requirements of the tools you’re recommending). + + **A.** Presume some logging occurred in the watershed in 1970. Are peak flows higher after 1970 than before 1970? + + **B.** Presume some logging occurred in the watershed in 1970. Have peak flows become more variable after 1970 than before 1970? + + **C.** If the mean annual peak flow has increased to above 50,000 cfs, the town will rebuild the levees. What are the chances that our statistical test would fail to identify this change? + + **D.** Has there been a trend in peak flows between 1930 and 2010? How fast are peak flows changing, and is this trend significant? \ No newline at end of file diff --git a/modules/data/ENSO_to2022.csv b/modules/data/ENSO_to2022.csv new file mode 100644 index 00000000..cb102b2e --- /dev/null +++ b/modules/data/ENSO_to2022.csv @@ -0,0 +1,129 @@ +# Observed Phase of the ENSO,, +#state number,state,description +#1,El Nino,warm +#2,ENSO neutral,neutral +#3,La Nina,cool +Water Year,ENSO Phase, +1900,1, +1901,2, +1902,2, +1903,1, +1904,3, +1905,1, +1906,1, +1907,3, +1908,2, +1909,3, +1910,3, +1911,3, +1912,1, +1913,2, +1914,1, +1915,1, +1916,2, +1917,3, +1918,3, +1919,1, +1920,1, +1921,3, +1922,2, +1923,3, +1924,1, +1925,3, +1926,1, +1927,2, +1928,2, +1929,2, +1930,1, +1931,1, +1932,3, +1933,2, +1934,3, +1935,2, +1936,2, +1937,2, +1938,3, +1939,3, +1940,1, +1941,1, +1942,1, +1943,3, +1944,3, +1945,3, +1946,2, +1947,2, +1948,2, +1949,2, +1950,3, +1951,3, +1952,1, +1953,2, +1954,2, +1955,3, +1956,3, +1957,2, +1958,1, +1959,1, +1960,2, +1961,2, +1962,2, +1963,3, +1964,1, +1965,3, +1966,1, +1967,2, +1968,3, +1969,1, +1970,1, +1971,3, +1972,3, +1973,1, +1974,3, +1975,3, +1976,3, +1977,1, +1978,1, +1979,2, +1980,1, +1981,2, +1982,2, +1983,1, +1984,3, +1985,3, +1986,3, +1987,1, +1988,1, +1989,3, +1990,2, +1991,2, +1992,1, +1993,2, +1994,2, +1995,1, +1996,3, +1997,2, +1998,1, +1999,3, +2000,3, +2001,3, +2002,2, +2003,1, +2004,2, +2005,1, +2006,3, +2007,1, +2008,3, +2009,3, +2010,1, +2011,3, +2012,3, +2013,2, +2014,2, +2015,1, +2016,1, +2017,3, +2018,3, +2019,1, +2020,2, +2021,3, +2022,3, \ No newline at end of file diff --git a/modules/lab7/.ipynb_checkpoints/lab7-2-checkpoint.ipynb b/modules/lab7/.ipynb_checkpoints/lab7-2-checkpoint.ipynb new file mode 100644 index 00000000..791c6adc --- /dev/null +++ b/modules/lab7/.ipynb_checkpoints/lab7-2-checkpoint.ipynb @@ -0,0 +1,244 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Lab 7-2: Markov Chains - ENSO Phases\n", + "\n", + "Download the data file for this lab, [ENSO_to2022.csv](https://mountain-hydrology-research-group.github.io/data-analysis/modules/data/ENSO_to2022.csv), which contains a record of the El Niño Southern Oscillation (ENSO) phase from 1900-2022.\n", + "\n", + "You can read more about ENSO [here](https://www.weather.gov/mhx/ensowhat), and [here](https://www.climate.gov/enso).\n", + "\n", + "---" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Importing python packages you'll need for this lab:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "import numpy as np\n", + "import scipy.stats as stats\n", + "from scipy import sparse\n", + "import matplotlib.pyplot as plt\n", + "%matplotlib inline" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Load the data file" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Water YearENSO PhaseUnnamed: 2
019001NaN
119012NaN
219022NaN
\n", + "
" + ], + "text/plain": [ + " Water Year ENSO Phase Unnamed: 2\n", + "0 1900 1 NaN\n", + "1 1901 2 NaN\n", + "2 1902 2 NaN" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df = pd.read_csv('../data/ENSO_to2022.csv', comment='#')\n", + "df.head(3)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "**A.** Using the time series of the phase of the El Niño Southern Oscillation (ENSO) from 1900-2022, create a lag-1 Markov model of the ENSO phase.\n", + "\n", + "Observed Phases of ENSO:\n", + " - 1: warm (El Niño) \n", + " - 2: neutral (ENSO neutral) \n", + " - 3: cool, (La Niña) " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Count transitions between each of the three ENSO phases using [scipy.sparse.csr_matrix()](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html) and then [scipy.sparse.csr_matrix.todense()](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.todense.html)." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "# count the transitions from each state to the next\n", + "\n", + "# convert transition counts to matrix form\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Normalize the transition matrix to get probabilities. This will create our lag-1 Markov Model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Compute cumulative sums along the rows, make sure these sum to 1. (We will use this cdf matrix below in a simulation of ENSO phases)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "**B.** Using this Markov model and a random number generator, simulate 5,000 years of ENSO data." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "# pick the number of years we want to simulate (5000)\n", + "\n", + "# use a uniform random number for 5000 years\n", + "\n", + "# start off in state 2, neutral" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "**C.** Using this randomly generated data, answer the following questions. \n", + "\n", + "* According to the model, what is the probability that three warm ENSO years would occur in a row?\n", + "* What is the large-sample probability that three cool ENSO years would happen in a row?\n", + "\n", + "(Try refreshing the numbers several times to increase the sample size if the condition never happens.) " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.7" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/modules/lab7/lab7-2.ipynb b/modules/lab7/lab7-2.ipynb index 6c79187b..791c6adc 100644 --- a/modules/lab7/lab7-2.ipynb +++ b/modules/lab7/lab7-2.ipynb @@ -6,7 +6,7 @@ "source": [ "# Lab 7-2: Markov Chains - ENSO Phases\n", "\n", - "Download the data file for this lab, [ENSO_to2021.csv](https://mountain-hydrology-research-group.github.io/data-analysis/modules/data/ENSO_to2021.csv), which contains a record of the El Niño Southern Oscillation (ENSO) phase from 1900-2021.\n", + "Download the data file for this lab, [ENSO_to2022.csv](https://mountain-hydrology-research-group.github.io/data-analysis/modules/data/ENSO_to2022.csv), which contains a record of the El Niño Southern Oscillation (ENSO) phase from 1900-2022.\n", "\n", "You can read more about ENSO [here](https://www.weather.gov/mhx/ensowhat), and [here](https://www.climate.gov/enso).\n", "\n", @@ -69,6 +69,7 @@ " \n", " Water Year\n", " ENSO Phase\n", + " Unnamed: 2\n", " \n", " \n", " \n", @@ -76,26 +77,29 @@ " 0\n", " 1900\n", " 1\n", + " NaN\n", " \n", " \n", " 1\n", " 1901\n", " 2\n", + " NaN\n", " \n", " \n", " 2\n", " 1902\n", " 2\n", + " NaN\n", " \n", " \n", "\n", "" ], "text/plain": [ - " Water Year ENSO Phase\n", - "0 1900 1\n", - "1 1901 2\n", - "2 1902 2" + " Water Year ENSO Phase Unnamed: 2\n", + "0 1900 1 NaN\n", + "1 1901 2 NaN\n", + "2 1902 2 NaN" ] }, "execution_count": 2, @@ -104,7 +108,7 @@ } ], "source": [ - "df = pd.read_csv('../data/ENSO_to2021.csv', comment='#')\n", + "df = pd.read_csv('../data/ENSO_to2022.csv', comment='#')\n", "df.head(3)" ] }, @@ -113,7 +117,7 @@ "metadata": {}, "source": [ "---\n", - "**A.** Using the time series of the phase of the El Niño Southern Oscillation (ENSO) from 1900-2021, create a lag-1 Markov model of the ENSO phase.\n", + "**A.** Using the time series of the phase of the El Niño Southern Oscillation (ENSO) from 1900-2022, create a lag-1 Markov model of the ENSO phase.\n", "\n", "Observed Phases of ENSO:\n", " - 1: warm (El Niño) \n", @@ -218,7 +222,7 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3", + "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, @@ -232,7 +236,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.7.4" + "version": "3.9.7" } }, "nbformat": 4, diff --git a/modules/module7.md b/modules/module7.md index 304ab995..059be225 100644 --- a/modules/module7.md +++ b/modules/module7.md @@ -8,7 +8,7 @@ Download the lab and data files to your computer. Then, upload them to your Jupy * [Lab 7-1: Markov Chains - Basic Examples](lab7/lab7-1.ipynb) * data: [markov_random4.txt](data/markov_random4.txt) * [Lab 7-2: Markov Chain - ENSO Phases](lab7/lab7-2.ipynb) - * data: [ENSO_to2021.csv](data/ENSO_to2021.csv) + * data: [ENSO_to2022.csv](data/ENSO_to2022.csv) * [Lab 7-3: MCMC Rating Curves](lab7/lab7-3.ipynb) * data: [Lyell_h_Q_sorted.mat](data/Lyell_h_Q_sorted.mat) @@ -20,7 +20,7 @@ Download the lab and data files to your computer. Then, upload them to your Jupy ### Problem 1: ENSO Phases Following Lab 7-1 and Lab 7-2, -A) Use the time series of the phase of the El Niño Southern Oscillation (ENSO) from 1900-2021 to create a lag-1 Markov model of the ENSO phase. +A) Use the time series of the phase of the El Niño Southern Oscillation (ENSO) from 1900-2022 to create a lag-1 Markov model of the ENSO phase. where the observed Phases of ENSO are as follows: 1: warm (El Niño) @@ -33,6 +33,7 @@ C) Using this randomly generated data, answer the following questions. - According to the model, what is the probability that three warm ENSO years would occur in a row? - What is the large-sample probability that three cool ENSO years would happen in a row? (Try refreshing the numbers several times to increase the sample size if the condition never happens.) + - Check out this [blog](https://www.climate.gov/news-features/blogs/september-2022-la-ni%C3%B1a-update-it%E2%80%99s-q-time) about why we care about ENSO and the exciting current probability of getting a cool ENSO (La Nina) again in 2023, making it three in a row. ### Probelm 2: Rating Curves and Application of Bayes Theorem with MCMC