-
Notifications
You must be signed in to change notification settings - Fork 3
/
about-me-example.Rmd
209 lines (162 loc) · 7.17 KB
/
about-me-example.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
---
title: "About Me"
author: "Eric C. Anderson"
output:
html_document:
toc: yes
word_document:
toc: yes
pdf_document:
toc: yes
bibliography: references.bib
---
# Who I am and where I came from
I grew up in a small town in Southern California. When I was 11 years old,
I spent five weeks living with a family friend who ran the hyperbaric chamber
adjacent to the University of Southern California Marine Lab on Catalina Island.
Days spent snorkeling the kelp beds, keeping intertidal organisms in an aquarium
I was allowed to fill, and watching researchers tag blue sharks convinced me that
I wanted to be a marine biologist.
In high school, however, I got into rock climbing and became a little math/physics nerd.
I went to [Stanford University](https://www.stanford.edu/) planning to major in
Mathematics, but got super intimidated by all the smart, wonky people in my first
honors math class. I took a year-long detour to
[Prescott College](https://www.prescott.edu/), where I had my first class in evolution
and wondered why the hell my high-school biology class was not taught in the context
of evolution.
I ended up studying human biology at Stanford, and then went to
[University of Washington](https://www.washington.edu/) for graduate work. I started
in Fisheries, but then got involved in genetics research that needed some
new statistical developments. Thus, my math nerdiness was resurrected and
I did my PhD in Quantitative Ecology and Resource Management, focusing on
the use of Monte Carlo methods for inference from population genetic models.
After a two-year postdoc at Berkeley, I started working for the National Marine
Fisheries Service in Santa Cruz, CA. I actually did become a marine biologist,
sort of! I still work for NMFS, but a year and a half
ago I moved to Fort Collins with Kristen, and have an affiliate position
in FWCB.
When I am not working I love getting out and being active. My top four things
to do are:
1. Snorkeling in rivers and creeks, backpacking, and hiking (all with my family),
1. Playing hockey (I learned to skate after moving here. What a blast!),
1. Biking,
1. Playing with our pair of awesome, springy cats.
Here is a picture of me with daughter Zoe, looking for aquatic invertebrates
above the CSU Mountain Campus.
```{r me_pic, echo=FALSE, out.width="500px"}
knitr::include_graphics("images/eric.jpg", auto_pdf = TRUE)
```
# Research Interests
I'm interested in all manner of statistical inference from genetic data. Lately I have
been working on the genetic basis of run timing in Chinook salmon and other salmonids.
## Influential papers
When I was a graduate student, I heard Peter Green speak about
his work on reversible jump MCMC for the analysis of finite mixture
models. One of the problems I was working with at the time was
estimating proportions of salmon from different rivers that were being
caught in the ocean---the mixed stock fishery problem. I spent a lot
of time working through @richardson1997bayesian and learned a lot about
MCMC and RJMCMC in the process.
Later on, much of my work on Bayesian inference of pedigrees from genetic data
[@anderson2016bayesian] builds upon the idea of factor graphs described by
@kschischang2001factor.
## The mathematics behind my research
I have worked a lot with the coalescent process, so let's put
down the expected time during which there are $k$ extant lineages
in a population of size $N$.
$$
\mathbb{E}T_k = \frac{4N}{k(k-1)}.
$$
And, while we are at, let's throw down a description of one of the
update steps in the sum-product algorithm for acyclic factor graphs:
$$
\mu_{f_j\longrightarrow v_i}(x_i) =
\sum_{x_{C\backslash i} \in \mathcal{X}_{C\backslash i}}
h_j(x_{C\backslash i}, x_i) \prod_{k\in C\backslash i} \mu_{v_k\longrightarrow f_j}(x_k).
$$
## My computing experience
I started programming in BASIC on our old Apple IIe in 1983. In high school
I implemented a basic program to plot some fractal images. After that, I didn't really
do any programming until grad school when I took a course in C.
Here is some C code that I wrote:
```c
if(RU!=NULL) {
RepUnitZSum = (int *)calloc(RU->NumRepUnits,sizeof(int));
RepUnitPis = DvalVector(0,RU->NumRepUnits, 0.0, 1.0, .01);
RepUnitPofZs = (dval ***)calloc(N,sizeof(dval **));
for(i=0;i<N;i++) {
RepUnitPofZs[i] = DvalVector(0,RU->NumRepUnits-1, 0.0,1.0, -1); /* no histograms for these */
}
if(BO->PiTraceInterval>0) {
repPi_tracef = OpenTraceFile("rep_unit_pi_trace.txt", "trace file of reporting unit Pi values from gsi_sim.", Baselines, BO, RU, BO->PiTraceInterval);
}
if(BO->ZSumTraceInterval>0) {
repZSumtracef = OpenTraceFile("rep_unit_zsum_trace.txt", "trace file of reporting unit ZSum values from gsi_sim.", Baselines, BO, RU, BO->ZSumTraceInterval);
}
}
```
Wow! That is pretty ugly.
When I was a postdoc, John Novembre and the other members of Monty Slatkin's lab at
Berkeley got me hooked on using the Unix shell, programming in bash, and
writing short scripts in awk and sed. Here is a little awk script that takes the
output of SGE's `qacct` command and makes a nice, tidy table of it
```sh
#! /usr/bin/awk -f
# an awk script.
# it expects the output of qacct like this:
# qacct -o eriq -b 09271925 -j ml
# make it executable and run it like this:
# qacct -o eriq -b 09271925 -j ml | tidy-qacct
# if you pass it a job number that was not one of your jobs it
# just skips the error message that comes up.
# note that the output of qacct is space delimited
/^==========/ {++n; next} # increment run counter, then skip these lines
/^error:/ {next} # skip it if you told it to get a wrong job number
# now, every data line it gets things. It compiles the header
# all the way through, in case some reports have more columns...
NF > 1 {
tag = $1;
if(!(tag in header_vals)) {
header[++h] = tag;
header_vals[tag]++;
}
$1 = ""; # remove the tag from the full line of stuff
values[n, tag] = $0; # assign the values to the tag
}
# at the end of it all, we print the header and then all the values:
END {
# print the header
printf("%s", header[1]);
for(i=2;i<=h;i++)
printf("\t%s", header[i]);
printf("\n");
# cycle over individuals and fields and print em
for(j=1;j<=n;j++) {
printf("%s", values[j,header[1]]);
for(i=2;i<=h;i++)
printf("\t%s", values[j,header[i]]);
printf("\n");
}
}
```
I used to rather dislike the R programming language and felt it was
dreadfully slow. It has gotten a lot better in the last two decades.
The introduction of Hadley Wickham's tidy data analysis framework has
really improved things.
## What I hope to get out of this class
I hope that I will:
* Help students understand enough about Unix and programming to lessen the pain of learning to do bioinformatics.
* Be able to advance students' own research.
* Impart to the students an appreciation of the importance of making research reproducible.
# Evaluating some R code
I'm going to just simulate one million beta random variables from a $\mathrm{Beta}(2,5)$ distribution
and plot a histogram of it.
```{r, message=FALSE}
library(tidyverse)
beta_rvs <- tibble(
x = rbeta(1e06, shape1 = 2, shape2 = 5)
)
ggplot(beta_rvs, aes(x = x)) +
geom_histogram(binwidth = 0.01)
```
# Citations