diff --git a/dev/index.html b/dev/index.html index bb4485b6..b641da8d 100644 --- a/dev/index.html +++ b/dev/index.html @@ -7,4 +7,4 @@ LogitLink LogLink ProbitLink -SqrtLink

Manual Outline

+SqrtLink

Manual Outline

diff --git a/dev/man/FAQ/index.html b/dev/man/FAQ/index.html index 45757e6a..1429aced 100644 --- a/dev/man/FAQ/index.html +++ b/dev/man/FAQ/index.html @@ -16,4 +16,4 @@ @show add(0, 3) # 0 + b + c using b = 3 and default value for c @show add(0, 5, c=10); # 0 + b + c using b = 5 and c = 10
add(0) = 3
 add(0, 3) = 5
-add(0, 5, c = 10) = 15

Will IHT work on sequence/imputed data?

If someone can test this out and tell us, that would be extremely helpful.

+add(0, 5, c = 10) = 15

Will IHT work on sequence/imputed data?

If someone can test this out and tell us, that would be extremely helpful.

diff --git a/dev/man/api/index.html b/dev/man/api/index.html index bf6ba541..7529f337 100644 --- a/dev/man/api/index.html +++ b/dev/man/api/index.html @@ -5,4 +5,4 @@ w = #import weight vector ng = length(unique(g)) # specify number of non-zero groups result = fit_iht(y, x, z; J=ng, k=10, d=Normal(), l=IdentityLink(), group=g, weight=w)

Simulation Utilities

For complex simulations, please use TraitSimulation.jl.

MendelIHT provides very naive simulation utilities, which were written before TraitSimulation.jl was developed.

MendelIHT.simulate_random_snparrayFunction
simulate_random_snparray(s::String, n::Integer, p::Integer; 
-    [mafs::Vector{Float64}], [min_ma::Integer])

Creates a random SnpArray in the current directory without missing value, where each SNP has ⫺5 (default) minor alleles.

Note: if supplied minor allele frequency is extremely small, it could take a long time for the simulation to generate samples where at least min_ma (defaults to 5) are present.

Arguments:

  • s: name of SnpArray that will be created in the current directory. To not create file, use undef.
  • n: number of samples
  • p: number of SNPs

Optional Arguments:

  • mafs: vector of desired minor allele freuqencies (uniform(0,0.5) by default)
  • min_ma: the minimum number of minor alleles that must be present for each SNP (defaults to 5)
source
MendelIHT.simulate_correlated_snparrayFunction
simulate_correlated_snparray(s, n, p; block_length, hap, prob)

Simulates a SnpArray with correlation. SNPs are divided into blocks where each adjacent SNP is the same with probability prob. There are no correlation between blocks.

Arguments:

  • n: number of samples
  • p: number of SNPs
  • s: name of SnpArray that will be created (memory mapped) in the current directory. To not memory map, use undef.

Optional arguments:

  • block_length: length of each LD block
  • hap: number of haplotypes to simulate for each block
  • prob: with probability prob an adjacent SNP would be the same.
source
Note

Simulating a SnpArray with $n$ subjects and $p$ SNPs requires up to $2np$ bits of RAM.

MendelIHT.simulate_random_responseFunction
simulate_random_response(x, k, d, l; kwargs...)

This function simulates a random response (trait) vector y. When the distribution d is from Poisson, Gamma, or Negative Binomial, we simulate β ∼ N(0, 0.3) to roughly ensure the mean of response y doesn't become too large. For other distributions, we choose β ∼ N(0, 1).

Arguments

  • x: Design matrix
  • k: the true number of predictors.
  • d: The distribution of the simulated trait (note typeof(d) = UnionAll but typeof(d()) is an actual distribution: e.g. Normal)
  • l: The link function. Input canonicallink(d()) if you want to use the canonical link of d.

Optional arguments

  • r: The number of success until stopping in negative binomial regression, defaults to 10
  • α: Shape parameter of the gamma distribution, defaults to 1
  • Zu: Effect of non-genetic covariates. Zu should have dimension n × 1.
source
simulate_random_response(x, k, traits)

Simulates a response matrix Y where each row is an independent multivariate Gaussian with length trait. There are k non-zero β over all traits. Each trait shares overlap causal SNPs. The covariance matrix Σ is positive definite and symmetric.

Arguments

  • x: Design matrix of dimension n × p. Each row is a sample.
  • k: the total true number of causal SNPs (predictors)
  • traits: Number of traits

Optional arguments

  • Zu: Effect of non-genetic covariates. Zu should have dimension n × traits.
  • overlap: Number of causal SNPs shared by all traits. Shared SNPs does not have the same effect size.

Outputs

  • Y: Response matrix where each row is sampled from a multivariate normal with mean μ[i] = X[i, :] * true_b and variance Σ
  • Σ: the symmetric, positive definite covariance matrix used
  • true_b: A sparse matrix containing true beta values.
  • correct_position: Non-zero indices of true_b
source
Note

For negative binomial and gamma, the link function must be LogLink.

MendelIHT.make_bim_fam_filesFunction
make_bim_fam_files(x::SnpArray, y, name::String)

Creates .bim and .bed files from a SnpArray.

Arguments:

  • x: A SnpArray (i.e. .bed file on the disk) for which you wish to create corresponding .bim and .fam files.
  • name: string that should match the .bed file (Do not include .bim or .fam extensions in name).
  • y: Trait vector that will go in to the 6th column of .fam file.
source

Other Useful Functions

MendelIHT additionally provides useful utilities that may be of interest to a few advanced users.

MendelIHT.iht_run_many_modelsFunction

Runs IHT across many different model sizes specifed in path using the full design matrix. Same as cv_iht but DOES NOT validate in a holdout set, meaning that this will definitely induce overfitting as we increase model size. Use this if you want to quickly estimate a range of feasible model sizes before engaging in full cross validation.

source
MendelIHT.pveFunction
pve(y, X, β; l = IdentityLink())

Estimates phenotype's Proportion of Variance Explained (PVE) by typed genotypes (i.e. chip heritability or SNP heritability).

Model

We compute Var(ŷ) / Var(y) where y is the raw phenotypes, X contains all the genotypes, and ŷ = Xβ is the predicted (average) phenotype values from the statistical model β. Intercept is NOT included.

source
Missing docstring.

Missing docstring for parse_genotypes. Check Documenter's build log for details.

Missing docstring.

Missing docstring for convert_gt. Check Documenter's build log for details.

+ [mafs::Vector{Float64}], [min_ma::Integer])

Creates a random SnpArray in the current directory without missing value, where each SNP has ⫺5 (default) minor alleles.

Note: if supplied minor allele frequency is extremely small, it could take a long time for the simulation to generate samples where at least min_ma (defaults to 5) are present.

Arguments:

Optional Arguments:

source
MendelIHT.simulate_correlated_snparrayFunction
simulate_correlated_snparray(s, n, p; block_length, hap, prob)

Simulates a SnpArray with correlation. SNPs are divided into blocks where each adjacent SNP is the same with probability prob. There are no correlation between blocks.

Arguments:

  • n: number of samples
  • p: number of SNPs
  • s: name of SnpArray that will be created (memory mapped) in the current directory. To not memory map, use undef.

Optional arguments:

  • block_length: length of each LD block
  • hap: number of haplotypes to simulate for each block
  • prob: with probability prob an adjacent SNP would be the same.
source
Note

Simulating a SnpArray with $n$ subjects and $p$ SNPs requires up to $2np$ bits of RAM.

MendelIHT.simulate_random_responseFunction
simulate_random_response(x, k, d, l; kwargs...)

This function simulates a random response (trait) vector y. When the distribution d is from Poisson, Gamma, or Negative Binomial, we simulate β ∼ N(0, 0.3) to roughly ensure the mean of response y doesn't become too large. For other distributions, we choose β ∼ N(0, 1).

Arguments

  • x: Design matrix
  • k: the true number of predictors.
  • d: The distribution of the simulated trait (note typeof(d) = UnionAll but typeof(d()) is an actual distribution: e.g. Normal)
  • l: The link function. Input canonicallink(d()) if you want to use the canonical link of d.

Optional arguments

  • r: The number of success until stopping in negative binomial regression, defaults to 10
  • α: Shape parameter of the gamma distribution, defaults to 1
  • Zu: Effect of non-genetic covariates. Zu should have dimension n × 1.
source
simulate_random_response(x, k, traits)

Simulates a response matrix Y where each row is an independent multivariate Gaussian with length trait. There are k non-zero β over all traits. Each trait shares overlap causal SNPs. The covariance matrix Σ is positive definite and symmetric.

Arguments

  • x: Design matrix of dimension n × p. Each row is a sample.
  • k: the total true number of causal SNPs (predictors)
  • traits: Number of traits

Optional arguments

  • Zu: Effect of non-genetic covariates. Zu should have dimension n × traits.
  • overlap: Number of causal SNPs shared by all traits. Shared SNPs does not have the same effect size.

Outputs

  • Y: Response matrix where each row is sampled from a multivariate normal with mean μ[i] = X[i, :] * true_b and variance Σ
  • Σ: the symmetric, positive definite covariance matrix used
  • true_b: A sparse matrix containing true beta values.
  • correct_position: Non-zero indices of true_b
source
Note

For negative binomial and gamma, the link function must be LogLink.

MendelIHT.make_bim_fam_filesFunction
make_bim_fam_files(x::SnpArray, y, name::String)

Creates .bim and .bed files from a SnpArray.

Arguments:

  • x: A SnpArray (i.e. .bed file on the disk) for which you wish to create corresponding .bim and .fam files.
  • name: string that should match the .bed file (Do not include .bim or .fam extensions in name).
  • y: Trait vector that will go in to the 6th column of .fam file.
source

Other Useful Functions

MendelIHT additionally provides useful utilities that may be of interest to a few advanced users.

MendelIHT.iht_run_many_modelsFunction

Runs IHT across many different model sizes specifed in path using the full design matrix. Same as cv_iht but DOES NOT validate in a holdout set, meaning that this will definitely induce overfitting as we increase model size. Use this if you want to quickly estimate a range of feasible model sizes before engaging in full cross validation.

source
MendelIHT.pveFunction
pve(y, X, β; l = IdentityLink())

Estimates phenotype's Proportion of Variance Explained (PVE) by typed genotypes (i.e. chip heritability or SNP heritability).

Model

We compute Var(ŷ) / Var(y) where y is the raw phenotypes, X contains all the genotypes, and ŷ = Xβ is the predicted (average) phenotype values from the statistical model β. Intercept is NOT included.

source
Missing docstring.

Missing docstring for parse_genotypes. Check Documenter's build log for details.

Missing docstring.

Missing docstring for convert_gt. Check Documenter's build log for details.

diff --git a/dev/man/contributing/index.html b/dev/man/contributing/index.html index 1694baf6..1e6d5416 100644 --- a/dev/man/contributing/index.html +++ b/dev/man/contributing/index.html @@ -8,4 +8,4 @@ pages={giaa044}, year={2020}, publisher={Oxford University Press} -} +} diff --git a/dev/man/examples/index.html b/dev/man/examples/index.html index 17e5d826..2e9e7153 100644 --- a/dev/man/examples/index.html +++ b/dev/man/examples/index.html @@ -789,4 +789,4 @@ 4.7186 4.96944 0.0303161 0.162057 0.0303161 0.162057 - 3.72355 3.74153

Conclusion:

Other examples and functionalities

Additional features are available as optional parameters in the fit_iht function, but they should be treated as experimental features. Interested users are encouraged to explore them and please file issues on GitHub if you encounter a problem.

+ 3.72355 3.74153

Conclusion:

Other examples and functionalities

Additional features are available as optional parameters in the fit_iht function, but they should be treated as experimental features. Interested users are encouraged to explore them and please file issues on GitHub if you encounter a problem.

diff --git a/dev/man/getting_started/index.html b/dev/man/getting_started/index.html index 2d26b11a..53c475e9 100644 --- a/dev/man/getting_started/index.html +++ b/dev/man/getting_started/index.html @@ -11,4 +11,4 @@ # run MendelIHT: first cross validate for best k, then run IHT using best k mses = cross_validate(plinkfile, Normal, covariates=covariates, path=path) -iht_result = iht(plinkfile, Normal, k=path[argmin(mses)])

Then in the terminal you can do:

julia iht.jl plinkfile covariates.txt

You should get progress printed to your terminal and have cviht.summary.txt, iht.summary.txt, and iht.beta.txt files saved to your local directory

+iht_result = iht(plinkfile, Normal, k=path[argmin(mses)])

Then in the terminal you can do:

julia iht.jl plinkfile covariates.txt

You should get progress printed to your terminal and have cviht.summary.txt, iht.summary.txt, and iht.beta.txt files saved to your local directory

diff --git a/dev/man/math/index.html b/dev/man/math/index.html index d970c284..a98587f0 100644 --- a/dev/man/math/index.html +++ b/dev/man/math/index.html @@ -55,4 +55,4 @@ \frac{d^2}{dr^2} L(p_1, ..., p_m, r) =&\sum_{i=1}^m \left[ \operatorname{trigamma}(y_i+r) - \operatorname{trigamma}(r) + \frac{1}{r} - \frac{2}{\mu_i + r} + \frac{r+y_i}{(\mu_i + r)^2} \right] \end{aligned}\]

So the iteration to use is:

\[\begin{aligned} r_{n+1} = r_n - \frac{\frac{d}{dr}L(p_1,...,p_m,r)}{\frac{d^2}{dr^2}L(p_1,...,p_m,r)}. -\end{aligned}\]

For stability, we set the denominator equal to $1$ if it is less than 0. That is, we use gradient descent if the current iteration has non-positive definite Hessian matrices.

+\end{aligned}\]

For stability, we set the denominator equal to $1$ if it is less than 0. That is, we use gradient descent if the current iteration has non-positive definite Hessian matrices.

diff --git a/dev/search/index.html b/dev/search/index.html index 711fdac8..6333f1f7 100644 --- a/dev/search/index.html +++ b/dev/search/index.html @@ -1,2 +1,2 @@ -Search · MendelIHT

Loading search...

    +Search · MendelIHT

    Loading search...