Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where did the #15

Open
ChadMcintire opened this issue May 21, 2023 · 5 comments
Open

Where did the #15

ChadMcintire opened this issue May 21, 2023 · 5 comments

Comments

@ChadMcintire
Copy link

I am curious in your utils.py where the calculate_gaussian_log_prob(log_std, noise) function came from? It doesn't look like the stable baselines or pytorch Log PDF of the normal distribution. So what is it if you don't mind answering.

@toshikwa
Copy link
Owner

toshikwa commented May 22, 2023

Hi, @ChadMcintire
It can be derived by simple mathematics. I calculated log N(stds * I|\mu=0, \sigma=stds * I).

@ChadMcintire
Copy link
Author

ChadMcintire commented May 22, 2023

Thank you for responding.

I am sorry I have been trying to derive it and I'm not asking you to derive anything.

$\frac{1}{\sigma \sqrt{2 \pi}} e^{\frac{1}{2}(\frac{x-\mu}{\sigma})^2} $

Which will simplify as follows:

$$\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac12 \big(\frac{x-\mu}{\sigma}\big)}$$

$$\log\Big(\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac12 \big(\frac{x-\mu}{\sigma}\big)}\Big) $$

$$= \log\Big(\frac{1}{\sigma\sqrt{2\pi}}\Big) + \log\Big(e^{-\frac12 \big(\frac{x-\mu}{\sigma}\big)}\Big) $$

$$= -\log\sigma -\log\sqrt{2\pi} - \frac12\big(\frac{x-\mu}{\sigma}\big)^2$$

This is the equation I saw on the stable baselines 3 code base and I like your code and want to understand it it better.

The equation in the code looks more like this:

$\log \mu(u|s) = -0.5 \times \log(2 \times \pi) \times n + \sum (-0.5 \times \epsilon^2 - \log \sigma^2) $

I'm unsure where the "sum" comes from, as well as the "n".

In your reply, you are saying that you derived the formula from a log normal distribution. The std * I, is that just the standard deviation * the identity is what you are saying. I'm sure I'm missing something, I apologize.

@toshikwa
Copy link
Owner

Hi, please note that the distribution is multivariate gaussian, and variables are independent. To be precise, sigma is N-dimensional diagonal matrix, whose diagonal elements are equal to stds.

So you need to calculate the probabilities of each dimension and multiply them together. Alternatively, calulate the log probabilities and sum them.

Because there is no need to calculate the probability itself, I calculated it this way to reduce the computation.

@ChadMcintire
Copy link
Author

ChadMcintire commented May 27, 2023

So if I start with the multivariate Gaussian PDF.

$(2 \pi)^{-\frac{k}{2}} det(\Sigma)^{-1/2} exp(-\frac{1}{2}(x - \mu)^T \Sigma^{-1}(x -mu))$

Taking the natural log of both sides

$-\frac{k}{2} ln(2 \pi) -\frac{1}{2} \ln det(\Sigma) - \frac{1}{2}(x - \mu)^T \Sigma^{-1}(x -mu)$

To make it look more like the code:

$-\frac{1}{2} ln(2 \pi) * k -\frac{1}{2}\ln det(\Sigma) - \frac{1}{2}(x - \mu)^T \Sigma^{-1}(x -mu)$

$-\frac{1}{2}\ln det(\Sigma) - \frac{1}{2}(x - \mu)^T \Sigma^{-1}(x -\mu) -\frac{1}{2} ln(2 \pi) * k$

The code looks like this
$\sum (-0.5 \times \mu^2 - \ln std) -0.5 \ln(2 \pi) \times k $

I'm so sorry I don't understand from you comment how you go from experssion 1 to the expression in your code. Since the last part of the expression is the same in both code, how do you derive the first part to match your code.

1st part of ln pdf:
$-\frac{1}{2}\ln det(\Sigma) - \frac{1}{2}(x - \mu)^T \Sigma^{-1}(x -\mu)$

vs. first part of the code.
$\sum (-0.5 \times \mu^2 - \ln std)$

@toshikwa
Copy link
Owner

Calculate the probability (or log probability) of each dimension independently and multiply (or add) them together, because these variables are independent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants