Where did the #15

ChadMcintire · 2023-05-21T14:44:59Z

I am curious in your utils.py where the calculate_gaussian_log_prob(log_std, noise) function came from? It doesn't look like the stable baselines or pytorch Log PDF of the normal distribution. So what is it if you don't mind answering.

toshikwa · 2023-05-22T02:27:13Z

Hi, @ChadMcintire
It can be derived by simple mathematics. I calculated log N(stds * I|\mu=0, \sigma=stds * I).

ChadMcintire · 2023-05-22T17:41:04Z

Thank you for responding.

I am sorry I have been trying to derive it and I'm not asking you to derive anything.

$\frac{1}{\sigma \sqrt{2 \pi}} e^{\frac{1}{2}(\frac{x-\mu}{\sigma})^2} $

Which will simplify as follows:

$$\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac12 \big(\frac{x-\mu}{\sigma}\big)}$$

$$\log\Big(\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac12 \big(\frac{x-\mu}{\sigma}\big)}\Big) $$

$$= \log\Big(\frac{1}{\sigma\sqrt{2\pi}}\Big) + \log\Big(e^{-\frac12 \big(\frac{x-\mu}{\sigma}\big)}\Big) $$

$$= -\log\sigma -\log\sqrt{2\pi} - \frac12\big(\frac{x-\mu}{\sigma}\big)^2$$

This is the equation I saw on the stable baselines 3 code base and I like your code and want to understand it it better.

The equation in the code looks more like this:

$\log \mu(u|s) = -0.5 \times \log(2 \times \pi) \times n + \sum (-0.5 \times \epsilon^2 - \log \sigma^2) $

I'm unsure where the "sum" comes from, as well as the "n".

In your reply, you are saying that you derived the formula from a log normal distribution. The std * I, is that just the standard deviation * the identity is what you are saying. I'm sure I'm missing something, I apologize.

toshikwa · 2023-05-22T17:57:14Z

Hi, please note that the distribution is multivariate gaussian, and variables are independent. To be precise, sigma is N-dimensional diagonal matrix, whose diagonal elements are equal to stds.

So you need to calculate the probabilities of each dimension and multiply them together. Alternatively, calulate the log probabilities and sum them.

Because there is no need to calculate the probability itself, I calculated it this way to reduce the computation.

ChadMcintire · 2023-05-27T21:05:43Z

So if I start with the multivariate Gaussian PDF.

$(2 \pi)^{-\frac{k}{2}} det(\Sigma)^{-1/2} exp(-\frac{1}{2}(x - \mu)^T \Sigma^{-1}(x -mu))$

Taking the natural log of both sides

$-\frac{k}{2} ln(2 \pi) -\frac{1}{2} \ln det(\Sigma) - \frac{1}{2}(x - \mu)^T \Sigma^{-1}(x -mu)$

To make it look more like the code:

$-\frac{1}{2} ln(2 \pi) * k -\frac{1}{2}\ln det(\Sigma) - \frac{1}{2}(x - \mu)^T \Sigma^{-1}(x -mu)$

$-\frac{1}{2}\ln det(\Sigma) - \frac{1}{2}(x - \mu)^T \Sigma^{-1}(x -\mu) -\frac{1}{2} ln(2 \pi) * k$

The code looks like this
$\sum (-0.5 \times \mu^2 - \ln std) -0.5 \ln(2 \pi) \times k $

I'm so sorry I don't understand from you comment how you go from experssion 1 to the expression in your code. Since the last part of the expression is the same in both code, how do you derive the first part to match your code.

1st part of ln pdf:
$-\frac{1}{2}\ln det(\Sigma) - \frac{1}{2}(x - \mu)^T \Sigma^{-1}(x -\mu)$

vs. first part of the code.
$\sum (-0.5 \times \mu^2 - \ln std)$

toshikwa · 2023-05-30T06:24:06Z

Calculate the probability (or log probability) of each dimension independently and multiply (or add) them together, because these variables are independent.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Where did the #15

Where did the #15

ChadMcintire commented May 21, 2023

toshikwa commented May 22, 2023 •

edited

Loading

ChadMcintire commented May 22, 2023 •

edited

Loading

toshikwa commented May 22, 2023

ChadMcintire commented May 27, 2023 •

edited

Loading

toshikwa commented May 30, 2023

Where did the #15

Where did the #15

Comments

ChadMcintire commented May 21, 2023

toshikwa commented May 22, 2023 • edited Loading

ChadMcintire commented May 22, 2023 • edited Loading

toshikwa commented May 22, 2023

ChadMcintire commented May 27, 2023 • edited Loading

toshikwa commented May 30, 2023

toshikwa commented May 22, 2023 •

edited

Loading

ChadMcintire commented May 22, 2023 •

edited

Loading

ChadMcintire commented May 27, 2023 •

edited

Loading