Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

median and upper,lower percentile #43

Open
rumachan opened this issue May 25, 2015 · 12 comments
Open

median and upper,lower percentile #43

rumachan opened this issue May 25, 2015 · 12 comments
Assignees
Labels

Comments

@rumachan
Copy link

Would you be able to implement median, lower 20% percentile and upper 80% percentile to plots? This would then allow us to say something like 'the temperature of Ruapehu Crater Lake is 42 degC; it is above 37degC only 20% of the time, so we think it is really high' , or something like that. Reading the documentation I think that percentile_cont(fraction) WITHIN GROUP (ORDER BY sort_expression) does that?

The reason for this is that no assumptions on the distribution or independence of data points are made.

The general idea of these kind of basic statistics on plots is really great and allows us to make some very quick assessments of the data very easily.

@gclitheroe
Copy link

Hi,

yes but.... Unfortunately these ordered set aggregate functions are new in Postgres 9.4.x and we're running 9.3.x I will have to upgrade the DB which will take a little bit of planning. Possibly in the next 2-3 weeks?

@gclitheroe
Copy link

@junghao there is the request from Steve above. It needs me to upgrade to the latest Postgres version. If you have time please could you try upgrading your development DB and testing the queries out so that Steve can check the results? I think for Ruapehu Crater Lake temp the query should be something like:

select percentile_cont(0.8) within group (order by value desc) from fits.observation 
where sitepk = (select distinct on (sitepk) sitepk from fits.site join fits.network using (networkpk) where siteid = 'RU001' and networkid = 'VO' )
and typepk = (select typepk from fits.type where typeid = 't');

@rumachan
Copy link
Author

rumachan commented Jun 2, 2015

Thanks for looking into this.

SS

Steven Sherburn
GNS Science
Wairakei
New Zealand

From: Geoff Clitheroe [email protected]
To: GeoNet/fits [email protected],
Cc: rumachan [email protected]
Date: 02/06/2015 15:55
Subject: Re: [fits] median and upper,lower percentile (#43)

@junghao there is the request from Steve above. It needs me to upgrade to
the latest Postgres version. If you have time please could you try
upgrading your development DB and testing the queries out so that Steve
can check the results? I think for Ruapehu Crater Lake temp the query
should be something like:
select percentile_cont(0.8) within group (order by value desc) from
fits.observation
where sitepk = (select distinct on (sitepk) sitepk from fits.site join
fits.network using (networkpk) where siteid = 'RU001' and networkid = 'VO'
)
and typepk = (select typepk from fits.type where typeid = 't');


Reply to this email directly or view it on GitHub.

Notice: This email and any attachments are confidential.
If received in error please destroy and immediately notify us.
Do not copy or disclose the contents.

@junghao
Copy link
Contributor

junghao commented Jun 2, 2015

PSQL 9.4.2

fits=# select percentile_cont(0.8) within group (order by value desc) from fits.observation
where 
      sitepk =
         (select distinct on (sitepk) sitepk from fits.site join fits.network using (networkpk)
            where siteid = 'RU001' and networkid = 'VO' ) 
    and 
      typepk = (select typepk from fits.type where typeid = 't');
 percentile_cont
-----------------
           20.95
(1 row)

@rumachan
Copy link
Author

rumachan commented Jun 2, 2015

Hi,

I ordered the data and took the 20% and 80% values.
20th percentile = 21.09
80th percentile = 33.5
median = 26.53 (compared mean 27.37).
This agrees with using the 'percentile' function in a spreadsheet
(aaargh!)

The value Howard calculated doesn't seem right to me.

Thanks,

SS

Steven Sherburn
GNS Science
Wairakei
New Zealand

From: Howard Wu [email protected]
To: GeoNet/fits [email protected],
Cc: rumachan [email protected]
Date: 03/06/2015 09:07
Subject: Re: [fits] median and upper,lower percentile (#43)

PSQL 9.4.2
fits=# select percentile_cont(0.8) within group (order by value desc) from
fits.observation where sitepk = (select distinct on (sitepk) sitepk from
fits.site join fits.network using (networkpk) where siteid = 'RU001' and
networkid = 'VO' ) and typepk = (select typepk from fits.type where typeid
= 't');
percentile_cont
20.95

(1 row)

Reply to this email directly or view it on GitHub.

Notice: This email and any attachments are confidential.
If received in error please destroy and immediately notify us.
Do not copy or disclose the contents.

@junghao
Copy link
Contributor

junghao commented Jun 2, 2015

I'm running these SQLs in the test environment database. Not sure if they are the same as your.

Row count:

fits=# select count(*) from fits.observation where sitepk = (select distinct on (sitepk) sitepk from fits.site join fits.network using (networkpk) where siteid = 'RU001' and networkid = 'VO' ) and typepk = (select typepk from fits.type where typeid = 't');                  
 count
-------
 40895
(1 row)

Oldest row:

fits=# select * from fits.observation where sitepk = (select distinct on (sitepk) sitepk from fits.site join fits.network using (networkpk) where siteid = 'RU001' and networkid = 'VO' ) and typepk = (select typepk from fits.type where typeid = 't') order by time limit 1;
 sitepk | typepk | methodpk | samplepk |          time          |   value   |  error
--------+--------+----------+----------+------------------------+-----------+----------
    316 |     11 |        7 |        1 | 2010-04-13 12:15:00+00 | 21.300000 | 0.000000
(1 row)

Latest row:

fits=# select * from fits.observation where sitepk = (select distinct on (sitepk) sitepk from fits.site join fits.network using (networkpk) where siteid = 'RU001' and networkid = 'VO' ) and typepk = (select typepk from fits.type where typeid = 't') order by time desc limit 1;
 sitepk | typepk | methodpk | samplepk |          time          |   value   |  error
--------+--------+----------+----------+------------------------+-----------+----------
    316 |     11 |        7 |        1 | 2015-01-31 22:15:00+00 | 41.150000 | 0.000000
(1 row)

@rumachan
Copy link
Author

rumachan commented Jun 2, 2015

Okay Howard, that explains it. If I use the same data as you I get the
same answer to the 20th percentile!

Thanks,

SS

Steven Sherburn
GNS Science
Wairakei
New Zealand

From: Howard Wu [email protected]
To: GeoNet/fits [email protected],
Cc: rumachan [email protected]
Date: 03/06/2015 09:53
Subject: Re: [fits] median and upper,lower percentile (#43)

I'm running these SQLs using the test data. Not sure if they are the same
to you.
Row count:
fits=# select count(*) from fits.observation where sitepk = (select
distinct on (sitepk) sitepk from fits.site join fits.network using
(networkpk) where siteid = 'RU001' and networkid = 'VO' ) and typepk =
(select typepk from fits.type where typeid = 't');

count

40895
(1 row)

Oldest row:
fits=# select * from fits.observation where sitepk = (select distinct on
(sitepk) sitepk from fits.site join fits.network using (networkpk) where
siteid = 'RU001' and networkid = 'VO' ) and typepk = (select typepk from
fits.type where typeid = 't') order by time limit 1;
sitepk | typepk | methodpk | samplepk | time | value
| error
--------+--------+----------+----------+------------------------+-----------+----------
316 | 11 | 7 | 1 | 2010-04-13 12:15:00+00 |
21.300000 | 0.000000
(1 row)

Latest row:
fits=# select * from fits.observation where sitepk = (select distinct on
(sitepk) sitepk from fits.site join fits.network using (networkpk) where
siteid = 'RU001' and networkid = 'VO' ) and typepk = (select typepk from
fits.type where typeid = 't') order by time desc limit 1;
sitepk | typepk | methodpk | samplepk | time | value
| error
--------+--------+----------+----------+------------------------+-----------+----------
316 | 11 | 7 | 1 | 2015-01-31 22:15:00+00 |
41.150000 | 0.000000
(1 row)


Reply to this email directly or view it on GitHub.

Notice: This email and any attachments are confidential.
If received in error please destroy and immediately notify us.
Do not copy or disclose the contents.

@rumachan
Copy link
Author

rumachan commented Jun 2, 2015

Given this seems easy to do, we would get more flexibility by being able to specify the two percentiles we wanted to calculate/plot in the url. For example, ....&percentile=20,80

@gclitheroe
Copy link

AWS have now made the Postgres upgrade to 9.4 process very easy. I don't really want to do this right before going on leave. I'm suggesting Feb 2016 as a good time. Is it ok to wait till then for this feature?

@rumachan
Copy link
Author

That is fine.

@gclitheroe
Copy link

I've done the 9.3 -> database upgrade so we should be able to look at doing this feature now.

@gclitheroe gclitheroe assigned gclitheroe and junghao and unassigned junghao and gclitheroe Feb 18, 2016
@rumachan
Copy link
Author

I'm currently doing this kind of thing in python-pandas. For me that is a better option, but maybe not for those who can't do that. Back to the same issues of deciding how much functionality we want in http://fits.geonet.org.nz/plot?

@mabznz mabznz assigned mabznz and unassigned junghao Jun 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants