median and upper,lower percentile #43

rumachan · 2015-05-25T03:19:13Z

Would you be able to implement median, lower 20% percentile and upper 80% percentile to plots? This would then allow us to say something like 'the temperature of Ruapehu Crater Lake is 42 degC; it is above 37degC only 20% of the time, so we think it is really high' , or something like that. Reading the documentation I think that percentile_cont(fraction) WITHIN GROUP (ORDER BY sort_expression) does that?

The reason for this is that no assumptions on the distribution or independence of data points are made.

The general idea of these kind of basic statistics on plots is really great and allows us to make some very quick assessments of the data very easily.

gclitheroe · 2015-06-02T03:52:42Z

Hi,

yes but.... Unfortunately these ordered set aggregate functions are new in Postgres 9.4.x and we're running 9.3.x I will have to upgrade the DB which will take a little bit of planning. Possibly in the next 2-3 weeks?

gclitheroe · 2015-06-02T03:55:32Z

@junghao there is the request from Steve above. It needs me to upgrade to the latest Postgres version. If you have time please could you try upgrading your development DB and testing the queries out so that Steve can check the results? I think for Ruapehu Crater Lake temp the query should be something like:

select percentile_cont(0.8) within group (order by value desc) from fits.observation 
where sitepk = (select distinct on (sitepk) sitepk from fits.site join fits.network using (networkpk) where siteid = 'RU001' and networkid = 'VO' )
and typepk = (select typepk from fits.type where typeid = 't');

rumachan · 2015-06-02T20:21:06Z

Thanks for looking into this.

SS

Steven Sherburn
GNS Science
Wairakei
New Zealand

From: Geoff Clitheroe [email protected]
To: GeoNet/fits [email protected],
Cc: rumachan [email protected]
Date: 02/06/2015 15:55
Subject: Re: [fits] median and upper,lower percentile (#43)

@junghao there is the request from Steve above. It needs me to upgrade to
the latest Postgres version. If you have time please could you try
upgrading your development DB and testing the queries out so that Steve
can check the results? I think for Ruapehu Crater Lake temp the query
should be something like:
select percentile_cont(0.8) within group (order by value desc) from
fits.observation
where sitepk = (select distinct on (sitepk) sitepk from fits.site join
fits.network using (networkpk) where siteid = 'RU001' and networkid = 'VO'
)
and typepk = (select typepk from fits.type where typeid = 't');

—
Reply to this email directly or view it on GitHub.

Notice: This email and any attachments are confidential.
If received in error please destroy and immediately notify us.
Do not copy or disclose the contents.

junghao · 2015-06-02T21:07:34Z

PSQL 9.4.2

fits=# select percentile_cont(0.8) within group (order by value desc) from fits.observation
where 
      sitepk =
         (select distinct on (sitepk) sitepk from fits.site join fits.network using (networkpk)
            where siteid = 'RU001' and networkid = 'VO' ) 
    and 
      typepk = (select typepk from fits.type where typeid = 't');
 percentile_cont
-----------------
           20.95
(1 row)

rumachan · 2015-06-02T21:43:15Z

Hi,

I ordered the data and took the 20% and 80% values.
20th percentile = 21.09
80th percentile = 33.5
median = 26.53 (compared mean 27.37).
This agrees with using the 'percentile' function in a spreadsheet
(aaargh!)

The value Howard calculated doesn't seem right to me.

Thanks,

SS

Steven Sherburn
GNS Science
Wairakei
New Zealand

From: Howard Wu [email protected]
To: GeoNet/fits [email protected],
Cc: rumachan [email protected]
Date: 03/06/2015 09:07
Subject: Re: [fits] median and upper,lower percentile (#43)

PSQL 9.4.2
fits=# select percentile_cont(0.8) within group (order by value desc) from
fits.observation where sitepk = (select distinct on (sitepk) sitepk from
fits.site join fits.network using (networkpk) where siteid = 'RU001' and
networkid = 'VO' ) and typepk = (select typepk from fits.type where typeid
= 't');
percentile_cont
20.95

(1 row)
—
Reply to this email directly or view it on GitHub.

Notice: This email and any attachments are confidential.
If received in error please destroy and immediately notify us.
Do not copy or disclose the contents.

junghao · 2015-06-02T21:53:08Z

I'm running these SQLs in the test environment database. Not sure if they are the same as your.

Row count:

fits=# select count(*) from fits.observation where sitepk = (select distinct on (sitepk) sitepk from fits.site join fits.network using (networkpk) where siteid = 'RU001' and networkid = 'VO' ) and typepk = (select typepk from fits.type where typeid = 't');                  
 count
-------
 40895
(1 row)

Oldest row:

fits=# select * from fits.observation where sitepk = (select distinct on (sitepk) sitepk from fits.site join fits.network using (networkpk) where siteid = 'RU001' and networkid = 'VO' ) and typepk = (select typepk from fits.type where typeid = 't') order by time limit 1;
 sitepk | typepk | methodpk | samplepk |          time          |   value   |  error
--------+--------+----------+----------+------------------------+-----------+----------
    316 |     11 |        7 |        1 | 2010-04-13 12:15:00+00 | 21.300000 | 0.000000
(1 row)

Latest row:

fits=# select * from fits.observation where sitepk = (select distinct on (sitepk) sitepk from fits.site join fits.network using (networkpk) where siteid = 'RU001' and networkid = 'VO' ) and typepk = (select typepk from fits.type where typeid = 't') order by time desc limit 1;
 sitepk | typepk | methodpk | samplepk |          time          |   value   |  error
--------+--------+----------+----------+------------------------+-----------+----------
    316 |     11 |        7 |        1 | 2015-01-31 22:15:00+00 | 41.150000 | 0.000000
(1 row)

rumachan · 2015-06-02T22:14:10Z

Okay Howard, that explains it. If I use the same data as you I get the
same answer to the 20th percentile!

Thanks,

SS

Steven Sherburn
GNS Science
Wairakei
New Zealand

From: Howard Wu [email protected]
To: GeoNet/fits [email protected],
Cc: rumachan [email protected]
Date: 03/06/2015 09:53
Subject: Re: [fits] median and upper,lower percentile (#43)

I'm running these SQLs using the test data. Not sure if they are the same
to you.
Row count:
fits=# select count(*) from fits.observation where sitepk = (select
distinct on (sitepk) sitepk from fits.site join fits.network using
(networkpk) where siteid = 'RU001' and networkid = 'VO' ) and typepk =
(select typepk from fits.type where typeid = 't');

count

40895
(1 row)

Oldest row:
fits=# select * from fits.observation where sitepk = (select distinct on
(sitepk) sitepk from fits.site join fits.network using (networkpk) where
siteid = 'RU001' and networkid = 'VO' ) and typepk = (select typepk from
fits.type where typeid = 't') order by time limit 1;
sitepk | typepk | methodpk | samplepk | time | value
| error
--------+--------+----------+----------+------------------------+-----------+----------
316 | 11 | 7 | 1 | 2010-04-13 12:15:00+00 |
21.300000 | 0.000000
(1 row)

Latest row:
fits=# select * from fits.observation where sitepk = (select distinct on
(sitepk) sitepk from fits.site join fits.network using (networkpk) where
siteid = 'RU001' and networkid = 'VO' ) and typepk = (select typepk from
fits.type where typeid = 't') order by time desc limit 1;
sitepk | typepk | methodpk | samplepk | time | value
| error
--------+--------+----------+----------+------------------------+-----------+----------
316 | 11 | 7 | 1 | 2015-01-31 22:15:00+00 |
41.150000 | 0.000000
(1 row)

—
Reply to this email directly or view it on GitHub.

Notice: This email and any attachments are confidential.
If received in error please destroy and immediately notify us.
Do not copy or disclose the contents.

rumachan · 2015-06-02T22:30:57Z

Given this seems easy to do, we would get more flexibility by being able to specify the two percentiles we wanted to calculate/plot in the url. For example, ....&percentile=20,80

gclitheroe · 2015-12-15T19:49:47Z

AWS have now made the Postgres upgrade to 9.4 process very easy. I don't really want to do this right before going on leave. I'm suggesting Feb 2016 as a good time. Is it ok to wait till then for this feature?

rumachan · 2015-12-15T20:54:04Z

That is fine.

gclitheroe · 2016-02-18T03:47:58Z

I've done the 9.3 -> database upgrade so we should be able to look at doing this feature now.

rumachan · 2017-06-22T00:45:53Z

I'm currently doing this kind of thing in python-pandas. For me that is a better option, but maybe not for those who can't do that. Back to the same issues of deciding how much functionality we want in http://fits.geonet.org.nz/plot?

gclitheroe assigned junghao Jun 2, 2015

gclitheroe assigned gclitheroe and junghao and unassigned junghao and gclitheroe Feb 18, 2016

mabznz assigned mabznz and unassigned junghao Jun 26, 2017

mabznz added the Rich GUI label Jun 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

median and upper,lower percentile #43

median and upper,lower percentile #43

rumachan commented May 25, 2015

gclitheroe commented Jun 2, 2015

gclitheroe commented Jun 2, 2015

rumachan commented Jun 2, 2015

junghao commented Jun 2, 2015

rumachan commented Jun 2, 2015

junghao commented Jun 2, 2015

rumachan commented Jun 2, 2015

rumachan commented Jun 2, 2015

gclitheroe commented Dec 15, 2015

rumachan commented Dec 15, 2015

gclitheroe commented Feb 18, 2016

rumachan commented Jun 22, 2017

median and upper,lower percentile #43

median and upper,lower percentile #43

Comments

rumachan commented May 25, 2015

gclitheroe commented Jun 2, 2015

gclitheroe commented Jun 2, 2015

rumachan commented Jun 2, 2015

junghao commented Jun 2, 2015

rumachan commented Jun 2, 2015

junghao commented Jun 2, 2015

rumachan commented Jun 2, 2015

count

rumachan commented Jun 2, 2015

gclitheroe commented Dec 15, 2015

rumachan commented Dec 15, 2015

gclitheroe commented Feb 18, 2016

rumachan commented Jun 22, 2017