-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
median and upper,lower percentile #43
Comments
Hi, yes but.... Unfortunately these ordered set aggregate functions are new in Postgres 9.4.x and we're running 9.3.x I will have to upgrade the DB which will take a little bit of planning. Possibly in the next 2-3 weeks? |
@junghao there is the request from Steve above. It needs me to upgrade to the latest Postgres version. If you have time please could you try upgrading your development DB and testing the queries out so that Steve can check the results? I think for Ruapehu Crater Lake temp the query should be something like:
|
Thanks for looking into this. SS Steven Sherburn From: Geoff Clitheroe [email protected] @junghao there is the request from Steve above. It needs me to upgrade to — Notice: This email and any attachments are confidential. |
PSQL 9.4.2 fits=# select percentile_cont(0.8) within group (order by value desc) from fits.observation where sitepk = (select distinct on (sitepk) sitepk from fits.site join fits.network using (networkpk) where siteid = 'RU001' and networkid = 'VO' ) and typepk = (select typepk from fits.type where typeid = 't'); percentile_cont ----------------- 20.95 (1 row) |
Hi, I ordered the data and took the 20% and 80% values. The value Howard calculated doesn't seem right to me. Thanks, SS Steven Sherburn From: Howard Wu [email protected] PSQL 9.4.2 (1 row) Notice: This email and any attachments are confidential. |
I'm running these SQLs in the test environment database. Not sure if they are the same as your. Row count: fits=# select count(*) from fits.observation where sitepk = (select distinct on (sitepk) sitepk from fits.site join fits.network using (networkpk) where siteid = 'RU001' and networkid = 'VO' ) and typepk = (select typepk from fits.type where typeid = 't'); count ------- 40895 (1 row) Oldest row: fits=# select * from fits.observation where sitepk = (select distinct on (sitepk) sitepk from fits.site join fits.network using (networkpk) where siteid = 'RU001' and networkid = 'VO' ) and typepk = (select typepk from fits.type where typeid = 't') order by time limit 1; sitepk | typepk | methodpk | samplepk | time | value | error --------+--------+----------+----------+------------------------+-----------+---------- 316 | 11 | 7 | 1 | 2010-04-13 12:15:00+00 | 21.300000 | 0.000000 (1 row) Latest row: fits=# select * from fits.observation where sitepk = (select distinct on (sitepk) sitepk from fits.site join fits.network using (networkpk) where siteid = 'RU001' and networkid = 'VO' ) and typepk = (select typepk from fits.type where typeid = 't') order by time desc limit 1; sitepk | typepk | methodpk | samplepk | time | value | error --------+--------+----------+----------+------------------------+-----------+---------- 316 | 11 | 7 | 1 | 2015-01-31 22:15:00+00 | 41.150000 | 0.000000 (1 row) |
Okay Howard, that explains it. If I use the same data as you I get the Thanks, SS Steven Sherburn From: Howard Wu [email protected] I'm running these SQLs using the test data. Not sure if they are the same count40895 Oldest row: Latest row: — Notice: This email and any attachments are confidential. |
Given this seems easy to do, we would get more flexibility by being able to specify the two percentiles we wanted to calculate/plot in the url. For example, ....&percentile=20,80 |
AWS have now made the Postgres upgrade to 9.4 process very easy. I don't really want to do this right before going on leave. I'm suggesting Feb 2016 as a good time. Is it ok to wait till then for this feature? |
That is fine. |
I've done the 9.3 -> database upgrade so we should be able to look at doing this feature now. |
I'm currently doing this kind of thing in python-pandas. For me that is a better option, but maybe not for those who can't do that. Back to the same issues of deciding how much functionality we want in http://fits.geonet.org.nz/plot? |
Would you be able to implement median, lower 20% percentile and upper 80% percentile to plots? This would then allow us to say something like 'the temperature of Ruapehu Crater Lake is 42 degC; it is above 37degC only 20% of the time, so we think it is really high' , or something like that. Reading the documentation I think that percentile_cont(fraction) WITHIN GROUP (ORDER BY sort_expression) does that?
The reason for this is that no assumptions on the distribution or independence of data points are made.
The general idea of these kind of basic statistics on plots is really great and allows us to make some very quick assessments of the data very easily.
The text was updated successfully, but these errors were encountered: