How to speedup schedule[X:Y]
lookup
#259
-
I have a code that spends ~40% of time on this line of code:
Where I ask here because you have previously improved performance elsewhere so may have insight. |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments
-
Can you run your code through I suspect you'll get some minor speedup using an pd.Timestamp/numpy datetime instead - but I'd need to see where pandas is spending its time internally. |
Beta Was this translation helpful? Give feedback.
-
I reduced it to ~25% with some more optimisation - removing a copy() and adding another layer of caching (frequently accessed years). I'm ok with this now, but here is relevant trace in case anything bad sticks out
|
Beta Was this translation helpful? Give feedback.
-
By using nanos under the surface you can get a 100% speed up... start_ts, end_ts = pd.Timestamp(start_d), pd.Timestamp(end_d)
slc = cal._get_sessions_slice(start_ts, end_ts, False)
cal.schedule[slc] A method could be added to Also, could add access methods for each column. For example, |
Beta Was this translation helpful? Give feedback.
-
nanos only gave me a 20% boost, but I appreciate that's probably the best without moving to SQL. I ended up caching the query results for decent speedup. The nanos code in case anyone else is interested. Use numpy's searchsorted() not pandas
So probably not worth adding those access methods. |
Beta Was this translation helpful? Give feedback.
nanos only gave me a 20% boost, but I appreciate that's probably the best without moving to SQL. I ended up caching the query results for decent speedup.
The nanos code in case anyone else is interested. Use numpy's searchsorted() not pandas
So probably not worth adding those access methods.