How to speedup `schedule[X:Y]` lookup #259

ValueRaider · 2022-11-04T17:48:33Z

ValueRaider
Nov 4, 2022

I have a code that spends ~40% of time on this line of code:

sched = cal.schedule[start_d:end_d]

Where start_d and end_d are datetime.date. This is probably because I am using full-size calendars (1950->now), but I was hoping Pandas.DatetimeIndex would use heuristics to quickly jump to the requested rows not dumb iterate through (unless it is using heuristics?)

I ask here because you have previously improved performance elsewhere so may have insight.

Answered by ValueRaider

Nov 19, 2022

nanos only gave me a 20% boost, but I appreciate that's probably the best without moving to SQL. I ended up caching the query results for decent speedup.

The nanos code in case anyone else is interested. Use numpy's searchsorted() not pandas

cal.schedule["idx_nanos"] = cal.schedule.index.values.astype("int64")
...
slc_start = cal.schedule["idx_nanos"].to_numpy().searchsorted(start_ts.value, side="left")
slc_end = cal.schedule["idx_nanos"].to_numpy().searchsorted(end_ts.value, side="right")
sched = s[slc_start:slc_end]

So probably not worth adding those access methods.

View full answer

gerrymanoim · 2022-11-04T20:06:43Z

gerrymanoim
Nov 4, 2022
Maintainer

Can you run your code through pyinstrument?

I suspect you'll get some minor speedup using an pd.Timestamp/numpy datetime instead - but I'd need to see where pandas is spending its time internally.

0 replies

ValueRaider · 2022-11-05T22:44:25Z

ValueRaider
Nov 5, 2022
Author

I reduced it to ~25% with some more optimisation - removing a copy() and adding another layer of caching (frequently accessed years). I'm ok with this now, but here is relevant trace in case anything bad sticks out

%	Time	Function
100.0%	1.667	GetExchangeSchedule
48.3%		0.805	__get__item  -  -  -  -  -  - # Guess = 1-year calendar not in cache so accessing and slicing full calendar
26.5%			0.442	_get_indexer_strict
20.7%			0.345	_take_with_is_copy
30.2%		0.503	GetCalendar  -  -  -  -  -  - # exchange_calendars
29.3%			0.488	get_calendar
29.2%				0.486	_get_cached_factory_output
20.0%		0.334	__get__item  -  -  -  -  -  -  # Guess = accessing cached 1-year calendar
19.6%			0.327	_getitem_axis
19.1%				0.318	_get_slice_axis

0 replies

maread99 · 2022-11-10T09:48:56Z

maread99
Nov 10, 2022
Collaborator

By using nanos under the surface you can get a 100% speed up...

start_ts, end_ts = pd.Timestamp(start_d), pd.Timestamp(end_d)
slc = cal._get_sessions_slice(start_ts, end_ts, False)
cal.schedule[slc]

A method could be added to ExchangeCalendar to retrieve subsets of the schedule in this way.

Also, could add access methods for each column. For example, sessions_opens(start, end) was deprecated in favor of .opens[start:end] as under-the-bonnet sessions_opens just called self.schedule.loc[start:end, "open"]. It could be reinstated and defined to use _get_sessions_slice for quicker access. Same for sessions_closes and similarly methods could be introduced for sessions_break_starts and sessions_break_ends.

0 replies

ValueRaider · 2022-11-19T22:40:47Z

ValueRaider
Nov 19, 2022
Author

nanos only gave me a 20% boost, but I appreciate that's probably the best without moving to SQL. I ended up caching the query results for decent speedup.

The nanos code in case anyone else is interested. Use numpy's searchsorted() not pandas

cal.schedule["idx_nanos"] = cal.schedule.index.values.astype("int64")
...
slc_start = cal.schedule["idx_nanos"].to_numpy().searchsorted(start_ts.value, side="left")
slc_end = cal.schedule["idx_nanos"].to_numpy().searchsorted(end_ts.value, side="right")
sched = s[slc_start:slc_end]

So probably not worth adding those access methods.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to speedup `schedule[X:Y]` lookup #259

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

How to speedup schedule[X:Y] lookup #259

ValueRaider Nov 4, 2022

Replies: 4 comments

gerrymanoim Nov 4, 2022 Maintainer

ValueRaider Nov 5, 2022 Author

maread99 Nov 10, 2022 Collaborator

ValueRaider Nov 19, 2022 Author

How to speedup `schedule[X:Y]` lookup #259

ValueRaider
Nov 4, 2022

gerrymanoim
Nov 4, 2022
Maintainer

ValueRaider
Nov 5, 2022
Author

maread99
Nov 10, 2022
Collaborator

ValueRaider
Nov 19, 2022
Author