Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aerobulk-python is SLOWWW #32

Open
jbusecke opened this issue Jun 16, 2022 · 4 comments
Open

aerobulk-python is SLOWWW #32

jbusecke opened this issue Jun 16, 2022 · 4 comments

Comments

@jbusecke
Copy link
Contributor

I am currently struggling with what I perceive as very slow performance with aerobulk-python.

For some of the work we are doing over at https://github.com/ocean-transport/scale-aware-air-sea, it seems that a single timestep of CM2.6 data takes about 45-50s to execute.

This does not seem to be a particularity of the data wer are using, since I am getting similar times for synthetic data:

from typing import Dict, Tuple
import time
import numpy as np
import pytest
import xarray as xr
from aerobulk import noskin

"""Tests for the xarray wrapper"""


def create_data(
    shape: Tuple[int, ...],
    chunks: Dict[str, int] = {},
    skin_correction: bool = False,
    order: str = "F",
):
    def _arr(value, chunks, order):
        arr = xr.DataArray(np.full(shape, value, order=order))

        # adds random noise scaled by a percentage of the value
        randomize_factor = 0.001
        randomize_range = value * randomize_factor
        arr = arr + np.random.rand(*shape) + randomize_range
        if chunks:
            arr = arr.chunk(chunks)
        return arr

    sst = _arr(290.0, chunks, order)
    t_zt = _arr(280.0, chunks, order)
    hum_zt = _arr(0.001, chunks, order)
    u_zu = _arr(1.0, chunks, order)
    v_zu = _arr(-1.0, chunks, order)
    slp = _arr(101000.0, chunks, order)
    rad_sw = _arr(0.000001, chunks, order)
    rad_lw = _arr(350, chunks, order)
    if skin_correction:
        return sst, t_zt, hum_zt, u_zu, v_zu, rad_sw, rad_lw, slp
    else:
        return sst, t_zt, hum_zt, u_zu, v_zu, slp
    
data = create_data((3600, 2700, 2), chunks=None)

When I time this execution

tic = time.time()
out_data = noskin(*data)
toc = time.time() - tic

I get a runtime of ~100sec, so about 50s per timestep.

Am I expecting too much of the fortran code? Or is this slow for computing on ~1e7 data points.

I wonder if there is some obvious issue with our compiler flags?

@rabernat
Copy link
Contributor

I would make a plot of how long the execution time takes as a function of domain size. Just 4-5 points should be enough to see whether it is scaling linearly. If so, we can focus on optimizing with just a small piece of data.

@jbusecke
Copy link
Contributor Author

Good idea! Ill do that in a bit.

@jbusecke
Copy link
Contributor Author

For results see here: ocean-transport/scale-aware-air-sea#28 (comment)

So from this I conclude that we can optimize on a small domain size e.g. 100x100.

@jbusecke
Copy link
Contributor Author

Some relief was brought by #40 (see benchmarks there).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants