Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow aggregating columns in the simplification process which are string or invalid #576

Open
sahand-asgarpour opened this issue Sep 11, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@sahand-asgarpour
Copy link
Contributor

Kind of request

Changing existing functionality

Enhancement Description

`

def _aggrfunc(self, cols, attributes_to_exclude: list[str], with_demand: bool):
    def aggregate_column(col_data, col_name: str):
        if col_name in attributes_to_exclude:
            return col_data.iloc[0]
        elif col_name == "rfid_c":
            return list(col_data)
        elif col_name in ["maxspeed", "avgspeed"]:
            return col_data.mean()
        elif with_demand and col_name == "demand_edge":
            return max(col_data)
        elif col_data.dtype == "O":
            col_data_unique_values = list(set(col_data))
            if len(col_data_unique_values) == 1:
                return col_data_unique_values[0]
            else:
                return str(col_data_unique_values)
        else:
            return col_data.iloc[0]
return {
    col: (lambda col_data, col_name=col: aggregate_column(col_data, col_name))
    for col in cols
}

`

"maxspeed", "avgspeed" columns can have string values of 30, 50. also they might have string values that eval of them are not float or int (invalid values).

Desired behaviour

`

def _aggrfunc(self, cols, attributes_to_exclude: list[str], with_demand: bool):
    def convert_to_numeric(val):
        """Convert value to float if possible, otherwise return np.nan."""
        try:
            return float(val)
        except (ValueError, TypeError):
            return np.nan

    def aggregate_column(col_data, col_name: str):
        if col_name in attributes_to_exclude:
            return col_data.iloc[0]
        elif col_name == "rfid_c":
            return list(col_data)
        elif col_name in ["maxspeed", "avgspeed"]:
            # Convert values to numeric, replacing invalid entries with np.nan
            numeric_col_data = pd.to_numeric(
                col_data.apply(convert_to_numeric), errors="coerce"
            )
            return numeric_col_data.mean()
        elif with_demand and col_name == "demand_edge":
            return max(col_data)
        elif col_data.dtype == "O":
            col_data_unique_values = list(set(col_data))
            if len(col_data_unique_values) == 1:
                return col_data_unique_values[0]
            else:
                return str(col_data_unique_values)
        else:
            return col_data.iloc[0]

    return {
        col: (lambda col_data, col_name=col: aggregate_column(col_data, col_name))
        for col in cols
    }

Use case

Graz project: The Austrian road network (up to and including the residential) first downloaded from OSM and then input for an OSdamage analysis. Below shows the invalid maxspeed of some of the road segments.

image

Additional Context

This issue is solved in ra2ce_graz forked repo of ra2ce.

@sahand-asgarpour sahand-asgarpour added the enhancement New feature or request label Sep 11, 2024
@sahand-asgarpour sahand-asgarpour added this to the Sprint 2024 Q2.4 milestone Sep 11, 2024
@ArdtK ArdtK removed this from the Sprint 2024 Q2.4 milestone Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants