Allow aggregating columns in the simplification process which are string or invalid #576

sahand-asgarpour · 2024-09-11T14:50:57Z

Kind of request

Changing existing functionality

Enhancement Description

`

def _aggrfunc(self, cols, attributes_to_exclude: list[str], with_demand: bool):
    def aggregate_column(col_data, col_name: str):
        if col_name in attributes_to_exclude:
            return col_data.iloc[0]
        elif col_name == "rfid_c":
            return list(col_data)
        elif col_name in ["maxspeed", "avgspeed"]:
            return col_data.mean()
        elif with_demand and col_name == "demand_edge":
            return max(col_data)
        elif col_data.dtype == "O":
            col_data_unique_values = list(set(col_data))
            if len(col_data_unique_values) == 1:
                return col_data_unique_values[0]
            else:
                return str(col_data_unique_values)
        else:
            return col_data.iloc[0]
return {
    col: (lambda col_data, col_name=col: aggregate_column(col_data, col_name))
    for col in cols
}

`

"maxspeed", "avgspeed" columns can have string values of 30, 50. also they might have string values that eval of them are not float or int (invalid values).

Desired behaviour

`

def _aggrfunc(self, cols, attributes_to_exclude: list[str], with_demand: bool):
    def convert_to_numeric(val):
        """Convert value to float if possible, otherwise return np.nan."""
        try:
            return float(val)
        except (ValueError, TypeError):
            return np.nan

    def aggregate_column(col_data, col_name: str):
        if col_name in attributes_to_exclude:
            return col_data.iloc[0]
        elif col_name == "rfid_c":
            return list(col_data)
        elif col_name in ["maxspeed", "avgspeed"]:
            # Convert values to numeric, replacing invalid entries with np.nan
            numeric_col_data = pd.to_numeric(
                col_data.apply(convert_to_numeric), errors="coerce"
            )
            return numeric_col_data.mean()
        elif with_demand and col_name == "demand_edge":
            return max(col_data)
        elif col_data.dtype == "O":
            col_data_unique_values = list(set(col_data))
            if len(col_data_unique_values) == 1:
                return col_data_unique_values[0]
            else:
                return str(col_data_unique_values)
        else:
            return col_data.iloc[0]

    return {
        col: (lambda col_data, col_name=col: aggregate_column(col_data, col_name))
        for col in cols
    }

Use case

Graz project: The Austrian road network (up to and including the residential) first downloaded from OSM and then input for an OSdamage analysis. Below shows the invalid maxspeed of some of the road segments.

Additional Context

This issue is solved in ra2ce_graz forked repo of ra2ce.

The text was updated successfully, but these errors were encountered:

sahand-asgarpour added the enhancement New feature or request label Sep 11, 2024

sahand-asgarpour added this to the Sprint 2024 Q2.4 milestone Sep 11, 2024

ArdtK removed this from the Sprint 2024 Q2.4 milestone Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow aggregating columns in the simplification process which are string or invalid #576

Allow aggregating columns in the simplification process which are string or invalid #576

sahand-asgarpour commented Sep 11, 2024

Allow aggregating columns in the simplification process which are string or invalid #576

Allow aggregating columns in the simplification process which are string or invalid #576

Comments

sahand-asgarpour commented Sep 11, 2024

Kind of request

Enhancement Description

Use case

Additional Context