Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safety Data new API #2734

Open
atalyaalon opened this issue Dec 5, 2024 · 11 comments · May be fixed by #2733
Open

Safety Data new API #2734

atalyaalon opened this issue Dec 5, 2024 · 11 comments · May be fixed by #2733
Assignees

Comments

@atalyaalon
Copy link
Collaborator

atalyaalon commented Dec 5, 2024

  • API template of existing documentation using OPEN API 3.0.0 in this pr.
    Existing paths: /involved, /involved/groupby, /city
    Regarding hotspots and users, we will implement them in the API after we finish building the current API for the existing state.
  • You can play with documentation (mock data) here
  1. Please review the API documentation created (of existing situation) together with existing tables in Mongo DB ("accidents" represent involved and "cities" represet the CBS cities)
  2. Please suggest a modified version of API together with Design for safety data tables, based on our existing DB and tables.
    You can use the API template I created with swaggerhub, or use another template for your choice.

I want you to take into consideration:

  • Existing mongo db table containing involved named "accidents" and can be found here) - you should have access
  • I assume that with the new API will need to perform some FE adjustments. That's OK, we'll adjust but we will consult Dror to make sure changes are minor and won't require a huge additional work.
  • In current implementation of API querying, we have "AND" between filter fields and "OR" between chosen field values
  • Performance should be good, just like current Safety Data API.

My suggestions and thoughts:

cbs_cities table:

  • In our current CBS cbs_cities table, I think we're missing only population field (other fields in mongoDB cities table are not used at the moment). believe you can add the population field from data_gov?
  • I suggest that FE will query cities info based on yishuv_symbol

cbs_streets table:

  • I suggest adding an API for it, or better - using our existing API
  • I suggest that FE will query streets info based on yishuv_symbol and street symbol.
    There for either: this won't enable to query a specific street in multiple cities, but I think that's ok for now (if we want to enable this, we will find workaround that is still based on
  • streets aliasing - (dataset can be found here) - I think that if we do, it will be the last feature since it's not a requirement from the municipality at the moment.

cbs_involved_for_safety_data table:

  • Existing API does not enable filtering by all involved parameters (for example weather field ), even though it exists in mongoDB. I think we should enable as much flexibility as possible to filter by all relevant fields.
    However, we can also start "lean" and only keep existing functionality with relevant fields, and later on to add additional fields that are not present.
  • Will we create a new table? (or perhaps we'll understand we want to use involved_markers_hebrew with improved structure?)
    If we create a new table, I assume this table will be similar to involved_markers_hebrew, just w/o the hebrew fields. Perhaps it's better to save in this table only with integer codes and link it with relevant hebrew views (star schema) for a light table.
  • I wonder if pagination is needed when data is large, let's discuss our options (I believe that the curr '/accidents' API pulling from mongo's "accidents" table is not paginated).
  • Note the difference between current field vehicle_vehicle_type_hebrew that contains the involve's vehicle_type AND vehicles that contains a list of all involved vehicles using the following mapping that contains grouping of all vehicles from the specific accident:
vehicle_type=1 ->"רכב נוסעים פרטי"
vehicle_type=2 -> "טרנזיט"
vehicle_type=3 -> "טנדר"
vehicle_type  in (4, 5, 6, 7, 24 ,25) -> "משאית"
vehicle_type  in (8, 9, 10, 19) -> "אופנוע"
vehicle_type in (11 ,18) -> 'אוטובוס'
vehicle_type = 12 -> 'מונית' 
vehicle_type = 13 -> 'רכב עבודה' 
vehicle_type = 14 -> 'טרקטור' 
vehicle_type = 15 -> 'אופניים' 
vehicle_type = 16 -> 'רכבת'
vehicle_type = 17 -> 'אחר ולא ידוע'
vehicle_type = 21 -> 'קורקינט חשמלי'
vehicle_type = 22 -> 'קלנועית חשמלית' 
vehicle_type = 23 -> 'אופניים חשמליים'
  • There are mappings done in the FE, for example here in injured_type, 4,5 are mapped to motorcycle, 6,7 to cyclist. Should we keep these mapping in FE? or transfer them to BE?
@ziv17
Copy link
Collaborator

ziv17 commented Dec 9, 2024

Hi @atalyaalon ,
Regarding
"Existing mongo db table containing involved named "accidents" and can be found here) - you should have access"
I did not find anything there. It looks empty (I tried both my yahoo and gmail accounts).

@ziv17
Copy link
Collaborator

ziv17 commented Dec 10, 2024

Hi @atalyaalon ,
regarding viewing the MongoDB tables. To which of my emails did you grant access? I tried logging in with both of my emails and did not see anything. Maybe I do not know where to look

@ziv17
Copy link
Collaborator

ziv17 commented Dec 21, 2024

Hi @atalyaalon ,
Regarding the _id field that is returned in the /involved request: what does it identify?

  1. Should it be the same for the same person that is involved in two different accidents that are returned in the same query?
  2. Should it be the same for the same person that is involved in the same accident that are returned in two different queries?
  3. Will just specifying the order of the returned items in each query suffice?

@ziv17
Copy link
Collaborator

ziv17 commented Dec 26, 2024

Hi @atalyaalon , @citizen-dror
Regarding the involved query parameters:

  1. The parameters have short names, that are not always self explanatory. If we want these names to stay, I will use a translation table from these short names to the DB column names we use in Anyway.
  2. In the API document, several parameters can be assigned a list of values. I understand that such a list will be specified in the URL using multiple appearances of the parameter name, with a single value each time.

@ziv17
Copy link
Collaborator

ziv17 commented Dec 29, 2024

Hi @atalyaalon , @citizen-dror
Regarding error handling,
In case of an error in a parameter, e.g. unknown parameter, or illegal value, Should the API return 404, or write error to log and return the answer as if that parameter was not specified?

@atalyaalon atalyaalon linked a pull request Dec 29, 2024 that will close this issue
@atalyaalon
Copy link
Collaborator Author

atalyaalon commented Jan 1, 2025

Calculated fields:
@citizen-dror FYI - I listed the calculated fields that are needed in the API.

We need the following fields / fields modifications in API.

@ziv17 the implementation of the fields below can be done in either python or sql (with SQLAlchemy), they can be done either with offline calculations (and mid tables if needed) OR during API query itself (some of the logic are very light and can be done during query) depends on your choice of implementation.

The most important things are:

  1. data consistency. That tables are updated as a whole (same as cache tables, same as markers_heberw and involved_markers_hebrew, in all we have data consistency)
  2. API Performance is kept ~the same. Both of involved (today's accident) query and of group by queries.

The calculated fields:

  1. vehicle_type_short_hebrew
    This field is not used by itself in API but we will use it for calculation of the other fields listed below, hence I added its query.
    This field is created by this mapping - see query
    Note that current mapping does not depend on year / provider_code fields. I verified it.
SELECT id,
       YEAR,
       provider_code,
       vehicle_type_hebrew,
       CASE
           WHEN id = 2 THEN 'טרנזיט'
           WHEN id = 3 THEN 'טנדר'
           WHEN id IN (4,
                       5,
                       6,
                       7,
                       24,
                       25) THEN 'משאית'
           WHEN id IN (8,
                       9,
                       10,
                       19) THEN 'אופנוע'
           WHEN id IN (11,
                       18) THEN 'אוטובוס'
           ELSE vehicle_type_hebrew
       END AS vehicle_type_short_hebrew
FROM vehicle_type
ORDER BY YEAR DESC, provider_code,
                    id
  1. injured_type_hebrew we send in API should be the injured_type_enriched_hebrew field in this query.
    Note that current mapping does not depend on year / provider_code fields. I verified it.
    Also note that even though we change the Hebrew field values, the FE is still querying by injured_type integer values and we keep it this way for now.
SELECT injured_type,
       injured_type_hebrew,
       vehicle_type,
       vehicle_type_hebrew,
       vehicle_type_short_hebrew,
       CASE
           WHEN injured_type IN (NULL,
                                 0) THEN CONCAT(SPLIT_PART('מעורב שלא נפגע', ' - ', 1), ' - ', vehicle_type_short_hebrew)
           WHEN vehicle_type IS NOT NULL THEN CONCAT(SPLIT_PART(injured_type_hebrew, ' - ', 1), ' - ', vehicle_type_short_hebrew)
           ELSE injured_type_hebrew
       END AS injured_type_enriched_hebrew
FROM
  (SELECT DISTINCT injured_type,
                   vehicle_type,
                   vehicle_type_hebrew,
                   injured_type_hebrew,
                   CASE
                       WHEN vehicle_type = 2 THEN 'טרנזיט'
                       WHEN vehicle_type = 3 THEN 'טנדר'
                       WHEN vehicle_type IN (4,
                                             5,
                                             6,
                                             7,
                                             24,
                                             25) THEN 'משאית'
                       WHEN vehicle_type IN (8,
                                             9,
                                             10,
                                             19) THEN 'אופנוע'
                       WHEN vehicle_type IN (11,
                                             18) THEN 'אוטובוס'
                       ELSE vehicle_type_hebrew
                   END AS vehicle_type_short_hebrew
   FROM involved
   LEFT JOIN vehicle_type ON involved.vehicle_type = vehicle_type.id
   AND involved.accident_year = vehicle_type.year
   AND involved.provider_code = vehicle_type.provider_code
   LEFT JOIN injured_type ON involved.injured_type = injured_type.id
   AND involved.accident_year = injured_type.year
   AND involved.provider_code = injured_type.provider_code) IVT

Note that I added in the query the calculation of vehicle_type_short_hebrew

  1. vehicle_vehicle_type_hebrew we send in API should be the vehicle_type_enriched_hebrew field in this query
    Note that current mapping does not depend on year / provider_code fields. I verified it.
    Also note that even though we change the Hebrew field values, the FE is still querying by vehicle_type integer values and we keep it this way for now.
SELECT injured_type,
       injured_type_hebrew,
       vehicle_type,
       vehicle_type_hebrew,
       CASE
           WHEN injured_type = 1 THEN injured_type_hebrew
           ELSE vehicle_type_hebrew
       END AS vehicle_type_enriched_hebrew
FROM
  (SELECT DISTINCT injured_type,
                   vehicle_type,
                   vehicle_type_hebrew,
                   injured_type_hebrew
   FROM involved
   LEFT JOIN vehicle_type ON involved.vehicle_type = vehicle_type.id
   AND involved.accident_year = vehicle_type.year
   AND involved.provider_code = vehicle_type.provider_code
   LEFT JOIN injured_type ON involved.injured_type = injured_type.id
   AND involved.accident_year = injured_type.year
   AND involved.provider_code = injured_type.provider_code) IVT
  1. vehicles we send in API should be the vehicles_distinct_concat_hebrew field in this query. The grouping is per accident (which is unique by accident_id, provider_code, accident_year - the three of them are primary keys in markers table which contains the accidents)
SELECT accident_id,
       provider_code,
       accident_year,
       STRING_AGG(vehicle_type_short_hebrew, ', '
                  ORDER BY vehicle_type_short_hebrew) AS vehicles_distinct_concat_hebrew
FROM
  (SELECT DISTINCT 
          vehicles.accident_id,
          vehicles.provider_code,
          vehicles.accident_year,
          CASE
              WHEN vehicle_type = 2 THEN 'טרנזיט'
              WHEN vehicle_type = 3 THEN 'טנדר'
              WHEN vehicle_type IN (4,
                                    5,
                                    6,
                                    7,
                                    24,
                                    25) THEN 'משאית'
              WHEN vehicle_type IN (8,
                                    9,
                                    10,
                                    19) THEN 'אופנוע'
              WHEN vehicle_type IN (11,
                                    18) THEN 'אוטובוס'
              ELSE vehicle_type_hebrew
          END AS vehicle_type_short_hebrew
   FROM vehicles
   LEFT JOIN vehicle_type ON vehicles.vehicle_type = vehicle_type.id
   AND vehicles.accident_year = vehicle_type.year
   AND vehicles.provider_code = vehicle_type.provider_code) IVT
GROUP BY accident_id,
         provider_code,
         accident_year

@atalyaalon
Copy link
Collaborator Author

atalyaalon commented Jan 1, 2025

Hi @atalyaalon , Regarding the _id field that is returned in the /involved request: what does it identify?

  1. Should it be the same for the same person that is involved in two different accidents that are returned in the same query?
  2. Should it be the same for the same person that is involved in the same accident that are returned in two different queries?
  3. Will just specifying the order of the returned items in each query suffice?

As discussed, it's the id field in involved, which is unique for each involve in the table (different from involved_id which is unique only within a specific accident)

@atalyaalon
Copy link
Collaborator Author

Hi @atalyaalon , @citizen-dror Regarding the involved query parameters:

  1. The parameters have short names, that are not always self explanatory. If we want these names to stay, I will use a translation table from these short names to the DB column names we use in Anyway.
  2. In the API document, several parameters can be assigned a list of values. I understand that such a list will be specified in the URL using multiple appearances of the parameter name, with a single value each time.
  1. Yes. I think that mapping in our Backed from short names to long names is a good idea
  2. Today in API, both options work for BE, but FE sends it with commas and not with multiple appearances.
    We can either handle both cases, or ask @citizen-dror to change it to a preferred option

@atalyaalon
Copy link
Collaborator Author

Hi @atalyaalon , @citizen-dror Regarding error handling, In case of an error in a parameter, e.g. unknown parameter, or illegal value, Should the API return 404, or write error to log and return the answer as if that parameter was not specified?

I think the API should return 404, @citizen-dror what do you think?

@atalyaalon atalyaalon linked a pull request Jan 1, 2025 that will close this issue
@atalyaalon atalyaalon linked a pull request Jan 1, 2025 that will close this issue
@ziv17
Copy link
Collaborator

ziv17 commented Jan 2, 2025

Hi @atalyaalon , Regarding the calculated fields:

  1. What are the names of the calculated fields that are returned by the /involved query?
  2. Is there a calculated field that is used as a query parameter?
  3. Is it OK to separate the grouping of the values and the conversion to Hebrew? I prefer to place in the tables numeric values.
  4. Regarding 3, and 4. Do the values injured_type_enriched_hebrew and vehicle_type_enriched_hebrew depend only on injured_type and vehicle_type?
  5. I did not see the vehicles table being used in these queries. Do we need it?

@atalyaalon
Copy link
Collaborator Author

atalyaalon commented Jan 9, 2025

Hi @atalyaalon , Regarding the calculated fields:

  1. What are the names of the calculated fields that are returned by the /involved query?
    following @citizen-dror's comment, the previous injured_type_enriched_hebrew is splitted into two fields:
  • vehicle_type_short_hebrew (by this mapping)
  • injured_type_short_hebrew - with the logic of injured_type_short_hebrew in this query
  • vehicle_vehicle_type_hebrew - with the logic of vehicle_type_enriched_hebrew field in this query
  • vehicles with the logic of vehicles in this query
  • I suggest the we'll also keep the existing field injured_type_hebrew in the API for now, until @citizen-dror will approve its removal.
  1. Is there a calculated field that is used as a query parameter?

Yes.
Existing injured_type_hebrew (in API it's injt filter) (that we'll keep in API)
vehicle_vehicle_type_hebrew (in API it's vcl filter)
vehicles (in API it's vcli filter)

  1. Is it OK to separate the grouping of the values and the conversion to Hebrew? I prefer to place in the tables numeric values.

Yes, I prefer it too, and ideally it will be built with localization in mind, we want to add localization to this API to English in the coming future.

  1. Regarding 3, and 4. Do the values injured_type_enriched_hebrew and vehicle_type_enriched_hebrew depend only on injured_type and vehicle_type?

As mentioned in 1 - we separated injured_type_enriched_hebrew into two fields.

  1. I did not see the vehicles table being used in these queries. Do we need it?
  1. It is used, in field vehicles in this query

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants