Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process metadata differences between backends: new report and fix low hanging fruit #105

Closed
19 tasks
JohanKJSchreurs opened this issue Apr 11, 2023 · 5 comments
Closed
19 tasks
Assignees

Comments

@JohanKJSchreurs
Copy link
Contributor

JohanKJSchreurs commented Apr 11, 2023

Process metadata differences between backends at aggregator level

In the aggregator project, under #76 , we started creating a tool to report collection/process metadata differences across different backends, for #4

Below is a next iteration of reported process metadata issues. (iteration 3)
In other words, this to do: #4 (comment)

(Previous iteration of the report here: https://github.com/openEOPlatform/architecture-docs/issues/282 )

At the moment, in the current implementation, "merged" basically comes from the VITO backend and the other one is from he other backend, EODC as indicated in the diff

Though Sentinel Hub was included in the comparison by the tool, the report did not indicate any differences for Sentinel Hub.

The comparison is for the production environment (CLI option: -e prod )


Command:

python3 src/openeo_aggregator/metadata/validator.py -p -e prod

Report

Running metadata merging checks against backend urls:

Comparing /processes

  • https://openeo.eodc.eu/v1.0/ : array_labels (merging.py:446): Parameter 'data' field 'schema' value differs from merged.

    • merged {'type': 'array'}

    • value {'items': {'description': 'Any data type.'}, 'subtype': 'labeled-array', 'type': 'array'}

    • JSON diff:

      --- merged
      +++ https://openeo.eodc.eu/v1.0/
      @@ -1,3 +1,7 @@
       {
      +  "items": {
      +    "description": "Any data type."
      +  },
      +  "subtype": "labeled-array",
         "type": "array"
       }
      
  • https://openeo.eodc.eu/v1.0/ : array_find (merging.py:416): Missing parameters.

    • missing_parameters {'reverse'}
  • https://openeo.eodc.eu/v1.0/ : save_ml_model (merging.py:416): Missing parameters.

    • missing_parameters {'options', 'data'}
  • https://openeo.eodc.eu/v1.0/ : save_ml_model (merging.py:424): Extra parameters (not in merged listing).

    • extra_parameters {'model'}
  • https://openeo.eodc.eu/v1.0/ : load_ml_model (merging.py:416): Missing parameters.

    • missing_parameters {'id'}
  • https://openeo.eodc.eu/v1.0/ : load_ml_model (merging.py:424): Extra parameters (not in merged listing).

    • extra_parameters {'model'}
  • https://openeo.eodc.eu/v1.0/ : predict_random_forest (merging.py:424): Extra parameters (not in merged listing).

    • extra_parameters {'dimension'}
  • https://openeo.eodc.eu/v1.0/ : run_udf (merging.py:446): Parameter 'data' field 'schema' value differs from merged.

    • merged [{'items': {'description': 'Any data type.'}, 'minItems': 1, 'title': 'Array', 'type': 'array'}, {'description': 'A single value of any data type.', 'title': 'Single Value'}]

    • value [{'subtype': 'raster-cube', 'title': 'Raster data cube', 'type': 'object'}, {'items': {'description': 'Any data type.'}, 'minItems': 1.0, 'title': 'Array', 'type': 'array'}, {'description': 'A single value of any data type.', 'title': 'Single Value'}]

    • JSON diff:

      --- merged
      +++ https://openeo.eodc.eu/v1.0/
      @@ -1,9 +1,14 @@
       [
      +  {
      +    "subtype": "raster-cube",
      +    "title": "Raster data cube",
      +    "type": "object"
      +  },
         {
           "items": {
             "description": "Any data type."
           },
      -    "minItems": 1,
      +    "minItems": 1.0,
           "title": "Array",
           "type": "array"
         },
      
  • https://openeo.eodc.eu/v1.0/ : run_udf (merging.py:446): Parameter 'udf' field 'schema' value differs from merged.

    • merged [{'description': 'Absolute URL to a UDF', 'format': 'uri', 'pattern': '^https?://', 'subtype': 'uri', 'type': 'string'}, {'description': 'Path to a UDF uploaded to the server.', 'pattern': '^[^\r\n\\:\'"]+$', 'subtype': 'file-path', 'type': 'string'}, {'description': 'The multi-line source code of a UDF, must contain a newline/line-break.', 'pattern': '(\r\n|\r|\n)', 'subtype': 'udf-code', 'type': 'string'}]

    • value [{'description': 'Absolute URL to a UDF', 'format': 'uri', 'pattern': '^(http|https)://', 'subtype': 'uri', 'type': 'string'}, {'description': 'Path to a UDF uploaded to the server.', 'pattern': '^[^\r\n\\:\'"]+$', 'subtype': 'file-path', 'type': 'string'}, {'description': 'The multi-line source code of a UDF, must contain a newline/line-break.', 'pattern': '(\r\n|\r|\n)', 'subtype': 'udf-code', 'type': 'string'}]

    • JSON diff:

      --- merged
      +++ https://openeo.eodc.eu/v1.0/
      @@ -2,7 +2,7 @@
         {
           "description": "Absolute URL to a UDF",
           "format": "uri",
      -    "pattern": "^https?://",
      +    "pattern": "^(http|https)://",
           "subtype": "uri",
           "type": "string"
         },
      
  • https://openeo.eodc.eu/v1.0/ : run_udf (merging.py:478): Returns schema is different from merged.

    • merged {'description': 'The data processed by the UDF. The returned value can be of any data type and is exactly what the UDF code returns.', 'schema': {'description': 'Any data type.', 'title': 'Any'}}

    • value {'description': 'The data processed by the UDF.\n\n* Returns a raster data cube, if a raster data cube is passed for data. Details on the dimensions and dimension properties (name, type, labels, reference system and resolution) depend on the UDF.\n* If an array is passed for data, the returned value can be of any data type, but is exactly what the UDF returns.', 'schema': [{'subtype': 'raster-cube', 'title': 'Raster data cube', 'type': 'object'}, {'description': 'Any data type.', 'title': 'Any'}]}

    • JSON diff:

      --- merged
      +++ https://openeo.eodc.eu/v1.0/
      @@ -1,7 +1,14 @@
       {
      -  "description": "The data processed by the UDF. The returned value can be of any data type and is exactly what the UDF code returns.",
      -  "schema": {
      -    "description": "Any data type.",
      -    "title": "Any"
      -  }
      +  "description": "The data processed by the UDF.\n\n* Returns a raster data cube, if a raster data cube is passed for `data`. Details on the dimensions and dimension properties (name, type, labels, reference system and resolution) depend on the UDF.\n* If an array is passed for `data`, the returned value can be of any data type, but is exactly what the UDF returns.",
      +  "schema": [
      +    {
      +      "subtype": "raster-cube",
      +      "title": "Raster data cube",
      +      "type": "object"
      +    },
      +    {
      +      "description": "Any data type.",
      +      "title": "Any"
      +    }
      +  ]
       }
      
  • https://openeo.eodc.eu/v1.0/ : atmospheric_correction (merging.py:416): Missing parameters.

    • missing_parameters {'missionId', 'sza', 'vza', 'gnd', 'appendDebugBands', 'aot', 'cwv', 'raa'}
  • https://openeo.eodc.eu/v1.0/ : atmospheric_correction (merging.py:424): Extra parameters (not in merged listing).

    • extra_parameters {'options'}
  • https://openeo.eodc.eu/v1.0/ : atmospheric_correction (merging.py:446): Parameter 'method' field 'schema' value differs from merged.

    • merged {'type': 'string'}

    • value [{'enum': ['FORCE'], 'type': 'string'}, {'type': 'null'}]

    • JSON diff:

      --- merged
      +++ https://openeo.eodc.eu/v1.0/
      @@ -1,3 +1,11 @@
      -{
      -  "type": "string"
      -}
      +[
      +  {
      +    "enum": [
      +      "FORCE"
      +    ],
      +    "type": "string"
      +  },
      +  {
      +    "type": "null"
      +  }
      +]
      
  • https://openeo.eodc.eu/v1.0/ : atmospheric_correction (merging.py:446): Parameter 'method' field 'optional' value differs from merged.

    • merged True

    • value False

    • JSON diff:

      --- merged
      +++ https://openeo.eodc.eu/v1.0/
      @@ -1 +1 @@
      -true
      +false
      
  • atmospheric_correction (merging.py:305): Failed to merge process metadata: KeyError('default')

  • https://openeo.eodc.eu/v1.0/ : sar_backscatter (merging.py:416): Missing parameters.

    • missing_parameters {'options'}
  • https://openeo.eodc.eu/v1.0/ : sar_backscatter (merging.py:446): Parameter 'elevation_model' field 'default' value differs from merged.

    • merged None

    • value 'cop-dem-30m'

    • JSON diff:

      --- merged
      +++ https://openeo.eodc.eu/v1.0/
      @@ -1 +1 @@
      -null
      +"cop-dem-30m"
      
  • https://openeo.eodc.eu/v1.0/ : ard_normalized_radar_backscatter (merging.py:416): Missing parameters.

    • missing_parameters {'options', 'contributing_area'}
  • https://openeo.eodc.eu/v1.0/ : ard_normalized_radar_backscatter (merging.py:446): Parameter 'elevation_model' field 'default' value differs from merged.

    • merged None

    • value 'cop-dem-30m'

    • JSON diff:

      --- merged
      +++ https://openeo.eodc.eu/v1.0/
      @@ -1 +1 @@
      -null
      +"cop-dem-30m"
      
@JohanKJSchreurs JohanKJSchreurs changed the title Iteration 3 or report: Process metadata differences between backends at aggregator level Process metadata differences between backends: new report and fix low hanging fruit Apr 11, 2023
@soxofaan
Copy link
Member

Can you also try this against the dev instances?
e.g. the dev instance of the EODC backend at https://openeo-dev.eodc.eu/openeo/1.1.0/ seems to have the reverse parameter in array_find

@JohanKJSchreurs JohanKJSchreurs self-assigned this Apr 12, 2023
@JohanKJSchreurs
Copy link
Contributor Author

Unfortunately the comparison on the dev instances returns a very large report:

  • 13931 lines
  • 770 to dos.

This will be difficult to process. (Couldn't even replace the contents of the issue's description because the text is too large)

Even a more selective comparison did not improve it, comparing only vito and eodc dev environments give a report of similar size.

See attachment:
metadata-diff-dev-processes.md

@JohanKJSchreurs
Copy link
Contributor Author

JohanKJSchreurs commented Apr 19, 2023

Inspecting differences more closely, by backend pairs instead of the whole set of backends

Dev env, vito & creo

No differences found

python3 src/openeo_aggregator/metadata/validator.py -e dev -p -b vito creo

Production env, vito & creo:

There is no production env for creodias

Dev env, vito & SentinelHub

No differences found

python3 src/openeo_aggregator/metadata/validator.py -e dev -p -b vito sentinelhub

Production env, vito & SentinelHub

No differences found

python3 src/openeo_aggregator/metadata/validator.py -e prod -p -b vito sentinelhub

Production env, vito & SentinelHub

19 differences found

python3 src/openeo_aggregator/metadata/validator.py -e prod -p -b vito eodc

Dev env, vito & SentinelHub

768 differences found

Possibly, most of them are due to a different way of reporting defaults for properties of parameters.
To be investigated further.

python3 src/openeo_aggregator/metadata/validator.py -e dev -p -b vito eodc

JohanKJSchreurs added a commit that referenced this issue Apr 21, 2023
Allowing to compare parameters that have a lot of fields set to `null` or `0.0`
which were probably supposed to be left out entirely.
Work in progress.

#105
JohanKJSchreurs added a commit that referenced this issue Apr 27, 2023
…rant for incorrect defaults.

This was not reliable yet, and we would need a cleaner solution.
First make the tests work again.

Issue #105
@JohanKJSchreurs
Copy link
Contributor Author

Ran a new report after the EODC process descriptions were update on the dev environment, (https://github.com/openEOPlatform/architecture-docs/issues/324)

Report in attachment.

Most of the previous issues no longer occur, the number of issue has been reduced significantly, to 12 issues over all dev backends.

diff-process-metadata_dev_vito-eodc-creo-sh_2023-04-28.md

JohanKJSchreurs added a commit that referenced this issue Apr 28, 2023
JohanKJSchreurs added a commit that referenced this issue Apr 28, 2023
Updated test's expected result: missing params and extra params are now a sorted list

Issue #105
@jdries
Copy link
Contributor

jdries commented May 9, 2023

closing this one, will be covered by test suite

@jdries jdries closed this as completed May 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants