Connector/Source: Publish the request object in the record transformation like the config, stream_slice, stream_interval, etc. are #50395

rpopov · 2024-12-22T12:13:08Z

Discussed in #49971

^{Originally posted by rpopov December 20, 2024}
Status in Airbyte 1.2.0
The record transformation allows removing existing fields from the record and adding new fields by calculating them using:

JINJA templates
pre-defined objects found in the transformation's context:
- record
- stream_state
- stream_slice
- stream_interval
- stream_partition
the access to the response raw data is redirected through the record object, which by definition is only part of the response. Converting the response into valuable records sometimes it needs data beyond the scope of the record object, which is not accessible.

Example
The JIRA /issue response:

{
  "expand": "renderedFields,names,schema,operations,editmeta,changelog,versionedRepresentations",
  "id": "2759961",
  "self": "https://jira-test.paysafe.cloud/rest/api/2/issue/2759961",
  "key": "EF-3131",
  "fields": {
    "customfield_19440": {
      "self": "https://jira-test.paysafe.cloud/rest/api/2/customFieldOption/23882",
      "value": "Perú",
      "id": "23882",
      "disabled": false
    },
...
    "customfield_12073": "EF-3130"
  },
  "names": {
    "customfield_19442": "End Time",
...
    "customfield_17709": "BU Legal Lead"
  },
  "schema": {
    "customfield_19442": {
      "type": "date",
      "custom": "com.atlassian.jira.plugin.system.customfieldtypes:datepicker",
      "customId": 19442
    },
...
    "customfield_17709": {
      "type": "string",
      "custom": "com.atlassian.jira.plugin.system.customfieldtypes:textfield",
      "customId": 17709
    }
  },
}
]

In JIRA the list of custom fields is highly dynamic, and it makes no practical sense to have a single record per issue, combining in it all standard and custom fields.
Instead, the JIRA issue can be represented using 2 tables: ISSUE 1 --- * CUSTOM_FIELD in one-to-many / master-detail relation.
Turning the response into a list of records, one per custom field, could be done by iterating over the schema.* sub-list in the response, but taking their values from the fields map and taking their human-readble names from the name map.

Problem
In this configuration, the fields and names maps are not accessible. They could be, if the response object were available in the Tranformations' context. The response object exists in the context of the Pagination section.

Suggestion

Publish the response object in the context of the Transformations section/phase, as available in the Pagination section.

The text was updated successfully, but these errors were encountered:

At record extraction step, in each record add the service field $root holding a reference to: * the root response object, when parsing JSON format * the original record, when parsing JSONL format that each record to process is extracted from. More service fields could be added in future. The service fields are available in the record's filtering and transform steps. Avoid: * reusing the maps/dictionaries produced, thus avoid building cyclic structures * transforming the service fields in the Flatten transformation. Explicitly cleanup the service field(s) after the transform step, thus making them: * local for the filter and transform steps * not visible to the next mapping and store steps (as they should be) * not visible in the tests beyond the test_record_selector (as they should be) This allows the record transformation logic to define its "local variables" to reuse some interim calculations. The contract of body parsing seems irregular in representing the cases of bad JSON, no JSON and empty JSON. Cannot be unified as that that irregularity is already used. Update the development environment setup documentation * to organize and present the setup steps explicitly * to avoid misunderstandings and wasted efforts. Update CONTRIBUTING.md to * collect and organize the knowledge on running the test locally. * state the actual testing steps. * clarify and make explicit the procedures and steps. The unit, integration, and acceptance tests in this exactly version succeed under Fedora 41, while one of them fails under Oracle Linux 8.7. not related to the contents of this PR. The integration tests of the CDK fail due to missing `secrets/config.json` file for the Shopify source. See airbytehq#197

rpopov · 2025-01-13T10:59:31Z

Replaced with airbytehq/airbyte-python-cdk#214

rpopov · 2025-01-17T16:18:37Z

The JIRA API can surprise you with another case that is impossible to store in (second-level) tables:
API call: GET https:///rest/api/2/issue/PPP-49571?fields=key&expand=changelog
Real-life response:

 "expand": "renderedFields,names,schema,operations,editmeta,changelog,versionedRepresentations",
  "id": "1589758",
  "self": "https://<JIRA host>/rest/api/2/issue/1589758",
  "key": "PPP-49571",
  "changelog": {
    "startAt": 0,
    "maxResults": 1,
    "total": 1,
    "histories": [
      {
        "id": "12613893",
        "author": {
          "self": "https://jira.paysafe.cloud/rest/api/2/user?username=patrickpoell",
          "name": "patrickpoell",
          "key": "patpoe",
          "emailAddress": "[email protected]",
          "avatarUrls": {
            "48x48": "https://<JIRA host>/secure/useravatar?ownerId=patpoe&avatarId=18562",
            "24x24": "https://<JIRA host>/secure/useravatar?size=small&ownerId=patpoe&avatarId=18562",
            "16x16": "https://<JIRA host>/secure/useravatar?size=xsmall&ownerId=patpoe&avatarId=18562",
            "32x32": "https://<JIRA host>/secure/useravatar?size=medium&ownerId=patpoe&avatarId=18562"
          },
          "displayName": "Patrick Poell",
          "active": true,
          "timeZone": "Europe/London"
        },
        "created": "2021-03-11T15:27:33.000+0000",
        "items": [
          {
            "field": "summary",
            "fieldtype": "jira",
            "from": null,
            "fromString": "CLONE - investigate",
            "to": null,
            "toString": "remove from mypins-integration-test-components"
          },
          {
            "field": "labels",
            "fieldtype": "jira",
            "from": null,
            "fromString": "Appium",
            "to": null,
            "toString": null
          }]}]}}

The attempt to store the elements of the change log (path:histories[].items[]) is not possible even with the suggested change above, as each such element requires the timestamp from the parent histories[*] and the issue key from the root JSON object.

Suggestion

In each nested JSON object add a $parent reference to the owner JSON object.

Pros and cons

(pro) This change would allow navigating from any object no matter how deeply (but statically) nested in the response to access and inherit data from the parent objects thus storing a single response in several separate tables.
(con) Automated traversal of such enhanced JSON objects (like deep copy, print, etc.) would go in infinite recursion.

octavia-squidington-iii added autoteam community team/use labels Dec 22, 2024

rpopov mentioned this issue Jan 13, 2025

feat: In each record to filter and transform, publish a local service field holding the original object the record is extracted from airbytehq/airbyte-python-cdk#214

Open

rpopov closed this as completed Jan 13, 2025

rpopov reopened this Jan 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connector/Source: Publish the request object in the record transformation like the config, stream_slice, stream_interval, etc. are #50395

Connector/Source: Publish the request object in the record transformation like the config, stream_slice, stream_interval, etc. are #50395

rpopov commented Dec 22, 2024

rpopov commented Jan 13, 2025

rpopov commented Jan 17, 2025

Connector/Source: Publish the *request* object in the record transformation like the config, stream_slice, stream_interval, etc. are #50395

Connector/Source: Publish the *request* object in the record transformation like the config, stream_slice, stream_interval, etc. are #50395

Comments

rpopov commented Dec 22, 2024

Discussed in #49971

rpopov commented Jan 13, 2025

rpopov commented Jan 17, 2025

Connector/Source: Publish the request object in the record transformation like the config, stream_slice, stream_interval, etc. are #50395

Connector/Source: Publish the request object in the record transformation like the config, stream_slice, stream_interval, etc. are #50395