Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some fields are lost when a field.set processor is used #1941

Closed
hariso opened this issue Nov 5, 2024 · 2 comments · Fixed by ConduitIO/conduit-schema-registry#15
Closed
Labels
bug Something isn't working

Comments

@hariso
Copy link
Contributor

hariso commented Nov 5, 2024

Bug description

When the field.set processor is used, then some fields can be lost, like in the pipeline below that uses a generator source, field.set to add a new field, and a file destination.

When the field.set processor is not present, then the generated records have the fields as expected.

Steps to reproduce

  1. Run Conduit with this pipeline:
version: "2.2"
pipelines:
  - id: example-pipeline
    status: running
    name: "generator-to-file"
    connectors:
      - id: example-source
        type: source
        plugin: "generator"
        settings:
          format.options.airline: 'string'
          format.options.scheduledDeparture: 'time'
          format.type: 'structured'
          rate: 1
      - id: example-destination
        type: destination
        plugin: "file"
        settings:
          path: './destination.txt'
  1. Check contents of destination.txt with tail -f destination.txt | jq. All the records will look similar to this:
{
      "airline": "demicanon",
      "scheduledDeparture": "2024-11-05T13:57:19.436001Z"
}
  1. Stop Conduit, change the pipeline include a field.set processor, like below:
version: "2.2"
pipelines:
  - id: example-pipeline
    status: running
    name: "generator-to-file"
    processors:
      - id: add-city
        plugin: "field.set"
        settings:
          field: ".Payload.After.city"
          value: "Bedrock"
    connectors:
      - id: example-source
        type: source
        plugin: "generator"
        settings:
          format.options.airline: 'string'
          format.options.scheduledDeparture: 'time'
          format.type: 'structured'
          rate: 1
      - id: example-destination
        type: destination
        plugin: "file"
        settings:
          path: './destination.txt'
  1. Run Conduit. Check contents of destination.txt with tail -f destination.txt | jq. All the records will look similar to this:
{
      "airline": "rhombogenous",
      "scheduledDeparture": "1970-01-01T00:00:00.000007Z"
    }

Expected result:

Field scheduledDeparture has a non-zero value, city is present and set to Bedrock.

Actual result:

Field scheduledDeparture has a zero value, city is not present.

Version

Commit 4c3c375

@hariso hariso added bug Something isn't working triage Needs to be triaged labels Nov 5, 2024
@github-project-automation github-project-automation bot moved this to Triage in Conduit Main Nov 5, 2024
@lovromazgon
Copy link
Member

I can't really reproduce it, this is the record I get in the output:

{
  "position": "MjE=",
  "operation": "create",
  "metadata": {
    "conduit.source.connector.id": "example-pipeline:example-source",
    "opencdc.createdAt": "1730813554815653000",
    "opencdc.payload.schema.subject": "example-pipeline:example-source:payload",
    "opencdc.payload.schema.version": "2"
  },
  "key": "eXVjYQ==",
  "payload": {
    "before": null,
    "after": {
      "airline": "cuadra",
      "city": "Bedrock",
      "scheduledDeparture": "2024-11-05T13:32:34.815657Z"
    }
  }
}

@hariso
Copy link
Contributor Author

hariso commented Nov 5, 2024

Yeah, I changed the steps a bit, thinking they could be simplified. I'll update the steps in a bit to what they were (and they consistently reproduce this behavior). It looks like the issue is in our schema registry and the way we generate IDs.

@hariso hariso moved this from Triage to In Review in Conduit Main Nov 5, 2024
@hariso hariso removed the triage Needs to be triaged label Nov 5, 2024
@github-project-automation github-project-automation bot moved this from In Review to Done in Conduit Main Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants