Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat Request: HTTP polling source #16025

Open
stdrc opened this issue Mar 31, 2024 · 6 comments
Open

Feat Request: HTTP polling source #16025

stdrc opened this issue Mar 31, 2024 · 6 comments

Comments

@stdrc
Copy link
Member

stdrc commented Mar 31, 2024

Is your feature request related to a problem? Please describe.

A community user requested that in some simple use cases, users may already have a service that provide a web API that allows for polling events, thus it's can be sweet to have HTTP polling source support in RW. In such use cases, setting up CDC or Kafka can be an overkill.

Also, many web apps provide polling APIs, e.g. instant messaging apps. It can be easier to integrate with these APIs if RW directly supports HTTP polling source.

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

@github-actions github-actions bot added this to the release-1.8 milestone Mar 31, 2024
@stdrc stdrc removed this from the release-1.8 milestone Mar 31, 2024
@tabVersion
Copy link
Contributor

I'm not sure if exposing RW directly as a service is desirable. The internet only provides best-effort guarantees, and we cannot recover previous HTTP requests from the network interface. This could result in data loss, particularly during cluster recovery processes where requests that have already reached the RW cluster may be lost, possibly even after they have been responded to.
Additionally, we would need to implement some traffic balancing strategies before each CN receives HTTP requests, which could further complicate our deployment.

@stdrc
Copy link
Member Author

stdrc commented Mar 31, 2024

No it's not exposing RW as an HTTP service, it's having a task inside RW to poll an outside HTTP service to get "updates" or "events". I think that's basically how we currently work with Kafka. The outside HTTP service may or may not have some mechanism to set the consuming offset or something, but that can be discussed. Maybe we can just treat such polling source as non-recoverable append-only source, then, what we get is what we get, what we don't get (due to network issue or something) is just non-existence.

@xxchan
Copy link
Member

xxchan commented Mar 31, 2024

Since we've added MQTT source (#15388), which cannot be rewound and replayed either, so this shouldn't block HTTP polling source.

Exposing RW as an HTTP service (webhook?) is push-based. Considering integrations with more systems, I feel webhook is more widely used than polling sources (I'm not so sure though). But it's implementation will be more different than other sources.

Another common issue is no standard schema (jsonb?)


But it's implementation will be more different than other sources.

Edit: Maybe it's not that different. It just polls from socket. 🤔

Edit again:

Just realized MQTT is also push-based, but the client library provide poll API from EventLoop. It's not a big deal whether the protocol is push or pull. Just add an internal channel can change it. 🤡

@stdrc
Copy link
Member Author

stdrc commented Apr 1, 2024

Another common issue is no standard schema (jsonb?)

I think user can specify schema in source definition?

For event payload encoding, a benefit of HTTP polling source is that we can determine the content encoding by Content-Type header in the response.

Also we may need to allow setting request headers in WITH options.

Copy link
Contributor

github-actions bot commented Jun 6, 2024

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.

@BugenZhao
Copy link
Member

BugenZhao commented Jun 14, 2024

Just FYI: Below are the web connectors supported by a newly-emerged streaming system:

  • Source
    • Polling HTTP: no standard protocol
    • Server-sent events: can achieve exactly-once
    • WebSocket
  • Sink
    • Webhook

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants