Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create an open source data layer #793

Open
tpatel opened this issue Mar 6, 2024 · 23 comments
Open

Create an open source data layer #793

tpatel opened this issue Mar 6, 2024 · 23 comments
Labels
data layer Pertains to data layers. good first issue Good for newcomers help wanted Extra attention is needed

Comments

@tpatel
Copy link
Contributor

tpatel commented Mar 6, 2024

Chainlit has a feature that enables you to store, analyze and persist your data with Literal.
You can also create your own custom data layer to store the data in your database.

The goal of this ticket is to create an implementation of a custom data layer using a database like postgres or redis.
You can see an example of a custom data layer in this test, although it doesn't connect to a database.

@hayescode
Copy link
Contributor

Only existing version (using chainlit version 0.7.0) can be found here

Will Chainlit Devs assist in this effort or will this be 100% community-driven?

@tpatel
Copy link
Contributor Author

tpatel commented Mar 7, 2024

@hayescode thanks for sharing this repo. They decided to rebuild a graphql API.
With the custom data layer feature, you should be able to store data in any database without having to build this graphql API, thanks to the interface.

A custom data layer needs:

  1. One class that inherit from chainlit.data.BaseDataLayer
  2. In your chainlit app, you need:
import chainlit.data as cl_data

cl_data._data_layer = MyDataLayer()

This removes the need to stay compatible with the graphql API, the need to maintain a server reachable via your chainlit app. And it enables a direct connection to your DB.

I'm looking into other chainlit tasks at the moment, but I'll be around to review PRs and help!

@AndreasMarcec
Copy link

I'm currently working on my own Data Layer which is based on the BaseDataLayer. Could you please specify which of the features would be mandatory in order to create a valid PR?

@Rajatkhanna801
Copy link

Hi
I want to add MSAL authentication in custom layer. Can anyone help me with that.

@sandangel
Copy link
Contributor

Hi, I'm implementing this PR: #796
Is it possible to share where do we create_thread? I could not find it in the code.

@tpatel
Copy link
Contributor Author

tpatel commented Mar 12, 2024

I'm currently working on my own Data Layer which is based on the BaseDataLayer. Could you please specify which of the features would be mandatory in order to create a valid PR?

@AndreasMarcec The best would be a complete implementation that overrides all methods from the BaseDataLayer (https://github.com/Chainlit/chainlit/blob/main/backend/chainlit/data/__init__.py#L53). You could start with a partial implementation though, and get help if you're stuck on anything.

@tpatel
Copy link
Contributor Author

tpatel commented Mar 12, 2024

Hi I want to add MSAL authentication in custom layer. Can anyone help me with that.

@Rajatkhanna801 the best would be to start with the Authentication callback. This is a configuration on a per-app basis. No need for custom data layer if you just need MSAL Authentication.
I'm not familiar with MSAL, feel free to create another issue if you encounter any issue.

@Rajatkhanna801
Copy link

@tpatel
I have already done with MSAL authentication and it is working preety good. Now I am creating custom layer to add data in SQLite database.

@Rajatkhanna801
Copy link

@tpatel I need one help the SQLite database needs to create a user table is there is already predefined model structure for user table?

@hayescode
Copy link
Contributor

hayescode commented Mar 15, 2024

Hi, I'm implementing this PR: #796
Is it possible to share where do we create_thread? I could not find it in the code.

@sandangel the update_thread function is an upsert. I agree it's weird everything else has a create/update/delete but not for threads..

@hayescode
Copy link
Contributor

@tpatel if you could share the DDL for the backend tables I think it would speed up each of our developments. Thanks!

@sandangel
Copy link
Contributor

@hayescode I updated the code to mimic literalai client instead. It's working now, just need a few update on filter.

@hayescode
Copy link
Contributor

@sandangel why did you do that?

@sandangel
Copy link
Contributor

@hayescode I explained in the PR.

@hayescode
Copy link
Contributor

@willydouhard @tpatel @constantinidan implementing a custom data layer is proving difficult with the intertwining of literalai in chainlit/data. Literalai is effectively a dependency as-is even if literalai isn't used and will make long-term support more difficult. For example chainlit expects pagination, thread filters, etc. types from literalai in order to work. These would ideally be in Chainlit.

Will chainlit be refactored to natively support Chainlit functionality?

@hayescode
Copy link
Contributor

I just opened a PR to add a Postgres custom data layer with ADLS support -> #825

@tpatel @willydouhard

@tjroamer
Copy link

Just opened a PR to add a simple file-based SQLite data layer -> #832

No need to set up extra database. By default, the data is persisted in chainlit.db in working dir. The user can use any SQLite database tool to view the DB.

@tpatel @willydouhard

@hayescode
Copy link
Contributor

@tpatel @willydouhard Here's my PR for adding a dialect agnostic SQLAlchemy custom data layer. We've been getting more community contributions on this lately, can we move this to 'In Progress' or 'In Review'?

#836

@wfjt
Copy link

wfjt commented Jun 1, 2024

Something I'd like to raise is the coupling of session handling. If Chainlit used a session service/adapter interface and didn't spread session handling everywhere and force a stateful architecture, one could implement a Redis session adapter for example and run Chainlit on k8s like a normal stateless service. A backend should NOT be a monolithic stateful in-memory-state-keeping system. It's fine to have it as a default for simple use-cases, but should not prevent plugging a more conventional session management system in.

The whole notion of a data layer without control over sessions leaves it coupled to the in-memory stateful architecture. I should be able to run parallel backends without sticky sessions. I should be able to run Chainlit backend on spot compute and simply drain and fail-over as needed without impacting user experience.

This issue alone made me drop Chainlit from the short-list. I can't run it in production with this sort of software architecture, if you can even call it that.

@nileshtrivedi
Copy link

I posted this in the discord server as well.

Can the CustomDataLayer support user attributes (such as role/team/organization_id) so that the tools or prompts available to the AI chat can be customized for each user? This is a crucial requirement if Chainlit is to be used in SaaS apps with multiple isolated customers.

@hayescode
Copy link
Contributor

@nileshtrivedi yes, in the user.metadata field.

@dokterbob dokterbob added the data layer Pertains to data layers. label Aug 14, 2024
@iamrealvinnu
Copy link

Chainlit has a feature that enables you to store, analyze and persist your data with Literal. You can also create your own custom data layer to store the data in your database.

The goal of this ticket is to create an implementation of a custom data layer using a database like postgres or redis. You can see an example of a custom data layer in this test, although it doesn't connect to a database.

I think implementing a custom data layer with options for databases like PostgreSQL, Redis, or MongoDB would be a fantastic addition, especially for those needing flexibility with storage solutions. I’ll look at the test example you mentioned as a reference. Do you have any specific guidelines or starting points for integrating the database connection, or any preference between using PostgreSQL or Redis first?

@sgkalyans
Copy link

Hi @tpatel and all,

I'm working on cosmos db integration for conversion log, however the chat history is not loading when clicking the thread from the history list.

Will there be any additional code / settings need to configure in order to load the chat from history.

I'm stuck at this. Please help.

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data layer Pertains to data layers. good first issue Good for newcomers help wanted Extra attention is needed
Projects
Development

No branches or pull requests