Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avatar image is broken when reading threads from history #1370

Closed
LucasMagnum opened this issue Sep 24, 2024 · 12 comments
Closed

Avatar image is broken when reading threads from history #1370

LucasMagnum opened this issue Sep 24, 2024 · 12 comments
Labels
bug Something isn't working frontend Pertains to the frontend.

Comments

@LucasMagnum
Copy link

Describe the bug

The PR #1326 introduces a bug when it expects the get_avatar to match string (https://github.com/Chainlit/chainlit/pull/1326/files#diff-9a73e67e163edc3fae6839bf2bcc8c6ab650624eff6668502388a7a86b7ece78R964) .

The Frontend app can make requests to the avatar that do not match the Regex (https://github.com/Chainlit/chainlit/blob/main/frontend/src/components/molecules/messages/components/Avatar.tsx#L31) this creates a bug for existing applications where the config.ui.name contains space (e.g. My Assistant).

To Reproduce
Set your config.ui.name to My Assistant

  1. Create a thread
  2. Publish messages
  3. Create a new thread
  4. Try to resume the previous thread
  5. See the 400 error and the avatar not showing

Expected behavior
Avatar should be loaded for threads in history.

Copy link

dosubot bot commented Sep 24, 2024

To fix the bug where the get_avatar function fails to load avatars correctly for threads in history when config.ui.name contains spaces, you need to ensure that the author parameter in the URL is properly encoded. This can be done using encodeURIComponent to handle spaces and other special characters.

Here is the modified code for the MessageAvatar component:

import { useContext, useMemo } from 'react';

import { Tooltip } from '@mui/material';
import Avatar from '@mui/material/Avatar';

import {
  ChainlitContext,
  useChatSession,
  useConfig
} from '@chainlit/react-client';

interface Props {
  author?: string;
  hide?: boolean;
}

const MessageAvatar = ({ author, hide }: Props) => {
  const apiClient = useContext(ChainlitContext);
  const { chatProfile } = useChatSession();
  const { config } = useConfig();

  const selectedChatProfile = useMemo(() => {
    return config?.chatProfiles.find((profile) => profile.name === chatProfile);
  }, [config, chatProfile]);

  const avatarUrl = useMemo(() => {
    const isAssistant = !author || author === config?.ui.name;
    if (isAssistant && selectedChatProfile?.icon) {
      return selectedChatProfile.icon;
    }
    return apiClient?.buildEndpoint(`/avatars/${encodeURIComponent(author || 'default')}`);
  }, [apiClient, selectedChatProfile, config, author]);

  return (
    <span className={`message-avatar`}>
      <Tooltip title={author}>
        <Avatar
          sx={{
            width: '1.6rem',
            height: '1.6rem',
            bgcolor: 'transparent'
          }}
          src={hide ? undefined : avatarUrl}
        />
      </Tooltip>
    </span>
  );
};

export { MessageAvatar };

By using encodeURIComponent on the author parameter, you ensure that any spaces or special characters are correctly encoded, preventing the 400 error [1].

Additionally, the get_avatar function in the backend handles the avatar_id parameter by validating it, normalizing it, and searching for a matching avatar file. Here is the implementation:

@router.get("/avatars/{avatar_id:str}")
async def get_avatar(avatar_id: str):
    """Get the avatar for the user based on the avatar_id."""
    if not re.match(r"^[a-zA-Z0-9_-]+$", avatar_id):
        raise HTTPException(status_code=400, detail="Invalid avatar_id")

    if avatar_id == "default":
        avatar_id = config.ui.name

    avatar_id = avatar_id.strip().lower().replace(" ", "_")

    base_path = Path(APP_ROOT) / "public" / "avatars"
    avatar_pattern = f"{avatar_id}.*"

    matching_files = base_path.glob(avatar_pattern)

    if avatar_path := next(matching_files, None):
        if not is_path_inside(avatar_path, base_path):
            raise HTTPException(status_code=400, detail="Invalid filename")

        media_type, _ = mimetypes.guess_type(str(avatar_path))

        return FileResponse(avatar_path, media_type=media_type)

    return await get_favicon()

This function ensures that the avatar_id is properly formatted and validated before attempting to retrieve the avatar file [2].

To continue talking to Dosu, mention @dosu.

@dosubot dosubot bot added bug Something isn't working frontend Pertains to the frontend. labels Sep 24, 2024
@dokterbob dokterbob added backend Pertains to the Python backend. frontend Pertains to the frontend. and removed needs-triage frontend Pertains to the frontend. backend Pertains to the Python backend. labels Sep 24, 2024
@dokterbob
Copy link
Collaborator

@LucasMagnum Love to hear whether the fix suggested by our bot does the trick. If it does, we'd love a PR for that, preferably including a(n update to a) E2E test demonstrating the issue and it's resolution.

@dosu Any suggestion as to which is the most applicable place to write such a (regression) test, preferably without adding a new E2E (because they're slow)? What could such a test look like?

Copy link

dosubot bot commented Sep 25, 2024

The most applicable place to write a regression test for the get_avatar function without adding a new E2E test is in the backend/tests/test_server.py file. This file already contains tests for the get_avatar function, so adding a regression test here would be consistent with the existing structure.

Here is an example of what such a regression test could look like:

def test_get_avatar_regression(test_client: TestClient, monkeypatch: pytest.MonkeyPatch):
    """Regression test for get_avatar function."""
    # Setup: Create a custom avatar
    custom_avatar_path = os.path.join(
        APP_ROOT, "public", "avatars", "regression_avatar.png"
    )
    os.makedirs(os.path.dirname(custom_avatar_path), exist_ok=True)
    with open(custom_avatar_path, "wb") as f:
        f.write(b"regression test image data")

    # Test: Fetch the custom avatar
    response = test_client.get("/avatars/regression_avatar")
    assert response.status_code == 200
    assert response.headers["content-type"].startswith("image/")
    assert response.content == b"regression test image data"

    # Clean up
    os.remove(custom_avatar_path)

This test ensures that the get_avatar function correctly handles a custom avatar and returns the expected image data [1][2].

To continue talking to Dosu, mention @dosu.

@dokterbob
Copy link
Collaborator

The most applicable place to write a regression test for the get_avatar function without adding a new E2E test is in the backend/tests/test_server.py file. This file already contains tests for the get_avatar function, so adding a regression test here would be consistent with the existing structure.

Here is an example of what such a regression test could look like:

def test_get_avatar_regression(test_client: TestClient, monkeypatch: pytest.MonkeyPatch):
    """Regression test for get_avatar function."""
    # Setup: Create a custom avatar
    custom_avatar_path = os.path.join(
        APP_ROOT, "public", "avatars", "regression_avatar.png"
    )
    os.makedirs(os.path.dirname(custom_avatar_path), exist_ok=True)
    with open(custom_avatar_path, "wb") as f:
        f.write(b"regression test image data")

    # Test: Fetch the custom avatar
    response = test_client.get("/avatars/regression_avatar")
    assert response.status_code == 200
    assert response.headers["content-type"].startswith("image/")
    assert response.content == b"regression test image data"

    # Clean up
    os.remove(custom_avatar_path)

This test ensures that the get_avatar function correctly handles a custom avatar and returns the expected image data [1][2].

To continue talking to Dosu, mention @dosu.

But our specific test case concerns a space in config.ui.name -- and the requirement of URL encoding in the frontend, right?

@AidanShipperley
Copy link
Contributor

AidanShipperley commented Sep 25, 2024

Hi @dokterbob, was there a reason for preventing avatar names from containing spaces with the regex as such? I had to monkey patch the get_avatar() function when updating to 1.200 from 1.1.404 to modify the regex to allow spaces.

Changing the regex to this:

if not re.match(r"^[a-zA-Z0-9_. -]+$", avatar_id):
    raise HTTPException(status_code=400, detail="Invalid avatar_id")

Completely fixed the issue for me.

@dokterbob
Copy link
Collaborator

I've just been able to replicate the issue in our deployment as well. Full stack trace etc: https://synergy-os.sentry.io/share/issue/b23e6a9655a1421e9c3273e2d4a13342/

Hi @dokterbob, was there a reason for preventing avatar names from containing spaces with the regex as such? I had to monkey patch the get_avatar() function when updating to 1.200 from 1.1.404 to modify the regex to allow spaces.

Changing the regex to this:

if not re.match(r"^[a-zA-Z0-9_. -]+$", avatar_id):
    raise HTTPException(status_code=400, detail="Invalid avatar_id")

Completely fixed the issue for me.

To ensure proper security it is essential to always sanitise any sort of user input. That is, any unreasonable/unexpected/possibly failing or erroneous input should be rejected outright.

Allowing spaces in URL filenames IMHO definitely counts amongst these, i.e. the encoding of various types of unicode spaces can expose attack vectors. It is of course unfortunate if it breaks user experience and we should take care to address it, but not at the expense of increasing security exposure.

Perhaps the solution to this could be to slugify file names? Ref: https://docs.djangoproject.com/en/5.1/ref/utils/#django.utils.text.slugify

In that case, having a config.ui.name of 'Funny Bananawould give an avatar offunny_banana.*` (limiting file extensions here would be another one...).

@dokterbob
Copy link
Collaborator

I've given this some thought, I think indeed it would be best to 'slugify' the URL on the client side (not on the server side). I'll need to analyse more carefully what avatar's specifically are supposed to do, happy for people thinking along towards a spec for a solution.

@hayescode
Copy link
Contributor

@dokterbob will the fix for this be able to make it into the next release?

@hayescode
Copy link
Contributor

@dokterbob Are you saying that avatars (Assistants and Steps) should not be able to have spaces? I sure hope not because nearly all of my Avatars have spaces in the name, as do most people who have provided screenshots.

This issue is blocking many from upgrading past 1.1.404.

@dokterbob
Copy link
Collaborator

https://github.com/Chainlit/chainlit/blob/main/frontend/src/components/molecules/messages/components/Avatar.tsx#L31

What I'm saying is a clean fix is to move filename sanitation (avatar_id.strip().lower().replace(" ", "_")) to the client side and enforce 'clean' URL's.

I realise we need to fix this sooner though, I will make a PR with @AidanShipperley's fix to allow spaces in avatars.

@dokterbob
Copy link
Collaborator

dokterbob commented Oct 9, 2024

I'm not gonna allow spacesdots here though. That's exactly the kinda stuff which makes the code more prone to bugs.

I'm looking forward (or will roll) a patch which will move the encoding of filenames to the client side.

@dokterbob
Copy link
Collaborator

Cleaner proposal, proper client-side sanitation. We want to server to be conservative. Going forward, let's aim for industry-standard security. 😻 🗝️

See #1420.

We'll either roll that one or #1418 into 1.3.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working frontend Pertains to the frontend.
Projects
None yet
Development

No branches or pull requests

4 participants