Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@clerk/nextjs performance regression/hitting rate limits in next@15 due to fetch default behavior change #4894

Open
4 tasks done
sigmachirality opened this issue Jan 15, 2025 · 8 comments
Labels
needs-triage A ticket that needs to be triaged by a team member

Comments

@sigmachirality
Copy link

sigmachirality commented Jan 15, 2025

Preliminary Checks

Reproduction

https://github.com/sigmachirality/nextjs-auth-starter-template

Publishable key

pk_test_c3VyZS1kYW5lLTM5LmNsZXJrLmFjY291bnRzLmRldiQ

Description

NextJS monkey-patches the global fetch to overload it with a cache parameter that allows users to configure caching of network calls. Additionally, in Next 13 and 14, this cache was aggressively enabled by default, such that all GET requests were automatically cached.

For some, this resulted in performance improvements. For others, it resulted in regressions in functionality where libraries and codebases assumed fetch would return fresh data each time. Eventually, in Next 15 these default caching semantics were rolled back such that fetch would be uncached by default.

However since this was the default for two major semvers, some libraries (including this one) now assume fetch caches GET requests aggressively by default. See:
image

Note: calls fetch(), so it is automatically deduped per request

Clerk is rate-limited to the tune of 100 requests per 10 seconds (~10 requests per second) for a specific IP address, but because fetch was cached by default, the implementation of currentUser does not cache in memory with explicit caching semantics/eviction, unlike, say, the nanostores-based implementation of similar library features in the Astro clerk package..

In cases where currentUser() is called in the RSCs for multiple components in the same DOM during render (for example, in a few nested layouts containing a navbar, a sidebar, and in a few one-off components like a CTA login that contains the user's pfp), this rate limit can be easily hit. In cases where the user is deploying NextJS by themselves on, say, Fly or AWS, or in their own bare-metal server, this can be fixed in user land by wrapping currentUser in React.cache, unstable_cache or use cache;.

However, a large percentage of people host NextJS apps on Vercel., or any other public cloud. When people host NextJS apps on Vercel (or most other public clouds), they share Vercel's public IP ranges. I'm sure the rate limit for Vercel to Clerk is specifically set to be higher under the table, but the result is still that if a few people inside a Vercel IP range update to Next 15 without exhaustively auditing their caching semantics to the point of thinking about the caching semantics of their external modules, their codebase can clobber everyone else hosted within that IP range, resulting in 422 rate limits the others have no control over.

Here is a sketch of a few possible solutions:

Solution 1: Update the Docs

The docs above should NOT tell users that currentUsers dedupes requests. Instead, they should encourage users to wrap currentUsers in React.cache or unstable_cache or use cache;. This is by far the easiest solution, but also the most irresponsible from a library maintainer ethics standpoint. I do think the docs should be updated eventually, but the vast majority of users will not see that these docs updated and simply bump to Next 15, which does nothing to stop them from clobbering others in public clouds.

Solution 2: Export a new cachedCurrentUser helper from @nextjs/clerk/server

Export a cachedCurrentUser which is equal to React.cache(currentUser). React.cache is not for use outside of server components, and currentUser is intended to be used outside of server components in any server context such as in server actions or route handlers. To my knowledge calling React.cache outside of a server component is undefined behavior, or may even cause an error. This also has similar issues to the above, since people are unlikely to migrate to using cachedCurrentUser from currentUser unless they read the docs.

Solution 3: Monkey patch runtime.fetch in the @clerk/backend module

@clerk/backend calls fetch from a runtime object exported internally due to differences in how fetch is defined in different JS server environments (for example, Cloudflare workers
). Treat this module as a closure, and export an internal function inside @clerk/backend to call inside @clerk/nextjs/server to monkey-patch/override the params of fetch to include caching semantics that were implicitely default in Next 13/14. I imagine it would look something like this:

const fetch = (params) => _fetch(params. { cache: `force-cache` })

Alternatively, if it is determined that the additional param is unlikely to cause functional regressions in other pages downstream of @clerk/backend, you could just always pass it in by default inside @clerk/backend. Making the caching semantics of Next 13/14 explicit in calls to fetch() would not cause a regression for those semvers, but would improve the performance and behavior of the @clerk/nextjs module in Next 15 significantly.

Solution 4: Tell users to re-enable aggressive fetch GET caching by default in the docs

I think this solution is unacceptable due to the aforementioned performance/functionally regressions in libraries which use fetch, but I suppose it is a solution.

Happy to make a PR following any of the above approaches. We have already lost some clients through our onboarding process due to unpredictable/sporadic rate limit errors.

Environment

pnpm dlx envinfo --system --browsers --binaries --npmPackages

  System:
    OS: macOS 15.1.1
    CPU: (12) arm64 Apple M3 Pro
    Memory: 1.81 GB / 36.00 GB
    Shell: 5.9 - /bin/zsh
  Binaries:
    Node: 22.12.0 - /usr/local/bin/node
    Yarn: 1.22.19 - /opt/homebrew/bin/yarn
    npm: 8.4.1 - /opt/homebrew/bin/npm
    pnpm: 9.14.2 - ~/Library/pnpm/pnpm
    bun: 1.1.42 - ~/.bun/bin/bun
  Browsers:
    Chrome: 131.0.6778.265
    Safari: 18.1.1
  npmPackages:
    @biomejs/biome: 1.8.2 => 1.8.2 
    @clerk/nextjs: ^6.9.0 => 6.9.9 
    @elysiajs/eden: ^1.2.0 => 1.2.0 
    @elysiajs/swagger: ^1.1.6 => 1.2.0 
    @fortawesome/fontawesome-svg-core: ^6.5.2 => 6.7.2 
    @fortawesome/free-brands-svg-icons: ^6.5.2 => 6.7.2 
    @fortawesome/free-regular-svg-icons: ^6.5.2 => 6.7.2 
    @fortawesome/free-solid-svg-icons: ^6.5.2 => 6.7.2 
    @fortawesome/react-fontawesome: ^0.2.2 => 0.2.2 
    @heroicons/react: ^2.0.18 => 2.2.0 
    @hookform/resolvers: ^3.9.1 => 3.10.0 
    @mdx-js/loader: ^2.3.0 => 2.3.0 
    @mdx-js/react: ^2.3.0 => 2.3.0 
    @next/mdx: ^15.1.0 => 15.1.4 
    @radix-ui/react-dialog: ^1.1.2 => 1.1.4 
    @radix-ui/react-dropdown-menu: ^2.0.6 => 2.1.4 
    @radix-ui/react-icons: ^1.3.0 => 1.3.2 
    @radix-ui/react-label: ^2.1.0 => 2.1.1 
    @radix-ui/react-popover: ^1.1.2 => 1.1.4 
    @radix-ui/react-radio-group: ^1.2.0 => 1.2.2 
    @radix-ui/react-select: ^2.1.2 => 2.1.4 
    @radix-ui/react-separator: ^1.1.0 => 1.1.1 
    @radix-ui/react-slider: ^1.1.2 => 1.2.2 
    @radix-ui/react-slot: ^1.1.0 => 1.1.1 
    @radix-ui/react-toast: ^1.2.1 => 1.2.4 
    @radix-ui/react-tooltip: ^1.0.7 => 1.1.6 
    @shikijs/rehype: ^1.11.0 => 1.26.1 
    @slack/web-api: ^6.12.0 => 6.13.0 
    @tanstack/react-table: ^8.19.3 => 8.20.6 
    @testing-library/jest-dom: ^6.1.4 => 6.6.3 
    @testing-library/react: ^14.3.1 => 14.3.1 
    @testing-library/user-event: ^14.5.1 => 14.5.2 
    @types/bun: ^1.1.14 => 1.1.16 
    @types/jest: ^29.5.7 => 29.5.14 
    @types/jsonwebtoken: ^9.0.3 => 9.0.7 
    @types/lodash: ^4.14.199 => 4.17.14 
    @types/mdx: ^2.0.7 => 2.0.13 
    @types/mixpanel: ^2.14.8 => 2.14.9 
    @types/mixpanel-browser: ^2.47.5 => 2.51.0 
    @types/node: 20.5.0 => 20.5.0 
    @types/path-browserify: ^1.0.2 => 1.0.3 
    @types/pg: ^8.10.9 => 8.11.10 
    @types/react: 19.0.1 => 19.0.4 
    @types/react-dom: 19.0.2 => 19.0.2 
    @uidotdev/usehooks: ^2.4.1 => 2.4.1 
    @vercel/analytics: ^1.0.2 => 1.4.1 
    @vercel/kv: ^2.0.0 => 2.0.0 
    @vercel/speed-insights: ^1.0.12 => 1.1.0 
    @visx/event: ^2.1.0 => 2.17.0 
    @visx/tooltip: ^2.1.0 => 2.17.0 
    @visx/visx: ^3.8.0 => 3.12.0 
    @vx/responsive: ^0.0.199 => 0.0.199 
    @vx/tooltip: ^0.0.199 => 0.0.199 
    autoprefixer: 10.4.15 => 10.4.15 
    axios: ^1.7.2 => 1.7.9 
    class-variance-authority: ^0.7.0 => 0.7.1 
    classnames: ^2.3.2 => 2.5.1 
    clsx: ^2.1.1 => 2.1.1 
    cmdk: 1.0.0 => 1.0.0 
    date-fns: ^4.1.0 => 4.1.0 
    dayjs: ^1.11.11 => 1.11.13 
    elysia: 1.2.6 => 1.2.6 
    gsap: ^3.12.5 => 3.12.5 
    hash-string: ^1.0.0 => 1.0.0 
    hoist-non-react-statics: ^3.3.2 => 3.3.2 
    jest: ^29.7.0 => 29.7.0 
    jest-environment-jsdom: ^29.7.0 => 29.7.0 
    jsonwebtoken: ^9.0.2 => 9.0.2 
    lodash: ^4.17.21 => 4.17.21 
    lucide-react: ^0.395.0 => 0.395.0 
    mixpanel: ^0.18.0 => 0.18.0 
    nanoid: ^5.0.7 => 5.0.9 
    next: ^15.1.0 => 15.1.4 
    next-mdx-remote: ^5.0.0 => 5.0.0 
    node-fetch: ^3.3.2 => 3.3.2 
    p-queue: ^8.0.1 => 8.0.1 
    path-browserify: ^1.0.1 => 1.0.1 
    pg: ^8.11.3 => 8.13.1 
    plaid: ^26.0.0 => 26.0.0 
    postcss: ^8.4.34 => 8.4.49 
    posthog-js: ^1.204.0 => 1.205.0 
    posthog-node: ^4.3.2 => 4.3.2 
    react: ^19.0.0 => 19.0.0 
    react-day-picker: ^8.10.1 => 8.10.1 
    react-dom: ^19.0.0 => 19.0.0 
    react-hook-form: ^7.53.2 => 7.54.2 
    react-plaid-link: ^3.5.2 => 3.6.1 
    react-spinners: ^0.14.1 => 0.14.1 
    rehype-pretty-code: ^0.13.2 => 0.13.2 
    rehype-react: ^8.0.0 => 8.0.0 
    remark-gfm: ^4.0.0 => 4.0.0 
    remark-parse: ^11.0.0 => 11.0.0 
    remark-prism: ^1.3.6 => 1.3.6 
    remark-rehype: ^11.1.0 => 11.1.1 
    resend: ^2.0.0-canary.0 => 2.1.0 
    rsuite: ^5.43.0 => 5.76.3 
    server-only: ^0.0.1 => 0.0.1 
    shiki: ^1.11.0 => 1.26.1 
    slugify: ^1.6.6 => 1.6.6 
    stripe: ^14.5.0 => 14.25.0 
    tailwind-merge: ^2.3.0 => 2.6.0 
    tailwindcss: 3.3.3 => 3.3.3 
    tailwindcss-animate: ^1.0.7 => 1.0.7 
    ts-invariant: ^0.10.3 => 0.10.3 
    type-fest: ^4.31.0 => 4.32.0 
    typescript: 5.1.6 => 5.1.6 
    zod: ^3.23.8 => 3.24.1
@sigmachirality sigmachirality added the needs-triage A ticket that needs to be triaged by a team member label Jan 15, 2025
@colinclerk
Copy link
Member

Hello and thank you for the report. We're going to discuss tomorrow whether it's appropriate to adjust our Next.js SDK to cache by default, or to expose an option.

I did want to quickly address this concern:

However, a large percentage of people host NextJS apps on Vercel., or any other public cloud. When people host NextJS apps on Vercel (or most other public clouds), they share Vercel's public IP ranges. I'm sure the rate limit for Vercel to Clerk is specifically set to be higher under the table, but the result is still that if a few people inside a Vercel IP range update to Next 15 without exhaustively auditing their caching semantics to the point of thinking about the caching semantics of their external modules, their codebase can clobber everyone else hosted within that IP range, resulting in 422 rate limits the others have no control over.

We just verified that our rate limits over this API are per-application rather than per IP, so you will not be impacted by others accessing from within Vercel. This is an error in our documentation that we will update.

@brkalow
Copy link
Member

brkalow commented Jan 15, 2025

Appreciate the detailed writeup! Additionally as far as we're aware, Next.js still has request-level fetch de-duping by way of "request memoization".

Passing the explicit cache option to fetch() would opt the request into Next.js's data cache, which I believe spans requests and deployments (at least on Vercel). This is definitely a lever we can look at exposing for our consumers.

@sigmachirality
Copy link
Author

sigmachirality commented Jan 15, 2025

Yes, I quite like the explicit cache semantics of the Astro library, so having the ability to cache and evict users more aggressively in clerk nextjs module userland code would be great.

@sigmachirality
Copy link
Author

Thank you for the quick response/corrections! next/vercel has a lot of caching nowadays and it's not clear what is cached and where sometimes, so happy to be corrected where wrong!

@sigmachirality
Copy link
Author

sigmachirality commented Jan 15, 2025

We just verified that our rate limits over this API are per-application rather than per IP, so you will not be impacted by others accessing from within Vercel. This is an error in our documentation that we will update.

ok, that makes much more sense, we have a cron job (that I have been meaning to rewrite using webhooks) on a separate backend microservice with the same API key is probably eating up all the requests.

i would still be interested in more explicit cache semantics, as we will probably have to start caching within our codebase otherwise as we continue to scale

@panteliselef
Copy link
Member

I'd like to know more about how you are utilizing currentUser() in your code base. Could some of its usage be replace with auth() which does not have any rate limiting ?

@JohnPhamous
Copy link

@panteliselef I'm working with @sigmachirality.

We were calling currentUser() and getOrganization() in our auth code paths e.g. checking if a user is logged in. We've switched to using auth(). We also needed to get some data from user.publicMetadata but have switched to sessionClaims.

@sigmachirality
Copy link
Author

In some cases we're running into race conditions with sessionClaims where it takes a few seconds to update but we try to read it immediately (for example, to determine whether to redirect a user).

However in those cases we're probably going to move those values into our backend/db to avoid rate limits.

I think it might be good for someone to revisit the custom onboarding docs. I followed the general approach outlined in the article, but ran into the aforementioned race condition (sessionClaims being stale for a few seconds) resulting in redirect loops and general bugs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-triage A ticket that needs to be triaged by a team member
Projects
None yet
Development

No branches or pull requests

5 participants