Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal errors in @axiomhq/pino cause server to crash #140

Open
charles-rc opened this issue Feb 6, 2024 · 7 comments
Open

Internal errors in @axiomhq/pino cause server to crash #140

charles-rc opened this issue Feb 6, 2024 · 7 comments
Labels
bug Something isn't working

Comments

@charles-rc
Copy link

I think this is a similar issue to #131.

We currently use @axiomhq/pino in our Koa Node servers. Our logging (and our servers!) went down for an hour. We have Sentry errors during this time for errors internal to Axiom - Error FetchClient.<anonymous>(@axiomhq.js.dist.cjs:fetchClient) Service Unavailable.

It seems that either Axiom ingestion (via the api.axiom.co endpoint) went down for an hour or potentially our network connectivity in AWS ECS dropped.

It appears that thrown (network) errors during axiom.ingest aren't caught properly. You can reproduce this by logging with an instance of @axiomhq/pino with your internet connection disabled. The error is thrown at the root of the program so it halts execution.

I tried using the hooks functionality of pino to put a wrapper around the logs but it appears I cannot await ingest() as its promise is not returned by the log function.

I would love your advice on how to resolve this. I think either better error handling inside @axiomhq/pino or some callback we can subscribe to would be best.

I guess I could also try wrapping the instantiation of the logger in a function to attempt to catch errors there rather than at the script level?

This is obviously a bit concerning that a network error can bring down our server. Other libraries like ioredis have an onError callback that will be invoked if network connectivity drops, but it doesn't crash the server.

Thanks in advance for all your help!

@AkMo3
Copy link

AkMo3 commented Feb 6, 2024

It would be appreciable to have a rejection or onError handler. Otherwise, it is a critical problem which restricts anyone from using this in production, which can bring down production and there is no way you can handle it. Hope this gets implemented ASAP.

@Iznogohul
Copy link

Iznogohul commented Feb 7, 2024

@charles-rc I had a very similar problem-incident while using @axiomhq/js and also our node servers went down.
It would be nice to have the reason behind a failed ingest exposed as if you have internet but defined an api token that has expired or is invalid it does the same thing because you are unauthorized.
The only way I found that it can be solved for @axiomhq/js is to try catch the ingest and await the flush for each of them.

try{
  // Data ingestion as mentioned in their @axiomhq/js readme
  axiom.ingest(dataset, [{ foo: 'bar' }]); 
  await axiom.flush()
}catch(err){
  console.log(err)
}

@dasfmi
Copy link
Collaborator

dasfmi commented Feb 7, 2024

I am working on handling those exception now, finalizing the PR.

@dasfmi
Copy link
Collaborator

dasfmi commented Feb 9, 2024

we have a released a new version of axiom-js and axiom-pino that would solve the exceptions handling in axiom-js. on Monday we will continue investigating Pino package and add some tests to ensure that doesn't happen again. let us know if this works our for you, and if there is any thing we can improve on the new onError() method.

@charles-rc
Copy link
Author

charles-rc commented Feb 11, 2024

@schehata thanks for doing that!

Unfortunately, I am getting this error when attempting to pass in a custom onError callback to @axiomhq/pino:

node:internal/worker:236
    this[kPort].postMessage({
                ^
DOMException [DataCloneError]: error => {
      if ((0, _utils.shouldLogToSentry)()) {
        Sentry.captureException(error);
      } else {
...<omitted>... } could not be cloned.
    at new DOMException (node:internal/per_context/domexception:53:5)
    at new Worker (node:internal/worker:236:17)
    at createWorker (/app/node_modules/thread-stream/index.js:52:18)
    at new ThreadStream (/app/node_modules/thread-stream/index.js:224:19)
    at buildStream (/app/node_modules/pino/lib/transport.js:21:18)
    at transport (/app/node_modules/pino/lib/transport.js:114:10)
    at normalizeArgs (/app/node_modules/pino/lib/tools.js:316:16)
    at pino (/app/node_modules/pino/pino.js:90:28)
    at wrapChild (/app/node_modules/pino-http/logger.js:226:14)
    at pinoLogger (/app/node_modules/pino-http/logger.js:81:18)

I think it's to do with how pino runs inside workers. I think the current bump should fix the problem with our servers crashing as I believe you now default to catching and then console.erroring the caught exception?

However, it doesn't seem like I can pass in a custom handler to pass the error onto Sentry or the like.

Am I maybe doing something wrong?

@dasfmi
Copy link
Collaborator

dasfmi commented Feb 28, 2024

the onError handler is for js SDK not pino. I have to check pino and find a way to pass an onError handler to the logger. will get back to you on this.

@dasfmi dasfmi added the bug Something isn't working label Feb 28, 2024
@dasfmi
Copy link
Collaborator

dasfmi commented Feb 28, 2024

looks like, doing this won't be straight forward. I think the problem is that the worker can't pass a function in postMessage and that's why passing onError to options is failing.

There is another way but that will need some work, I am wondering if its worth it? or the current console.log would be enough?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants