Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hubot/Slack on Heroku disconnects/reconnects to Slack every minute on the minute #543

Closed
4 of 9 tasks
perplexes opened this issue Nov 1, 2018 · 5 comments
Closed
4 of 9 tasks
Labels
auto-triage-stale needs info An issue that is claimed to be a bug and hasn't been reproduced, or otherwise needs more info

Comments

@perplexes
Copy link

Description

Hubot disconnects/reconnects to Slack every minute on the minute. This makes Hubot exceed the Slack API limits, which means Hubot cannot respond to messages.

Ref. #420 & #420 (comment)

What type of issue is this? (place an x in one of the [ ])

  • bug
  • enhancement (feature request)
  • question
  • documentation related
  • testing related
  • discussion

Requirements (place an x in each of the [ ])

  • I've read and understood the Contributing guidelines and have done my best effort to follow them.
  • I've read and agree to the Code of Conduct.
  • I've searched for any related issues and avoided creating a duplicate issue.

Bug Report

Filling out the following details about bugs will help us solve your issue sooner.

Reproducible in:

hubot-slack version: 4.5.5

node version: 6.11.1

OS version(s): heroku-18 stack

Steps to reproduce:

  1. Follow the instructions on http://slackapi.github.io/hubot-slack/
  2. Deploy to heroku (heroku create, heroku config:set HUBOT_SLACK_TOKEN, REDIS_URL, etc)

Expected result:

Hubot would just sit on slack and serve requests.

Actual result:

Hubot disconnects and reconnects every minute, which causes us to breach the API requests limits, which means Hubot is occassionally on Slack.

Attachments:

https://gist.github.com/perplexes/8aeb684001a5a7c70a66b6c5554ede47

@perplexes
Copy link
Author

Hey @aoberoi here's the new issue I've made.

@perplexes
Copy link
Author

Ping

@aoberoi
Copy link
Contributor

aoberoi commented Dec 11, 2018

Yikes, I'm so sorry it has taken so long for me to return to this issue. The whole team responsible for this repo was traveling for most of November, and then December has been us trying to catch up on all the activity since then.

So here's what I see in the log:

  • 9 seconds into having an open RTM connection, we see a warning about a reconnection taking place. This is interesting because it's so soon. Looking through the source of the RTM Client, there's only a few ways for this to occur: on start failure (which doesn't seem to be possible here since we see the success message), on team migration (which is extremely unlikely), on pong timeout (which doesn't seem to be possible since its not been long enough), and on the websocket closing (which also doesn't have an explanation, but at this point seems the most likely).

  • Immediately after this, there seems to be a reconnection attempt which fails because of rate limiting. Then another attempt starts 52 seconds later, which makes sense given the rate limiting error said that's how long Slack wants you to wait to retry. How did we reach the rate limit? That could happen if while trying to load the complete list of users from Slack on the first connection, which makes sense knowing that the DISABLE_USER_SYNC flag remedied your issue. How many users (approximately) are in this workspace if you can share? This package fetches them 100 at a time, and using cursor-based pagination, which should allow for an even more generous rate limit. Another interesting question: where are all the BaseAPIClient _makeAPICall logs for these requests if they are the reason for hitting the rate limit?

  • This cycle basically repeats over and over again. Given my estimations of likelihood above, I think that means the websocket is closing over and over again. Why a period of 1 minute? I think that just happens to be because connecting to RTM takes about 7-8 seconds, and the rate-limiting response is about 52-54 seconds. So I don't think something in your app is causing this 1 minute period, its a result of how many requests you are making and Slack's computation of the rate limit.

Are you using any specific scripts that are meant to update the code and restart the bot automatically? I've seen some of these behave badly in the past, but its a shot in the dark. If you feel comfortable sharing what's in your external-scripts.json file, I'd like to look into each of them to search for potential causes.

@seratch seratch added the needs info An issue that is claimed to be a bug and hasn't been reproduced, or otherwise needs more info label Apr 14, 2020
@github-actions
Copy link

github-actions bot commented Dec 5, 2021

👋 It looks like this issue has been open for 30 days with no activity. We'll mark this as stale for now, and wait 10 days for an update or for further comment before closing this issue out.

@github-actions
Copy link

As this issue has been inactive for more than one month, we will be closing it. Thank you to all the participants! If you would like to raise a related issue, please create a new issue which includes your specific details and references this issue number.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-triage-stale needs info An issue that is claimed to be a bug and hasn't been reproduced, or otherwise needs more info
Projects
None yet
Development

No branches or pull requests

3 participants