Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix #129] Fix register/de-register bug when process is restarted out-of-band. #142

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

apinstein
Copy link

Fixes a bug when a process is restarted out-of-band wherby the
original pid was never unwatched and thus would trigger
false-positives by god on noticing process exits.

I am not 100% sure that this doesn't cause problems elsewhere. Not exactly sure how to test and don't know enough about the overall architecture to know what I don't know about how all this works across different process mechanisms.

In any case I'm going to use this on my prod box for a bit and see how it goes.

Would love to have additional feedback.

@eric
Copy link
Collaborator

eric commented Aug 6, 2013

Nice! I agree the current logic isn't right.

I think we could change the definition of pid() to something like this:

      def pid
        @pid ||= self.pid_file ? File.read(self.pid_file).strip.to_i : self.watch.pid
      end

and leave the rest of the logic as-is and it would do what we intended.

@apinstein
Copy link
Author

Yeah I suppose that could work as well. I was just trying to keep the concern separate since I didn't know what else was going on.

Thanks for reviewing!

On Aug 6, 2013, at 5:15 PM, Eric Lindvall [email protected] wrote:

Nice! I agree the current logic isn't right.

I think we could change the definition of pid() to something like this:

  def pid
    @pid ||= self.pid_file ? File.read(self.pid_file).strip.to_i : self.watch.pid
  end

and leave the rest of the logic as-is and it would do what we intended.


Reply to this email directly or view it on GitHub.

@apinstein
Copy link
Author

@eric your idea only half worked; the pid needed to be cleared out on de-register so that it would re-bootstrap itself based on the watch/pid file. Otherwise it would continue monitoring the prior pid.

What do you think?

@chetan
Copy link

chetan commented Jan 24, 2015

I've been running into this issue for a while (and #127 as well) but came up with a slightly different fix which you can see here:

https://github.com/chetan/god/compare/dangling_process_exit

In addition, I changed my config to disable the clean_pid_file behavior and changed the process_exit condition to transition to :init instead of :start, so that it has a chance to pickup the new pid instead of just blindly trying to start and throw errors.

edit: I'm not 100% sure that the Monitor synchronization is necessary, but it does seem there are possible race conditions there otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants