Issue Job Enqueued count incorrect/mismatch #377

yadurajshakti · 2024-10-15T04:56:30Z

Hi Team,

We are facing this issue of enqueued count incorrect/mismatch as you can see in the screenshot below. We could not find any solution in hangfire forum or github yet. Please help us in providing explanations on this behavior and possible fix.

We are using .Net Core 7.0 and PostgreSQL. Hangfire package and version used in the solution are:
"Hangfire.AspNetCore" Version="1.8.0"
"Hangfire.Core" Version="1.8.0"
"Hangfire.PostgreSql" Version="1.19.12"

One observation is: The job is moved from the queue, but its state not changed to either success or failed.

azygis · 2024-10-15T05:10:37Z

Would be good to provide at least the configuration/setup done for Hangfire. Best case would be a reproducible minimal case.

yadurajshakti · 2024-10-21T05:25:24Z

Hi @azygis

We have done the similar setup as in official website https://docs.hangfire.io/en/latest/getting-started/aspnet-core-applications.html

The dashboard is a separate web application, and we have a console app to schedule the job.

Dashboard Configuration:

public void ConfigureServices(IServiceCollection services)
{
    // Add Hangfire services.
    services.AddHangfire(configuration => configuration
        .SetDataCompatibilityLevel(CompatibilityLevel.Version_170)
        .UseSimpleAssemblyNameTypeSerializer()
        .UseRecommendedSerializerSettings()
        .UseSqlServerStorage(Configuration.GetConnectionString("HangfireConnection")));  
}

Then in app configuration

public void Configure(IApplicationBuilder app, IWebHostEnvironment env)
{
app.UseHangfireDashboard(
    "/hangfire",
    new DashboardOptions
    {
        Authorization = new[] { new AuthorizationFilter() },
        AppPath = "/hangfire/home/index"
    },
    new PostgreSqlStorage(Configuration.GetConnectionString("HangfireConnection"))
);
}

Engine configuration responsible for schedule and executing the jobs:

ConfigureHangFire(HostBuilderContext hostContext, IServiceCollection services)
{
            GlobalConfiguration.Configuration.UsePostgreSqlStorage(connectionString, new PostgreSqlStorageOptions
            {
                DistributedLockTimeout = TimeSpan.FromMinutes(5),
                InvisibilityTimeout = TimeSpan.FromMinutes(20)
            });
            GlobalConfiguration.Configuration.UseSimpleAssemblyNameTypeSerializer();
            GlobalConfiguration.Configuration.UseRecommendedSerializerSettings();
            GlobalJobFilters.Filters.Add(new AutomaticRetryAttribute { Attempts = 3 });

            var options = new BackgroundJobServerOptions
            {
                WorkerCount = 1
            };
            
}

azygis · 2024-10-22T05:48:21Z

Your dashboard configuration is using SQL Server integration. Are you sure you copied your own configuration and not the one from Hangfire docs?

yadurajshakti · 2024-10-23T03:50:31Z

Hi @azygis
Here is the dashboard configuration from my application. We are using UsePostgreSqlStorage(..) method only.

azygis · 2024-10-23T05:19:52Z

Can you check the lock table if it has entries? I feel it may have a lock placed there and the application exited in a "kill" fashion which prevented from clearing the locks. Nothing picking up the job for 17 days is usually related to zombie locks.

yadurajshakti · 2024-11-04T02:37:32Z

Thanks @azygis
This is happening mostly in our UAT/QA environments where the deployment frequency is higher. Please can you suggest a mechanism to handle such cases.

Should we manuly stop hangfire jobs/engine/services during deployment?
Do you have any implementation to reset locks and other tables during deployments.

azygis · 2024-11-04T07:50:47Z

First of all, do not kill the process. Instead of SIGKILL, send SIGTERM when you are stopping the application. I do not know what you use for stopping it hence that's the first suggestion. Proper shutdown sends cancel request to the processing jobs, as long as you use cancellation tokens. If the cancellation doesn't complete in some seconds (sorry, can't remember, it's like 5 or 10s or some other number) the jobs get terminated. Locks are released right before application stops. If application is straight up killed there's no way for Hangfire to exit cleanly.

As for stopping the engine, that is still not really possible without nasty workarounds. That is not possible to handle by storage provider (which is this repository/library). We cannot clear the locks on startup or anytime random because then we might end up with a broken state.

What we did at work is add an endpoint to the applications to check whether any jobs are enqueued/processing, and if so, wait until all of them are complete before continuing with swarm deployment. Waiting for us is fine, YMMV.

yadurajshakti mentioned this issue Oct 15, 2024

Issue Job Enqueued count incorrect/mismatch HangfireIO/Hangfire#2451

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue Job Enqueued count incorrect/mismatch #377

Issue Job Enqueued count incorrect/mismatch #377

yadurajshakti commented Oct 15, 2024

azygis commented Oct 15, 2024

yadurajshakti commented Oct 21, 2024 •

edited

Loading

azygis commented Oct 22, 2024

yadurajshakti commented Oct 23, 2024

azygis commented Oct 23, 2024

yadurajshakti commented Nov 4, 2024

azygis commented Nov 4, 2024

Issue Job Enqueued count incorrect/mismatch #377

Issue Job Enqueued count incorrect/mismatch #377

Comments

yadurajshakti commented Oct 15, 2024

azygis commented Oct 15, 2024

yadurajshakti commented Oct 21, 2024 • edited Loading

azygis commented Oct 22, 2024

yadurajshakti commented Oct 23, 2024

azygis commented Oct 23, 2024

yadurajshakti commented Nov 4, 2024

azygis commented Nov 4, 2024

yadurajshakti commented Oct 21, 2024 •

edited

Loading