ASP.net Cluster Raft question to configure full in memory mode #200
-
Hi again, and still thank you for this awesome library. I think it is my latest question for SlimFaas: I would like for security reason for some scenario to be able to set up raft cluster in full memory mode instead of hard disk usage. I would like to be able to set à mode whithout any disk write (like we can do with redis) May you indicate to me the procedure to follow? Class to overload, ect. Or may be it is already possible to do by using MemoryBased implementation and configuration? Regards, |
Beta Was this translation helpful? Give feedback.
Replies: 10 comments 24 replies
-
For security reasons, you can use However, I strongly discourage from doing so. Persistence is a major characteristics of Raft. Let's imagine you have 7 nodes. Due to failure, 4 of them are down. Due to lack of persistence, they lost state. On restart, they have a conflict with other 3 nodes because their WALs not empty. The cluster will not be able to reach the majority without human intrusion. |
Beta Was this translation helpful? Give feedback.
-
Thank you so much for such a rich response @sakno ! Thank you for your example with a majority nodes crash. I understand why. In that case how can i detect it ? I would like to be able to reset the all state. I accept a lost of state (it will be the drawback of memory mode). I do not know how to do for my case inside kubernetes for SlimFaas https://github.com/AxaFrance/slimfaas My needs :
|
Beta Was this translation helpful? Give feedback.
-
Thank you @sakno for again a so rich answer and references. |
Beta Was this translation helpful? Give feedback.
-
Hi @sakno , it is not perfect yet but it is already awesome https://youtu.be/hxRfvJhWW1w?si=48CKFCGxoReMdbIk Thank you @sakno |
Beta Was this translation helpful? Give feedback.
-
FWIW we also have plans to try full in memory (via OS - ram) for different reasons: the disk at hand is emmc and the OS + flash controller combo ends up with sporadic slow writes on the long term: https://forums.raspberrypi.com/viewtopic.php?p=2148709. While there are options in some cases to avoid it (we have not had success with it in all systems though), we expect the write rate will still become an issue on longer timeframes due to emmc wear. Thinking about it, we also realized, that in scenarios where only the distributed state is needed but not persistence, removing the disk speed tax is very desireable. @sakno I am confused about the mention of 4 failed nodes out of 7 nodes. I thought that is beyond raft requirements, since it requires half + 1 nodes to be working. I thought there are already other ways in which 4 failed nodes break raft properties? That said, how does that scenario play out?
You mentioned the cluster would not be able to reach a majority without human intervention? why is this? Is there anything else that would prevent the 3 nodes from being elected? |
Beta Was this translation helpful? Give feedback.
-
@guillaume-chervet , a new release with native support of .NET 8 has been published. |
Beta Was this translation helpful? Give feedback.
-
Hi @sakno, Thank you so much for your awesome works. |
Beta Was this translation helpful? Give feedback.
-
Thank you @sakno migration worked well with your information :) When I activate trimming I have error from CommandInterpreter. It generate trimming log warning. 2024-02-29 13:34:34 fail: Microsoft.AspNetCore.Diagnostics.ExceptionHandlerMiddleware[1]
2024-02-29 13:34:34 An unhandled exception has occurred while executing the request.
2024-02-29 13:34:34 DotNext.Net.Cluster.Consensus.Raft.Commands.CommandInterpreter+UnknownCommandException: Command with id 2 cannot be recognized by the interpreter
2024-02-29 13:34:34 at DotNext.Net.Cluster.Consensus.Raft.Commands.CommandInterpreter.InterpretingTransformation..ctor(Int32, IHandlerRegistry) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/Commands/CommandInterpreter.Registry.cs:line 24
2024-02-29 13:34:34 at DotNext.Net.Cluster.Consensus.Raft.Commands.CommandInterpreter.InterpretAsync[TEntry](TEntry, CancellationToken ) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/Commands/CommandInterpreter.cs:line 129
2024-02-29 13:34:34 at RaftNode.SlimPersistentState.UpdateValue(LogEntry)
2024-02-29 13:34:34 at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine.ApplyAsync(Int32, Int64, CancellationToken) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/MemoryBasedStateMachine.cs:line 685
2024-02-29 13:34:34 at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16)
2024-02-29 13:34:34 at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine.CommitAndCompactSequentiallyAsync(Nullable`1, CancellationToken) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/MemoryBasedStateMachine.cs:line 405
2024-02-29 13:34:34 at DotNext.Net.Cluster.Consensus.Raft.PersistentState.AppendAndCommitSlowAsync[TEntry](ILogEntryProducer`1, Int64, Boolean, Int64, CancellationToken) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/PersistentState.cs:line 472
2024-02-29 13:34:34 at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16)
2024-02-29 13:34:34 at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.AppendEntriesAsync[TEntry](ClusterMemberId, Int64, ILogEntryProducer`1, Int64, Int64, Int64, IClusterConfiguration, Boolean, CancellationToken) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/RaftCluster.cs:line 616
2024-02-29 13:34:34 at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16)
2024-02-29 13:34:34 at DotNext.Net.Cluster.Consensus.Raft.Http.RaftHttpCluster.AppendEntriesAsync(HttpRequest, HttpResponse, CancellationToken) in /_/src/cluster/DotNext.AspNetCore.Cluster/Net/Cluster/Consensus/Raft/Http/RaftHttpCluster.Messaging.cs:line 313
2024-02-29 13:34:34 at DotNext.Net.Cluster.Consensus.Raft.Http.RaftHttpCluster.AppendEntriesAsync(HttpRequest, HttpResponse, CancellationToken) in /_/src/cluster/DotNext.AspNetCore.Cluster/Net/Cluster/Consensus/Raft/Http/RaftHttpCluster.Messaging.cs:line 314
2024-02-29 13:34:34 at Microsoft.AspNetCore.Diagnostics.ExceptionHandlerMiddlewareImpl.<Invoke>g__Awaited|10_0(ExceptionHandlerMiddlewareImpl, HttpContext, Task) |
Beta Was this translation helpful? Give feedback.
-
Thank you again @sakno , It works : AxaFrance/SlimFaas@3558e85 |
Beta Was this translation helpful? Give feedback.
-
I have tried to activate AOT; Do you have any idea @sakno that can help? https://github.com/AxaFrance/SlimFaas/pull/37/files 2024-03-04 13:50:19 Starting in namespace slimfaas-demo
2024-03-04 13:50:20 Starting node slimfaas-0
2024-03-04 13:50:20 Node started slimfaas-0 http://10.1.2.44:3262/
2024-03-04 13:50:20 >> Configuration:
2024-03-04 13:50:20 - partitioning:false
2024-03-04 13:50:20 - lowerElectionTimeout:400
2024-03-04 13:50:20 - upperElectionTimeout:800
2024-03-04 13:50:20 - requestTimeout:00:01:20.0000000
2024-03-04 13:50:20 - rpcTimeout:00:00:40.0000000
2024-03-04 13:50:20 - publicEndPoint:http://10.1.2.44:3262/
2024-03-04 13:50:20 - coldStart:false
2024-03-04 13:50:20 - requestJournal:memoryLimit:5
2024-03-04 13:50:20 - requestJournal:expiration:00:01:00
2024-03-04 13:50:20 - heartbeatThreshold:0.2
2024-03-04 13:50:20 CORS Allowing origins: *
2024-03-04 13:50:20 CORS Allowing all origins
2024-03-04 13:50:20 Raft cluster has no leader
2024-03-04 13:50:20 Raft cluster has no leader
2024-03-04 13:50:20 SlimDataSynchronizationWorker: Start
2024-03-04 13:50:20 Raft cluster has no leader
2024-03-04 13:50:20 fail: Microsoft.Extensions.Hosting.Internal.Host[11]
2024-03-04 13:50:20 Hosting failed to start
2024-03-04 13:50:20 System.PlatformNotSupportedException: Operation is not supported on this platform.
2024-03-04 13:50:20 at Internal.Runtime.CompilerHelpers.ThrowHelpers.ThrowPlatformNotSupportedException() + 0x20
2024-03-04 13:50:20 at System.Func`4..ctor(Object, IntPtr) + 0x6
2024-03-04 13:50:20 at DotNext.Net.Cluster.Consensus.Raft.Http.RaftHttpCluster.<StartAsync>d__45.MoveNext() + 0xa7
2024-03-04 13:50:20 --- End of stack trace from previous location ---
2024-03-04 13:50:20 at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
2024-03-04 13:50:20 at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xbe
2024-03-04 13:50:20 at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4e
2024-03-04 13:50:20 at Microsoft.Extensions.Hosting.Internal.Host.<<StartAsync>b__15_1>d.MoveNext() + 0xc9
2024-03-04 13:50:20 --- End of stack trace from previous location ---
2024-03-04 13:50:20 at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
2024-03-04 13:50:20 at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xbe
2024-03-04 13:50:20 at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4e
2024-03-04 13:50:20 at Microsoft.Extensions.Hosting.Internal.Host.<ForeachService>d__18`1.MoveNext() + 0x3d6
2024-03-04 13:50:20 Unhandled Exception: System.PlatformNotSupportedException: Operation is not supported on this platform.
2024-03-04 13:50:20 at Internal.Runtime.CompilerHelpers.ThrowHelpers.ThrowPlatformNotSupportedException() + 0x20
2024-03-04 13:50:20 at System.Func`4..ctor(Object, IntPtr) + 0x6
2024-03-04 13:50:20 at DotNext.Net.Cluster.Consensus.Raft.Http.RaftHttpCluster.<>c__DisplayClass47_0.<<StopAsync>g__StopAsync|0>d.MoveNext() + 0xf4
2024-03-04 13:50:20 --- End of stack trace from previous location ---
2024-03-04 13:50:20 at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
2024-03-04 13:50:20 at DotNext.Net.Cluster.Consensus.Raft.Http.RaftHttpCluster.<>c__DisplayClass47_0.<<StopAsync>g__StopAsync|0>d.MoveNext() + 0x1d2
2024-03-04 13:50:20 --- End of stack trace from previous location ---
2024-03-04 13:50:20 at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
2024-03-04 13:50:20 at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xbe
2024-03-04 13:50:20 at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4e
2024-03-04 13:50:20 at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.<DisposeAsyncCore>d__120.MoveNext() + 0xd0
2024-03-04 13:50:20 --- End of stack trace from previous location ---
2024-03-04 13:50:20 at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
2024-03-04 13:50:20 at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xbe
2024-03-04 13:50:20 at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4e
2024-03-04 13:50:20 at DotNext.Disposable.<DisposeAsyncImpl>d__19.MoveNext() + 0x122
2024-03-04 13:50:20 --- End of stack trace from previous location ---
2024-03-04 13:50:20 at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
2024-03-04 13:50:20 at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xbe
2024-03-04 13:50:20 at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4e
2024-03-04 13:50:20 at Microsoft.Extensions.DependencyInjection.ServiceLookup.ServiceProviderEngineScope.<<DisposeAsync>g__Await|26_0>d.MoveNext() + 0x11b
2024-03-04 13:50:20 --- End of stack trace from previous location ---
2024-03-04 13:50:20 at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
2024-03-04 13:50:20 at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xbe
2024-03-04 13:50:20 at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4e
2024-03-04 13:50:20 at Microsoft.Extensions.Hosting.Internal.Host.<<DisposeAsync>g__DisposeAsync|21_0>d.MoveNext() + 0x163
2024-03-04 13:50:20 --- End of stack trace from previous location ---
2024-03-04 13:50:20 at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
2024-03-04 13:50:20 at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xbe
2024-03-04 13:50:20 at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4e
2024-03-04 13:50:20 at Microsoft.Extensions.Hosting.Internal.Host.<DisposeAsync>d__21.MoveNext() + 0x4d4
2024-03-04 13:50:20 --- End of stack trace from previous location ---
2024-03-04 13:50:20 at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
2024-03-04 13:50:20 at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xbe
2024-03-04 13:50:20 at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4e
2024-03-04 13:50:20 at Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions.<RunAsync>d__4.MoveNext() + 0x2dd
2024-03-04 13:50:20 --- End of stack trace from previous location ---
2024-03-04 13:50:20 at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
2024-03-04 13:50:20 at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xbe
2024-03-04 13:50:20 at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4e
2024-03-04 13:50:20 at Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions.Run(IHost) + 0x26
2024-03-04 13:50:20 at Program.<Main>$(String[] args) + 0x11bf
2024-03-04 13:50:20 at SlimFaas!<BaseAddress>+0xd30b3c |
Beta Was this translation helpful? Give feedback.
You can't. It's not a case for Raft. It expects that all committed entries are presented on the majority of nodes. In case of node failure, it should be removed and introduced again, with empty WAL.
This is what exactly containers do. Container is stateless by default, writes to its file system are not persistent.
K8s built on top of Raft. It has its own Leader Election service exposed through API. Take a look at this.
Read #122 how to scale in K8s.