Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds support for scripting engines as Valkey modules #1277

Open
wants to merge 18 commits into
base: unstable
Choose a base branch
from

Conversation

rjd15372
Copy link
Contributor

@rjd15372 rjd15372 commented Nov 8, 2024

This PR extends the module API to support the addition of different scripting engines to execute user defined functions.

The scripting engine can be implemented as a Valkey module, and can be dynamically loaded with the loadmodule config directive, or with the MODULE LOAD command.

This PR also adds an example of a dummy scripting engine module, to show how to use the new module API. The dummy module is implemented in tests/modules/helloscripting.c.

The current module API support, only allows to load scripting engines to run functions using FCALL command.

The additions to the module API are the following:

/* Registers a new scripting engine in the server.
 *
 * - `engine_name`: the name of the scripting engine. This name will match
 *   against the engine name specified in the script header using a shebang.
 *
 * - `engine_ctx`: engine specific context pointer.
 *
 * - `create_functions_library_func`: Library create function callback. When a
 *   new script is loaded, this callback will be called with the script code,
 *   and returns a list of ValkeyModuleScriptingEngineCompiledFunc objects.
 *
 * - `call_function_func`: the callback function called when `FCALL` command is
 *   called on a function registered in this engine.
 *
 * - `get_used_memory_func`: function callback to get current used memory by the
 *   engine.
 *
 * - `get_function_memory_overhead_func`: function callback to return memory
 *   overhead for a given function.
 *
 * - `get_engine_memory_overhead_func`: function callback to return memory
 *   overhead of the engine.
 *
 * - `free_function_func`: function callback to free the memory of a registered
 *   engine function.
 */
int ValkeyModule_RegisterScriptingEngine(ValkeyModuleCtx *ctx,
                                         const char *engine_name,
                                         void *engine_ctx,
                                         ValkeyModuleScriptingEngineCreateFunctionsLibraryFunc create_functions_library_func,
                                         ValkeyModuleScriptingEngineCallFunctionFunc call_function_func,
                                         ValkeyModuleScriptingEngineGetUsedMemoryFunc get_used_memory_func,
                                         ValkeyModuleScriptingEngineGetFunctionMemoryOverheadFunc get_function_memory_overhead_func,
                                         ValkeyModuleScriptingEngineGetEngineMemoryOverheadFunc get_engine_memory_overhead_func,
                                         ValkeyModuleScriptingEngineFreeFunctionFunc free_function_func);

/* Removes the scripting engine from the server.
 *
 * `engine_name` is the name of the scripting engine.
 *
 */
int ValkeyModule_UnregisterScriptingEngine(ValkeyModuleCtx *ctx, const char *engine_name);

@rjd15372
Copy link
Contributor Author

rjd15372 commented Nov 8, 2024

I'm opening this PR in draft mode because I still need to fix test failures, and add new tests as well.

Copy link

codecov bot commented Nov 8, 2024

Codecov Report

Attention: Patch coverage is 77.88018% with 48 lines in your changes missing coverage. Please review.

Project coverage is 70.84%. Comparing base (653d5f7) to head (b66a169).
Report is 22 commits behind head on unstable.

Files with missing lines Patch % Lines
src/functions.c 72.11% 29 Missing ⚠️
src/module.c 10.00% 18 Missing ⚠️
src/function_lua.c 98.41% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1277      +/-   ##
============================================
+ Coverage     70.63%   70.84%   +0.21%     
============================================
  Files           116      118       +2     
  Lines         63303    63691     +388     
============================================
+ Hits          44711    45119     +408     
+ Misses        18592    18572      -20     
Files with missing lines Coverage Δ
src/rdb.c 76.63% <100.00%> (+0.05%) ⬆️
src/script_lua.c 89.92% <100.00%> (-0.51%) ⬇️
src/util.c 71.89% <100.00%> (+0.16%) ⬆️
src/function_lua.c 98.87% <98.41%> (-0.32%) ⬇️
src/module.c 9.64% <10.00%> (-0.01%) ⬇️
src/functions.c 91.81% <72.11%> (-3.73%) ⬇️

... and 28 files with indirect coverage changes

@rjd15372 rjd15372 marked this pull request as ready for review November 8, 2024 14:52
@rjd15372 rjd15372 force-pushed the engine-api-1261 branch 2 times, most recently from 4a2e223 to 1a808b2 Compare November 8, 2024 16:13
@rjd15372
Copy link
Contributor Author

rjd15372 commented Nov 8, 2024

I just realized I still need to implement the MODULE UNLOAD support. The existing code was not prepared to allow the deletion of a scripting engine.

@madolson madolson added the major-decision-pending Major decision pending by TSC team label Nov 11, 2024
@madolson
Copy link
Member

Core team said we are directionally aligned with, we still want someone to take a deeper look to make sure the details makes sense.

@rjd15372 rjd15372 force-pushed the engine-api-1261 branch 5 times, most recently from fe56090 to 9d4b296 Compare November 11, 2024 17:35
@rjd15372
Copy link
Contributor Author

The PR is ready for review.

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like great progress. Is it ready for a proper review or are you going to change much more?

Shall we do EVAL as a follow-up PR?

Each engine has only one context which is used for all calls, right? Would it be possible for an engine to use a context (or sub-context, etc.) for each client? This idea came up in the LuaJIT discussion by @secwall in #1229. A separate Lua context per client would isolate each client from each other, which compensates for the fact that Lua isn't completely sandboxed by design. Maybe the engine can organize this by itself, if the engine just has access to the current client at the time of a FCALL or EVAL?

tests/modules/helloscripting.c Outdated Show resolved Hide resolved
src/functions.c Outdated Show resolved Hide resolved
tests/unit/moduleapi/scriptingengine.tcl Show resolved Hide resolved
src/valkeymodule.h Outdated Show resolved Hide resolved
@rjd15372
Copy link
Contributor Author

This looks like great progress. Is it ready for a proper review or are you going to change much more?

It's ready. I don't think I'll add more changes apart from the changes asked by the reviewers.

Shall we do EVAL as a follow-up PR?

Yes, that's the plan.

Each engine has only one context which is used for all calls, right? Would it be possible for an engine to use a context (or sub-context, etc.) for each client? This idea came up in the LuaJIT discussion by @secwall in #1229. A separate Lua context per client would isolate each client from each other, which compensates for the fact that Lua isn't completely sandboxed by design. Maybe the engine can organize this by itself, if the engine just has access to the current client at the time of a FCALL or EVAL?

I'll take a look if it's possible, and will open a follow-up PR with the changes.

This commit extends the module API to support the addition of different
scripting engines to run user defined functions.

The scripting engine can be implemented as a Valkey module, and can be
dynamically loaded with the `loadmodule` config directive, or with
the `MODULE LOAD` command.

The current module API support, only allows to load scripting engines to
run functions using `FCALL` command.

In a follow up PR, we will move the Lua scripting engine implmentation
into its own module.

Signed-off-by: Ricardo Dias <[email protected]>
This commit adds a module with a very simple stack based scripting
language implementation to test the new module API that allows to
implement new scripting engines as modules.

Signed-off-by: Ricardo Dias <[email protected]>
@zuiderkwast
Copy link
Contributor

Can you avoid force-pushing? It's easier to review what's changed since last time if you just push small commits. You can even merge unstable to it, rather than rebase. We squash-merge PRs anyway in the end and we use the PR title and description as the commit message.

@rjd15372
Copy link
Contributor Author

Can you avoid force-pushing? It's easier to review what's changed since last time if you just push small commits. You can even merge unstable to it, rather than rebase. We squash-merge PRs anyway in the end and we use the PR title and description as the commit message.

Ah sorry. I wasn't aware that we were squashing the commits upon merge. I'll stop force-pushing.

@rjd15372
Copy link
Contributor Author

Today I added more test cases to test the error code paths.

@rjd15372
Copy link
Contributor Author

@zuiderkwast I've pushed a small update that refactors a bit the code. I've already have branch with the implementation of the EVAL using the scripting engine module API almost done.

It would be good to finish the review of this PR and merge it, so we can focus on the next PRs.

… functionLibInfo in most places

Signed-off-by: Ricardo Dias <[email protected]>
- Removed the `ValkeyModuleScriptingEngineFunctionLibrary` structure.
- Changed a few structure and parameters names.

Signed-off-by: Ricardo Dias <[email protected]>
Signed-off-by: Ricardo Dias <[email protected]>
Signed-off-by: Ricardo Dias <[email protected]>
@rjd15372 rjd15372 marked this pull request as ready for review November 27, 2024 10:51
@rjd15372
Copy link
Contributor Author

Updated the PR description with the new API functions added by this PR.

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one looks close now. I think we can merge it soon and add EVAL in the other PR, to keep each PR small.

Regarding the sds strings that can't be used in the module API: Did you settle with char * for the strings or are you still considering ValkeyModuleString AKA robj * for the strings? Just curious.

src/module.c Outdated Show resolved Hide resolved
src/strl.c Outdated Show resolved Hide resolved
Comment on lines +802 to +808
typedef struct ValkeyModuleScriptingEngineCompiledFunction {
char *name; /* Function name */
void *function; /* Opaque object representing a function, usually it'
the function compiled code. */
char *desc; /* Function description */
uint64_t f_flags; /* Function flags */
} ValkeyModuleScriptingEngineCompiledFunction;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these structs be opaque and defined in module.c? If not, why?

For the other non-opaque structs, we've often included a version field, so we can update it in the future by adding more fields at the end. The valkeymodule.h is copied in each module and initialized in a way that permits us to add more fields in the struct. Here is an example:

#define VALKEYMODULE_CLIENTINFO_VERSION 1
typedef struct ValkeyModuleClientInfo {
    uint64_t version; /* Version of this structure for ABI compat. */
    uint64_t flags;   /* VALKEYMODULE_CLIENTINFO_FLAG_* */
    uint64_t id;      /* Client ID. */
    char addr[46];    /* IPv4 or IPv6 address. */
    uint16_t port;    /* TCP port. */
    uint16_t db;      /* Selected DB. */
} ValkeyModuleClientInfoV1;
#define ValkeyModuleClientInfo ValkeyModuleClientInfoV1
#define VALKEYMODULE_CLIENTINFO_INITIALIZER_V1 {.version = 1}

Should we do this here too? If not, why not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the ValkeyModuleScriptingEngineCompiledFunction struct, this struct is must be populated by the implementation of the scripting engine, and then its fields are read individually by the core implementation. We can make it opaque by providing a new VM_* function that accepts the fields as parameters and returns a pointer to the structure, but I'm not sure we gain much by doing that, specially if in the future we want to add/remove fields from this struct.

Regarding the structure of non-opaque structs to support versioning, I think it's a good idea. I was not aware of this pattern. I'll do the same thing with the non-opaque structs created in this PR and in the following ones.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I agree we don't need a VM function just to populate a struct.

Let it be non-opaque and use version field. 👍

I'll do the same thing with the non-opaque structs created in this PR and in the following ones.

Good, at least whenever it makes sense. Maybe there are exceptions, where we don't need it to be extensible. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I introduced a "versioned struct" to store the engine callbacks, the other structs used by the callback functions don't need to be versioned because they will all be under the umbrella of the engine callback structs version value.

src/valkeymodule.h Outdated Show resolved Hide resolved
ValkeyModuleScriptingEngineGetUsedMemoryFunc get_used_memory_func,
ValkeyModuleScriptingEngineGetFunctionMemoryOverheadFunc get_function_memory_overhead_func,
ValkeyModuleScriptingEngineGetEngineMemoryOverheadFunc get_engine_memory_overhead_func,
ValkeyModuleScriptingEngineFreeFunctionFunc free_function_func) VALKEYMODULE_ATTR;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to add an optional callback to kill a script or function? It can be invoked by SCRIPT KILL or FUNCTION KILL.

Thinking about extenisility: If we want to add more callbacks in the future, we will have to add another API function. Is it likely that we will want to do that? It's easier if don't need to do that; if we manage to get it right from the beginning. :)

Copy link
Contributor

@zuiderkwast zuiderkwast Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another style of the API could be to use a struct with all the callbacks and then just register the struct.

The struct could have a version field for future extensibility.

Pros and cons? What do you think?

int VM_RegisterScriptingEngine(ValkeyModuleCtx *ctx,
                               const char *engine_name,
                               void *engine_ctx,
                               ValkeyModuleScriptingEngineCallbacks *callbacks);

typedef struct {
    int version; /* version field for ABI compatibility */
    ... callbacks ...
    ValkeyModuleScriptingEngineGetUsedMemoryFunc get_used_memory_func;
    ... etc ...
} ValkeyModuleScriptingEngineCallbacksV1;
typedef ValkeyModuleScriptingEngineCallbacksV1 ValkeyModuleScriptingEngineCallbacks;

/* Modules use this to initialize version field. */
#define VALKEYMODULE_SCRIPTING_ENGINE_CALLBACKS_VERSION 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to use a struct to group the callbacks because of the extensibility argument. I will change it.

@rjd15372
Copy link
Contributor Author

rjd15372 commented Dec 4, 2024

Regarding the sds strings that can't be used in the module API: Did you settle with char * for the strings or are you still considering RedisModuleString AKA robj * for the strings? Just curious.

We will use char* for strings.

@rjd15372
Copy link
Contributor Author

rjd15372 commented Dec 4, 2024

@zuiderkwast I'm now looking at the structure of ValkeyModuleTypeMethods, and how is used, and the function pointers definitions are duplicated in valkeymodule.h and server.h. Probably I will refactor this PR to follow the same pattern, and avoid using ValkeyModule* types in the core code.

@zuiderkwast
Copy link
Contributor

@zuiderkwast I'm now looking at the structure of ValkeyModuleTypeMethods, and how is used, and the function pointers definitions are duplicated in valkeymodule.h and server.h. Probably I will refactor this PR to follow the same pattern, and avoid using ValkeyModule* types in the core code.

Good finding! I'm not that familiar with ValkeyModuleTypeMethods, but yes it looks like a similar thing. The module struct just adds versioning, right? Do this if you think it's reasonable.

Comment on lines +57 to +61
/* ValkeyModule type aliases for scripting engine structs and types. */
typedef ValkeyModuleScriptingEngineCtx EngineCtx;
typedef ValkeyModuleScriptingEngineFunctionCtx FunctionCtx;
typedef ValkeyModuleScriptingEngineCompiledFunction CompiledFunction;
typedef ValkeyModuleScriptingEngineMethods EngineMethods;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

Just thinking, apart from in the module API, we use a lowercase first letter for types and structs, at least in most cases (or always? I'm not sure). For consistency, it's probably best to follow the existing style.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
major-decision-pending Major decision pending by TSC team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants