Skip to content

Commit

Permalink
Fix LRU crash when getting too many random lua scripts (valkey-io#1310)
Browse files Browse the repository at this point in the history
### Problem
Valkey stores scripts in a dictionary (lua_scripts) keyed by their SHA1
hashes, but it needs a way to know which scripts are least recently
used. It uses an LRU list (lua_scripts_lru_list) to keep track of
scripts in usage order. When the list reaches a maximum length, Valkey
evicts the oldest scripts to free memory in both the list and
dictionary. The problem here is that the sds from the LRU list can be
pointing to already freed/moved memory by active defrag that the sds in
the dictionary used to point to. It results in assertion error at [this
line](https://github.com/valkey-io/valkey/blob/unstable/src/eval.c#L519)

### Solution
If we duplicate the sds when adding it to the LRU list, we can create an
independent copy of the script identifier (sha). This duplication
ensures that the sha string in the LRU list remains stable and
unaffected by any defragmentation that could alter or free the original
sds. In addition, dictUnlink doesn't require exact pointer
match([ref](https://github.com/valkey-io/valkey/blob/unstable/src/eval.c#L71-L78))
so this change makes sense to unlink the right dictEntry with the copy
of the sds.

### Reproduce
To reproduce it with tcl test:
1. Disable je_get_defrag_hint in defrag.c to trigger defrag often
2. Execute test script
```
start_server {tags {"auth external:skip"}} {

    test {Regression for script LRU crash} {
        r config set activedefrag yes
        r config set active-defrag-ignore-bytes 1
        r config set active-defrag-threshold-lower 0
        r config set active-defrag-threshold-upper 1
        r config set active-defrag-cycle-min 99
        r config set active-defrag-cycle-max 99

        for {set i 0} {$i < 100000} {incr i} {
            r eval "return $i" 0
        }
        after 5000;
    }
}
```


### Crash info
Crash report:
```
=== REDIS BUG REPORT START: Cut & paste starting from here ===
14044:M 12 Nov 2024 14:51:27.054 # === ASSERTION FAILED ===
14044:M 12 Nov 2024 14:51:27.054 # ==> eval.c:556 'de' is not true

------ STACK TRACE ------

Backtrace:
/usr/bin/redis-server 127.0.0.1:6379 [cluster](luaDeleteFunction+0x148)[0x723708]
/usr/bin/redis-server 127.0.0.1:6379 [cluster](luaCreateFunction+0x26c)[0x724450]
/usr/bin/redis-server 127.0.0.1:6379 [cluster](evalCommand+0x2bc)[0x7254dc]
/usr/bin/redis-server 127.0.0.1:6379 [cluster](call+0x574)[0x5b8d14]
/usr/bin/redis-server 127.0.0.1:6379 [cluster](processCommand+0xc84)[0x5b9b10]
/usr/bin/redis-server 127.0.0.1:6379 [cluster](processCommandAndResetClient+0x11c)[0x6db63c]
/usr/bin/redis-server 127.0.0.1:6379 [cluster](processInputBuffer+0x1b0)[0x6dffd4]
/usr/bin/redis-server 127.0.0.1:6379 [cluster][0x6bd968]
/usr/bin/redis-server 127.0.0.1:6379 [cluster][0x659634]
/usr/bin/redis-server 127.0.0.1:6379 [cluster](amzTLSEventHandler+0x194)[0x6588d8]
/usr/bin/redis-server 127.0.0.1:6379 [cluster][0x750c88]
/usr/bin/redis-server 127.0.0.1:6379 [cluster](aeProcessEvents+0x228)[0x757fa8]
/usr/bin/redis-server 127.0.0.1:6379 [cluster](redisMain+0x478)[0x7786b8]
/lib64/libc.so.6(__libc_start_main+0xe4)[0xffffa7763da4]
/usr/bin/redis-server 127.0.0.1:6379 [cluster][0x5ad3b0]
```
Defrag info:
```
mem_fragmentation_ratio:1.18
mem_fragmentation_bytes:47229992
active_defrag_hits:20561
active_defrag_misses:5878518
active_defrag_key_hits:77
active_defrag_key_misses:212
total_active_defrag_time:29009
```

### Test:
Run the test script to push 100,000 scripts to ensure the LRU list keeps
500 maximum length without any crash.
```
27489:M 14 Nov 2024 20:56:41.583 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.583 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
[ok]: Regression for script LRU crash (6811 ms)
[1/1 done]: unit/test (7 seconds)
```

---------

Signed-off-by: Seungmin Lee <[email protected]>
Signed-off-by: Seungmin Lee <[email protected]>
Co-authored-by: Seungmin Lee <[email protected]>
Co-authored-by: Binbin <[email protected]>
  • Loading branch information
3 people authored Nov 19, 2024
1 parent f9d0b87 commit 3d0c834
Showing 1 changed file with 6 additions and 6 deletions.
12 changes: 6 additions & 6 deletions src/eval.c
Original file line number Diff line number Diff line change
Expand Up @@ -199,10 +199,12 @@ void scriptingInit(int setup) {
}

/* Initialize a dictionary we use to map SHAs to scripts.
* Initialize a list we use for lua script evictions, it shares the
* sha with the dictionary, so free fn is not set. */
* Initialize a list we use for lua script evictions.
* Note that we duplicate the sha when adding to the lru list due to defrag,
* and we need to free them respectively. */
lctx.lua_scripts = dictCreate(&shaScriptObjectDictType);
lctx.lua_scripts_lru_list = listCreate();
listSetFreeMethod(lctx.lua_scripts_lru_list, (void (*)(void *))sdsfree);
lctx.lua_scripts_mem = 0;

luaRegisterServerAPI(lua);
Expand Down Expand Up @@ -518,9 +520,6 @@ void luaDeleteFunction(client *c, sds sha) {
dictEntry *de = dictUnlink(lctx.lua_scripts, sha);
serverAssertWithInfo(c ? c : lctx.lua_client, NULL, de);
luaScript *l = dictGetVal(de);
/* We only delete `EVAL` scripts, which must exist in the LRU list. */
serverAssert(l->node);
listDelNode(lctx.lua_scripts_lru_list, l->node);
lctx.lua_scripts_mem -= sdsAllocSize(sha) + getStringObjectSdsUsedMemory(l->body);
dictFreeUnlinkedEntry(lctx.lua_scripts, de);
}
Expand Down Expand Up @@ -549,11 +548,12 @@ listNode *luaScriptsLRUAdd(client *c, sds sha, int evalsha) {
listNode *ln = listFirst(lctx.lua_scripts_lru_list);
sds oldest = listNodeValue(ln);
luaDeleteFunction(c, oldest);
listDelNode(lctx.lua_scripts_lru_list, ln);
server.stat_evictedscripts++;
}

/* Add current. */
listAddNodeTail(lctx.lua_scripts_lru_list, sha);
listAddNodeTail(lctx.lua_scripts_lru_list, sdsdup(sha));
return listLast(lctx.lua_scripts_lru_list);
}

Expand Down

0 comments on commit 3d0c834

Please sign in to comment.