Handle out-of-memory errors during DMA sequence recording #945

cjbe · 2018-03-05T12:31:54Z

Trying to record a DMA sequence that is too large to fit into memory results in the core device falling over - it would be useful if the kernel error-ed out with a descriptive exception.

[     8.585024s]  INFO(runtime::kern_hwreq): resetting RTIO                                                                                                                                                                        [110/1967]
panic at /home/artiq/artiq-dev/artiq/artiq/firmware/liballoc_list/lib.rs:107: heap view: BUSY 0x4014c000 + 0xc + 0xc -> 0x4014c018
BUSY 0x4014c018 + 0xc + 0x18 -> 0x4014c03c
BUSY 0x4014c03c + 0xc + 0x24 -> 0x4014c06c
BUSY 0x4014c06c + 0xc + 0x1008 -> 0x4014d080
BUSY 0x4014d080 + 0xc + 0x3c -> 0x4014d0c8
BUSY 0x4014d0c8 + 0xc + 0x18 -> 0x4014d0ec
BUSY 0x4014d0ec + 0xc + 0x1008 -> 0x4014e100
BUSY 0x4014e100 + 0xc + 0x3c -> 0x4014e148
BUSY 0x4014e148 + 0xc + 0x1008 -> 0x4014f15c
BUSY 0x4014f15c + 0xc + 0x3c -> 0x4014f1a4
BUSY 0x4014f1a4 + 0xc + 0x4008 -> 0x401531b8
BUSY 0x401531b8 + 0xc + 0x3c -> 0x40153200
BUSY 0x40153200 + 0xc + 0x1008 -> 0x40154214
BUSY 0x40154214 + 0xc + 0x3c -> 0x4015425c
BUSY 0x4015425c + 0xc + 0x24 -> 0x4015428c
BUSY 0x4015428c + 0xc + 0x1008 -> 0x401552a0
BUSY 0x401552a0 + 0xc + 0x3c -> 0x401552e8
BUSY 0x401552e8 + 0xc + 0x2004 -> 0x401572f8
BUSY 0x401572f8 + 0xc + 0x2004 -> 0x40159308
BUSY 0x40159308 + 0xc + 0x3c -> 0x40159350
IDLE 0x40159350 + 0xc + 0x33c -> 0x40159698
BUSY 0x40159698 + 0xc + 0x10008 -> 0x401696ac
BUSY 0x401696ac + 0xc + 0x10008 -> 0x401796c0
BUSY 0x401796c0 + 0xc + 0x144 -> 0x40179810
BUSY 0x40179810 + 0xc + 0x198 -> 0x401799b4
BUSY 0x401799b4 + 0xc + 0x48 -> 0x40179a08
BUSY 0x40179a08 + 0xc + 0x4008 -> 0x4017da1c
IDLE 0x4017da1c + 0xc + 0x3c -> 0x4017da64
BUSY 0x4017da64 + 0xc + 0x30 -> 0x4017daa0
BUSY 0x4017daa0 + 0xc + 0x804 -> 0x4017e2b0
BUSY 0x4017e2b0 + 0xc + 0x804 -> 0x4017eac0
BUSY 0x4017eac0 + 0xc + 0x10008 -> 0x4018ead4
BUSY 0x4018ead4 + 0xc + 0x10008 -> 0x4019eae8
IDLE 0x4019eae8 + 0xc + 0x24 -> 0x4019eb18
BUSY 0x4019eb18 + 0xc + 0x10008 -> 0x401aeb2c
BUSY 0x401aeb2c + 0xc + 0x10008 -> 0x401beb40
BUSY 0x401beb40 + 0xc + 0x708 -> 0x401bf254
IDLE 0x401bf254 + 0xc + 0xeff34 -> 0x402af194
BUSY 0x402af194 + 0xc + 0xfff00 -> 0x403af0a0
IDLE 0x403af0a0 + 0xc + 0x50f54 -> 0x0
 === busy: 0x172bfc idle: 0x141224 meta: 0x1e0 total: 0x2b4000

cannot allocate: Exhausted { request: Layout { size: 2096640, align: 1 } }
backtrace for software version 4.0.dev+670.ge3938620:
0x4002851c
0x4004a0c8
0x4002940c
0x40028cfc
0x40028ed8
0x400025cc
0x4001f384
0x4001de3c
0x4000abb0
0x4000a8c0
0x4000a8c0
0x4000a8c0
0x4000a8c0
0x4000a8c0

Experiment, running on a Kasli DRTIO master (a274af7)

class DMASaturate(EnvExperiment):
    def build(self):
        self.setattr_device("core")
        self.setattr_device("core_dma")
        self.setattr_device("ttlo0")

    @kernel
    def run(self):
        self.core.reset()

        with self.core_dma.record("ttl_local"):
            for _ in range(100000):
                self.ttlo0.pulse(1*us)
                delay(1*us)

The text was updated successfully, but these errors were encountered:

whitequark · 2018-03-05T12:37:14Z

Rust doesn't have fallible allocation. This is a longstanding issue and is being implemented, until then this can't be fixed at a reasonable cost.

dnadlinger · 2018-03-05T17:45:19Z

Surely we can at the very least set aside a chunk of memory ourselves on startup to service DMA buffer requests from (i.e. use a local allocator backed by a preallocated region), or use the global allocator directly and handle failure without oom()ing?

dnadlinger · 2018-03-05T17:52:21Z

(Without having looked at the crash – famous last words – I presume this is from the buffer appending in rtio_dma. Can't we just replace the Vec<u8> by something that is backed by a local allocator and handles failure gracefully? Just hardcoding the amount of RAM to dedicate to DMA traces would be fine to start out with.)

whitequark · 2018-03-06T05:20:20Z

You'd need to "just" rewrite Vec to support specifying allocators explicitly. It seems more straightforward to wait a few days for the RFC to be implemented.

whitequark · 2018-05-14T19:15:49Z

I've nearly fixed the longstanding issue where error handling would require allocation, so this is one step closer to working.

whitequark added area:coredevice type:feature labels Mar 5, 2018

sbourdeauducq added the type:needs-funding label Mar 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle out-of-memory errors during DMA sequence recording #945

Handle out-of-memory errors during DMA sequence recording #945

cjbe commented Mar 5, 2018

whitequark commented Mar 5, 2018

dnadlinger commented Mar 5, 2018

dnadlinger commented Mar 5, 2018

whitequark commented Mar 6, 2018

whitequark commented May 14, 2018

Handle out-of-memory errors during DMA sequence recording #945

Handle out-of-memory errors during DMA sequence recording #945

Comments

cjbe commented Mar 5, 2018

whitequark commented Mar 5, 2018

dnadlinger commented Mar 5, 2018

dnadlinger commented Mar 5, 2018

whitequark commented Mar 6, 2018

whitequark commented May 14, 2018