Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle out-of-memory errors during DMA sequence recording #945

Open
cjbe opened this issue Mar 5, 2018 · 5 comments
Open

Handle out-of-memory errors during DMA sequence recording #945

cjbe opened this issue Mar 5, 2018 · 5 comments

Comments

@cjbe
Copy link
Contributor

cjbe commented Mar 5, 2018

Trying to record a DMA sequence that is too large to fit into memory results in the core device falling over - it would be useful if the kernel error-ed out with a descriptive exception.

[     8.585024s]  INFO(runtime::kern_hwreq): resetting RTIO                                                                                                                                                                        [110/1967]
panic at /home/artiq/artiq-dev/artiq/artiq/firmware/liballoc_list/lib.rs:107: heap view: BUSY 0x4014c000 + 0xc + 0xc -> 0x4014c018
BUSY 0x4014c018 + 0xc + 0x18 -> 0x4014c03c
BUSY 0x4014c03c + 0xc + 0x24 -> 0x4014c06c
BUSY 0x4014c06c + 0xc + 0x1008 -> 0x4014d080
BUSY 0x4014d080 + 0xc + 0x3c -> 0x4014d0c8
BUSY 0x4014d0c8 + 0xc + 0x18 -> 0x4014d0ec
BUSY 0x4014d0ec + 0xc + 0x1008 -> 0x4014e100
BUSY 0x4014e100 + 0xc + 0x3c -> 0x4014e148
BUSY 0x4014e148 + 0xc + 0x1008 -> 0x4014f15c
BUSY 0x4014f15c + 0xc + 0x3c -> 0x4014f1a4
BUSY 0x4014f1a4 + 0xc + 0x4008 -> 0x401531b8
BUSY 0x401531b8 + 0xc + 0x3c -> 0x40153200
BUSY 0x40153200 + 0xc + 0x1008 -> 0x40154214
BUSY 0x40154214 + 0xc + 0x3c -> 0x4015425c
BUSY 0x4015425c + 0xc + 0x24 -> 0x4015428c
BUSY 0x4015428c + 0xc + 0x1008 -> 0x401552a0
BUSY 0x401552a0 + 0xc + 0x3c -> 0x401552e8
BUSY 0x401552e8 + 0xc + 0x2004 -> 0x401572f8
BUSY 0x401572f8 + 0xc + 0x2004 -> 0x40159308
BUSY 0x40159308 + 0xc + 0x3c -> 0x40159350
IDLE 0x40159350 + 0xc + 0x33c -> 0x40159698
BUSY 0x40159698 + 0xc + 0x10008 -> 0x401696ac
BUSY 0x401696ac + 0xc + 0x10008 -> 0x401796c0
BUSY 0x401796c0 + 0xc + 0x144 -> 0x40179810
BUSY 0x40179810 + 0xc + 0x198 -> 0x401799b4
BUSY 0x401799b4 + 0xc + 0x48 -> 0x40179a08
BUSY 0x40179a08 + 0xc + 0x4008 -> 0x4017da1c
IDLE 0x4017da1c + 0xc + 0x3c -> 0x4017da64
BUSY 0x4017da64 + 0xc + 0x30 -> 0x4017daa0
BUSY 0x4017daa0 + 0xc + 0x804 -> 0x4017e2b0
BUSY 0x4017e2b0 + 0xc + 0x804 -> 0x4017eac0
BUSY 0x4017eac0 + 0xc + 0x10008 -> 0x4018ead4
BUSY 0x4018ead4 + 0xc + 0x10008 -> 0x4019eae8
IDLE 0x4019eae8 + 0xc + 0x24 -> 0x4019eb18
BUSY 0x4019eb18 + 0xc + 0x10008 -> 0x401aeb2c
BUSY 0x401aeb2c + 0xc + 0x10008 -> 0x401beb40
BUSY 0x401beb40 + 0xc + 0x708 -> 0x401bf254
IDLE 0x401bf254 + 0xc + 0xeff34 -> 0x402af194
BUSY 0x402af194 + 0xc + 0xfff00 -> 0x403af0a0
IDLE 0x403af0a0 + 0xc + 0x50f54 -> 0x0
 === busy: 0x172bfc idle: 0x141224 meta: 0x1e0 total: 0x2b4000

cannot allocate: Exhausted { request: Layout { size: 2096640, align: 1 } }
backtrace for software version 4.0.dev+670.ge3938620:
0x4002851c
0x4004a0c8
0x4002940c
0x40028cfc
0x40028ed8
0x400025cc
0x4001f384
0x4001de3c
0x4000abb0
0x4000a8c0
0x4000a8c0
0x4000a8c0
0x4000a8c0
0x4000a8c0

Experiment, running on a Kasli DRTIO master (a274af7)

class DMASaturate(EnvExperiment):
    def build(self):
        self.setattr_device("core")
        self.setattr_device("core_dma")
        self.setattr_device("ttlo0")

    @kernel
    def run(self):
        self.core.reset()

        with self.core_dma.record("ttl_local"):
            for _ in range(100000):
                self.ttlo0.pulse(1*us)
                delay(1*us)
@whitequark
Copy link
Contributor

Rust doesn't have fallible allocation. This is a longstanding issue and is being implemented, until then this can't be fixed at a reasonable cost.

@dnadlinger
Copy link
Collaborator

Surely we can at the very least set aside a chunk of memory ourselves on startup to service DMA buffer requests from (i.e. use a local allocator backed by a preallocated region), or use the global allocator directly and handle failure without oom()ing?

@dnadlinger
Copy link
Collaborator

(Without having looked at the crash – famous last words – I presume this is from the buffer appending in rtio_dma. Can't we just replace the Vec<u8> by something that is backed by a local allocator and handles failure gracefully? Just hardcoding the amount of RAM to dedicate to DMA traces would be fine to start out with.)

@whitequark
Copy link
Contributor

You'd need to "just" rewrite Vec to support specifying allocators explicitly. It seems more straightforward to wait a few days for the RFC to be implemented.

@whitequark
Copy link
Contributor

I've nearly fixed the longstanding issue where error handling would require allocation, so this is one step closer to working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants