Question about GC improvements and memory management #70

stefanos82 · 2021-06-06T20:21:44Z

stefanos82
Jun 6, 2021

I remember reading these two Nim articles, it really blew my mind: https://nim-lang.org/blog/2020/12/08/introducing-orc.html and https://nim-lang.org/docs/destructors.html.

Do you think such implementation, such as of destructor mechanism, could simplify even further the memory management for Nelua's record(s)?

As a programming language designer @edubart, I would like to hear your thoughts around this topic.

Answered by edubart

Jun 6, 2021

First, as a Nim user for a good amount of time and a C++ for more than a decade I am very used to RAII, constructors/destructors, reference counting, smart pointers. I was even addicted to abuse of such mechanisms for a good part of my programming life, thus was even I my original goals of Nelua to offer the following 3 memory management mechanisms:

Garbage collection (like Lua or Go)
Manual memory management (like naive C)
Automatic referencing counting (like Nim ARC/ORC, Swift ARC or modern C++)

While developing Nelua I first made 1 and 2, then I've even worked in 3, and it even shipped the Nelua master for some time (although under experimental and not documented). But while developi…

View full answer

edubart · 2021-06-06T21:20:35Z

edubart
Jun 6, 2021
Maintainer

First, as a Nim user for a good amount of time and a C++ for more than a decade I am very used to RAII, constructors/destructors, reference counting, smart pointers. I was even addicted to abuse of such mechanisms for a good part of my programming life, thus was even I my original goals of Nelua to offer the following 3 memory management mechanisms:

Garbage collection (like Lua or Go)
Manual memory management (like naive C)
Automatic referencing counting (like Nim ARC/ORC, Swift ARC or modern C++)

While developing Nelua I first made 1 and 2, then I've even worked in 3, and it even shipped the Nelua master for some time (although under experimental and not documented). But while developing that at some point I've decided to cut that out because as you code and design such mechanisms you notice the amount of complexity is made, not just in the compiler and language design, but the cognitive load it causes on users and in the syntax. Plus the standard library would become complex and not that efficient in some places, (try to read for example C++ standard libraries, do you find that readable?). In summary the language would not be so simple anymore with such systems, thus not that pleasant to code and also not always efficient, all that diverges from Nelua simplicity and efficient goals. Moreover automatic referencing counting is not magic or that efficient as some assume, it trashes the CPU caches with referencing counting, thus depending on your application GC or manual memory management can be faster than reference counting.

What memory model to choose all depends on the application requirements, none of the 3 options is the best. The best always depends on your requirements, for some things you could use GC, for others manual memory management would be best, for some special stuff reference counting makes sense and that can still be done manually in Nelua, it's just not in the language goals to provide means to do this automatically because it would hurt some principles as I found in my research.

There is also a 4. way I think people should do more in the future

Custom allocators and handles (like Zig and Odin)

This 4th way is what I currently aim for in the future of Nelua in terms of better memory management. It's the most efficient and logical for me today, and can be easier and faster than manual memory management. It's the most logical when you think about how your hardware works. Nelua already has some allocators for that in the standard library, but it's design is not finished yet so people should stick with GC or manual memory management at this moment.

The best memory management mechanism is to never allocate in the first place, if you design your application with a well thought data structure, custom allocators and everything preallocated you never need to allocate or free, I plan to demo more how to this with Nelua in the future, I've already some in-house games in Nelua not doing any allocation, just using custom allocators, handles and fixed buffers. In this design there is no cost of GC or referencing counting and leaks are impossible. The custom allocator can have data locality which is even better for the CPU cache and efficiency. The code complexity is way lower than ownership or referencing counting system would be in my opinion, and simpler than doing manual memory management because you can't have leaks if you never allocate, and can't have dangling pointers (use after free) if you use generational handles. This is a nice way software can be designed in my opinion while maintaining simplicity, bug free and efficiency.

Finally I will share here some articles on the topic that I feel similar thoughts for further reading:

https://www.gingerbill.org/article/2019/02/01/memory-allocation-strategies-001/
https://www.gingerbill.org/article/2020/06/21/the-ownership-semantics-flaw/
https://floooh.github.io/2018/06/17/handles-vs-pointers.html

0 replies

stefanos82 · 2021-06-06T22:17:23Z

stefanos82
Jun 6, 2021
Author

First, as a Nim user for a good amount of time and a C++ for more than a decade I am very used to RAII, constructors/destructors, reference counting, smart pointers. I was even addicted to abuse of such mechanisms for a good part of my programming life, thus was even I my original goals of Nelua to offer the following 3 memory management mechanisms:
1. Garbage collection (like Lua or Go)

2. Manual memory management (like naive C)

3. Automatic referencing counting (like Nim ARC/ORC, Swift ARC or modern C++)
While developing Nelua I first made 1 and 2, then I've even worked in 3, and it even shipped the Nelua master for some time (although under experimental and not documented).

I thought you would follow this route and order of choice due to the nature of game development and your accumulated experience in such field.

But while developing that at some point I've decided to cut that out because as you code and design such mechanisms you notice the amount of complexity is made, not just in the compiler and language design, but the cognitive load it causes on users and in the syntax. Plus the standard library would become complex and not that efficient in some places, (try to read for example C++ standard libraries, do you find that readable?).

About reading STL and C++ libraries in general...yeah, I feel you!

I could be wrong, but I have the impression elite engineers and developers compete with each other for the sake of showoff without paying attention to usability, let alone readability and comprehension.

In summary the language would not be so simple anymore with such systems, thus not that pleasant to code and also not always efficient, all that diverges from Nelua simplicity and efficient goals. Moreover automatic referencing counting is not magic or that efficient as some assume, it trashes the CPU caches with referencing counting, thus depending on your application GC or manual memory management can be faster than reference counting.

Basically what I have had in mind is how RAII works, especially with modern C++ (smart pointers etc), things got a lot "safer" comparing the past times with legacy code and tricky low-level techniques that you were forced to use to handle memory. It's to know you don't have to worry about releasing memory, that the language does it for you via RAII; whatever goes out of scope gets released.

What memory model to choose all depends on the application requirements, none of the 3 options is the best. The best always depends on your requirements, for some things you could use GC, for others manual memory management would be best, for some special stuff reference counting makes sense and that can still be done manually in Nelua, it's just not in the language goals to provide means to do this automatically because it would hurt some principles as I found in my research.

...but it can be implemented as an extension, a language extension that is via metaprogramming implementation, right?

There is also a 4. way I think people should do more in the future
1. Custom allocators and handles (like Zig and Odin)
This 4th way is what I currently aim for in the future of Nelua in terms of better memory management. It's the most efficient and logical for me today, and can be easier and faster than manual memory management. It's the most logical when you think about how your hardware works. Nelua already has some allocators for that in the standard library, but it's design is not finished yet so people should stick with GC or manual memory management at this moment.

The best memory management mechanism is to never allocate in the first place, if you design your application with a well thought data structure, custom allocators and everything preallocated you never need to allocate or free, I plan to demo more how to this with Nelua in the future, I've already some in-house games in Nelua not doing any allocation, just using custom allocators, handles and fixed buffers. In this design there is no cost of GC or referencing counting and leaks are impossible. The custom allocator can have data locality which is even better for the CPU cache and efficiency. The code complexity is way lower than ownership or referencing counting system would be in my opinion, and simpler than doing manual memory management because you can't have leaks if you never allocate, and can't have dangling pointers (use after free) if you use generational handles. This is a nice way software can be designed in my opinion while maintaining simplicity, bug free and efficiency.

I learned something new today, about custom allocators; cheers for sharing this valuable info.

I would love to see a demo around this concept to get a taste how it works.

Finally I will share here some articles on the topic that I feel similar thoughts for further reading:

https://www.gingerbill.org/article/2019/02/01/memory-allocation-strategies-001/
https://www.gingerbill.org/article/2020/06/21/the-ownership-semantics-flaw/
https://floooh.github.io/2018/06/17/handles-vs-pointers.html

I heard a podcast with Ginger Bill, Andrew Kelly, and another guy that I cannot remember his name; they shared incredible feedback, ideas, and bottleneck they all faced while trying to solve specific problems around the domain they were working on.

Andre's blog is a valuable resource around various topics. I already read https://floooh.github.io/2019/09/27/modern-c-for-cpp-peeps.html and enjoyed it.

I appreciate your thorough feedback @edubart; you are helping this "old" geek to finally embrace language design and implementation.

0 replies

edubart · 2021-06-07T00:52:29Z

edubart
Jun 7, 2021
Maintainer

I heard a podcast with Ginger Bill, Andrew Kelly, and another guy that I cannot remember his name; they shared incredible feedback, ideas

Oh yea, I've heard that podcast, also I've read the articles you mentioned of Nim and others blogs from Andre long time ago. They are all good.

...but it can be implemented as an extension, a language extension that is via metaprogramming implementation, right?

Probably, as an example, let's say you want to implement Lua 5.4 style "destructors", the to-be-closed variables, it's way simpler than full destructors semantics as found in C++ because only variables marked with the annotation <close> are "destroyed" by calling the metamethod __close at the scope end. Implementing this feature modifying the Nelua compiler via the processor can serve as a good showcase example as Nelua does not have this feature official yet, but a naive implementation can be implemented via meta programming like in the following:

##[[
local typedefs = require 'nelua.typedefs'
local tabler = require 'nelua.utils.tabler'
local visitors = require 'nelua.analyzer'.visitors
typedefs.variable_annots.close = true -- define the `close` annotation
-- hook original VarDecl node visitor in the analyzer
local orig_VarDecl = visitors.VarDecl
function visitors.VarDecl(context, node)
  local idnodes = node[2] -- list of identifier declarations nodes
  for _,idnode in ipairs(idnodes) do -- iterate over identifier declarations nodes
    local symbol = idnode.attr -- get identifier symbol
    if symbol.close and not symbol.closed then -- identifier symbol has `close` annotation
      -- create a defer call to __close method
      local callnode = aster.Defer{aster.Block{
        aster.CallMethod{'__close', {}, aster.Id{idnode[1]}}
      }}
      -- inject defer call after variable declaration
      local blocknode = context:get_parent_node() -- get parent block node
      assert(blocknode.tag == 'Block')
      local statindex = tabler.ifind(blocknode, node) -- find this node index
      assert(statindex)
      table.insert(blocknode, statindex+1, callnode) -- insert the new statement
      blocknode.scope:delay_resolution()
      symbol.closed = true
    end
  end
  -- call original VarDecl
  return orig_VarDecl(context, node)
end
]]

require 'allocators.general'

local Object = @record{
  x: integer
}

function Object:__close()
  print 'object destroyed'
  general_allocator:delete(self)
end

do
  local o: *Object <close> = general_allocator:new(@Object)
  -- "defer o:__close() end" is injected here
  print 'object created'
  -- o:__close() will be called automatically here
end

If you run the above program, you should get this output:

object created
object destroyed

Note that the __close call has been injected via metaprogramming, and was not called explicitly. Although I am not encouraging doing things like this because lots of the compiler knowledge and APIs (which may change) are required, and all that will be undocumented until the day I think compiler internal APIs are stable enough.

The same thing could be done for full feature fledged destructors, but would be quite complex to do.

3 replies

stefanos82 Jun 7, 2021
Author

I'm learning so many crazy stuff from you @edubart, goodness gracious 😆

edubart Jul 2, 2021
Maintainer

Since 847cd7a , to-be-closed variables are now part of the language, the above example was improved and made into the compiler.

stefanos82 Jul 3, 2021
Author

Based on this line function string.gmatchview(s: string, pattern: string, init: facultative(isize)): (auto, auto, string), it occurred me the following question: in the future, why not support a variadic auto, like (...auto) to expand accordingly?

edubart · 2021-06-23T15:05:40Z

edubart
Jun 23, 2021
Maintainer

This a repost for an answer made in 494ea5a :

There is room to improve the GC, just not good motivation for doing it at this moment, the current implementation uses a simple mark-and-sweep and stop-the-world algorithm (similar to Lua 5.0). Garbage collection is a broad topic under research, there are multiple ways to do it, all with different advantages and disadvantages. For Nelua by default I think having a simple and reliable garbage collector is enough for this moment, the garbage collector could be improved in the future by making it incremental (like in Lua 5.1), and later generational (like in Lua 5.4) if any good motivation to do so appears.

The current GC design stops the application to run a full collection cycle every time the memory usage is doubled. For small applications the collection cycle is quite fast, for large applications with lots of allocations this may become a problem for real-time requirements. Note that despite Lua 5.1 having an incremental garbage collector the application still suffers stalls from time to time for the garbage collection atomic phase, it's just less noticeable, so having a really real-time collector without any stop-the-world step is very difficult (I don't know if any even exists at this moment).

I don't plan to implement incremental collector at this time because it will incur more runtime overhead when the application is not collecting, due to write barriers that are introduced every-time a variable assignment happens, thus incremental collector increases overall CPU usage, I think this is a huge downside for my current plans, it's a price you pay for less noticeable stop-the-world phase. Also the compiler and application complexity would grow, thus making it harder to maintain, all go against the simple and minimal goals. I think if the user doesn't want any stop-the-world to happen, then he should design a good data structure, or disable the GC and manage memory himself, or maybe even mix GC / non-GC code (this is possible although should be done carefully).

Despite the current GC having a very simple algorithm, it has a good overall runtime performance when you can live with the stop-the-world phases. For example, if you make a script that is a single shot run (like a batch script) it should perform better in terms of total runtime when you compare with most incremental and generational collectors.

Although Nelua provides a default GC, you could completely replace it by other well researched GC, just by replacing the allocators/gc.nelua file in your project, for example a very known good GC and mostly available in all Linux distributions is The Boehm-Demers-Weiser Conservative Garbage Collector, it should be simple to make a gc.nelua replacing the current GC with it, and this GC supports both incremental and generational garbage collection, but it's huge dependency, Nelua will never use it in its core because Nelua remains to be free of external libraries, specially libraries that have complex licenses, but the user could switch to it for a better well researched GC implementation.

Based on our discussion about GC improvements and the incredible reply of yours how automatic memory release could get implemented via meta-programming, do you think it could improve GC's behavior, even decrease its speed and usage, if it were implemented in Nelua right now?

The GC is already implemented in Nelua, but it does still have room to scan less memory by using some meta-programming on type information, it's possible to use type information to build the memory layout of records, to scan just segments in the record that contains pointers, because at this moment any record that is known to contain a pointer is fully scanned. This may be improved somewhere in the future, I just did not do it yet because for the current use cases the GC is already quite fast, until someone has good motivations to do this, it should remain simple and ignore records memory layout.

Also, based on your own experience, under which conditions and circumstances GC becomes heavily used that makes its presence obvious?

The GC will become slow to run a collect cycle when you have a lot of allocations done by the gc_allocator (remember that this is default allocator for everything), and even more for allocations that contain pointers inside (because they are marked to be scanned). Let's say you have hundreds of thousands different allocations, in that case the application may freeze for a few seconds every time a collection cycle is requested. Remember that a collection cycle only happens only when the used memory doubles, and there are ways you could delay this. Also you could avoid hundreds of thousands different allocations to alleviate GC work by designing a good data structure in your application.

1 reply

stefanos82 Jun 23, 2021
Author

Lots of interesting things for me, thank you @edubart.

I need to study more about Garbage Collection and the following homonym book has lots of useful stuff I need learn, especially on three topics I'm most interested in: parallelism, concurrency, and real-time.

stefanos82 · 2021-06-23T20:23:54Z

stefanos82
Jun 23, 2021
Author

I'm writing here my thoughts about current Nelua GC state; in other words, arguing with myself out loud.

Currently I can see we have /lib/allocators/ and in there it's gc.nelua which implements mark-and-sweep garbage collector mechanism.

For future reasons, in my humble opinion it would make sense to have something like /lib/allocators/gc/ and in there it could be placed marksweep.nelua that could be included in /lib/allocators/gc.nelua.

This way, if I'm interested in testing a different type of collector mechanism, I could implement it and place it in /lib/allocators/gc/ or if it's already implemented, I could call it accordingly from inside gc.nelua; and in the future, when such mechanism could become available, it could be a matter of a flag argument a user could enter either during execution or inside nelua's global config setting's file to use the one is appropriate on a user's case.

4 replies

edubart Jun 23, 2021
Maintainer

That's a good suggestion to do once the language get another GC option, I think I will implement a second GC option using the Boehm GC later, but the user would need to be explicit enabled it and have libgc (the Boehm library) installed in his computer. Then we would have this structure:

allocators/gc.nelua (provides just GCAllocator)
allocators/gc/marksweep.nelua (provides GC using a simple mark and sweep implementation, the default)
allocators/gc/boehm.nelua (provides GC using system's libgc)

Mark and sweep would be the default, but in case the user do require 'allocators.gc.boehm', then Boehm would be used.

stefanos82 Jun 23, 2021
Author

Isn't there a way to include it inline as part of the generated C file, like SQLite does with amalgamation?

edubart Jun 23, 2021
Maintainer

Isn't there a way to include it inline as part of the generated C file, like SQLite does with amalgamation?

Boehm is huge, not simple to compile, compiles differently across systems, and there are license issues. Also I don't think it's easy to do amalgamation on it.

stefanos82 Jun 23, 2021
Author

Yeah, I looked at it and realized how humongous its codebase is!

So, logically what you will do as part of the Boehm support would be to have a fallback to default GC (whichever that is) in case libgc is not installed or cannot be found in OS system path and user tries to use it.

A set of warnings should be displayed to let the user know what he or she tries to accomplish, without actually blaming the language.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about GC improvements and memory management #70

{{title}}

Replies: 5 comments 8 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Question about GC improvements and memory management #70

stefanos82 Jun 6, 2021

Replies: 5 comments · 8 replies

edubart Jun 6, 2021 Maintainer

stefanos82 Jun 6, 2021 Author

edubart Jun 7, 2021 Maintainer

stefanos82 Jun 7, 2021 Author

edubart Jul 2, 2021 Maintainer

stefanos82 Jul 3, 2021 Author

edubart Jun 23, 2021 Maintainer

stefanos82 Jun 23, 2021 Author

stefanos82 Jun 23, 2021 Author

edubart Jun 23, 2021 Maintainer

stefanos82 Jun 23, 2021 Author

edubart Jun 23, 2021 Maintainer

stefanos82 Jun 23, 2021 Author

stefanos82
Jun 6, 2021

Replies: 5 comments 8 replies

edubart
Jun 6, 2021
Maintainer

stefanos82
Jun 6, 2021
Author

edubart
Jun 7, 2021
Maintainer

stefanos82 Jun 7, 2021
Author

edubart Jul 2, 2021
Maintainer

stefanos82 Jul 3, 2021
Author

edubart
Jun 23, 2021
Maintainer

stefanos82 Jun 23, 2021
Author

stefanos82
Jun 23, 2021
Author

edubart Jun 23, 2021
Maintainer

stefanos82 Jun 23, 2021
Author

edubart Jun 23, 2021
Maintainer

stefanos82 Jun 23, 2021
Author