-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of AMR with P4estMesh
#627
Comments
Some thoughts, questions:
|
It's only refining initially, isn't it? |
Yes, it could be that it's really just the number of cells involved being smaller in the beginning. |
@efaulhaber If I were you, I would
|
Most of the time spent in the solver part is due to the reinitializing of the containers. |
Here's another timer output where I timed
A considerable amount of time comes from I also confirmed this with |
OK, thanks for doing these first analysis steps. If it is not too difficult, it would be good to post the commands you used for these benchmarks here such that we pick this up at a later point. I think the good result is that most time is spent in the solver where we have more room for improvements, than directly in p4est. The rebalance cost will be cut in half as well, once cburstedde/p4est#112 is resolved. Thus for now I think this can be left as-is, and you can further concentrate on new features. |
I just ran mesh = P4estMesh((1, 1), polydeg=3,
coordinates_min=coordinates_min, coordinates_max=coordinates_max,
initial_refinement_level=4) and added some more @benchmark Trixi.reinitialize_containers!($mesh, $equations, $solver, $semi.cache) |
Request my review on your related PR when you feel ready for that and I can try to have a look to speed up critical parts |
Alright, thanks! |
I ran your benchmark setup above and have some prototype code which reduces the time from
in your PR to
using |
Perfect, thank you! A blog post sounds awesome! |
Currently, AMR with
TreeMesh
is a lot faster than withP4estMesh
.A quick comparison using
elixir_advection_amr.jl
(witht_end=9.9999
becauseP4estMesh
does one more time step thanTreeMesh
witht_end=10
):rhs!
is expected to be slower sinceP4estMesh
is treated as a curved mesh.AMR is also expected to be slower because the curved data structures that need to be rebuilt are a lot more complex than the ones used by
TreeMesh
.However, AMR with
P4estMesh
can most likely be optimized to be at least twice as fast as it is now.I found two things in particular that are slowing down AMR a lot and to which I don't know a solution yet.
Firstly,
calc_jacobian_matrix!
in theCurvedMesh
data structures (which is used byP4estMesh
as well).This function consists of these four lines:
Trixi.jl/src/solvers/dg_curved/containers_2d.jl
Lines 53 to 56 in c795248
Using
mul!
generally doesn't allocate, but it does when used with views like this.Secondly, when initializing
P4estMesh
data structures, user data needs to be passed to another function through p4est.This user data needs to provide the interfaces container, some interface ID, and the mesh. Currently, I'm packing these to a
Vector{Any}
, to which a pointer is passed to p4est.Trixi.jl/src/solvers/dg_p4est/containers_2d.jl
Line 143 in d53db7d
In other functions, I get this as a
Ptr{nothing}
, which I need to unpack.I am currently doing this as follows.
Trixi.jl/src/solvers/dg_p4est/containers_2d.jl
Lines 80 to 85 in d53db7d
However, this is not type-stable. I added a function barrier in this PR, but this is still not optimal.
How could I further optimize these two functions?
The text was updated successfully, but these errors were encountered: