Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Interior facet integrals give differing values in serial and in parallel #3501

Closed
bpachev opened this issue Nov 6, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@bpachev
Copy link
Contributor

bpachev commented Nov 6, 2024

Summarize the issue

When assembling a vector from an interior facet integral, the vector sum and norm change with the number of MPI processes. While mesh partitioning does affect the order of degrees of freedom, it shouldn't change these aggregate metrics.

How to reproduce the bug

Within the Docker image dolfinx/dolfinx:stable, I ran the MWE in parallel with varying numbers of processes

mpiexec -np 1 python3 mwe_interior_facet_bug.py
mpiexec -np 2 python3 mwe_interior_facet_bug.py
mpiexec -np 3 python3 mwe_interior_facet_bug.py

Minimal Example (Python)

from mpi4py import MPI
from dolfinx import mesh, fem as fe
from dolfinx.fem import petsc as fe_petsc
from petsc4py import PETSc
import ufl
import numpy as np

domain = mesh.create_unit_square(
    MPI.COMM_WORLD, 19, 27, mesh.CellType.triangle
)

V = fe.functionspace(domain, ("P", 1))
V_dg = fe.functionspace(domain, ("DG", 1))
p = ufl.TestFunction(V)
p_dg = ufl.TestFunction(V_dg)
n = ufl.FacetNormal(domain)
u_dg = fe.Function(V_dg)
u_dg.interpolate(lambda x: x[0]**2 + x[1])
kappa = fe.Function(V)
kappa.interpolate(lambda x: np.sin(x[0])*np.cos(x[1]))
L = ufl.avg(p_dg) * ufl.avg(kappa) * ufl.dot(ufl.avg(ufl.grad(u_dg)), n("+")) * ufl.dS
L = fe.form(L)

b = fe_petsc.create_vector(L)
b.zeroEntries()

fe_petsc.assemble_vector(b, L)
b.ghostUpdate(addv=PETSc.InsertMode.ADD, mode=PETSc.ScatterMode.REVERSE)

# now compute sum across all entries
# show prints for all ranks to prove that the values are global, not local
print(f"With {MPI.COMM_WORLD.size} processes, sum={b.sum()}, norm={b.norm(PETSc.NormType.NORM_2)}")

Output (Python)

With 1 processes, sum=1.9735490402112, norm=0.2900086489812219
With 2 processes, sum=-1.1928489757529805, norm=0.2906011302255919
With 2 processes, sum=-1.1928489757529805, norm=0.2906011302255919
With 3 processes, sum=2.753330018608379, norm=0.3319789413877082
With 3 processes, sum=2.753330018608379, norm=0.3319789413877082
With 3 processes, sum=2.753330018608379, norm=0.3319789413877082

Version

0.9.0

DOLFINx git commit

4116cca

Installation

Used the docker image dolfinx/dolfinx:stable

Additional information

Manual inspection of some of the vector entries indicates that the absolute value is correct, but the sign is flipped. However, as the mismatch in norm values indicates, that's not the only issue. This issue doesn't appear for exterior facet or cell integral types. I suspect the issue may have something to do with how the ordering of cells corresponding to a given facet changes in parallel, but am not sure yet.

@bpachev bpachev added the bug Something isn't working label Nov 6, 2024
@francesco-ballarin
Copy link
Member

francesco-ballarin commented Nov 7, 2024

You need to change the ghost_mode to shared_facet, see for instance https://jsdokken.com/dolfinx_docs/meshes.html#ghosting
or grep in this repository for unit tests.

Something like

mesh = dolfinx.mesh.create_unit_square(
    MPI.COMM_WORLD, 10, 10, ghost_mode=dolfinx.mesh.GhostMode.shared_facet)

will fix the issue.

@bpachev
Copy link
Contributor Author

bpachev commented Nov 7, 2024

Setting the ghost mode to shared facet doesn't change the results of the above script. In fact, shared_facet is the default ghost mode for create_unit_square, as listed in the docs

@francesco-ballarin
Copy link
Member

Well, then you need something similar to https://fenicsproject.discourse.group/t/consistent-side-of-interior-boundaries-with-dolfinx/15968/2 .

@garth-wells
Copy link
Member

garth-wells commented Nov 7, 2024

The n("+") term looks fishy to me. There are no guarantees on which side is "+" and which side is "-". DG schemes are normally written such that it doesn't matter which side is +/-.

@garth-wells garth-wells reopened this Nov 7, 2024
@bpachev
Copy link
Contributor Author

bpachev commented Nov 7, 2024

So in light of that, we're OK if the results of an integral involving an n("+") term change with the number of MPI processes? I guess that makes sense given this wouldn't affect an actual DG scheme (which this is not). Which would make this not a bug, because we give no guarantees about the choice between +/- or that choice remaining stable across different numbers of processes.

Might be worth including some sort of warning about this in the docs.

@garth-wells garth-wells closed this as not planned Won't fix, can't repro, duplicate, stale Nov 7, 2024
@jorgensd
Copy link
Member

jorgensd commented Nov 7, 2024

So in light of that, we're OK if the results of an integral involving an n("+") term change with the number of MPI processes? I guess that makes sense given this wouldn't affect an actual DG scheme (which this is not). Which would make this not a bug, because we give no guarantees about the choice between +/- or that choice remaining stable across different numbers of processes.

Might be worth including some sort of warning about this in the docs.

I have written up a demo that illustrates how to do custom orientation of jumps at:
jorgensd/dolfinx-tutorial#158
https://fenicsproject.discourse.group/t/wrong-facetnormal-vector-on-internal-boundaries/12887/2
in the case of one sided integration, and for jumps at:
https://gist.github.com/jorgensd/5a8b60121a705bb9896eb1465be8e111
(Here in the case of a bifurcation mesh, but it does work in general).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants