Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast Matrix Assembly #448

Draft
wants to merge 10 commits into
base: devel
Choose a base branch
from
Draft

Fast Matrix Assembly #448

wants to merge 10 commits into from

Conversation

jowezarek
Copy link
Contributor

@jowezarek jowezarek commented Nov 8, 2024

Global Sum Factorization for Psydac

This PR aims to speed up the assembly of matrices corresponding to 2D and 3D BilinearForms on the domain.

How is a matrix assembled?

Previously:

a   = BilinearForm((u, v), integral(domain, expr))
a_h = discretize(a, domain_h, (V1h, V2h), backend=backend) # <--- What happens in here?
A   = a_h.assemble()

# -----

# Step by Step summary of what happens after you discretize the BilinearForm

# In psydac.api.discretization.discretize(): Additional kwargs are generated. 
# Then:
return DiscreteBilinearForm(a, kernel_expr, *args, **kwargs)

# In psydac.api.fem.DiscreteBilinearForm.__init__():
# BasicDiscrete generates the assembly code and sets the following attributes that are used afterwards:
# self._func, self._free_args, self._max_nderiv and self._backend
BasicDiscrete.__init__(self, expr, kernel_expr, comm=comm, root=0, discrete_space=discrete_space,
              nquads=nquads, is_rational_mapping=is_rational_mapping, mapping=symbolic_mapping,
              mapping_space=mapping_space, num_threads=self._num_threads, backend=assembly_backend)

# Allocate the output matrix, if needed
self.allocate_matrices(linalg_backend)

# Determine whether OpenMP instructions were generated
with_openmp = (assembly_backend['name'] == 'pyccel' and assembly_backend['openmp']) if assembly_backend else False

# Construct the arguments to be passed to the assemble() function, which is stored in self._func
self._args, self._threads_args = self.construct_arguments(with_openmp=with_openmp)
  1. DiscreteBasic generates the assembly code (self._func) and sets self._free_args. The latter consists of information regarding free FemFields in the BilinearForm.
  2. self.allocate_matrices generates the StencilMatrices we write into.
  3. self.construct_arguments generates all the arguments required for the assembly that are not already covered by self._free_args. Additional arguments in the case of openmp parallelization are generated.

Now:

a   = BilinearForm((u, v), integral(domain, expr))
a_h = discretize(a, domain_h, (V1h, V2h), backend=backend) # <--- What happens in here?
A   = a_h.assemble()

# -----

# Step by Step summary of what happens after you discretize the BilinearForm

# In psydac.api.discretization.discretize(): Additional kwargs are generated. 
# !NEW!: We check whether we can use the new assembly method.
# !NEW!: Only then do we set the kwarg new_assembly to True
# Then:
return DiscreteBilinearForm(a, kernel_expr, *args, **kwargs)

# In psydac.api.fem.DiscreteBilinearForm.__init__():
# BasicDiscrete generates the assembly code and sets the following attributes that are used afterwards:
# self._func, self._free_args, self._max_nderiv and self._backend
BasicDiscrete.__init__(self, expr, kernel_expr, comm=comm, root=0, discrete_space=discrete_space,
              nquads=nquads, is_rational_mapping=is_rational_mapping, mapping=symbolic_mapping,
              mapping_space=mapping_space, num_threads=self._num_threads, backend=assembly_backend)
# !NEW! We don't mess with BasicDiscrete. Instead, we keep using self._free_args and overwrite self._func later.

# Allocate the output matrix, if needed
self.allocate_matrices(linalg_backend)

# Determine whether OpenMP instructions were generated
with_openmp = (assembly_backend['name'] == 'pyccel' and assembly_backend['openmp']) if assembly_backend else False

# Construct the arguments to be passed to the assemble() function, which is stored in self._func
if self._new_assembly == True:
    # no openmp support yet
    self._args, self._threads_args = self.construct_arguments_generate_assembly_file()
else:
    self._args, self._threads_args = self.construct_arguments(with_openmp)
# !NEW! If the flag new_assembly is set to True, do something else!
  1. DiscreteBasic generates the assembly code (self._func) and sets self._free_args. The latter consists of information regarding free FemFields in the BilinearForm. We will overwrite self._func later.
  2. self.allocate_matrices generates the StencilMatrices we write into.
  3. self._new_assembly == True or self._new_assembly == False ?
    1. False: self.construct_arguments generates all the arguments required for the assembly that are not already covered by self._free_args. Additional arguments in the case of openmp parallelization are generated.
    2. True: self.construct_arguments_generate_assembly_file generates all the arguments required for the new assembly method that are not already covered by self._free_args. No openmp support! Additionally, new assembly code is generated and pyccelized.

Summary of what the new self.construct_arguments_generate_assembly_file method does follows next week.

No openmp support - What now?

I don't intend on supporting openmp parallelization. At least not until everything else works as a cherry on top.

But: openmp related Github tests fail. I want to detect whether code is run with openmp and accordingly don't set new_assembly to True. But I don't know how to do that. Help please.

No multiplicity support - What now?

Changing the multiplicity (of ...?) results in a changed assembly algorithm. I didn't look into it yet. I suppose it's related to additional quadrature points.

The new assembly method can of course be altered such that it's compatible with a changed multiplicity, but, again, I don't intend to do that right now. Instead, I want to detect such cases and don't set new_assembly to True. This can eventually be fixed once everything else runs.

What is left to do?

  • See bullet point 3: I need to read import information from the BilinearForm.
  • See bullet point 9: I need to randomize the assembly function file names. If I don't, they will get overwritten.
  • The four template properties of DiscreteBilinearForm will be merged into two.
  • There's a bug in the old allocate_matrix function. I will better document this issue next week.
  • It seems I'm using AnalyticalMappings wrong? Will need to investigate next week.
  • Once the 3D case works, adding the 2D case should not be a huge issue.

Also: The fact that all tests currently run implies that matrix assembly is poorly tested. I will need to properly test matrix assembly with tests that don't run forever - This will be difficult, as BilinearForm discretization can be slow, but I'll need to test all possibilities.
#--------------------------------------------------------------------------------------------------------------------------------------
Is it finally happening? Fast Bilinear Form Matrix Assembly is coming to Psydac!

ToDo:

  1. Understand ._free_args <- DONE

    In DiscreteBilinearForm, the following happens:

    if self._free_args:
       basis   = []
       spans   = []
       degrees = []
       pads    = []
       coeffs  = []
       consts  = []
    
       for key in self._free_args:
          v = kwargs[key]
    
          # the arrays are being filled
    
       args = (*self.args, *basis, *spans, *degrees, *pads, *coeffs, *consts)
    else:
       args = self._args

    As I did not consider BilinearForms with additional free FemFields yet, the assembly of such matrices will for sure fail.
    I will need to consider such cases and test them properly.

  2. Include free FemFields in the test cases <- DONE

    Just to make sure: Once 1. is done, properly test the inclusion of free FemFields in the matrix assembly of BilinearForms.

  3. Get the imports right

    A BilinearForm as defined below will require the import of the sin function, e.g. from numpy, inside the generated assemble_matrix method in the assembly.py file. I currently have no way of automatically including this import, rather I hardcoded

    from numpy import array, zeros, zeros_like, floor
    from math import sqrt, sin, pi, cos

    in the _assembly_template_head property of the DiscreteBilinearForm.

    x, y, z         = domain.coordinates
    gamma           = x*y*z + sin(x*y+z)**2
    
    V2              = derham.V2
    w1, w2          = elements_of(V2, names='w1, w2')
    
    expr            = dot(w1, w2)*gamma
    int_0           = lambda expr: integral(domain, expr)
    
    a               = BilinearForm((w1, w2), int_0(expr))
  4. Handle VectorFunctionSpaces not belonging to a de Rham sequence. <- not a problem after all, test cases have been added

    VectorFunctionSpaces not belonging to a de Rham sequence have (always?) the same Basis in each direction, whereas e.g. a discrete Hcurl space belonging to a de Rham sequence has three different bases for each direction.

    Hence, certain variables are of different length. For example, given a discretization with degree = [3, 3, 3] Bsplines, one might find in one case trial_degrees = (3, 3, 3) and in the other trial_degrees = (2, 3, 3, 3, 2, 3, 3, 3, 2).

    I need to find a way to deal with this issue. In addition, I must include sufficient tests!

    Temporary workaround:

    if (nu == 3) and (len(trial_basis) == 3):
        # VectorFunction not belonging to a de Rham sequence - 3 instead of 9 variables in trial/test_degrees and trial/test_basis
        trial_u_p   = {u:trial_degrees for u in range(nu)}
        global_basis_u  = {u:trial_basis    for u in range(nu)}
    else:
        trial_u_p       = {u:trial_degrees[d*u:d*(u+1)] for u in range(nu)}
        global_basis_u  = {u:trial_basis[d*u:d*(u+1)]    for u in range(nu)}
    if (nv == 3) and (len(test_basis) == 3):
        test_v_p   = {v:test_degrees for v in range(nv)}
        global_basis_v  = {v:test_basis   for v in range(nv)}
        spans = [*spans, *spans, *spans]
    else:
        test_v_p       = {v:test_degrees[d*v:d*(v+1)] for v in range(nv)}
        global_basis_v  = {v:test_basis[d*v:d*(v+1)]   for v in range(nv)}
  5. Remove coupling_term=0 sub_expr <- DONE

    This is an important change to do! Currently, often times when not including a Mapping, we'll find something like this

     coupling_terms_u_v[k_2, q_2, k_3, q_3, 0] = 1
     coupling_terms_u_v[k_2, q_2, k_3, q_3, 1] = 0
     coupling_terms_u_v[k_2, q_2, k_3, q_3, 2] = 0
     coupling_terms_u_v[k_2, q_2, k_3, q_3, 3] = 0
     coupling_terms_u_v[k_2, q_2, k_3, q_3, 4] = 1
     coupling_terms_u_v[k_2, q_2, k_3, q_3, 5] = 0
     coupling_terms_u_v[k_2, q_2, k_3, q_3, 6] = 0
     coupling_terms_u_v[k_2, q_2, k_3, q_3, 7] = 0
     coupling_terms_u_v[k_2, q_2, k_3, q_3, 8] = 1

    in the assembly method. We can save a lot of computations by excluding subexpressions from the assembly method that don't contribute to the global matrix.

  6. Remove not-required global matrices <- DONE

    Related to 5. - At times, mostly when not including Mappings, we are including not just subexpressions that don't contribute to the global matrix, but in the case of vector valued functions, even entire global matrices that remain zero, but potentially still require a lot of computational power. I will need to make sure this doesn't happen.

  7. Enable different multiplicity in DiscreteDerham

    Tests with a multiplicity different from 1 currently fail. Will need to investigate further.

  8. Enable openmp support

  9. Must use different assembly function names

At the moment, the assembly function file is simply called assemble.py. Discretizing multiple BilinearForms before assemblying them hence won't work, because we over-write the assembly file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant