Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add UL2 data sampling and pretraining #358

Open
wants to merge 122 commits into
base: main
Choose a base branch
from

Commits on Dec 13, 2022

  1. Configuration menu
    Copy the full SHA
    b2fc665 View commit details
    Browse the repository at this point in the history
  2. Allow passing existing casual attention masks

    Since we create them in the T5 data loader, why not use them?
    janEbert committed Dec 13, 2022
    Configuration menu
    Copy the full SHA
    13becf1 View commit details
    Browse the repository at this point in the history
  3. Refactor masked LM sampling style selection

    Handles backward-compatibility, so the rest of the code base does not
    need to change.
    janEbert committed Dec 13, 2022
    Configuration menu
    Copy the full SHA
    7f50532 View commit details
    Browse the repository at this point in the history
  4. Add more masked LM sampling styles

    Namely sampling from uniform and normal distributions.
    janEbert committed Dec 13, 2022
    Configuration menu
    Copy the full SHA
    d8db189 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    006c4e9 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    f802317 View commit details
    Browse the repository at this point in the history
  7. Refactor span merging

    janEbert committed Dec 13, 2022
    Configuration menu
    Copy the full SHA
    deed87f View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    728e076 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    42ece6b View commit details
    Browse the repository at this point in the history

Commits on Dec 14, 2022

  1. Add custom exceptions

    ... which also improve error messages.
    janEbert committed Dec 14, 2022
    Configuration menu
    Copy the full SHA
    d18f84e View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    fa5aa68 View commit details
    Browse the repository at this point in the history
  3. Remove additional sequence truncation

    Instead, the user should choose a larger maximum sequence length, which
    an error warns them about.
    janEbert committed Dec 14, 2022
    Configuration menu
    Copy the full SHA
    c7d8a8b View commit details
    Browse the repository at this point in the history
  4. Prefer array-from-list creation

    Instead of concatenating arrays and lists to get a certain dtype.
    janEbert committed Dec 14, 2022
    Configuration menu
    Copy the full SHA
    c722516 View commit details
    Browse the repository at this point in the history

Commits on Jan 2, 2023

  1. Remove redundant imports

    janEbert committed Jan 2, 2023
    Configuration menu
    Copy the full SHA
    69f6e70 View commit details
    Browse the repository at this point in the history

Commits on Jan 3, 2023

  1. Fix not inserting prefixes

    For small sequence lengths or low probability/mean ngram values, we
    could get `max_ngrams` < 1 and `max_predictions_per_seq` < 1, causing no
    masking to be done.
    janEbert committed Jan 3, 2023
    Configuration menu
    Copy the full SHA
    f08a104 View commit details
    Browse the repository at this point in the history
  2. Do not insert extra_id tokens for PrefixLM task

    Now same as in the UL2 paper code snippet.
    janEbert committed Jan 3, 2023
    Configuration menu
    Copy the full SHA
    d2fd03e View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    daf52cc View commit details
    Browse the repository at this point in the history
  4. Skip redundant computations

    janEbert committed Jan 3, 2023
    Configuration menu
    Copy the full SHA
    04be590 View commit details
    Browse the repository at this point in the history
  5. Fix PrefixLM mean location

    janEbert committed Jan 3, 2023
    Configuration menu
    Copy the full SHA
    7bc5a87 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    775e99d View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    538c30b View commit details
    Browse the repository at this point in the history

Commits on Jan 23, 2023

  1. Configuration menu
    Copy the full SHA
    ba4476c View commit details
    Browse the repository at this point in the history
  2. Fix max_ngrams for normal sampling style

    Since the normal distribution is unbounded, we cannot have `max_ngrams`
    set to a bounded value.
    janEbert committed Jan 23, 2023
    Configuration menu
    Copy the full SHA
    678fbdc View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    00479e5 View commit details
    Browse the repository at this point in the history
  4. Calculate and use amount of filtered tokens

    Filtered means not `cls_id` or `sep_id` tokens. This slightly improves
    calculated statistics for long sequences and greatly for very short
    sequences.
    janEbert committed Jan 23, 2023
    Configuration menu
    Copy the full SHA
    795caef View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    689e15f View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    e44d0e4 View commit details
    Browse the repository at this point in the history

Commits on Jan 24, 2023

  1. Configuration menu
    Copy the full SHA
    075f05f View commit details
    Browse the repository at this point in the history
  2. Calculate n-gram indices lazily

    Usually we do not iterate through all indices, so we can save quite some
    time if `max_ngrams` is large.
    janEbert committed Jan 24, 2023
    Configuration menu
    Copy the full SHA
    6bc7471 View commit details
    Browse the repository at this point in the history
  3. Fix code style

    janEbert committed Jan 24, 2023
    Configuration menu
    Copy the full SHA
    a105f32 View commit details
    Browse the repository at this point in the history
  4. Prefer list comprehensions

    janEbert committed Jan 24, 2023
    Configuration menu
    Copy the full SHA
    f0fe282 View commit details
    Browse the repository at this point in the history

Commits on Feb 14, 2023

  1. Allow recognizing when UL2 is used

    Via an extra "private" argument.
    janEbert committed Feb 14, 2023
    Configuration menu
    Copy the full SHA
    11bd6db View commit details
    Browse the repository at this point in the history
  2. Support UL2 tokens for all tokenizers

    The GPT tokenizer does not handle the difference between UL2 tokens and
    other special tokens well. This should be fine as UL2 tokens being
    distinct from other special tokens is never assumed at the
    moment (although other tokenizers implement it like that). In general,
    `additional_special_token_ids` is new for the GPT tokenizer, so there is
    no backward compatibility trouble.
    janEbert committed Feb 14, 2023
    Configuration menu
    Copy the full SHA
    43eee93 View commit details
    Browse the repository at this point in the history
  3. Support <extra_id> tokens for GPT tokenizer

    With this, we also adjust the `additional_special_token_ids` to only
    return extra ID tokens.
    janEbert committed Feb 14, 2023
    Configuration menu
    Copy the full SHA
    6686f04 View commit details
    Browse the repository at this point in the history
  4. Fix tokenizer vocab access

    janEbert committed Feb 14, 2023
    Configuration menu
    Copy the full SHA
    f6128c6 View commit details
    Browse the repository at this point in the history
  5. Revert inheriting from T5Dataset

    Personally, this makes the model more holistic and we never inherited
    correctly anyway, changing the public API. Finally, this allows usage of
    tokenizers without `cls_id`, which was previously redundantly queried
    due to the mentioned incorrect inheritance.
    
    Finally, the inheritance never saved much repetition to begin with.
    janEbert committed Feb 14, 2023
    Configuration menu
    Copy the full SHA
    8f48763 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    7f99a12 View commit details
    Browse the repository at this point in the history
  7. Do inherit from torch.utils.data.Dataset

    Removing all inheritance from the class was a bit too eager.
    janEbert committed Feb 14, 2023
    Configuration menu
    Copy the full SHA
    535a306 View commit details
    Browse the repository at this point in the history
  8. Add whitespace

    For readability.
    janEbert committed Feb 14, 2023
    Configuration menu
    Copy the full SHA
    db623b3 View commit details
    Browse the repository at this point in the history
  9. Allow selectively disabling denoiser token

    Could make sense in the future to even allow different tokens for same
    denoising objectives. (E.g. one R-denoiser has token `[R]`, other
    R-denoiser has `[R+]`.)
    janEbert committed Feb 14, 2023
    Configuration menu
    Copy the full SHA
    ef72280 View commit details
    Browse the repository at this point in the history
  10. Allow not replacing masks with sentinel tokens

    Backward-compatible since passing `sentinel_tokens=None` would have
    resulted in an error previously.
    janEbert committed Feb 14, 2023
    Configuration menu
    Copy the full SHA
    001b50c View commit details
    Browse the repository at this point in the history
  11. Support not adding mask tokens in span corruption

    Backward-incompatible change as we put this before an existing
    positional argument.
    janEbert committed Feb 14, 2023
    Configuration menu
    Copy the full SHA
    23c052f View commit details
    Browse the repository at this point in the history

Commits on Feb 15, 2023

  1. Fix expected number of added tokens

    Was wrong for decoder-only case.
    janEbert committed Feb 15, 2023
    Configuration menu
    Copy the full SHA
    0f4fd3f View commit details
    Browse the repository at this point in the history

Commits on Feb 16, 2023

  1. Fix non-masked data

    Previously the model didn't know _where_ the data was actually inserted.
    Now it repeats the input sequence and inserts the masked data in the
    correct place. See example in Fig. 1 of AlexaTM 20B
    paper (arXiv/2208.01448).
    janEbert committed Feb 16, 2023
    Configuration menu
    Copy the full SHA
    da1f4e9 View commit details
    Browse the repository at this point in the history
  2. Fix unclear wording

    This wording was confusing and basically stated the wrong thing. The
    number/amount of n-grams is not bounded by `max_ngrams`, even though the
    variable name sounds like it. Instead, `max_ngrams` bounds n.
    janEbert committed Feb 16, 2023
    Configuration menu
    Copy the full SHA
    55320ea View commit details
    Browse the repository at this point in the history

Commits on Feb 17, 2023

  1. Adjust code style

    It's just too ugly to leave it like the original.
    janEbert committed Feb 17, 2023
    Configuration menu
    Copy the full SHA
    5d27b27 View commit details
    Browse the repository at this point in the history
  2. Fix covered index skipping

    janEbert committed Feb 17, 2023
    Configuration menu
    Copy the full SHA
    23181ab View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    6032cc6 View commit details
    Browse the repository at this point in the history
  4. Automatically truncate sequences for decoder-only

    Expecting the user to supply a sequence length greater than any data
    point is ridiculous. So now we greedily truncate the sequence based on
    the maximum amount of `extra_id`s, which wastes a lot of data. An
    alternative would be going a statistical route with significance
    attached to it; allowing the expected amount of tokens with some leeway,
    while handling an unlikely length excession error.
    
    This only handles the decoder-only case, while the encoder-decoder case
    is left as is. This is because errors are much less like for the
    encoder-decoder case unless massive corruption is configured or if the
    decoder has a smaller sequence length than the encoder.
    janEbert committed Feb 17, 2023
    Configuration menu
    Copy the full SHA
    c9c336f View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    b8003cb View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    e3d91a6 View commit details
    Browse the repository at this point in the history
  7. Refactor getting sample

    janEbert committed Feb 17, 2023
    Configuration menu
    Copy the full SHA
    e61e78f View commit details
    Browse the repository at this point in the history
  8. Add sample packing to T5 dataset

    Backward-incompatible change due to positional argument without default,
    inserted before another positional argument.
    janEbert committed Feb 17, 2023
    Configuration menu
    Copy the full SHA
    c3b0a55 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    c4d748b View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    689b57e View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    af204e7 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    78eb035 View commit details
    Browse the repository at this point in the history
  13. Fix T5 dataset packing

    Forgot to apply fixes here.
    janEbert committed Feb 17, 2023
    Configuration menu
    Copy the full SHA
    c03eed4 View commit details
    Browse the repository at this point in the history

Commits on Feb 22, 2023

  1. Refactor get_sample to return a list

    Accordingly, rename to `get_samples`.
    janEbert committed Feb 22, 2023
    Configuration menu
    Copy the full SHA
    9e84f06 View commit details
    Browse the repository at this point in the history
  2. Fix T5 sample packing

    janEbert committed Feb 22, 2023
    Configuration menu
    Copy the full SHA
    5e2b4f5 View commit details
    Browse the repository at this point in the history
  3. Fix UL2 sample packing

    janEbert committed Feb 22, 2023
    Configuration menu
    Copy the full SHA
    e2a0c36 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    c2884c8 View commit details
    Browse the repository at this point in the history

Commits on Feb 23, 2023

  1. Fix desired seq length

    Now we won't exceed the desired seq length.
    janEbert committed Feb 23, 2023
    Configuration menu
    Copy the full SHA
    7eb7923 View commit details
    Browse the repository at this point in the history
  2. Fix padding removal

    janEbert committed Feb 23, 2023
    Configuration menu
    Copy the full SHA
    dd4c0d0 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    58148f8 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    c41fecd View commit details
    Browse the repository at this point in the history
  5. Refactor sample packing functions

    Just pull them out of the other ones (and add separating whitespace/join
    lines).
    janEbert committed Feb 23, 2023
    Configuration menu
    Copy the full SHA
    057bb47 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    e2062b7 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    d31b89f View commit details
    Browse the repository at this point in the history

Commits on Feb 24, 2023

  1. Fix GPT tokenizer vocab size query

    Did not include additional special tokens.
    janEbert committed Feb 24, 2023
    Configuration menu
    Copy the full SHA
    17dca4f View commit details
    Browse the repository at this point in the history
  2. Handle possibly empty list

    janEbert committed Feb 24, 2023
    Configuration menu
    Copy the full SHA
    bf9b1eb View commit details
    Browse the repository at this point in the history

Commits on Feb 27, 2023

  1. Fix no newline at EOF

    janEbert committed Feb 27, 2023
    Configuration menu
    Copy the full SHA
    c4aa4cd View commit details
    Browse the repository at this point in the history
  2. Allow full prefix Prefix-LM attention sampling

    Useful for evaluation.
    janEbert committed Feb 27, 2023
    Configuration menu
    Copy the full SHA
    8d7a0df View commit details
    Browse the repository at this point in the history
  3. Support PrefixLM models

    janEbert committed Feb 27, 2023
    Configuration menu
    Copy the full SHA
    9bd6e1e View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    ba4ab49 View commit details
    Browse the repository at this point in the history
  5. Update task/dataset name

    "lambada" was renamed to "lambada_openai" in the upstream
    lm-eval-harness repo.
    janEbert committed Feb 27, 2023
    Configuration menu
    Copy the full SHA
    9f53171 View commit details
    Browse the repository at this point in the history

Commits on Feb 28, 2023

  1. Do not remove last token

    This corrupts the targets. There is no good reason for this.
    janEbert committed Feb 28, 2023
    Configuration menu
    Copy the full SHA
    5b63d0b View commit details
    Browse the repository at this point in the history
  2. Fix PrefixLM contexts

    Previously we always gave the whole sequence as context, when it also
    includes the answer. This is obviously not desired. We only want to give
    enough context to reach the answer.
    janEbert committed Feb 28, 2023
    Configuration menu
    Copy the full SHA
    639b71d View commit details
    Browse the repository at this point in the history
  3. Fix module refactor

    These models have moved into DeepSpeed but were never probably replaced
    here after they have been removed.
    janEbert committed Feb 28, 2023
    Configuration menu
    Copy the full SHA
    127d1e4 View commit details
    Browse the repository at this point in the history
  4. Fix possible TypeError

    When indexing into `False` or `None`.
    janEbert committed Feb 28, 2023
    Configuration menu
    Copy the full SHA
    1bb788d View commit details
    Browse the repository at this point in the history
  5. Optionally add prefix tokens

    janEbert committed Feb 28, 2023
    Configuration menu
    Copy the full SHA
    cf5965a View commit details
    Browse the repository at this point in the history
  6. Automatically add UL2 tokens

    At worst, these may be mapped to the wrong tokens. However, the chance
    that the amount of unknown tokens are as many or fewer than the few UL2
    tokens is very low. And if there are more unknown tokens than UL2
    tokens, we'll get errors.
    janEbert committed Feb 28, 2023
    Configuration menu
    Copy the full SHA
    a538238 View commit details
    Browse the repository at this point in the history

Commits on Mar 1, 2023

  1. Configuration menu
    Copy the full SHA
    3a8bc35 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    6f0e33a View commit details
    Browse the repository at this point in the history

Commits on Mar 2, 2023

  1. Configuration menu
    Copy the full SHA
    9c4c718 View commit details
    Browse the repository at this point in the history

Commits on Mar 7, 2023

  1. Add xPos embeddings

    janEbert committed Mar 7, 2023
    Configuration menu
    Copy the full SHA
    754cf21 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    08b0eaf View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    15622d2 View commit details
    Browse the repository at this point in the history

Commits on Mar 9, 2023

  1. Configuration menu
    Copy the full SHA
    e5a6169 View commit details
    Browse the repository at this point in the history
  2. Add T5-style GLU layers

    janEbert committed Mar 9, 2023
    Configuration menu
    Copy the full SHA
    d583fe9 View commit details
    Browse the repository at this point in the history
  3. Rename xPos embedding class

    `XPos` → `XPosEmbedding`
    janEbert committed Mar 9, 2023
    Configuration menu
    Copy the full SHA
    ad7de7e View commit details
    Browse the repository at this point in the history
  4. Integrate xPos embedding

    janEbert committed Mar 9, 2023
    Configuration menu
    Copy the full SHA
    81a68f7 View commit details
    Browse the repository at this point in the history
  5. Handle xPos embedding

    janEbert committed Mar 9, 2023
    Configuration menu
    Copy the full SHA
    46e145d View commit details
    Browse the repository at this point in the history
  6. Do not use bias for 2nd MLP layer if using T5 GLU

    As in the T5 codebase. This could have highly detrimental effects on
    performance of TorchScript cannot easily type-dispatch the
    `bias_dropout_add` function.
    janEbert committed Mar 9, 2023
    Configuration menu
    Copy the full SHA
    482f0ea View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    4385f7b View commit details
    Browse the repository at this point in the history
  8. Refactor samples dict creation

    More code reuse, change some methods to functions and change their
    visibility.
    janEbert committed Mar 9, 2023
    Configuration menu
    Copy the full SHA
    2d24b13 View commit details
    Browse the repository at this point in the history
  9. Move callees under caller

    For readability.
    janEbert committed Mar 9, 2023
    Configuration menu
    Copy the full SHA
    bd461f5 View commit details
    Browse the repository at this point in the history

Commits on Mar 10, 2023

  1. Handle empty context

    janEbert committed Mar 10, 2023
    Configuration menu
    Copy the full SHA
    35b2956 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    f0171e0 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    92158d8 View commit details
    Browse the repository at this point in the history
  4. Make T5 GLU checks safer

    janEbert committed Mar 10, 2023
    Configuration menu
    Copy the full SHA
    3b7692f View commit details
    Browse the repository at this point in the history

Commits on Mar 20, 2023

  1. Improve import code style

    janEbert committed Mar 20, 2023
    Configuration menu
    Copy the full SHA
    b37d3ee View commit details
    Browse the repository at this point in the history
  2. Refactor dummy barriers

    janEbert committed Mar 20, 2023
    Configuration menu
    Copy the full SHA
    5959e89 View commit details
    Browse the repository at this point in the history
  3. Refactor file name creation

    janEbert committed Mar 20, 2023
    Configuration menu
    Copy the full SHA
    ce8c1a5 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    3e52966 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    23efa88 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    88eb98a View commit details
    Browse the repository at this point in the history

Commits on Mar 21, 2023

  1. Configuration menu
    Copy the full SHA
    59e8451 View commit details
    Browse the repository at this point in the history

Commits on Mar 24, 2023

  1. Speed up packed dataset indexing

    By pre-allocating more data.
    janEbert committed Mar 24, 2023
    Configuration menu
    Copy the full SHA
    24d46ff View commit details
    Browse the repository at this point in the history

Commits on Apr 3, 2023

  1. Configuration menu
    Copy the full SHA
    600542d View commit details
    Browse the repository at this point in the history

Commits on Apr 4, 2023

  1. Fix xPos embedding

    janEbert committed Apr 4, 2023
    Configuration menu
    Copy the full SHA
    58831d2 View commit details
    Browse the repository at this point in the history

Commits on Apr 13, 2023

  1. Fix padding loss mask

    janEbert committed Apr 13, 2023
    Configuration menu
    Copy the full SHA
    fe45cea View commit details
    Browse the repository at this point in the history
  2. Handle failure mode regarding non-DS checkpoints

    DS = DeepSpeed
    
    No idea why this happens, I couldn't explain it after briefly looking
    into the DeepSpeed source.
    janEbert committed Apr 13, 2023
    Configuration menu
    Copy the full SHA
    15e7b98 View commit details
    Browse the repository at this point in the history

Commits on Jun 7, 2023

  1. Configuration menu
    Copy the full SHA
    ae45a9e View commit details
    Browse the repository at this point in the history
  2. Omit second objective token if without mask tokens

    That is, the reproduced objective token.
    janEbert committed Jun 7, 2023
    Configuration menu
    Copy the full SHA
    0c91b96 View commit details
    Browse the repository at this point in the history
  3. Fix NumPy deprecations

    janEbert committed Jun 7, 2023
    Configuration menu
    Copy the full SHA
    0c246c4 View commit details
    Browse the repository at this point in the history

Commits on Jun 26, 2023

  1. Fix supplied arguments

    Was missing `max_seq_length_dec`.
    janEbert committed Jun 26, 2023
    Configuration menu
    Copy the full SHA
    7ce8635 View commit details
    Browse the repository at this point in the history
  2. Do not add separator if S-denoising

    This was already the case for encoder-decoders, but is now also the case
    for decoder-only models.
    janEbert committed Jun 26, 2023
    Configuration menu
    Copy the full SHA
    7290181 View commit details
    Browse the repository at this point in the history
  3. Fix caching error

    janEbert committed Jun 26, 2023
    Configuration menu
    Copy the full SHA
    628d847 View commit details
    Browse the repository at this point in the history

Commits on Jun 29, 2023

  1. Configuration menu
    Copy the full SHA
    9c727e7 View commit details
    Browse the repository at this point in the history
  2. Do not automatically add <EOS> token when packing

    This also fixes problems with decoder-only attention masks.
    janEbert committed Jun 29, 2023
    Configuration menu
    Copy the full SHA
    4ffa951 View commit details
    Browse the repository at this point in the history
  3. Allow silently ignoring causal attention mask

    When using the custom fused softmax kernel.
    janEbert committed Jun 29, 2023
    Configuration menu
    Copy the full SHA
    ff5787e View commit details
    Browse the repository at this point in the history