Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix embedding tests that were previously failing #1128

Merged
merged 1 commit into from
Nov 4, 2024
Merged

Conversation

jnie-TT
Copy link
Contributor

@jnie-TT jnie-TT commented Oct 31, 2024

TTNN embedding requires bf16 and ROW_MAJOR input tensors, so added embeddingOp as a case in shouldForceRowMajor and updated all embedding tests to use bf16. Re-enabled embedding silicon tests.

Also made embeddingOp dps to match ttnn implementation.

FYI @ddilbazTT

Copy link
Contributor

@tapspatel tapspatel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add an embedding test under test/Silicon/TTNN/perf_unit? Other than that, looks great!

@jnie-TT jnie-TT force-pushed the jnie/embedding_fix branch from 323f10c to d2c8cf5 Compare November 1, 2024 01:59
@jnie-TT jnie-TT force-pushed the jnie/embedding_fix branch from d2c8cf5 to e798a17 Compare November 1, 2024 20:42
Comment on lines 20 to +31
std::optional<int> padToken = std::nullopt;
::tt::tt_metal::Layout layout = ::ttnn::ROW_MAJOR_LAYOUT;
::tt::tt_metal::Layout layout = utils::isTilized(op->out())
? ::ttnn::TILE_LAYOUT
: ::ttnn::ROW_MAJOR_LAYOUT;
auto embeddingsType = ::ttnn::operations::embedding::EmbeddingsType::GENERIC;
::ttnn::DataType outputDataType = utils::getDataType(op->output());
::ttnn::DataType outputDataType = utils::getDataType(op->out());
::ttnn::MemoryConfig outputMemoryConfig =
utils::createMemoryConfig(op->output());
utils::createMemoryConfig(op->out());
::ttnn::Tensor out =
::ttnn::embedding(input, weight, padToken, layout, embeddingsType,
outputDataType, outputMemoryConfig);
tensorPool.insert_or_assign(op->output()->global_id(), out);
tensorPool.insert_or_assign(op->out()->global_id(), out);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh this is an interesting problem we didn't foresee... So in order to supply the destination arg, we need to supply all the other optional parameters?

In the TTNN Defaults Design Doc you mentioned that

In TTNN they include all explicit layout info in the parameters, but if the optional dps output tensor is provided, all the explicit parameters get ignored and they extract all information from the output tensor.

Now I understand what you were talking about... We should raise this with them, seems like bad API design. Dest arg and other optional args should live in 2 separate overloads.

@sdjordjevicTT @nsmithtt you guys have regular syncs with ttnn folks? Is this something that we could ask them to take care of?

Copy link
Contributor Author

@jnie-TT jnie-TT Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@svuckovicTT yeah exactly. Seems like this is inconsistent as well. Looking at binary.cpp, we ignore memory_config if we pass in the optional output tensor (this is the point I was making in the document):

inline Tensor binary_impl(
    uint8_t queue_id,
    BinaryOpType binary_op_type,
    const ttnn::Tensor &input_tensor,
    const float scalar,
    const std::optional<ttnn::MemoryConfig> &memory_config = std::nullopt,
    const std::optional<Tensor> &optional_output_tensor = std::nullopt) {
    // output_memory_config is extracted from optional_output_tensor when possible
    auto output_memory_config = optional_output_tensor.has_value()
                                    ? optional_output_tensor.value().memory_config()
                                    : memory_config.value_or(input_tensor.memory_config());

However in embedding, we ignore the optional_output_tensor completely:

// optional_output_tensor is unused in this function
struct EmbeddingOperation {
    static inline Tensor invoke(
        uint8_t queue_id,
        const Tensor& input_tensor_arg,
        const Tensor& weight_arg,
        const std::optional<int>& pad_token = std::nullopt,
        const Layout& layout = ttnn::ROW_MAJOR_LAYOUT,
        EmbeddingsType embeddings_type = EmbeddingsType::GENERIC,
        const std::optional<const DataType> dtype = std::nullopt,
        const std::optional<MemoryConfig>& memory_config = std::nullopt,
        std::optional<Tensor> optional_output_tensor = std::nullopt) {
        if (pad_token.has_value()) {
            embeddings_type = EmbeddingsType::PADDED;
        }

        auto hidden_embedding_dim = weight_arg.get_shape()[-1];
        auto padded_hidden_embedding_dim = weight_arg.get_shape().with_tile_padding()[-1];
        auto weight = ttnn::unsqueeze_to_4D(weight_arg);

        auto batch_size = input_tensor_arg.get_shape()[0];
        auto sentence_size = input_tensor_arg.get_shape()[-1];
        auto input_tensor =
            ttnn::reshape(input_tensor_arg, ttnn::SimpleShape{std::array<uint32_t, 4>{batch_size, 1, 1, sentence_size}});

        bool tilized = layout == ttnn::TILE_LAYOUT;
        auto embeddings = operation::run(
                              Embeddings{
                                  .output_mem_config = memory_config.value_or(input_tensor.memory_config()),
                                  .tilized = tilized,
                                  .embeddings_type = embeddings_type,
                                  .pad_token = pad_token,
                                  .output_dtype = dtype.value_or(weight.get_dtype())},
                              {input_tensor, weight})
                              .at(0);
        embeddings = ttnn::reshape(
            embeddings, ttnn::SimpleShape{std::array<uint32_t, 3>{batch_size, sentence_size, hidden_embedding_dim}});
        return embeddings;
    }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I understand what you were talking about... We should raise this with them, seems like bad API design. Dest arg and other optional args should live in 2 separate overloads.

So my understanding is that you either supply 1 or 2:

  1. Explicit parameters, like output_dtype / output_memory_config / etc.
  2. Output tensor (DPS style)

If you supply 2, then 1 are ignored, this makes sense to me, what are they supposed to do if you supply both?

Copy link
Contributor Author

@jnie-TT jnie-TT Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nsmithtt so in eltwise binary it ignores 1. and uses everything in 2. if you provide both.
However looking at embedding, it completely ignores 2. whether or not you provide it and completely uses 1.
So it's inconsistent across ops currently. And my other question in the document was that would we want to model our ops like this as well? Since the current goal is to match the modelling of ttnn. Currently alot of our ops (for example eltwise binary) only provides 2. and doesn't have the option to provide 1 in the tablegen/flatbuffer schemas.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we probably want to model it how TTNN does, although that does feel a bit cumbersome. Would like to get thoughts from @sdjordjevicTT and @svuckovicTT, I'm not sure how else we could model it though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nsmithtt I agree, it makes the parameters explicit, while there's also the need for the tensor to have this information because the consumer op would likely need it. I guess a note here would be that if we ever force anything, we need to change both the op params and the output tensor. We hit an issue previously in the compiler that when forcing row_major/tile we updated the op params but not the tensor info.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my understanding is that you either supply 1 or 2

AFAIK, in C++, you can't supply just 2 - if you want to supply 2, you need to supply 1 as well, given they're both (all) optional. All the non-dest optional parameters come before the optional dest param. This makes it cumbersome to use DPS, because now you need to create random parameters that are not going to be used. (Sometimes.)

Having 2 separate overloads here is the solution, though I don't expect we'll get that anytime soon, if ever. Can we talk to them and see if they're willing to canonicalize to respecting dest param's properties? That would make it easier for us to not have to specialize each op.

Otherwise, I don't see us being able to do anything besides modelling what TTNN does.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, in C++, you can't supply just 2 - if you want to supply 2, you need to supply 1 as well, given they're both (all) optional. All the non-dest optional parameters come before the optional dest param. This makes it cumbersome to use DPS, because now you need to create random parameters that are not going to be used. (Sometimes.)

Not sure I'm following, if you had API:

Tensor add(Tensor a, Tensor b, optional<DType> output_dtype = nullopt, optional<Tensor> out = nullopt);

Then for each case we'd do respectively:

  1. add(a, b, DType::Float16, nullopt);
  2. add(a, b, nullopt, out);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I'm following

Of course, it's because I'm not making any sense... While I was saying optional parameters, I was thinking default parameters, disregarding that optional params can be supplied with nullopts. Sincere apologies for wasting time!

I think we probably want to model it how TTNN does, although that does feel a bit cumbersome.

I'm partial to us modelling TTNN accurately as well. It'll simplify runtimes, no need to work around what was supplied and what wasn't - just "copy-paste" parameters from the IR to the API call (have some default constructors for nullopts, etc.). @sdjordjevicTT anything to add?

I was already thinking of adapting an op here and there for emitc path, just to make things smoother, wondering if it's okay if we roll this out on a per-op, need-to-have basis.

@jnie-TT jnie-TT merged commit 6988418 into main Nov 4, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants