Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensor mismatch on embedding op using bfloat16 weights. #1404

Closed
dgolubovicTT opened this issue Nov 25, 2024 · 2 comments · Fixed by #1633
Closed

Tensor mismatch on embedding op using bfloat16 weights. #1404

dgolubovicTT opened this issue Nov 25, 2024 · 2 comments · Fixed by #1633
Assignees

Comments

@dgolubovicTT
Copy link
Contributor

Running single op embedding test for bfloat16 weights causes tensor mismatch.
However, this can't be reproed on ttnn embedding test.
Here is the ttnn IR of transpose case causing tensor mismatch: test_embedding_bfloat16_data_mismatch_ttnn.txt

Here is the ttnn repro test that passes:

@pytest.mark.parametrize("batch_size", [1])
@pytest.mark.parametrize("sentence_size", [12])
@pytest.mark.parametrize("hidden_embedding_dim", [3200])  # Bert_Num_Cols_768, Llama_Num_Cols
@pytest.mark.parametrize(
    "vocabulary_size", [32000]
)  # Bert_Position_Embeddings_512, Bert_Word_Embeddings_30528, Llama_Position_Embeddings,
@pytest.mark.parametrize("dtype", [ttnn.bfloat16])
@pytest.mark.parametrize("input_mem_config", [ttnn.DRAM_MEMORY_CONFIG])
@pytest.mark.parametrize("output_mem_config", [ttnn.DRAM_MEMORY_CONFIG])
@pytest.mark.parametrize("layout", [ttnn.ROW_MAJOR_LAYOUT])
def test_embedding(
    device,
    batch_size,
    sentence_size,
    hidden_embedding_dim,
    vocabulary_size,
    dtype,
    input_mem_config,
    output_mem_config,
    layout,
):
    torch.manual_seed(1234)

    torch_input_tensor = torch.randint(0, vocabulary_size - 1, (batch_size, sentence_size))
    torch_weights = torch_random((vocabulary_size, hidden_embedding_dim), -0.1, 0.1, dtype=torch.bfloat16)
    torch_output_tensor = torch.nn.functional.embedding(torch_input_tensor, torch_weights)

    input_tensor = ttnn.to_device(ttnn.from_torch(torch_input_tensor), device, memory_config=input_mem_config)
    weights = ttnn.to_device(ttnn.from_torch(torch_weights, dtype=dtype), device, memory_config=input_mem_config)

    output_tensor = ttnn.embedding(input_tensor, weights, memory_config=output_mem_config, layout=layout)
    output_tensor = ttnn.to_torch(output_tensor)

    assert_with_pcc(torch_output_tensor, output_tensor)

Comparing ttnn test and ttnn IR I can't find what is the difference that could cause ttnn test to pass and ttnn IR to fail.
Need help from someone on mlir side @sdjordjevicTT.

Note: Embedding op doesn't support float32 as weights therefore I tried bfloat16 and ran into this.

@dgolubovicTT
Copy link
Contributor Author

As agreed offline I am providing you the ttir and ttnn on latest main of forge.LlamaEmbedding_data_mismatch_bfloat16_ttir.txt
Llama_Embedding_data_mismatch_bfloat16_ttnn.txt

dgolubovicTT added a commit to tenstorrent/tt-forge-fe that referenced this issue Dec 10, 2024
…dd mlir hacks to push the compile to the end. Now embedding hangs in ttnn runtime, which is expected from tenstorrent/tt-mlir#1404
@nsmithtt
Copy link
Contributor

We have a "new" golden flow that we are developing that I think could be used for unit testing ops like this. See: https://github.com/tenstorrent/tt-mlir/blob/main/test/python/golden/test_ttir_ops.py

Embedding op + golden func will have to be added to https://github.com/tenstorrent/tt-mlir/blob/main/python/test_infra/ttir_builder.py

Sync with @ctodTT for questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants