RuntimeError: output with shape [256] doesn't match the broadcast shape [256, 256] #234

ajay-vikram · 2024-07-30T09:58:36Z

I have trained a Recurrent network using an LSTMCell and MLP layers. But when I load the model and the weights for running the benchmark, I get "RuntimeError: output with shape [256] doesn't match the broadcast shape [256, 256]". Tracing it backwards, it originates from the utils.py file on line 291 (out += biases). On printing the shapes of out and biases, I got [256] and [256, 1] respectively. Squeezing out the 2nd dimension from biases resolves the issue, but I am unsure whether there is a mistake with the benchmark code or with how my model is defined. I faced a similar issue on using a GRUCell. Can I please get some help?

jasonlyik · 2024-07-30T13:13:12Z

Hi Ajay, you may be running with the data shaped differently. We expect that the out tensor is shaped [4*hidden_state, batch_size], so I would expect that out should be shaped [256, 1] and not [256].

At benchmark.py:125 (batch_results[m] = self.workload_metrics[m](self.model, preds, data)), can you please check the shape of preds and data? Otherwise, it may be an issue with the hook connected to the RNNCell which tracks inputs.

Also, there is the LSTM example for a different sequence task here which may be helpful.

ajay-vikram · 2024-07-31T04:41:37Z

Hi Jason,
The shapes of pred and data are [256, 2] and ([256, 1, 96], [256, 2]) respectively, where data is a tuple. These are the inputs to my model as well. What shape do you expect as input to the LSTMCell. In my case, a [1, 96] tensor goes to the LSTMCell. This [1, 96] comes from the acc_spikes in the buffering mechanism of the forward pass, similar to the one in primate_example.

jasonlyik · 2024-07-31T05:21:40Z

The shape is reasonable to me, can you check whether your code matches the code block from this previous issue #225? That works with the latest neurobench package 1.0.6, as well as any arbitrary batch size. If there is still issues, please post your code block so we can inspect the error.

ajay-vikram · 2024-07-31T05:31:38Z

Ohh, I see. I didn't get the latest version. How do I get it? Do I run .bumpversion.toml?

jasonlyik · 2024-07-31T05:35:10Z

pip install --upgrade neurobench

or if you are using poetry and a local cloned repo, then simply git pull on main branch

ajay-vikram · 2024-07-31T05:46:28Z

Still getting the same issue. Can you tell which code has been modified. Ill check if the changes have been updated.

jasonlyik · 2024-07-31T05:51:15Z

Changes are listed in #227

Please check if you can successfully run the minimal example from the code block in #225

If there is still an issue, please provide a minimal example of the model definition and harness call which causes the issue.

ajay-vikram · 2024-07-31T05:56:01Z

Yes the minimal example code runs.

Here's my model definition

class LSTM(nn.Module):
    def __init__(self, input_dim):
        super(LSTM, self).__init__()
        self.input_dim = input_dim
        self.output_dim = 2

        self.lstm = nn.LSTMCell(self.input_dim, 64)
        self.fc1 =  nn.Linear(64, 32)
        self.fc2 = nn.Linear(32, 16)
        self.fc3 = nn.Linear(16, self.output_dim)
        self.layernorm0 = nn.LayerNorm(self.input_dim)
        self.layernorm1 = nn.LayerNorm(32)
        self.layernorm2 = nn.LayerNorm(16)
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(0.3)

        self.bin_window_time = 0.2
        self.sampling_rate = 0.004
        self.bin_window_size = int(self.bin_window_time / self.sampling_rate)
        self.register_buffer("data_buffer", torch.zeros(1, self.input_dim).type(torch.float32), persistent=False)
    
    def single_forward(self,x):
        x = x.unsqueeze(0)
        x = self.layernorm0(x)
        (hn, cn) = self.lstm(x)
        out = self.relu(hn)
        out = self.layernorm1(self.relu(self.fc1(out)))
        out = self.dropout(out)
        out = self.layernorm2(self.relu(self.fc2(out)))
        out = self.fc3(out)
        return out

    def forward(self, x):
        predictions = []

        seq_length = x.shape[0]
        for seq in range(seq_length):
            current_seq = x[seq, :, :]
            self.data_buffer = torch.cat((self.data_buffer, current_seq), dim=0)
            if self.data_buffer.shape[0] <= self.bin_window_size:
                predictions.append(torch.zeros(1, self.output_dim).to(x.device))
            else:
                # Only pass input into model when the buffer size == bin_window_size
                if self.data_buffer.shape[0] > self.bin_window_size:
                    self.data_buffer = self.data_buffer[1:, :]

                # Accumulate
                spikes = self.data_buffer.clone()
                acc_spikes = torch.sum(spikes, dim=0)
                pred = self.single_forward(acc_spikes)
                predictions.append(pred)

        predictions = torch.stack(predictions).squeeze(dim=1)
 
        return predictions

ajay-vikram · 2024-07-31T06:02:42Z

This is the benchmark code

import torch
from torch.utils.data import DataLoader, Subset

from neurobench.datasets import PrimateReaching
from neurobench.models.torch_model import TorchModel
from neurobench.benchmarks import Benchmark

from ANN import ANNModel2D
from GRU import GRU
from LSTM import LSTM

all_files = ["indy_20160622_01"]
# all_files = ["indy_20160622_01", "indy_20160630_01", "indy_20170131_02", 
#              "loco_20170210_03", "loco_20170215_02", "loco_20170301_05"]

footprint = []
connection_sparsity = []
activation_sparsity = []
dense = []
macs = []
acs = []
r2 = []

device = torch.device("cuda:1" if torch.cuda.is_available() else "cpu")

for filename in all_files:
    print("Processing {}".format(filename))

    # The dataloader and preprocessor has been combined together into a single class
    data_dir = "/home/satyapreets/Ajay/neurobench/neurobench/data" # data in repo root dir
    dataset = PrimateReaching(file_path=data_dir, filename=filename,
                            num_steps=1, train_ratio=0.5, bin_width=0.004,
                            biological_delay=0, remove_segments_inactive=False)

    test_set_loader = DataLoader(Subset(dataset, dataset.ind_test), batch_size=256, shuffle=False)

    net = LSTM(input_dim=dataset.input_feature_size)
    # net = ANNModel2D(input_dim=dataset.input_feature_size, layer1=32, layer2=48, 
    #                  output_dim=2, bin_window=0.2, drop_rate=0.5)

    net.load_state_dict(torch.load("/home/satyapreets/Ajay/neurobench/mobilenet_training/experiments/vww/submission/lstm_64_indy_20160622_01.pt", map_location=device)['state_dict'])
    # net.load_state_dict(torch.load("./model_data/2D_ANN_Weight/"+filename+"_model_state_dict.pth", map_location=device))

    model = TorchModel(net)

    static_metrics = ["footprint", "connection_sparsity"]
    workload_metrics = ["r2", "activation_sparsity", "synaptic_operations"]

    # Benchmark expects the following:
    benchmark = Benchmark(model, test_set_loader, [], [], [static_metrics, workload_metrics])
    results = benchmark.run(device=device)
    print(results)

    footprint.append(results['footprint'])
    connection_sparsity.append(results['connection_sparsity'])
    activation_sparsity.append(results['activation_sparsity'])
    dense.append(results['synaptic_operations']['Dense'])
    macs.append(results['synaptic_operations']['Effective_MACs'])
    acs.append(results['synaptic_operations']['Effective_ACs'])
    r2.append(results['r2'])

print("Footprint: {}".format(footprint))
print("Connection sparsity: {}".format(connection_sparsity))
print("Activation sparsity: {}".format(activation_sparsity), sum(activation_sparsity)/len(activation_sparsity))
print("Dense: {}".format(dense), sum(dense)/len(dense))
print("MACs: {}".format(macs), sum(macs)/len(macs))
print("ACs: {}".format(acs), sum(acs)/len(acs))
print("R2: {}".format(r2), sum(r2)/len(r2))

# Footprint: [20824, 20824, 20824, 33496, 33496, 33496]
# Connection sparsity: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
# Activation sparsity: [0.7068512007122443, 0.7274494314849341, 0.6142621034584272, 0.6290474755671983, 0.6793054885963405, 0.6963649652600741] 0.6755467775132032
# Dense: [4702.261627687736, 4701.8430499148435, 4699.549582947173, 7773.2197567257945, 7771.01773105288, 7772.632844051291] 6236.754098729952
# MACs: [4306.322415210456, 3595.209672287623, 3607.261044176707, 5851.9819915795315, 5995.014802029395, 6462.786839756449] 4969.76279417336
# ACs: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0] 0.0
# R2: [0.6327020525932312, 0.5241347551345825, 0.6216747164726257, 0.5727078914642334, 0.4745999276638031, 0.6272222995758057] 0.5755069404840469

jasonlyik · 2024-07-31T06:38:17Z

Hi Ajay, I noticed that your LSTMCell forward call does not include the (h, c) in the inputs. Based on the documentation, if these are not included, I believe that the recurrent state of the LSTM is not tracked at all, and essentially the LSTM block is just an MLP-type transform. I may be wrong on this, though.

Regardless, note that all of our other LSTM examples use the forward convention for the LSTMCell hx, cx = rnn(input[i], (hx, cx)), and not just hx, cx = rnn(input[i]).

By making additions to your model definition shown in the below code block, there is no longer a harness runtime error:

class LSTM(nn.Module):
    def __init__(self, input_dim):
        super(LSTM, self).__init__()
        self.input_dim = input_dim
        self.output_dim = 2

        self.lstm = nn.LSTMCell(self.input_dim, 64)
        self.fc1 =  nn.Linear(64, 32)
        self.fc2 = nn.Linear(32, 16)
        self.fc3 = nn.Linear(16, self.output_dim)
        self.layernorm0 = nn.LayerNorm(self.input_dim)
        self.layernorm1 = nn.LayerNorm(32)
        self.layernorm2 = nn.LayerNorm(16)
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(0.3)

        self.bin_window_time = 0.2
        self.sampling_rate = 0.004
        self.bin_window_size = int(self.bin_window_time / self.sampling_rate)
        self.register_buffer("data_buffer", torch.zeros(1, self.input_dim).type(torch.float32), persistent=False)

        self.h = None
        self.c = None
    
    def single_forward(self,x):
        x = x.unsqueeze(0)
        x = self.layernorm0(x)
        self.h, self.c = self.lstm(x, (self.h, self.c))
        out = self.relu(self.h)
        out = self.layernorm1(self.relu(self.fc1(out)))
        out = self.dropout(out)
        out = self.layernorm2(self.relu(self.fc2(out)))
        out = self.fc3(out)
        return out

    def forward(self, x):
        predictions = []

        self.h = torch.zeros(1, 64).to(x.device)
        self.c = torch.zeros(1, 64).to(x.device)

        seq_length = x.shape[0]
        for seq in range(seq_length):
            current_seq = x[seq, :, :]
            self.data_buffer = torch.cat((self.data_buffer, current_seq), dim=0)
            if self.data_buffer.shape[0] <= self.bin_window_size:
                predictions.append(torch.zeros(1, self.output_dim).to(x.device))
            else:
                # Only pass input into model when the buffer size == bin_window_size
                if self.data_buffer.shape[0] > self.bin_window_size:
                    self.data_buffer = self.data_buffer[1:, :]

                # Accumulate
                spikes = self.data_buffer.clone()
                acc_spikes = torch.sum(spikes, dim=0)
                pred = self.single_forward(acc_spikes)
                predictions.append(pred)

        predictions = torch.stack(predictions).squeeze(dim=1)
 
        return predictions

The harness should be able to support the case where (h, c) is not passed into the LSTMCell, so this is still an issue. But I recommend that you include (h, c) in the inputs.

ajay-vikram · 2024-07-31T06:44:36Z

Aah, I see. I read somewhere in the documentation that LSTMs by default initialize their hidden and cell states to a tensor of 0s, that's why I didn't explicitly add it. Thanks a lot!!

ajay-vikram · 2024-07-31T06:50:28Z

Also will I have to retrain my models with these changes incorporated? I just changed the model but passed the same weights I had before the explicit h and c definition and the neurobench benchmarks are running fine.

jasonlyik · 2024-07-31T06:54:14Z

My guess is that you will need to retrain the model, as it is now tracking recurrent state and it wasn't before. I suggest that you take out all of the metrics except the R2 workload metric and first verify you are getting the expected accuracy before considering the compute complexity.

ajay-vikram · 2024-07-31T06:55:26Z

Alright thanks a lot!

jasonlyik · 2024-07-31T14:01:29Z

TODO: support synops for RNNCells which do not use recurrent input

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: output with shape [256] doesn't match the broadcast shape [256, 256] #234

RuntimeError: output with shape [256] doesn't match the broadcast shape [256, 256] #234

ajay-vikram commented Jul 30, 2024

jasonlyik commented Jul 30, 2024

ajay-vikram commented Jul 31, 2024

jasonlyik commented Jul 31, 2024

ajay-vikram commented Jul 31, 2024

jasonlyik commented Jul 31, 2024 •

edited

Loading

ajay-vikram commented Jul 31, 2024

jasonlyik commented Jul 31, 2024

ajay-vikram commented Jul 31, 2024 •

edited

Loading

ajay-vikram commented Jul 31, 2024

jasonlyik commented Jul 31, 2024

ajay-vikram commented Jul 31, 2024

ajay-vikram commented Jul 31, 2024 •

edited

Loading

jasonlyik commented Jul 31, 2024

ajay-vikram commented Jul 31, 2024

jasonlyik commented Jul 31, 2024

RuntimeError: output with shape [256] doesn't match the broadcast shape [256, 256] #234

RuntimeError: output with shape [256] doesn't match the broadcast shape [256, 256] #234

Comments

ajay-vikram commented Jul 30, 2024

jasonlyik commented Jul 30, 2024

ajay-vikram commented Jul 31, 2024

jasonlyik commented Jul 31, 2024

ajay-vikram commented Jul 31, 2024

jasonlyik commented Jul 31, 2024 • edited Loading

ajay-vikram commented Jul 31, 2024

jasonlyik commented Jul 31, 2024

ajay-vikram commented Jul 31, 2024 • edited Loading

ajay-vikram commented Jul 31, 2024

jasonlyik commented Jul 31, 2024

ajay-vikram commented Jul 31, 2024

ajay-vikram commented Jul 31, 2024 • edited Loading

jasonlyik commented Jul 31, 2024

ajay-vikram commented Jul 31, 2024

jasonlyik commented Jul 31, 2024

jasonlyik commented Jul 31, 2024 •

edited

Loading

ajay-vikram commented Jul 31, 2024 •

edited

Loading

ajay-vikram commented Jul 31, 2024 •

edited

Loading