[manos branch] DGL Node Dataloader is not Compatible with Stratified Sampling #1

manoskary · 2021-08-20T10:56:25Z

NodeDataLoader results to error when inhereting Stratified Sampler from pytorch Dataloader.
It seems to be an internal conflict from the dgl dataloader.

To reproduce the error in [manos] branch go to src/models/rcgn-homo and run:

python entity_classify_mp.py --dataset cora --num-of-epochs 30 --gpu -1

The sampler class which creates the issue :

import torch
from sklearn.model_selection import StratifiedKFold

class StratifiedSampler:
    """Stratified batch sampling
    Provides equal representation of target classes in each batch
    """
    def __init__(self, y, batch_size, shuffle=True):
        if torch.is_tensor(y):
            y = y.numpy()
        assert len(y.shape) == 1, 'label array must be 1D'
        n_batches = int(len(y) / batch_size)
        self.skf = StratifiedKFold(n_splits=n_batches, shuffle=shuffle)
        self.X = torch.randn(len(y),1).numpy()
        self.y = y
        self.shuffle = shuffle

    def __iter__(self):
        if self.shuffle:
            self.skf.random_state = torch.randint(0,int(1e8),size=()).item()
        for train_idx, test_idx in self.skf.split(self.X, self.y):
            yield test_idx

    def __len__(self):
        return len(self.y)

The last produced error :

python entity_classify_mp.py --dataset cora --num-epochs 100 --gpu -1 --inductive --batch-size 40

  NumNodes: 2708
  NumEdges: 10556
  NumFeats: 1433
  NumClasses: 7
  NumTrainingSamples: 140
  NumValidationSamples: 500
  NumTestSamples: 1000
Done loading data from cached files.
torch.Size([140])
(140,)
Traceback (most recent call last):
  File "entity_classify_mp.py", line 249, in <module>
    run(args, device, data)
  File "entity_classify_mp.py", line 123, in run
    for step, (input_nodes, seeds, blocks) in enumerate(dataloader):
  File "C:\Users\melki\Desktop\JKU\codes\musym-GDL\env\lib\site-packages\dgl\dataloading\pytorch\dataloader.py", line 322, in __next__
    result_ = next(self.iter_)
  File "C:\Users\melki\Desktop\JKU\codes\musym-GDL\env\lib\site-packages\torch\utils\data\dataloader.py", line 517, in __next__
    data = self._next_data()
  File "C:\Users\melki\Desktop\JKU\codes\musym-GDL\env\lib\site-packages\torch\utils\data\dataloader.py", line 1199, in _next_data
    return self._process_data(data)
  File "C:\Users\melki\Desktop\JKU\codes\musym-GDL\env\lib\site-packages\torch\utils\data\dataloader.py", line 1225, in _process_data
    data.reraise()
  File "C:\Users\melki\Desktop\JKU\codes\musym-GDL\env\lib\site-packages\torch\_utils.py", line 429, in reraise
    raise self.exc_type(msg)
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "C:\Users\melki\Desktop\JKU\codes\musym-GDL\env\lib\site-packages\torch\utils\data\_utils\worker.py", line 202, in _worker_loop
    data = fetcher.fetch(index)
  File "C:\Users\melki\Desktop\JKU\codes\musym-GDL\env\lib\site-packages\torch\utils\data\_utils\fetch.py", line 47, in fetch
    return self.collate_fn(data)
  File "C:\Users\melki\Desktop\JKU\codes\musym-GDL\env\lib\site-packages\dgl\dataloading\pytorch\dataloader.py", line 280, in collate
    result = super().collate(items)
  File "C:\Users\melki\Desktop\JKU\codes\musym-GDL\env\lib\site-packages\dgl\dataloading\dataloader.py", line 453, in collate
    items = _prepare_tensor(self.g, items, 'items', self._is_distributed)
  File "C:\Users\melki\Desktop\JKU\codes\musym-GDL\env\lib\site-packages\dgl\dataloading\dataloader.py", line 369, in _prepare_tensor
    return F.tensor(data) if is_distributed else utils.prepare_tensor(g, data, name)
  File "C:\Users\melki\Desktop\JKU\codes\musym-GDL\env\lib\site-packages\dgl\utils\checks.py", line 38, in prepare_tensor
    data = F.tensor(data)
  File "C:\Users\melki\Desktop\JKU\codes\musym-GDL\env\lib\site-packages\dgl\backend\pytorch\tensor.py", line 46, in tensor
    return th.as_tensor(data, dtype=dtype)
TypeError: only integer tensors of a single element can be converted to an index

The text was updated successfully, but these errors were encountered:

manoskary added bug Something isn't working help wanted Extra attention is needed labels Aug 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[manos branch] DGL Node Dataloader is not Compatible with Stratified Sampling #1

[manos branch] DGL Node Dataloader is not Compatible with Stratified Sampling #1

manoskary commented Aug 20, 2021

[manos branch] DGL Node Dataloader is not Compatible with Stratified Sampling #1

[manos branch] DGL Node Dataloader is not Compatible with Stratified Sampling #1

Comments

manoskary commented Aug 20, 2021