Skip to content

Commit

Permalink
Cherry pick #2241 and #2242 (#2244)
Browse files Browse the repository at this point in the history
* Remove torchdata dependency from package and from CI (#2241)

* Fix torchdata import error (#2242)

* Remove stuff

* stuff

* lint

---------

Co-authored-by: Nicolas Hug <[email protected]>
  • Loading branch information
huydhn and NicolasHug authored Mar 23, 2024
1 parent 57ed43c commit 2c4ce95
Show file tree
Hide file tree
Showing 55 changed files with 72 additions and 191 deletions.
5 changes: 0 additions & 5 deletions .circleci/unittest/linux/scripts/install.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
#!/usr/bin/env bash

unset PYTORCH_VERSION
unset TORCHDATA_VERSION
# For unittest, nightly PyTorch is used as the following section,
# so no need to set PYTORCH_VERSION.
# In fact, keeping PYTORCH_VERSION forces us to hardcode PyTorch version in config.
Expand Down Expand Up @@ -30,10 +29,6 @@ printf "* Installing PyTorch\n"
)


printf "Installing torchdata nightly with portalocker\n"
pip install "portalocker>=2.0.0"
pip install --pre torchdata --index-url https://download.pytorch.org/whl/nightly/cpu

printf "* Installing torchtext\n"
python setup.py develop

Expand Down
5 changes: 0 additions & 5 deletions .circleci/unittest/windows/scripts/install.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
#!/usr/bin/env bash

unset PYTORCH_VERSION
unset TORCHDATA_VERSION
# For unittest, nightly PyTorch is used as the following section,
# so no need to set PYTORCH_VERSION.
# In fact, keeping PYTORCH_VERSION forces us to hardcode PyTorch version in config.
Expand All @@ -19,10 +18,6 @@ conda activate ./env
printf "* Installing PyTorch\n"
conda install -y -c "pytorch-${UPLOAD_CHANNEL}" ${CONDA_CHANNEL_FLAGS} pytorch cpuonly

printf "* Installing torchdata nightly with portalocker\n"
pip install "portalocker>=2.0.0"
pip install --pre torchdata --index-url https://download.pytorch.org/whl/nightly/cpu

printf "* Installing pywin32_postinstall script\n"
curl --output pywin32_postinstall.py https://raw.githubusercontent.com/mhammond/pywin32/main/pywin32_postinstall.py
python pywin32_postinstall.py -install
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/build-conda-linux.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ jobs:
matrix:
include:
- repository: pytorch/text
pre-script: packaging/install_torchdata.sh
pre-script: ""
post-script: ""
conda-package-directory: packaging/torchtext
smoke-test-script: test/smoke_tests/smoke_tests.py
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/build-conda-m1.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ jobs:
matrix:
include:
- repository: pytorch/text
pre-script: packaging/install_torchdata.sh
pre-script: ""
post-script: ""
conda-package-directory: packaging/torchtext
smoke-test-script: test/smoke_tests/smoke_tests.py
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/build-conda-windows.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ jobs:
matrix:
include:
- repository: pytorch/text
pre-script: packaging/install_torchdata.sh
pre-script: ""
post-script: ""
conda-package-directory: packaging/torchtext
smoke-test-script: test/smoke_tests/smoke_tests.py
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/build-wheels-linux.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ jobs:
matrix:
include:
- repository: pytorch/text
pre-script: packaging/install_torchdata.sh
pre-script: ""
post-script: ""
smoke-test-script: test/smoke_tests/smoke_tests.py
package-name: torchtext
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/build-wheels-m1.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ jobs:
matrix:
include:
- repository: pytorch/text
pre-script: packaging/install_torchdata.sh
pre-script: ""
post-script: ""
package-name: torchtext
smoke-test-script: test/smoke_tests/smoke_tests.py
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/build-wheels-windows.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ jobs:
matrix:
include:
- repository: pytorch/text
pre-script: packaging/install_torchdata.sh
pre-script: ""
env-script: packaging/vc_env_helper.bat
post-script: ""
smoke-test-script: test/smoke_tests/smoke_tests.py
Expand Down
1 change: 0 additions & 1 deletion .github/workflows/codeql.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,6 @@ jobs:
- name: Install Torch
run: |
python -m pip install cmake
python -m pip install --quiet --pre torch torchdata -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
sudo ln -s /usr/bin/ninja /usr/bin/ninja-build
- name: Build TorchText
Expand Down
4 changes: 1 addition & 3 deletions .github/workflows/integration-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,15 +39,13 @@ jobs:
python -m spacy download en_core_web_sm
printf "* Downloading SpaCy German models\n"
python -m spacy download de_core_news_sm
# Install PyTorch, Torchvision, and TorchData
# Install PyTorch, Torchvision
set -ex
conda install \
--yes \
-c "pytorch-${CHANNEL}" \
-c nvidia "pytorch-${CHANNEL}"::pytorch[build="*${VERSION}*"] \
"${CUDATOOLKIT}"
printf "Installing torchdata nightly\n"
python3 -m pip install --pre torchdata --index-url https://download.pytorch.org/whl/nightly/cpu
python3 setup.py develop
# Install integration test dependencies
python3 -m pip --quiet install parameterized
Expand Down
5 changes: 1 addition & 4 deletions .github/workflows/test-linux-cpu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,16 +50,13 @@ jobs:
printf "* Downloading SpaCy German models\n"
python -m spacy download de_core_news_sm
# Install PyTorch, Torchvision, and TorchData
# Install PyTorch, Torchvision
set -ex
conda install \
--yes \
-c "pytorch-${CHANNEL}" \
-c nvidia "pytorch-${CHANNEL}"::pytorch[build="*${VERSION}*"] \
"${CUDATOOLKIT}"
printf "Installing torchdata nightly\n"
python3 -m pip install "portalocker>=2.0.0"
python3 -m pip install --pre torchdata --index-url https://download.pytorch.org/whl/nightly/cpu
python3 setup.py develop
python3 -m pip install parameterized
Expand Down
5 changes: 1 addition & 4 deletions .github/workflows/test-linux-gpu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -54,17 +54,14 @@ jobs:
printf "* Downloading SpaCy German models\n"
python -m spacy download de_core_news_sm
# Install PyTorch and TorchData
# Install PyTorch
set -ex
conda install \
--yes \
--quiet \
-c "pytorch-${CHANNEL}" \
-c nvidia "pytorch-${CHANNEL}"::pytorch[build="*${VERSION}*"] \
"${CUDATOOLKIT}"
printf "Installing torchdata nightly\n"
python3 -m pip install "portalocker>=2.0.0"
python3 -m pip install --pre torchdata --index-url https://download.pytorch.org/whl/nightly/cpu --quiet
python3 setup.py develop
python3 -m pip install parameterized --quiet
Expand Down
5 changes: 1 addition & 4 deletions .github/workflows/test-macos-cpu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ jobs:
printf "* Downloading SpaCy German models\n"
python -m spacy download de_core_news_sm
# Install PyTorch, Torchvision, and TorchData
# Install PyTorch, Torchvision
set -ex
conda install \
--yes \
Expand All @@ -64,9 +64,6 @@ jobs:
"${MKL_CONSTRAINT}" \
pytorch \
"${CUDATOOLKIT}"
printf "Installing torchdata nightly\n"
python3 -m pip install "portalocker>=2.0.0"
python3 -m pip install --pre torchdata --index-url https://download.pytorch.org/whl/nightly/cpu
python3 setup.py develop
python3 -m pip install parameterized
Expand Down
5 changes: 1 addition & 4 deletions .github/workflows/test-windows-cpu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -51,15 +51,12 @@ jobs:
printf "* Downloading SpaCy German models\n"
python -m spacy download de_core_news_sm
# Install PyTorch, Torchvision, and TorchData
# Install PyTorch, Torchvision
conda install \
--yes \
-c "pytorch-${CHANNEL}" \
pytorch \
cpuonly
printf "Installing torchdata nightly\n"
python -m pip install "portalocker>=2.0.0"
python -m pip install --pre torchdata --index-url https://download.pytorch.org/whl/nightly/cpu
printf "* Installing pywin32_postinstall script\n"
curl --output pywin32_postinstall.py https://raw.githubusercontent.com/mhammond/pywin32/main/pywin32_postinstall.py
Expand Down
5 changes: 5 additions & 0 deletions .github/workflows/validate-binaries.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,11 @@ on:
default: ""
required: false
type: string
pytorch_version:
description: "PyTorch version to validate (ie. 2.0, 2.2.2, etc.) - optional"
default: ""
required: false
type: string
jobs:
validate-binaries:
uses: pytorch/test-infra/.github/workflows/validate-domain-library.yml@release/2.2
Expand Down
3 changes: 3 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@
torchtext
+++++++++

CAUTION: As of September 2023 we have paused active development of TorchText because our focus has shifted away from building out this library offering.
We will continue to release new versions but do not anticipate any new feature development as we figure out future investments in this space.

This repository consists of:

* `torchtext.datasets <https://github.com/pytorch/text/tree/main/torchtext/datasets>`_: The raw text iterators for common NLP datasets
Expand Down
40 changes: 0 additions & 40 deletions packaging/install_torchdata.sh

This file was deleted.

12 changes: 0 additions & 12 deletions packaging/pkg_helpers.bash
Original file line number Diff line number Diff line change
Expand Up @@ -190,14 +190,6 @@ setup_pip_pytorch_version() {
-f https://download.pytorch.org/whl/torch_stable.html \
-f "https://download.pytorch.org/whl/${UPLOAD_CHANNEL}/torch_${UPLOAD_CHANNEL}.html"
fi
if [[ -z "$TORCHDATA_VERSION" ]]; then
pip_install --pre torchdata -f "https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html"
export TORCHDATA_VERSION="$(pip show torchdata | grep ^Version: | sed 's/Version: *//' | sed 's/+.\+//')"
else
pip_install "torchdata==$TORCHDATA_VERSION" \
-f https://download.pytorch.org/whl/torch_stable.html \
-f "https://download.pytorch.org/whl/${UPLOAD_CHANNEL}/torch_${UPLOAD_CHANNEL}.html"
fi
}

# Fill PYTORCH_VERSION with the latest conda nightly version, and
Expand Down Expand Up @@ -232,10 +224,6 @@ setup_conda_pytorch_constraint() {
export CONDA_EXTRA_BUILD_CONSTRAINT="- mkl<=2021.2.0"
fi
fi
if [[ -z "$TORCHDATA_VERSION" ]]; then
export TORCHDATA_VERSION="$(conda search --json 'torchdata[channel=pytorch-nightly]' | ${PYTHON} -c "import sys, json, re; print(re.sub(r'\\+.*$', '', json.load(sys.stdin)['torchdata'][-1]['version']))")"
fi
export CONDA_TORCHDATA_CONSTRAINT="- torchdata==$TORCHDATA_VERSION"
}

# Translate CUDA_VERSION into CUDA_CUDATOOLKIT_CONSTRAINT
Expand Down
1 change: 0 additions & 1 deletion packaging/torchtext/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,6 @@ requirements:
- requests
- tqdm
{{ environ.get('CONDA_PYTORCH_CONSTRAINT') }}
{{ environ.get('CONDA_TORCHDATA_CONSTRAINT') }}

build:
string: py{{py}}
Expand Down
1 change: 1 addition & 0 deletions pytest.ini
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
[pytest]
addopts = --ignore-glob=test/torchtext_unittest/datasets/*
testpaths = test/
python_paths = ./
markers =
Expand Down
1 change: 0 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@ Sphinx
pytest
expecttest
parameterized
torchdata>0.5

# Lets pytest find our code by automatically modifying PYTHONPATH
pytest-pythonpath
Expand Down
6 changes: 1 addition & 5 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,14 +63,10 @@ def _init_submodule():
print("-- Building version " + VERSION)

pytorch_package_version = os.getenv("PYTORCH_VERSION")
torchdata_package_version = os.getenv("TORCHDATA_VERSION")

pytorch_package_dep = "torch"
if pytorch_package_version is not None:
pytorch_package_dep += "==" + pytorch_package_version
torchdata_package_dep = "torchdata"
if torchdata_package_version is not None:
torchdata_package_dep += "==" + torchdata_package_version


class clean(distutils.command.clean.clean):
Expand Down Expand Up @@ -104,7 +100,7 @@ def run(self):
description="Text utilities, models, transforms, and datasets for PyTorch.",
long_description=read("README.rst"),
license="BSD",
install_requires=["tqdm", "requests", pytorch_package_dep, "numpy", torchdata_package_dep],
install_requires=["tqdm", "requests", pytorch_package_dep, "numpy"],
python_requires=">=3.8",
classifiers=[
"Programming Language :: Python :: 3.8",
Expand Down
22 changes: 0 additions & 22 deletions test/smoke_tests/smoke_tests.py
Original file line number Diff line number Diff line change
@@ -1,28 +1,6 @@
"""Run smoke tests"""

import os
import re

import torchdata
import torchtext
import torchtext.version # noqa: F401

NIGHTLY_ALLOWED_DELTA = 3
channel = os.getenv("MATRIX_CHANNEL")


def validateTorchdataVersion():
from datetime import datetime

date_t_str = re.findall(r"dev\d+", torchdata.__version__)[0]
date_t_delta = datetime.now() - datetime.strptime(date_t_str[3:], "%Y%m%d")

if date_t_delta.days >= NIGHTLY_ALLOWED_DELTA:
raise RuntimeError(f"torchdata binary {torchdata.__version__} is more than {NIGHTLY_ALLOWED_DELTA} days old!")


if channel == "nightly":
validateTorchdataVersion()

print("torchtext version is ", torchtext.__version__)
print("torchdata version is ", torchdata.__version__)
1 change: 0 additions & 1 deletion torchtext/_download_hooks.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@

# This is to allow monkey-patching in fbcode
from torch.hub import load_state_dict_from_url # noqa
from torchdata.datapipes.iter import HttpReader, GDriveReader # noqa F401
from tqdm import tqdm


Expand Down
3 changes: 1 addition & 2 deletions torchtext/datasets/ag_news.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@
from functools import partial
from typing import Union, Tuple

from torchdata.datapipes.iter import FileOpener, IterableWrapper
from torchtext._download_hooks import HttpReader
from torchtext._internal.module_utils import is_module_available
from torchtext.data.datasets_utils import (
_wrap_split_argument,
Expand Down Expand Up @@ -65,6 +63,7 @@ def AG_NEWS(root: str, split: Union[Tuple[str], str]):
raise ModuleNotFoundError(
"Package `torchdata` not found. Please install following instructions at https://github.com/pytorch/data"
)
from torchdata.datapipes.iter import FileOpener, GDriveReader, HttpReader, IterableWrapper # noqa

url_dp = IterableWrapper([URL[split]])
cache_dp = url_dp.on_disk_cache(
Expand Down
3 changes: 1 addition & 2 deletions torchtext/datasets/amazonreviewfull.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@
from functools import partial
from typing import Union, Tuple

from torchdata.datapipes.iter import FileOpener, IterableWrapper
from torchtext._download_hooks import GDriveReader
from torchtext._internal.module_utils import is_module_available
from torchtext.data.datasets_utils import (
_wrap_split_argument,
Expand Down Expand Up @@ -79,6 +77,7 @@ def AmazonReviewFull(root: str, split: Union[Tuple[str], str]):
raise ModuleNotFoundError(
"Package `torchdata` not found. Please install following instructions at https://github.com/pytorch/data"
)
from torchdata.datapipes.iter import FileOpener, GDriveReader, HttpReader, IterableWrapper # noqa

url_dp = IterableWrapper([URL])
cache_compressed_dp = url_dp.on_disk_cache(
Expand Down
Loading

0 comments on commit 2c4ce95

Please sign in to comment.