Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Install Question-Generation via pip rather than cloning #34

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
35 changes: 35 additions & 0 deletions .github/workflows/python-publish.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# This workflows will upload a Python Package using Twine when a release is created
# For more information see: https://help.github.com/en/actions/language-and-framework-guides/using-python-with-github-actions#publishing-to-package-registries

name: Upload Python Package

on:
# release:
# types: [created]
push:
branches: [ master ]

jobs:
deploy:

runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.x'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install setuptools wheel twine
pip install -r requirements.txt
- name: Build
run: |
python setup.py sdist bdist_wheel
- name: Archive Distribution Files
uses: actions/upload-artifact@v1
with:
name: wheel
path: ./dist/question_generation-**.whl
18 changes: 8 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,9 +131,7 @@ The [nlg-eval](https://github.com/Maluuba/nlg-eval) package is used for calculat

## Requirements
```
transformers==3.0.0
nltk
nlp==0.2.0 # only if you want to fine-tune.
python -m pip install git+https://github.com/patil-suraj/question_generation.git
```

after installing `nltk` do
Expand All @@ -154,7 +152,7 @@ The pipeline is divided into 3 tasks
#### Question Generation

```python3
from pipelines import pipeline
from question_generation import pipeline

nlp = pipeline("question-generation")
nlp("42 is the answer to life, the universe and everything.")
Expand Down Expand Up @@ -224,7 +222,7 @@ The datasets will be saved in `data/` directory. You should provide filenames us

**process data for single task question generation with highlight_qg_format**
```bash
python prepare_data.py \
python -m question_generation.prepare_data.py \
--task qg \
--model_type t5 \
--dataset_path data/squad_multitask/ \
Expand All @@ -240,7 +238,7 @@ python prepare_data.py \
`valid_for_qg_only` argument is used to decide if the validation set should only contain data for qg task. For my multi-task experiments I used validation data with only qg task so that the eval loss curve can be easly compared with other single task models

```bash
python prepare_data.py \
python -m question_generation.prepare_data.py \
--task multi \
--valid_for_qg_only \
--model_type t5 \
Expand All @@ -254,7 +252,7 @@ python prepare_data.py \

**process dataset for end-to-end question generation**
```bash
python prepare_data.py \
python -m question_generation.prepare_data.py \
--task e2e_qg \
--valid_for_qg_only \
--model_type t5 \
Expand All @@ -271,7 +269,7 @@ Use the `run_qg.py` script to start training. It uses transformers `Trainer` cl


```bash
python run_qg.py \
python -m question_generation.run_qg.py \
--model_name_or_path t5-small \
--model_type t5 \
--tokenizer_name_or_path t5_qg_tokenizer \
Expand All @@ -293,7 +291,7 @@ python run_qg.py \
or if you want to train it from script or notebook then

```python3
from run_qg import run_qg
from question_generation import run_qg

args_dict = {
"model_name_or_path": "t5-small",
Expand Down Expand Up @@ -323,7 +321,7 @@ run_qg(args_dict)
Use the `eval.py` script for evaluting the model.

```bash
python eval.py \
python -m question_generation.eval.py \
--model_name_or_path t5-base-qg-hl \
--valid_file_path valid_data_qg_hl_t5.pt \
--model_type t5 \
Expand Down
4 changes: 4 additions & 0 deletions question_generation/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
from .pipelines import pipeline
from .run_qg import run_qg

__version__ = "0.1.0"
File renamed without changes.
2 changes: 1 addition & 1 deletion eval.py → question_generation/eval.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
from tqdm.auto import tqdm
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, HfArgumentParser

from data_collator import T2TDataCollator
from question_generation.data_collator import T2TDataCollator

device = 'cuda' if torch.cuda.is_available else 'cpu'

Expand Down
File renamed without changes.
File renamed without changes.
12 changes: 4 additions & 8 deletions run_qg.py → question_generation/run_qg.py
Original file line number Diff line number Diff line change
@@ -1,28 +1,24 @@
import dataclasses
import json
import logging
import os
import sys
from dataclasses import dataclass, field
from typing import Dict, List, Optional
from typing import Optional

import numpy as np
import torch

from transformers import (
AutoModelForSeq2SeqLM,
AutoTokenizer,
T5Tokenizer,
BartTokenizer,
HfArgumentParser,
DataCollator,
TrainingArguments,
set_seed,
)

from trainer import Trainer
from data_collator import T2TDataCollator
from utils import freeze_embeds, assert_not_all_frozen
from question_generation.trainer import Trainer
from question_generation.data_collator import T2TDataCollator
from question_generation.utils import freeze_embeds, assert_not_all_frozen

MODEL_TYPE_TO_TOKENIZER = {
"t5": T5Tokenizer,
Expand Down
2 changes: 1 addition & 1 deletion trainer.py → question_generation/trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
if is_apex_available():
from apex import amp

from utils import label_smoothed_nll_loss
from question_generation.utils import label_smoothed_nll_loss

class Trainer(HFTrainer):
def __init__(self, label_smoothing: float = 0, **kwargs):
Expand Down
File renamed without changes.
4 changes: 4 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
transformers>=3.0.0
nltk
nlp>=0.2.0
torch
8 changes: 8 additions & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
[metadata]
license = MIT
license-file = LICENSE
description-file = README.md
platform = any

[bdist_wheel]
universal = 1
33 changes: 33 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
from setuptools import setup, find_packages

from question_generation import __version__

with open("README.md", "r") as f:
long_description = f.read()

setup(
name="question_generation",
packages=find_packages(),
version=__version__,
url="https://github.com/patil-suraj/question_generation",
license="MIT",
author="Suraj Patil",
author_email="[email protected]",
description="Question generation is the task of automatically generating questions from a text paragraph.",
install_requires=["transformers>=3.0.0", "nltk", "nlp>=0.2.0", "torch"],
python_requires=">=3.6",
include_package_data=True,
platforms="any",
long_description=long_description,
long_description_content_type="text/markdown",
classifiers=[
"Operating System :: OS Independent",
"License :: OSI Approved :: MIT License",
"Topic :: Utilities",
"Intended Audience :: Developers",
"Topic :: Software Development :: Libraries :: Python Modules",
"Programming Language :: Python :: 3.6",
"Programming Language :: Python :: 3.7",
"Programming Language :: Python :: 3.8",
],
)