Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long context evals using hugging face hosted datasets #709

Closed

Conversation

maxisawesome
Copy link
Contributor

@maxisawesome maxisawesome commented Nov 1, 2023

Not ready to merge!

Adds long context eval tasks for naively support for LEval QA tasks as well as generated long context tasks (that are padded up to 2k, 4k, and 8k token context length). Both have been uploaded to HF datasets to avoid checking large files into our repo.

LEval:

Supported tasks are in scripts/eval/yamls/leval_tasks.yaml.
Adds very basic HF dataset parsing. I wrote a specific function for LEval tasks. Eventually, this per-dataset parsing that will be required for most arbitrary HF tasks should likely live in Composer. Otherwise, the yaml logic is as follows:

  label: kv_pairs_middle_4k
  dataset_uri: hf://maxisawesome/long_context_eval 
  num_fewshot: [0]
  icl_task_type: question_answering
  hf_vars: 
    name: kv_pairs
    context_length: 4096
    section: middle 
  hf_cols:
    inputs: ["context"]
    outputs: ["answer"] 

llm-foundry will remove the hf:// from dataset_uri and load that dataset (here, it will load maxisawesome/long_context_eval).
Everything under hf_vars will be passed into the load_dataset func as keyword args.
llm-foundry will concatenate together the HF dataset columns listed under hf_cols.inputs as the context for the model.
llm-foundry will concatenate together the HF dataset columns listed under hf_cols.outputs as the expected answer.

If pivot_col is specified under hf_cols, will treat each row in the dataset as pivot_col being the main context, inputs being the instruction, and outputs being the desired answer.
(For clarity, LEval has many tasks set up where one row consists of one col of 15 questions, one col of a single document, and one col of 15 answers. The current form of this setup is not the final version, just a temporary working solution.)

Previous notes for generated tasks:

  • HotPotQA - 10+ concatenated "documents" (3-4 sentences from a wikipedia article) with a single question at the end. Sometimes this requires two "documents" (multi-hop QA) to properly answer the question. While 1 or 2 docs are needed to answer the question, the rest are randomly chosen "distractor" docs that are not relevant to the question.
  • WikiQA (WikiQA-Altered_Numeric_QA) - Single documents from wikipedia at different context lengths. There are significantly more HTML tags in this dataset. All answers are some sort of number. This needs to be reuploaded to HF.
  • KV_Pairs - Constructed JSONs with Key Value pairs. The question will be a single key and and the expected answer is the corresponding value that was listed in the JSON.
    Caveats:
  • Fewshot versions of these tasks should be pregenerated (because right now every example is approximately the full context length, so fewshot = 5 for the 4k context length task would be 20k total tokens)

generation scripts for these datasets are not included.

@maxisawesome maxisawesome changed the title Long context from hugging face Long context evals using hugging face hosted datasets Nov 1, 2023
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @maxisawesome!

It might be worth passing in the hugging face variables into the get_icl_task_dataloader function. Maybe add

hf_loading_vars=icl_cfg.get('hf_loading_vars', {}),
hf_parsing_map=icl_cfg.get('hf_parsing_map', {})

in line 304 originally and in 358 in your new commit. These allows you to pass parameters into hugging face's load_dataset function. In particular, this was helpful in specifying which split of the hugging face dataset, I'd like to evaluate such as hf_loading_vars = {'split': 'train'}.

@maxisawesome
Copy link
Contributor Author

outdated. Main content of this branch was merged here: #925

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants