Skip to content

Commit

Permalink
Output eval logging (batch level) (#2977)
Browse files Browse the repository at this point in the history
* prelim commit

* fix max answer lengths for cot

* add output logger

* create eval output logger

* fix pyright; git push

* change dist reduce fx

* change dist reduce fx

* fix pyright

* Add nightly docker image (#2452)

Add pytorch nightly and CUDA 12.1 support for composer docker images

What issue(s) does this change relate to?
Related to https://mosaicml.atlassian.net/browse/GRT-2305

Tests
docker image: mosaicml/ci-staging:72744756-794c-4390-94db-72c212dd5e00 (cuda 12.1, pytorch 2.1.0)

mcli connect temp-test-ZAVxMh
Python 3.10.12 (main, Jun  7 2023, 12:45:35) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.version)
<module 'torch.version' from '/usr/lib/python3/dist-packages/torch/version.py'>
>>> print(torch.__version__)
2.1.0.dev20230623+cu121
>>> print(torch.version.cuda)
12.1
Integration Test
@mvpatel2000 has validated that this trains on initial mpt-2 experiments and speeds up training by +7-8% from 0.25 MFU to 0.27 MFU

* Fix local eval (#2465)

* fix autoresume with slashed directory

* Revert "fix autoresume with slashed directory"

This reverts commit 3dfb5f5.

revert

* fix

* fix precommit

* Update in_context_learning_evaluation.py

* Update in_context_learning_evaluation.py

* Update in_context_learning_evaluation.py

* add tests

* Add torch 2.1.0 args for github release-docker workflow

* Log system metrics on each event (#2412)


Signed-off-by: Prithvi Kannan <[email protected]>
Co-authored-by: Evan Racah <[email protected]>
Co-authored-by: eracah <[email protected]>

* Fix torch 2.1.0 docker tag (#2472)

* Upstream Generate Callback  (#2449)

Upstreams and generalizes the callback that logs generations to wandb from foundry to composer.

* Upgrade torch nightly docker image for 0.18.3 NCCL version  (#2476)

Upgrade torch docker nightly version to 08-23-23 so that we get nccl version 0.18.3 which was merged on 08-18-23.

* Test pytorch 2.1.0 docker images on ci/cd (#2469)

Test pytorch 2.1.0 docker images on ci/cd #2469

* Fix huggingface tokenizer loading for slow tokenizers (#2483)

* Deprecate Fused LayerNorm (#2475)

Will be removed in v0.18.

* Transformers upgrade (#2489)

* Update RTD build config with build.os (#2490)

* Update RTD build config with build.os
* Remove python.version

---------

Co-authored-by: Bandish Shah <[email protected]>

* Upgrade torch docker version and github workflow tests (#2488)

* upgrade node version (#2492)

# What does this PR do?
Security vulnerability in `semver` seen due to node. This PR upgrades the node version to bump up semver from 7.5.1 to 7.5.2

# Tests
Action Run: https://github.com/mosaicml/composer/actions/runs/6017539089
Correct version of semver seen after upgrade: 
```
#14 [pytorch_stage  7/24] RUN npm list -g semver --depth=1
#14 2.223 /usr/lib
#14 2.223 `-- [email protected]
#14 2.223   `-- [email protected]
#14 2.223 
#14 DONE 2.4s
```

* Gating tying modules w/ FSDP for torch 2.0 (#2467)

* Gating tying modules w/ FSDP

* Changing weight tying filtering to be less aggressive

* precommit formatting

* Removing min_params (#2494)

* Removing min_params

* formatting?

* removing overlap with another commit

* Fix torchmetrics backwards compatibility issue (#2468)

* add fix

* fix tests

* qwf

* dsfg

* add key

* remove short

* add map test

* remove comment

* filter warning

* simplify wrapping

* checkdown

* fix torchmetrics

* 300

* fix tests

* remove metric

* cleanup

* bug fixes

* fix lint

* fix lint

* fix test

* lint

* remove cuda

* fix tests

* fix ignore

* fix loading

* fix test

* save ckpt

---------

Co-authored-by: Mihir Patel <[email protected]>
Co-authored-by: Daniel King <[email protected]>
Co-authored-by: Your Name <[email protected]>

* Adding some fixes to FSDP tests (#2495)

* Adding some fixes to FSDP tests

* Add filter warnings

* fail count (#2496)

* Remove PR curve metrics from backward compatibility test and skip torch 1.13 (#2497)

* filter warning (#2500)

* bump version (#2498)

* Skip metrics in state dict (#2501)

* skip metrics in state dict

* fix unit tests

* Add peak memory stats (#2504)

* add peak memory stats

* fix tests

* fix sharded ckpt (#2505)

* Bump gitpython from 3.1.31 to 3.1.34 (#2509)

Bumps [gitpython](https://github.com/gitpython-developers/GitPython) from 3.1.31 to 3.1.34.
- [Release notes](https://github.com/gitpython-developers/GitPython/releases)
- [Changelog](https://github.com/gitpython-developers/GitPython/blob/main/CHANGES)
- [Commits](gitpython-developers/GitPython@3.1.31...3.1.34)

---
updated-dependencies:
- dependency-name: gitpython
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Annotate `torch_prof_remote_file_name` as Optional (#2512)

The `torch_prof_remote_file_name` argument of `Profiler` is passed
as the `remote_file_name` argument of `TorchProfiler`, which supports
passing `None` to disable uploading trace files. Prior to this
commit, passing `None` to `Profiler` to do this whilst using a
static type checker led to a type error.

* fix: when there is no train_metrics, do not checkpoint (#2502)

* Remove metric saving (#2514)

* no metric save

* fix docs

* checkdown

* fix tests

* filter warning

* move to device

* fix device gpu

* Update composer/core/state.py

Co-authored-by: Daniel King <[email protected]>

---------

Co-authored-by: Daniel King <[email protected]>

* Fix daily tests by removing gpu marker (#2515)

* Refactor mosaic_fsdp.py (#2506)

* Refactor mosaic_fsdp.py

* Format file

* Rename monkey patch function

* Fix import path

* Format files

* Fix version

* fix pr (#2517)

* Add custom sharding to ChunkShardingSpec (#2507)

* Refactor mosaic_fsdp.py

* Format file

* Rename monkey patch function

* Fix import path

* Format files

* Fix version

* Fix import path

* Monkey patch ChunkShardingSpec to dynamically detect sharding dim

* Format file

* Add non divisible functionality to ChunkShardingSpec

* Format file

* Format file

* Update nightly docker image to torch nightly 09-03-23 (#2518)

* Update pre-commit in setup.py (#2522)

* Add FSDP custom wrap with torch 2.1 (#2460)

* add torch2

* add code

* tag more changes

* Update composer/trainer/mosaic_fsdp.py

Co-authored-by: Vitaliy Chiley <[email protected]>

* monkeypatch init

* raise pins

* add print

* more logs

* change if statements

* remove imports

* remove imports

* fix init

* fix versioning

* add hybrid shard

* checkdown

* revert hsdp

* add peak memory stats

* lint

* imports

* Update composer/trainer/mosaic_fsdp.py

Co-authored-by: Daniel King <[email protected]>

* fix wrap

* fix gate

* lint

* test

* change thresh

* import typing

* fix checks

* nuke pyright

* typo

* Update composer/trainer/mosaic_fsdp.py

Co-authored-by: Brian <[email protected]>

* Update composer/trainer/mosaic_fsdp.py

Co-authored-by: Brian <[email protected]>

* Update composer/trainer/mosaic_fsdp_utils.py

Co-authored-by: Brian <[email protected]>

* resolve comments

* add comments

* add comments

---------

Co-authored-by: Vitaliy Chiley <[email protected]>
Co-authored-by: Daniel King <[email protected]>
Co-authored-by: Brian <[email protected]>

* Fix GCSObjectStore bug where hmac keys auth doesn't work (#2519)

* prelim commit

* add output logger

* create eval output logger

* change dist reduce fx

* Bump gitpython from 3.1.34 to 3.1.35 (#2525)

Bumps [gitpython](https://github.com/gitpython-developers/GitPython) from 3.1.34 to 3.1.35.
- [Release notes](https://github.com/gitpython-developers/GitPython/releases)
- [Changelog](https://github.com/gitpython-developers/GitPython/blob/main/CHANGES)
- [Commits](gitpython-developers/GitPython@3.1.34...3.1.35)

---
updated-dependencies:
- dependency-name: gitpython
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump pytest from 7.4.0 to 7.4.2 (#2523)

Bumps [pytest](https://github.com/pytest-dev/pytest) from 7.4.0 to 7.4.2.
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](pytest-dev/pytest@7.4.0...7.4.2)

---
updated-dependencies:
- dependency-name: pytest
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Upgrade to mlflow version 2.5.0 (#2528)

* disable cifar daily (#2527)

* mosaicml logger robustness improvements (#2530)

* Fix metrics keys sort in DecoupledAdamW for OptimizerMonitor FSDP metric agreggation (#2531)

* Fix github actions for GCS integration testing (#2532)

* fix github actions

* make gpu test

* change dist reduce fx

* fix pyright

* Fix GCS tests (#2535)

* add PR tests

* fix test

* remove pr daily

* remove pr daily

* finish error logging cb

* fix

* add import to init

* add import to init

* add import to init

* add file writing

* add file writing

* add file writing

* add file writing

* add file writing

* move tensors to cpu

* remove tensors

* remove tensors

* remove tensors

* add prompt to qa

* add prompt to qa

* add prompt to qa

* add prompt to qa

* add prompt to qa

* add prompt to qa

* add prompt to qa

* add prompt to qa

* add prompt to qa

* add prompt to qa

* add prompt to qa

* add prompt to qa

* add prompt to qa

* add prompt to qa

* try debugging dist sync issue

* nit

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* debugging

* fix syncing of non tensor state

* added gpu test

* fix error

* finish testing callback

* fix all errors

* test commit

* roll back test commit

* remove ranks

* re-tesT

* add custome gen kwargs and stopping on eos token

* modify test

* modify test

* finish

* finish

* finish

* finish

* finish pr

* implement early stop

* add tesT

* merge

* fix

* finish

* finish

* fix bug

* finish

* bug fix

* add keys

* add correcT

* modify sync

* diff split

* fix typo

* edit condition

* broken wip

* design demonstration commit

* simplify pr

* further simplify

* wip

* add comments

* add other icl metrics

* wip

* change dict method, add more stuff to logging

* fix typos, change some comments

* decode tensors, fix wrong dict key

* fix mc

* 1 to 0 lol

* wip linting

* adjust to step logging

* adjust logging names

* add mflow, rm batch keys

* add comments, check for dict in huggingface model update_metric

* add user specified logging

* move metric_name duplication to update_metric

* wip fix testing

* fix input shape error

* rm init

* rm eval_after_all

* step=None

* step=state.timestamp.batch.value

* update name to include step

* linting, wip on test

* fix test

* pyright wip

* add non-batch warning

* pyright

* debug

* rm this commit that wasn't the right branch

* log at the end of training

* rm silly wandb table logging

* add run_name

* add docstring

* add debug logging

* more logging

* rm info logging

* improve comments

* Update composer/callbacks/eval_output_logging_callback.py

Co-authored-by: Evan Racah <[email protected]>

* rm logging bool

* fix logging for schema tasks

* fix schema / mc tasks

* yapf

* rm reshape

* fix tests

* cleanup test

* pyright

* pyright

* docstring

* pyright

* update tests

* rm attention mask requirement

* Update composer/metrics/nlp.py

Co-authored-by: Mihir Patel <[email protected]>

* Update composer/metrics/nlp.py

Co-authored-by: Mihir Patel <[email protected]>

* rm todo

* lint

* lint

* lint

* more lint

---------

Signed-off-by: Prithvi Kannan <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Jeremy Dohmann <[email protected]>
Co-authored-by: Jeremy D <[email protected]>
Co-authored-by: Charles Tang <[email protected]>
Co-authored-by: Rishab Parthasarathy <[email protected]>
Co-authored-by: Prithvi Kannan <[email protected]>
Co-authored-by: Evan Racah <[email protected]>
Co-authored-by: eracah <[email protected]>
Co-authored-by: Irene Dea <[email protected]>
Co-authored-by: Daniel King <[email protected]>
Co-authored-by: nik-mosaic <[email protected]>
Co-authored-by: bandish-shah <[email protected]>
Co-authored-by: Bandish Shah <[email protected]>
Co-authored-by: bcui19 <[email protected]>
Co-authored-by: Mihir Patel <[email protected]>
Co-authored-by: Your Name <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Scott Stevenson <[email protected]>
Co-authored-by: furkanbiten <[email protected]>
Co-authored-by: Brian <[email protected]>
Co-authored-by: Vitaliy Chiley <[email protected]>
Co-authored-by: Nicholas Garcia <[email protected]>
Co-authored-by: Mikhail Kolesov <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: Tessa Barton <[email protected]>
  • Loading branch information
1 parent c5869d2 commit 594eaef
Show file tree
Hide file tree
Showing 11 changed files with 533 additions and 11 deletions.
2 changes: 2 additions & 0 deletions composer/callbacks/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
from composer.callbacks.activation_monitor import ActivationMonitor
from composer.callbacks.checkpoint_saver import CheckpointSaver
from composer.callbacks.early_stopper import EarlyStopper
from composer.callbacks.eval_output_logging_callback import EvalOutputLogging
from composer.callbacks.export_for_inference import ExportForInferenceCallback
from composer.callbacks.free_outputs import FreeOutputs
from composer.callbacks.generate import Generate
Expand All @@ -35,6 +36,7 @@
'CheckpointSaver',
'MLPerfCallback',
'EarlyStopper',
'EvalOutputLogging',
'ExportForInferenceCallback',
'ThresholdStopper',
'ImageVisualizer',
Expand Down
115 changes: 115 additions & 0 deletions composer/callbacks/eval_output_logging_callback.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
# Copyright 2022 MosaicML Composer authors
# SPDX-License-Identifier: Apache-2.0

"""Log model outputs and expected outputs during ICL evaluation."""

import warnings
from copy import deepcopy
from typing import Any, Dict, List, Sequence, Union

import torch

from composer.core import Callback, State
from composer.loggers import ConsoleLogger, Logger
from composer.utils.dist import all_gather_object


class EvalOutputLogging(Callback):
"""Logs eval outputs for each sample of each ICL evaluation dataset.
ICL metrics are required to support caching the model's responses including information on whether model was correct.
Metrics are responsible for returning the results of individual datapoints in a dictionary of lists.
The callback will log the metric name, the depadded and detokenized input, any data stored in state.metric_outputs, and
any keys from the batch pased into `batch_keys_to_log`. It will do so after every eval batch.
"""

def __init__(self, log_tokens=False, *args, **kwargs):
super().__init__(self, *args, **kwargs)
self.log_tokens = log_tokens
self.columns = None
self.name = None
self.rows = []

def eval_batch_end(self, state: State, logger: Logger) -> None:
if not isinstance(state.batch, Dict):
warnings.warn(
f'''EvalOutputLogging only supports batches that are dictionary. \
Found batch for type {type(state.batch)}. \
Not logging eval outputs.''',
)
return

assert state.outputs is not None
assert state.metric_outputs is not None
logging_dict: Dict[str, Union[List[Any], torch.Tensor, Sequence[torch.Tensor]]] = deepcopy(state.metric_outputs)

# If batch mode is not generate, outputs will be logits
if state.batch['mode'] == 'generate':
# Outputs are already detokenized
logging_dict['outputs'] = state.outputs

input_ids = state.batch['input_ids']
logged_input = []
assert state.dataloader is not None

# Depad and decode input_ids
for input_list in input_ids.tolist():
dataset = state.dataloader.dataset # pyright: ignore[reportGeneralTypeIssues]
depadded_input = [tok for tok in input_list if tok != dataset.pad_tok_id]
logged_input.append(dataset.tokenizer.decode(depadded_input))
logging_dict['input'] = logged_input

# Log token indices if toggled
if self.log_tokens:
logging_dict['input_tokens'] = input_ids.tolist()
if not state.batch['mode'] == 'generate':
if isinstance(state.outputs, torch.Tensor): # pyright
logging_dict['label_tokens'] = state.outputs.tolist()

# Add run_name as a column
run_name_list = [state.run_name for _ in range(0, len(logging_dict['input']))]
logging_dict['run_name'] = run_name_list

# NOTE: This assumes _any_ tensor logged are tokens to be decoded.
# This might not be true if, for example, logits are logged.

# Detokenize data in rows
for key, value in logging_dict.items():
# All types in list are the same
if isinstance(value[0], torch.Tensor):
logging_dict[key] = [
state.dataloader.dataset.tokenizer.decode(t) # pyright: ignore[reportGeneralTypeIssues]
for t in value
]
elif isinstance(value[0], list):
if isinstance(value[0][0], torch.Tensor):
tokenizer = state.dataloader.dataset.tokenizer # pyright: ignore[reportGeneralTypeIssues]
logging_dict[key] = [[tokenizer.decode(choice) for choice in t] for t in value]

# Convert logging_dict from kv pairs of column name and column values to a list of rows
# Example:
# logging_dict = {"a": ["1a", "2a"], "b": ["1b", "2b"]}
# will become
# columns = {"a", "b"}, rows = [["1a", "1b"], ["2a", "2b"]]
columns = list(logging_dict.keys())
rows = [list(item) for item in zip(*logging_dict.values())]

assert state.dataloader_label is not None
if not self.name:
# If only running eval, step will be 0
# If running training, step will be current training step
step = state.timestamp.batch.value
self.name = f'{state.dataloader_label}_step_{step}'
self.columns = columns
self.rows.extend(rows)

def eval_end(self, state: State, logger: Logger) -> None:
list_of_rows = all_gather_object(self.rows)
rows = [row for rows in list_of_rows for row in rows]
for dest_logger in logger.destinations:
if not isinstance(dest_logger, ConsoleLogger):
dest_logger.log_table(self.columns, rows, name=self.name, step=state.timestamp.batch.value)

self.rows = []
self.name = None
self.columns = None
2 changes: 2 additions & 0 deletions composer/core/state.py
Original file line number Diff line number Diff line change
Expand Up @@ -549,6 +549,8 @@ def __init__(
self.eval_metric_values: Dict[str, float] = {}
self.total_loss_dict: Dict[str, float] = {}

self.metric_outputs: Dict[str, Any] = {}

def _dataset_of(self, dataloader: Optional[Union[Evaluator, DataSpec, DataLoader, Iterable]]) -> Optional[Dataset]:
"""Get the dataset contained by the given dataloader-like object.
Expand Down
7 changes: 5 additions & 2 deletions composer/loggers/in_memory_logger.py
Original file line number Diff line number Diff line change
Expand Up @@ -87,8 +87,11 @@ def log_table(
conda_package='pandas',
conda_channel='conda-forge',
) from e
table = pd.DataFrame.from_records(data=rows, columns=columns).to_json(orient='split', index=False)
assert isinstance(table, str)
table = pd.DataFrame.from_records(data=rows,
columns=columns).to_json(orient='split', index=False, force_ascii=False)
assert table is not None
# Merged assert is different
# assert isinstance(table, str)
self.tables[name] = table

def log_metrics(self, metrics: Dict[str, Any], step: Optional[int] = None) -> None:
Expand Down
4 changes: 3 additions & 1 deletion composer/loggers/wandb_logger.py
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,8 @@ def __init__(
self.run_dir: Optional[str] = None
self.run_url: Optional[str] = None

self.table_dict = {}

def _set_is_in_atexit(self):
self._is_in_atexit = True

Expand All @@ -130,7 +132,7 @@ def log_table(
if self._enabled:
import wandb
table = wandb.Table(columns=columns, rows=rows)
wandb.log({name: table}, step)
wandb.log({name: table}, step=step)

def log_metrics(self, metrics: Dict[str, Any], step: Optional[int] = None) -> None:
if self._enabled:
Expand Down
Loading

0 comments on commit 594eaef

Please sign in to comment.