Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: Directly link to API in docs, remove repetitions #90

Merged
merged 4 commits into from
May 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 10 additions & 8 deletions docs/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,18 +11,20 @@ pip install neural-pipeline-search
```

## The 3 Main Components
1. **Define a [`run_pipeline=`](./reference/run_pipeline.md) Function**: This function is essential
for evaluating different configurations. You'll implement the specific logic for your problem within this function.
1. **Execute with [`neps.run()`](./reference/neps_run.md)**:
Optimize your `run_pipeline=` over the `pipeline_space=` using this function.
For a thorough overview of the arguments and their explanations, check out the detailed documentation.

2. **Define a [`run_pipeline=`](./reference/run_pipeline.md) Function**:
This function is essential for evaluating different configurations.
You'll implement the specific logic for your problem within this function.
For detailed instructions on initializing and effectively using `run_pipeline=`, refer to the guide.

2. **Establish a [`pipeline_space=`](./reference/pipeline_space.md)**: Your search space for
defining parameters. You can structure this in various formats, including dictionaries, YAML, or ConfigSpace.
3. **Establish a [`pipeline_space=`](./reference/pipeline_space.md)**:
Your search space for defining parameters.
You can structure this in various formats, including dictionaries, YAML, or ConfigSpace.
The guide offers insights into defining and configuring your search space.

3. **Execute with [`neps.run=`](./reference/neps_run.md)**: Optimize your `run_pipeline=` over
the `pipeline_space=` using this function. For a thorough overview of the arguments and their explanations,
check out the detailed documentation.

By following these steps and utilizing the extensive resources provided in the guides, you can tailor NePS to meet
your specific requirements, ensuring a streamlined and effective optimization process.

Expand Down
235 changes: 102 additions & 133 deletions docs/reference/analyse.md
Original file line number Diff line number Diff line change
@@ -1,145 +1,144 @@
# Analysing Runs
NePS has some convenient utilities to help you to understand the results after you've run your runs.
All of the results and state are stored and communicated on disk, which you can access using
the `#!bash python -m neps.status ROOT_DIRECTORY` command or integrate live logging directly into your training loop
and visualize the results using TensorBoard.

NePS has some convenient utilities to help you to understand the results of your run.
To get a quick overview of the results, you can use the `#!bash python -m neps.plot ROOT_DIRECTORY` command.

## Saved to disk
In the root directory, NePS maintains several files at all times that are human readable and can be useful

```
ROOT_DIRECTORY
├── results
│ └── config_1
│ ├── config.yaml
│ ├── metadata.yaml
│ └── result.yaml
├── all_losses_and_configs.txt
├── best_loss_trajectory.txt
└── best_loss_with_config_trajectory.txt
```
## Status

## Summary CSV
The argument `post_run_summary` in `neps.run` allows for the automatic generation of CSV files after a run is complete.
The new root directory after utilizing this argument will look like the following:
To show status information about a neural pipeline search run, use

```bash
python -m neps.status ROOT_DIRECTORY
```
ROOT_DIRECTORY
├── results
│ └── config_1
│ ├── config.yaml
│ ├── metadata.yaml
│ └── result.yaml
├── summary_csv
│ ├── config_data.csv
│ └── run_status.csv
├── all_losses_and_configs.txt
├── best_loss_trajectory.txt
└── best_loss_with_config_trajectory.txt
```

- *`config_data.csv`*: Contains all configuration details in CSV format, ordered by ascending `loss`.
Details include configuration hyperparameters, any returned result from the `run_pipeline` function, and metadata information.

- *`run_status.csv`*: Provides general run details, such as the number of sampled configs, best configs, number of failed configs, best loss, etc.

## TensorBoard Integration
If you need more status information than is printed per default (e.g., the best config over time), please have a look at

### Introduction
```bash
python -m neps.status --help
```

[TensorBoard](https://www.tensorflow.org/tensorboard) serves as a valuable tool for visualizing machine learning experiments, offering the ability to observe losses and metrics throughout the model training process.
In NePS, we use this powerful tool to show metrics of configurations during training in addition to comparisons to different hyperparameters used in the search for better diagnosis of the model.
!!! tip "Using `watch`"

### The Logging Function
To show the status repeatedly, on unix systems you can use

The `tblogger.log` function is invoked within the model's training loop to facilitate logging of key metrics.
```bash
watch --interval 30 python -m neps.status ROOT_DIRECTORY
```

!!! tip
## CLI commands

The logger function is primarily designed for implementation within the `run_pipeline` function during the training of the neural network.
To generate plots to the root directory, run

- **Signature:**
```python
tblogger.log(
loss: float,
current_epoch: int,
write_config_scalar: bool = False,
write_config_hparam: bool = True,
write_summary_incumbent: bool = False,
extra_data: dict | None = None
)
```bash
python -m neps.plot ROOT_DIRECTORY
```

- **Parameters:**
- `loss` (float): The loss value to be logged.
- `current_epoch` (int): The current epoch or iteration number.
- `write_config_scalar` (bool, optional): Set to `True` for a live loss trajectory for each configuration.
- `write_config_hparam` (bool, optional): Set to `True` for live parallel coordinate, scatter plot matrix, and table view.
- `write_summary_incumbent` (bool, optional): Set to `True` for a live incumbent trajectory.
- `extra_data` (dict, optional): Additional data to be logged, provided as a dictionary.
Currently, this creates one plot that shows the best error value across the number of evaluations.

### Extra Custom Logging
## What's on disk?
In the root directory, NePS maintains several files at all times that are human readable and can be useful
If you pass the `post_run_summary=` argument to [`neps.run()`][neps.api.run],
NePS will also generate a summary CSV file for you.

=== "`neps.run(..., post_run_summary=True)`"

```
ROOT_DIRECTORY
├── results
│ └── config_1
│ ├── config.yaml
│ ├── metadata.yaml
│ └── result.yaml
├── summary_csv
│ ├── config_data.csv
│ └── run_status.csv
├── all_losses_and_configs.txt
├── best_loss_trajectory.txt
└── best_loss_with_config_trajectory.txt
```


=== "`neps.run(..., post_run_summary=False)`"

```
ROOT_DIRECTORY
├── results
│ └── config_1
│ ├── config.yaml
│ ├── metadata.yaml
│ └── result.yaml
├── all_losses_and_configs.txt
├── best_loss_trajectory.txt
└── best_loss_with_config_trajectory.txt
```


The `config_data.csv` contains all configuration details in CSV format, ordered by ascending `loss`.
Details include configuration hyperparameters, any returned result from the `run_pipeline` function, and metadata information.

NePS provides dedicated functions for customized logging using the `extra_data` argument.
The `run_status.csv` provides general run details, such as the number of sampled configs, best configs, number of failed configs, best loss, etc.

!!! note "Custom Logging Instructions"
## TensorBoard Integration
[TensorBoard](https://www.tensorflow.org/tensorboard) serves as a valuable tool for visualizing machine learning experiments,
offering the ability to observe losses and metrics throughout the model training process.
In NePS, we use this to show metrics of configurations during training in addition to comparisons to different hyperparameters used in the search for better diagnosis of the model.

Name the dictionary keys as the names of the values you want to log and pass one of the following functions as the values for a successful logging process.
### Logging Things

#### 1- Extra Scalar Logging
The [`tblogger.log()`][neps.plot.tensorboard_eval.tblogger.log] function is invoked
within the model's training loop to facilitate logging of key metrics.

Logs new scalar data during training. Uses `current_epoch` from the log function as its `global_step`.
We also provide some utility functions to make it easier to log things like:

- **Signature:**
```python
tblogger.scalar_logging(value: float)
```
- **Parameters:**
- `value` (float): Any scalar value to be logged at the current epoch of `tblogger.log` function.
* Scalars through [`tblogger.scalar_logging()`][neps.plot.tensorboard_eval.tblogger.scalar_logging]
* Images through [`tblogger.image_logging()`][neps.plot.tensorboard_eval.tblogger.image_logging]

#### 2- Extra Image Logging
You can provide these through the `extra_data=` argument in the `tblogger.log()` function.

Logs images during training. Images can be resized, randomly selected, and a specified number can be logged at specified intervals. Uses `current_epoch` from the log function as its `global_step`.
For an example usage of all these features please refer to the [example](../examples/convenience/neps_tblogger_tutorial.md)!

- **Signature:**
```python
tblogger.image_logging(
image: torch.Tensor,
counter: int = 1,
resize_images: list[None | int] | None = None,
random_images: bool = True,
num_images: int = 20,
seed: int | np.random.RandomState | None = None,
tblogger.log(
loss=loss,
current_epoch=i,
write_summary_incumbent=False, # Set to `True` for a live incumbent trajectory.
writer_config_scalar=True, # Set to `True` for a live loss trajectory for each config.
writer_config_hparam=True, # Set to `True` for live parallel coordinate, scatter plot matrix, and table view.

# Name the dictionary keys as the names of the values
# you want to log and pass one of the following functions
# as the values for a successful logging process.
extra_data={
"lr_decay": tblogger.scalar_logging(value=scheduler.get_last_lr()[0]),
"miss_img": tblogger.image_logging(image=miss_img, counter=2, seed=2),
"layer_gradient1": tblogger.scalar_logging(value=mean_gradient[0]),
"layer_gradient2": tblogger.scalar_logging(value=mean_gradient[1]),
},
)
```

- **Parameters:**
- `image` (torch.Tensor): Image tensor to be logged.
- `counter` (int): Log images every counter epochs (i.e., when current_epoch % counter equals 0).
- `resize_images` (list of int, optional): List of integers for image sizes after resizing (default: [32, 32]).
- `random_images` (bool, optional): Images are randomly selected if True (default: True).
- `num_images` (int, optional): Number of images to log (default: 20).
- `seed` (int or np.random.RandomState or None, optional): Seed value or RandomState instance to control randomness and reproducibility (default: None).

### Logging Example

For illustration purposes, we have employed a straightforward example involving the tuning of hyperparameters for a model utilized in the classification of the MNIST dataset provided by [torchvision](https://pytorch.org/vision/main/generated/torchvision.datasets.MNIST.html).
!!! tip

You can find this example [here](../examples/convenience/neps_tblogger_tutorial.md)
The logger function is primarily designed for use within the `run_pipeline` function during the training of the neural network.

!!! info "Important"
??? example "Quick Reference"

We have optimized the example for computational efficiency. If you wish to replicate the exact results showcased in the following section, we recommend the following modifications:
=== "`tblogger.log()`"

1- Increase maximum epochs from 2 to 10
::: neps.plot.tensorboard_eval.tblogger.log

2- Set the `write_summary_incumbent` argument to `True`
=== "`tblogger.scalar_logging()`"

3- Change the searcher from `random_search` to `bayesian_optimization`
::: neps.plot.tensorboard_eval.tblogger.scalar_logging

4- Increase the maximum evaluations before disabling `tblogger` from 2 to 14
=== "`tblogger.image_logging()`"

5- Increase the maximum evaluations after disabling `tblogger` from 3 to 15
::: neps.plot.tensorboard_eval.tblogger.image_logging

### Visualization Results
### Visualizing Results

The following command will open a local host for TensorBoard visualizations, allowing you to view them either in real-time or after the run is complete.

Expand Down Expand Up @@ -174,33 +173,3 @@ The scatter plot matrix view provides an in-depth analysis of pairwise relations
By visualizing correlations and patterns, this view aids in identifying key interactions that may influence the model's performance.

![hparam_loggings3](../doc_images/tensorboard/tblogger_hparam3.jpg)

## Status

To show status information about a neural pipeline search run, use

```bash
python -m neps.status ROOT_DIRECTORY
```

If you need more status information than is printed per default (e.g., the best config over time), please have a look at

```bash
python -m neps.status --help
```

To show the status repeatedly, on unix systems you can use

```bash
watch --interval 30 python -m neps.status ROOT_DIRECTORY
```

## CLI commands

To generate plots to the root directory, run

```bash
python -m neps.plot ROOT_DIRECTORY
```

Currently, this creates one plot that shows the best error value across the number of evaluations.
Loading
Loading