Skip to content

Commit

Permalink
fix spelling mistakes
Browse files Browse the repository at this point in the history
  • Loading branch information
SkafteNicki committed Sep 27, 2023
1 parent 0209c38 commit 64ea22b
Show file tree
Hide file tree
Showing 43 changed files with 109 additions and 103 deletions.
6 changes: 6 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,9 @@ repos:
hooks:
- id: black
name: Format code

- repo: https://github.com/codespell-project/codespell
rev: v2.2.5
hooks:
- id: codespell
additional_dependencies: [tomli]
2 changes: 1 addition & 1 deletion projects.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ all the awesome packages that exist to extend the functionality of Pytorch. For
choose between one of three such frameworks which will serve as the basis of your project. The three frameworks are:

* [PyTorch Image Models](https://github.com/rwightman/pytorch-image-models). PyTorch Image Models (also known as TIMM)
is the absolutly most used computer vision package (maybe except for `torchvision`). It contains models, scripts and
is the absolutely most used computer vision package (maybe except for `torchvision`). It contains models, scripts and
pre trained for a lot of state-of-the-art image models within computer vision.

* [Transformers](https://github.com/huggingface/transformers). The Transformers repository from the Huggingface group
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -27,4 +27,4 @@ line-length = 120
exclude = "(.eggs|.git|.hg|.mypy_cache|.venv|_build|buck-out|build|dist)"

[tool.codespell]
skip = "*.pdf"
skip = "*.pdf,*.ipynb"
2 changes: 1 addition & 1 deletion reports/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@ end of the project.
>
> Example:
> *We used ... for managing our dependencies. The list of dependencies was auto-generated using ... . To get a*
> *complete copy of our development enviroment, one would have to run the following commands*
> *complete copy of our development environment, one would have to run the following commands*
>
> Answer:
Expand Down
6 changes: 3 additions & 3 deletions reports/report.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ def check():

answers.append(per_question[-1])
answers = answers[1:] # remove first section
answers = [ans.strip("\n") for ans in answers]
answers = [answer.strip("\n") for answer in answers]

def no_constraints(answer, index):
pass
Expand Down Expand Up @@ -124,8 +124,8 @@ def multi_constrains(answer, index, constrains):
if len(answers) != 27:
raise ValueError("Number of answers are different from the expected 27. Have you filled out every field?")

for i, (ans, const) in enumerate(zip(answers, question_constrains), start=1):
const(ans, i)
for i, (answer, const) in enumerate(zip(answers, question_constrains), start=1):
const(answer, i)


if __name__ == "__main__":
Expand Down
2 changes: 1 addition & 1 deletion s10_extra/exercise_files/fashion_trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ def train_and_test():

for epoch in range(num_epochs):
for batch_idx, (images, labels) in enumerate(train_loader):
# Transfering images and labels to GPU if available
# Transferring images and labels to GPU if available
images, labels = images.to(device), labels.to(device)

# Forward pass
Expand Down
12 changes: 6 additions & 6 deletions s10_extra/high_performance_clusters.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Tier, the larger applications it is possible to run.
## Cluster architectures

In very general terms, cluster can come as two different kind of systems: supercomputers and LSF
(Load Sharing Facility). A supercomputer (as shown below) is organized into different modules, that are seperated by
(Load Sharing Facility). A supercomputer (as shown below) is organized into different modules, that are separated by
network link. When you login to a supercomputer you will meet the front end which contains all the software needed to
run computations. When you submit a job it will get sent to the backend modules which in most cases includes: general
compute modules (CPU), acceleration modules (GPU), a memory module (RAM) and finally a storage module (HDD). Depending
Expand All @@ -36,7 +36,7 @@ important but in physics simulation the general compute module / storage model i

<figure markdown>
![Image](../figures/meluxina_overview.png){ width="800" }
<figcaption> Overview of the Meluxina supercomputer thats part of EuroHPC.
<figcaption> Overview of the Meluxina supercomputer that's part of EuroHPC.
<a href="https://hpc.uni.lu/old/blog/2019/luxembourg-meluxina-supercomputer-part-of-eurohpc/"> Image credit </a>
</figcaption>
</figure>
Expand All @@ -48,7 +48,7 @@ better to run on a LSF system if you are only requesting resources that can be h
is better to run on a supercomputer if you have a resource intensive application that requires many devices to
communicate with each others.

Regardless of cluster architechtures, on the software side of HPC, the most important part is whats called the
Regardless of cluster architectures, on the software side of HPC, the most important part is what's called the
*HPC scheduler*. Without a HPC scheduler an HPC cluster would just be a bunch of servers with different jobs
interfering with each other. The problem is when you have a large collection of resources and a large collection of
users, you cannot rely on the users just running their applications without interfering with each other. A HPC scheduler
Expand Down Expand Up @@ -113,7 +113,7 @@ of cluster. For the purpose of this exercise we are going to see how we can run

using this [requirements file](https://github.com/SkafteNicki/dtu_mlops/tree/main/s10_extra/exercise_files/image_classifier_requirements.txt).

3. Thats all the setup needed. You would need to go through the creating of environment and installation of requirements
3. That's all the setup needed. You would need to go through the creating of environment and installation of requirements
whenever you start a new project (no need for reinstalling conda). For the next step we need to look at how to submit
jobs on the cluster. We are now ready to submit the our first job to the cluster:
Expand All @@ -135,7 +135,7 @@ of cluster. For the purpose of this exercise we are going to see how we can run
bsub < jobscript.sh
```
You can check the status of your script by running the `bstat` command. Hopefully, the job should go trough
You can check the status of your script by running the `bstat` command. Hopefully, the job should go through
really quickly. Take a look at the output file, it should be called something like `gpu_*.out`. Also take a
look at the `gpu_*.err` file. Does both files look as they should?
Expand Down Expand Up @@ -173,7 +173,7 @@ of cluster. For the purpose of this exercise we are going to see how we can run
--trainer.accelerator 'gpu' --trainer.devices 1 --trainer.max_epochs 5
```
which will run the image classifier script (change it if you are runnning something else).
which will run the image classifier script (change it if you are running something else).
3. Finally submit the job:
Expand Down
4 changes: 2 additions & 2 deletions s10_extra/hyperparameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ rest to a "recommended value".

4. If implemented correctly the number of hyperparameter combinations should be at least 1000, meaning that
we not only need baysian optimization but probably also need pruning to succeed. Checkout the page for
[build-in pruners](https://optuna.readthedocs.io/en/stable/reference/pruners.html) in Optuna. Implement
[built-in pruners](https://optuna.readthedocs.io/en/stable/reference/pruners.html) in Optuna. Implement
pruning in the script. I recommend using either the `MedianPruner` or the `ProcentilePruner`.

5. Re-run the study using pruning with a large number of trials (`n_trials>50`)
Expand Down Expand Up @@ -182,6 +182,6 @@ rest to a "recommended value".

6. Finally, make sure that you can access the results

Thats all on how to do hyperparameter optimization in a scalable way. If you feel like it you can try to apply these
That's all on how to do hyperparameter optimization in a scalable way. If you feel like it you can try to apply these
techniques on the ongoing corrupted MNIST example, where you are free to choose what hyperparameters that you want
to use.
2 changes: 1 addition & 1 deletion s10_extra/kubernetes.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
!!! danger
Module is still under development

## Kubernetes architechture
## Kubernetes architecture

<figure markdown>
![Image](../figures/components_of_kubernetes.png){ width="800" }
Expand Down
2 changes: 1 addition & 1 deletion s10_extra/onnx.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ that datapoint. At a high-level, model predictions depends on three things:

* The codebase that implements the models prediction method
* The model weights which contains an actual instance of the model
* Code dependencies nessesary for running the codebase.
* Code dependencies necessary for running the codebase.

We have already in module [M9 on Docker](../s3_reproducibility/docker.md) touch on how to take care of all
these things. Containers makes it easy to link a codebase, model weights and code dependencies into a single object.
Expand Down
2 changes: 1 addition & 1 deletion s1_development_environment/command_line.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ As already stated, it is essentially just a big text interface to interact with
when trying to execute a command, there are several parts to it:

1. The **prompt** is the part where you type your commands. It usually contains the name of the current directory you
are in, followed by some kind of sign: `$`, `>`, `:` are the usual onces. It can also contain other information,
are in, followed by some kind of sign: `$`, `>`, `:` are the usual ones. It can also contain other information,
such as in the case of the above image it is also showing the current `conda` environment.
2. The **command** is the actual command you want to execute. For example, `ls` or `cd`
3. The **options** are additional arguments that you can pass to the command. For example, `ls -l` or `cd ..`.
Expand Down
2 changes: 1 addition & 1 deletion s1_development_environment/conda.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
![Logo](../figures/icons/conda.png){ align=right width="130"}

# Conda and virtual enviroments
# Conda and virtual environments

---

Expand Down
4 changes: 2 additions & 2 deletions s1_development_environment/deep_learning_software.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,7 @@ corrupted version of regular mnist. Your overall task is the following:
> **Implement a mnist neural network that achieves at least 85 % accuracy on the test set.**

Before any training can start, you should identify what corruption that we have applied to the mnist dataset to
create the corrupted version. This should give you a clue about what network architechture to use.
create the corrupted version. This should give you a clue about what network architecture to use.

One key point of this course is trying to stay organized. Spending time now organizing your code, will save time
in the future as you start to add more and more features. As subgoals, please fulfill the following exercises
Expand All @@ -177,7 +177,7 @@ To start you off, a very barebone version of each script is provided in the `fin
implemented some logic, especially to make sure you can easily run different subcommands in for step 4. If you are
interested in how this is done you can checkout this optional module on defining
[command line interfaces (CLI)](../s10_extra/cli.md). We additionally also provide an `requirements.py` with
suggestion to what packages are nessesary to complete the exercise.
suggestion to what packages are necessary to complete the exercise.

\
As documentation that your model is actually working, when running in the `train` command the script needs to
Expand Down
8 changes: 4 additions & 4 deletions s1_development_environment/editor.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ The main components of VS code are:
* The side bar: The side bar has different functionality depending on what extension that you have open.
In most cases, the side bar will just contain the file explorer.

* The editor: This where you code is. VS code supports a number of layouts in the editor (one column, two column ect.).
* The editor: This where you code is. VS code supports a number of layouts in the editor (one column, two column etc.).
You can make a custom layout by dragging a file to where you want the layout to split.

* The panel: The panel contains a terminal for you to interact with. This can quickly be used to try out code by
Expand Down Expand Up @@ -77,10 +77,10 @@ following exercises are just to get you started but you can find many more tutor
which indicates that you are using the stock python installation, instead of the one you have created using `conda`.
Click it and change the python environment to the one you actually want to use.

3. One of the most useful tools in VSCode is the ability to navigate a hole project using the build-in
3. One of the most useful tools in VSCode is the ability to navigate a hole project using the built-in
`Explorer`. To really take advantage of the VS code you need to make sure what you are working on is a project.
Create a folder called `hello` (somewhere on your laptop) and open it in VScode (Click `File` in the menu and then
select `Open Folder`). You should end up with a completly clean workspace (as shown below). Click the `New file`
select `Open Folder`). You should end up with a completely clean workspace (as shown below). Click the `New file`
button and create a file called `hello.py`.

<figure markdown>
Expand All @@ -102,7 +102,7 @@ following exercises are just to get you started but you can find many more tutor
* Select some code and right click, choosing to run in a interactive window (where you can interact with the results
like in a jupyter notebook)

Thats, the basic of using VScode. We recommend highly that you revisit
That's, the basic of using VScode. We recommend highly that you revisit
[this tutorial](https://code.visualstudio.com/docs/python/python-tutorial) during the course when we get to topics such
as debugging and version control which VScode can help with.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@
"* `weights.reshape(a, b)` will return a new tensor with the same data as `weights` with size `(a, b)` sometimes, and sometimes a clone, as in it copies the data to another part of memory.\n",
"* `weights.resize_(a, b)` returns the same tensor with a different shape. However, if the new shape results in fewer elements than the original tensor, some elements will be removed from the tensor (but not from memory). If the new shape results in more elements than the original tensor, new elements will be uninitialized in memory. Here I should note that the underscore at the end of the method denotes that this method is performed **in-place**. Here is a great forum thread to [read more about in-place operations](https://discuss.pytorch.org/t/what-is-in-place-operation/16244) in PyTorch.\n",
"* `weights.view(a, b)` will return a new tensor with the same data as `weights` with size `(a, b)`.\n",
"* `torch.transpose(weights,0,1)` will return transposed weights tensor. This returns transposed version of inpjut tensor along dim 0 and dim 1. This is efficient since we do not specify to actual dimesions of weights.\n",
"* `torch.transpose(weights,0,1)` will return transposed weights tensor. This returns transposed version of inpjut tensor along dim 0 and dim 1. This is efficient since we do not specify to actual dimensions of weights.\n",
"\n",
"I usually use `.view()`, but any of the three methods will work for this. So, now we can reshape `weights` to have five rows and one column with something like `weights.view(5, 1)`.\n",
"\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -215,7 +215,7 @@
"\\Large \\sigma(x_i) = \\cfrac{e^{x_i}}{\\sum_k^K{e^{x_k}}}\n",
"$$\n",
"\n",
"What this does is squish each input $x_i$ between 0 and 1 and normalizes the values to give you a proper probability distribution where the probabilites sum up to one.\n",
"What this does is squish each input $x_i$ between 0 and 1 and normalizes the values to give you a proper probability distribution where the probabilities sum up to one.\n",
"\n",
"> **Exercise:** Implement a function `softmax` that performs the softmax calculation and returns probability distributions for each example in the batch. Note that you'll need to pay attention to the shapes when doing this. If you have a tensor `a` with shape `(64, 10)` and a tensor `b` with shape `(64,)`, doing `a/b` will give you an error because PyTorch will try to do the division across the columns (called broadcasting) but you'll get a size mismatch. The way to think about this is for each of the 64 examples, you only want to divide by one value, the sum in the denominator. So you need `b` to have a shape of `(64, 1)`. This way PyTorch will divide the 10 values in each row of `a` by the one value in each row of `b`. Pay attention to how you take the sum as well. You'll need to define the `dim` keyword in `torch.sum`. Setting `dim=0` takes the sum across the rows while `dim=1` takes the sum across the columns."
]
Expand All @@ -240,7 +240,7 @@
"def softmax(x):\n",
" ## TODO: Implement the softmax function here\n",
"\n",
"# Here, out should be the output of the network in the previous excercise with shape (64,10)\n",
"# Here, out should be the output of the network in the previous exercise with shape (64,10)\n",
"probabilities = softmax(out)\n",
"\n",
"# Does it have the right shape? Should be (64, 10)\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -388,7 +388,7 @@
"source": [
"## Next Up!\n",
"\n",
"In the next part, I'll show you how to save your trained models. In general, you won't want to train a model everytime you need it. Instead, you'll train once, save it, then load the model when you want to train more or use if for inference."
"In the next part, I'll show you how to save your trained models. In general, you won't want to train a model every time you need it. Instead, you'll train once, save it, then load the model when you want to train more or use if for inference."
]
}
],
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -206,7 +206,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"This means we need to rebuild the model exactly as it was when trained. Information about the model architecture needs to be saved in the checkpoint, along with the state dict. To do this, you build a dictionary with all the information you need to compeletely rebuild the model."
"This means we need to rebuild the model exactly as it was when trained. Information about the model architecture needs to be saved in the checkpoint, along with the state dict. To do this, you build a dictionary with all the information you need to completely rebuild the model."
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion s1_development_environment/exercise_files/fc_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ def validation(model, testloader, criterion):
Arguments:
model: torch network
testloader: torch.utils.data.DataLoader, dataloader of test set
criterion: loss funtion
criterion: loss function
"""
accuracy = 0
test_loss = 0
Expand Down
4 changes: 2 additions & 2 deletions s2_organisation_and_version_control/code_structure.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ run a file I recommend always doing this from the root directory e.g.
```bash
python src/data/make_dataset.py data/raw data/processed
python src/models/train_model.py <arguments>
ect...
etc...
```

in this way paths (for saving and loading files) are always relative to the root.
Expand Down Expand Up @@ -157,7 +157,7 @@ in this way paths (for saving and loading files) are always relative to the root

That ends the module on code structure and `cookiecutter`. We again want to stress the point that `cookiecutter` is
just one template for organizing your code. What often happens in a team is that multiple templates are needed in
different stages of the development phase or for different product types because they share commen structure, while
different stages of the development phase or for different product types because they share common structure, while
still having some specifics. Keeping templates up-to-date then becomes critical such that no team member is using an
outdated template. If you ever end up in this situation, we highly recommend to checkout
[cruft](https://github.com/cruft/cruft) that works alongside `cookiecutter` to not only make projects but update
Expand Down
Loading

0 comments on commit 64ea22b

Please sign in to comment.