fix spelling mistakes

SkafteNicki · Sep 27, 2023 · 64ea22b · 64ea22b
1 parent 0209c38
commit 64ea22b
Show file tree

Hide file tree

Showing 43 changed files with 109 additions and 103 deletions.
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -20,3 +20,9 @@ repos:
     hooks:
       - id: black
         name: Format code
+
+  - repo: https://github.com/codespell-project/codespell
+    rev: v2.2.5
+    hooks:
+      - id: codespell
+        additional_dependencies: [tomli]
diff --git a/projects.md b/projects.md
@@ -21,7 +21,7 @@ all the awesome packages that exist to extend the functionality of Pytorch. For
 choose between one of three such frameworks which will serve as the basis of your project. The three frameworks are:
 
 * [PyTorch Image Models](https://github.com/rwightman/pytorch-image-models). PyTorch Image Models (also known as TIMM)
-  is the absolutly most used computer vision package (maybe except for `torchvision`). It contains models, scripts and
+  is the absolutely most used computer vision package (maybe except for `torchvision`). It contains models, scripts and
   pre trained for a lot of state-of-the-art image models within computer vision.
 
 * [Transformers](https://github.com/huggingface/transformers). The Transformers repository from the Huggingface group

diff --git a/pyproject.toml b/pyproject.toml
@@ -27,4 +27,4 @@ line-length = 120
 exclude = "(.eggs|.git|.hg|.mypy_cache|.venv|_build|buck-out|build|dist)"
 
 [tool.codespell]
-skip = "*.pdf"
+skip = "*.pdf,*.ipynb"
diff --git a/reports/README.md b/reports/README.md
@@ -144,7 +144,7 @@ end of the project.
 >
 > Example:
 > *We used ... for managing our dependencies. The list of dependencies was auto-generated using ... . To get a*
-> *complete copy of our development enviroment, one would have to run the following commands*
+> *complete copy of our development environment, one would have to run the following commands*
 >
 > Answer:
 

diff --git a/reports/report.py b/reports/report.py
@@ -53,7 +53,7 @@ def check():
 
     answers.append(per_question[-1])
     answers = answers[1:]  # remove first section
-    answers = [ans.strip("\n") for ans in answers]
+    answers = [answer.strip("\n") for answer in answers]
 
     def no_constraints(answer, index):
         pass
@@ -124,8 +124,8 @@ def multi_constrains(answer, index, constrains):
     if len(answers) != 27:
         raise ValueError("Number of answers are different from the expected 27. Have you filled out every field?")
 
-    for i, (ans, const) in enumerate(zip(answers, question_constrains), start=1):
-        const(ans, i)
+    for i, (answer, const) in enumerate(zip(answers, question_constrains), start=1):
+        const(answer, i)
 
 
 if __name__ == "__main__":

diff --git a/s10_extra/exercise_files/fashion_trainer.py b/s10_extra/exercise_files/fashion_trainer.py
@@ -102,7 +102,7 @@ def train_and_test():
 
     for epoch in range(num_epochs):
         for batch_idx, (images, labels) in enumerate(train_loader):
-            # Transfering images and labels to GPU if available
+            # Transferring images and labels to GPU if available
             images, labels = images.to(device), labels.to(device)
 
             # Forward pass

diff --git a/s10_extra/high_performance_clusters.md b/s10_extra/high_performance_clusters.md
@@ -27,7 +27,7 @@ Tier, the larger applications it is possible to run.
 ## Cluster architectures
 
 In very general terms, cluster can come as two different kind of systems: supercomputers and LSF
-(Load Sharing Facility). A supercomputer (as shown below) is organized into different modules, that are seperated by
+(Load Sharing Facility). A supercomputer (as shown below) is organized into different modules, that are separated by
 network link. When you login to a supercomputer you will meet the front end which contains all the software needed to
 run computations. When you submit a job it will get sent to the backend modules which in most cases includes: general
 compute modules (CPU), acceleration modules (GPU), a memory module (RAM) and finally a storage module (HDD). Depending
@@ -36,7 +36,7 @@ important but in physics simulation the general compute module / storage model i
 
 <figure markdown>
   ![Image](../figures/meluxina_overview.png){ width="800" }
-  <figcaption> Overview of the Meluxina supercomputer thats part of EuroHPC.
+  <figcaption> Overview of the Meluxina supercomputer that's part of EuroHPC.
   <a href="https://hpc.uni.lu/old/blog/2019/luxembourg-meluxina-supercomputer-part-of-eurohpc/"> Image credit </a>
   </figcaption>
 </figure>
@@ -48,7 +48,7 @@ better to run on a LSF system if you are only requesting resources that can be h
 is better to run on a supercomputer if you have a resource intensive application that requires many devices to
 communicate with each others.
 
-Regardless of cluster architechtures, on the software side of HPC, the most important part is whats called the
+Regardless of cluster architectures, on the software side of HPC, the most important part is what's called the
 *HPC scheduler*. Without a HPC scheduler an HPC cluster would just be a bunch of servers with different jobs
 interfering with each other. The problem is when you have a large collection of resources and a large collection of
 users, you cannot rely on the users just running their applications without interfering with each other. A HPC scheduler
@@ -113,7 +113,7 @@ of cluster. For the purpose of this exercise we are going to see how we can run
 
         using this [requirements file](https://github.com/SkafteNicki/dtu_mlops/tree/main/s10_extra/exercise_files/image_classifier_requirements.txt).
 
-3. Thats all the setup needed. You would need to go through the creating of environment and installation of requirements
+3. That's all the setup needed. You would need to go through the creating of environment and installation of requirements
     whenever you start a new project (no need for reinstalling conda). For the next step we need to look at how to submit
     jobs on the cluster. We are now ready to submit the our first job to the cluster:
 
@@ -135,7 +135,7 @@ of cluster. For the purpose of this exercise we are going to see how we can run
         bsub < jobscript.sh
         ```
 
-        You can check the status of your script by running the `bstat` command. Hopefully, the job should go trough
+        You can check the status of your script by running the `bstat` command. Hopefully, the job should go through
         really quickly. Take a look at the output file, it should be called something like `gpu_*.out`. Also take a
         look at the `gpu_*.err` file. Does both files look as they should?
 
@@ -173,7 +173,7 @@ of cluster. For the purpose of this exercise we are going to see how we can run
             --trainer.accelerator 'gpu' --trainer.devices 1  --trainer.max_epochs 5
         ```
 
-        which will run the image classifier script (change it if you are runnning something else).
+        which will run the image classifier script (change it if you are running something else).
 
     3. Finally submit the job:
 

diff --git a/s10_extra/hyperparameters.md b/s10_extra/hyperparameters.md
@@ -115,7 +115,7 @@ rest to a "recommended value".
 
     4. If implemented correctly the number of hyperparameter combinations should be at least 1000, meaning that
         we not only need baysian optimization but probably also need pruning to succeed. Checkout the page for
-        [build-in pruners](https://optuna.readthedocs.io/en/stable/reference/pruners.html) in Optuna. Implement
+        [built-in pruners](https://optuna.readthedocs.io/en/stable/reference/pruners.html) in Optuna. Implement
         pruning in the script. I recommend using either the `MedianPruner` or the `ProcentilePruner`.
 
     5. Re-run the study using pruning with a large number of trials (`n_trials>50`)
@@ -182,6 +182,6 @@ rest to a "recommended value".
 
     6. Finally, make sure that you can access the results
 
-Thats all on how to do hyperparameter optimization in a scalable way. If you feel like it you can try to apply these
+That's all on how to do hyperparameter optimization in a scalable way. If you feel like it you can try to apply these
 techniques on the ongoing corrupted MNIST example, where you are free to choose what hyperparameters that you want
 to use.
diff --git a/s10_extra/kubernetes.md b/s10_extra/kubernetes.md
@@ -7,7 +7,7 @@
 !!! danger
     Module is still under development
 
-## Kubernetes architechture
+## Kubernetes architecture
 
 <figure markdown>
   ![Image](../figures/components_of_kubernetes.png){ width="800" }

diff --git a/s10_extra/onnx.md b/s10_extra/onnx.md
@@ -15,7 +15,7 @@ that datapoint. At a high-level, model predictions depends on three things:
 
 * The codebase that implements the models prediction method
 * The model weights which contains an actual instance of the model
-* Code dependencies nessesary for running the codebase.
+* Code dependencies necessary for running the codebase.
 
 We have already in module [M9 on Docker](../s3_reproducibility/docker.md) touch on how to take care of all
 these things. Containers makes it easy to link a codebase, model weights and code dependencies into a single object.

diff --git a/s1_development_environment/command_line.md b/s1_development_environment/command_line.md
@@ -35,7 +35,7 @@ As already stated, it is essentially just a big text interface to interact with
 when trying to execute a command, there are several parts to it:
 
 1. The **prompt** is the part where you type your commands. It usually contains the name of the current directory you
-    are in, followed by some kind of sign: `$`, `>`, `:` are the usual onces. It can also contain other information,
+    are in, followed by some kind of sign: `$`, `>`, `:` are the usual ones. It can also contain other information,
     such as in the case of the above image it is also showing the current `conda` environment.
 2. The **command** is the actual command you want to execute. For example, `ls` or `cd`
 3. The **options** are additional arguments that you can pass to the command. For example, `ls -l` or `cd ..`.

diff --git a/s1_development_environment/conda.md b/s1_development_environment/conda.md
@@ -1,6 +1,6 @@
 ![Logo](../figures/icons/conda.png){ align=right width="130"}
 
-# Conda and virtual enviroments
+# Conda and virtual environments
 
 ---
 

diff --git a/s1_development_environment/deep_learning_software.md b/s1_development_environment/deep_learning_software.md
@@ -153,7 +153,7 @@ corrupted version of regular mnist. Your overall task is the following:
 > **Implement a mnist neural network that achieves at least 85 % accuracy on the test set.**
 
 Before any training can start, you should identify what corruption that we have applied to the mnist dataset to
-create the corrupted version. This should give you a clue about what network architechture to use.
+create the corrupted version. This should give you a clue about what network architecture to use.
 
 One key point of this course is trying to stay organized. Spending time now organizing your code, will save time
 in the future as you start to add more and more features. As subgoals, please fulfill the following exercises
@@ -177,7 +177,7 @@ To start you off, a very barebone version of each script is provided in the `fin
 implemented some logic, especially to make sure you can easily run different subcommands in for step 4. If you are
 interested in how this is done you can checkout this optional module on defining
 [command line interfaces (CLI)](../s10_extra/cli.md). We additionally also provide an `requirements.py` with
-suggestion to what packages are nessesary to complete the exercise.
+suggestion to what packages are necessary to complete the exercise.
 
 \
 As documentation that your model is actually working, when running in the `train` command the script needs to

diff --git a/s1_development_environment/editor.md b/s1_development_environment/editor.md
@@ -40,7 +40,7 @@ The main components of VS code are:
 * The side bar: The side bar has different functionality depending on what extension that you have open.
     In most cases, the side bar will just contain the file explorer.
 
-* The editor: This where you code is. VS code supports a number of layouts in the editor (one column, two column ect.).
+* The editor: This where you code is. VS code supports a number of layouts in the editor (one column, two column etc.).
     You can make a custom layout by dragging a file to where you want the layout to split.
 
 * The panel: The panel contains a terminal for you to interact with. This can quickly be used to try out code by
@@ -77,10 +77,10 @@ following exercises are just to get you started but you can find many more tutor
     which indicates that you are using the stock python installation, instead of the one you have created using `conda`.
     Click it and change the python environment to the one you actually want to use.
 
-3. One of the most useful tools in VSCode is the ability to navigate a hole project using the build-in
+3. One of the most useful tools in VSCode is the ability to navigate a hole project using the built-in
     `Explorer`. To really take advantage of the VS code you need to make sure what you are working on is a project.
     Create a folder called `hello` (somewhere on your laptop) and open it in VScode (Click `File` in the menu and then
-    select `Open Folder`). You should end up with a completly clean workspace (as shown below). Click the `New file`
+    select `Open Folder`). You should end up with a completely clean workspace (as shown below). Click the `New file`
     button and create a file called `hello.py`.
 
     <figure markdown>
@@ -102,7 +102,7 @@ following exercises are just to get you started but you can find many more tutor
     * Select some code and right click, choosing to run in a interactive window (where you can interact with the results
         like in a jupyter notebook)
 
-Thats, the basic of using VScode. We recommend highly that you revisit
+That's, the basic of using VScode. We recommend highly that you revisit
 [this tutorial](https://code.visualstudio.com/docs/python/python-tutorial) during the course when we get to topics such
 as debugging and version control which VScode can help with.
 

diff --git a/s1_development_environment/exercise_files/1_Tensors_in_PyTorch.ipynb b/s1_development_environment/exercise_files/1_Tensors_in_PyTorch.ipynb
@@ -154,7 +154,7 @@
     "* `weights.reshape(a, b)` will return a new tensor with the same data as `weights` with size `(a, b)` sometimes, and sometimes a clone, as in it copies the data to another part of memory.\n",
     "* `weights.resize_(a, b)` returns the same tensor with a different shape. However, if the new shape results in fewer elements than the original tensor, some elements will be removed from the tensor (but not from memory). If the new shape results in more elements than the original tensor, new elements will be uninitialized in memory. Here I should note that the underscore at the end of the method denotes that this method is performed **in-place**. Here is a great forum thread to [read more about in-place operations](https://discuss.pytorch.org/t/what-is-in-place-operation/16244) in PyTorch.\n",
     "* `weights.view(a, b)` will return a new tensor with the same data as `weights` with size `(a, b)`.\n",
-    "* `torch.transpose(weights,0,1)` will return transposed weights tensor. This returns transposed version of inpjut tensor along dim 0 and dim 1. This is efficient since we do not specify to actual dimesions of weights.\n",
+    "* `torch.transpose(weights,0,1)` will return transposed weights tensor. This returns transposed version of inpjut tensor along dim 0 and dim 1. This is efficient since we do not specify to actual dimensions of weights.\n",
     "\n",
     "I usually use `.view()`, but any of the three methods will work for this. So, now we can reshape `weights` to have five rows and one column with something like `weights.view(5, 1)`.\n",
     "\n",

diff --git a/s1_development_environment/exercise_files/2_Neural_Networks_in_PyTorch.ipynb b/s1_development_environment/exercise_files/2_Neural_Networks_in_PyTorch.ipynb
@@ -215,7 +215,7 @@
     "\\Large \\sigma(x_i) = \\cfrac{e^{x_i}}{\\sum_k^K{e^{x_k}}}\n",
     "$$\n",
     "\n",
-    "What this does is squish each input $x_i$ between 0 and 1 and normalizes the values to give you a proper probability distribution where the probabilites sum up to one.\n",
+    "What this does is squish each input $x_i$ between 0 and 1 and normalizes the values to give you a proper probability distribution where the probabilities sum up to one.\n",
     "\n",
     "> **Exercise:** Implement a function `softmax` that performs the softmax calculation and returns probability distributions for each example in the batch. Note that you'll need to pay attention to the shapes when doing this. If you have a tensor `a` with shape `(64, 10)` and a tensor `b` with shape `(64,)`, doing `a/b` will give you an error because PyTorch will try to do the division across the columns (called broadcasting) but you'll get a size mismatch. The way to think about this is for each of the 64 examples, you only want to divide by one value, the sum in the denominator. So you need `b` to have a shape of `(64, 1)`. This way PyTorch will divide the 10 values in each row of `a` by the one value in each row of `b`. Pay attention to how you take the sum as well. You'll need to define the `dim` keyword in `torch.sum`. Setting `dim=0` takes the sum across the rows while `dim=1` takes the sum across the columns."
    ]
@@ -240,7 +240,7 @@
     "def softmax(x):\n",
     "    ## TODO: Implement the softmax function here\n",
     "\n",
-    "# Here, out should be the output of the network in the previous excercise with shape (64,10)\n",
+    "# Here, out should be the output of the network in the previous exercise with shape (64,10)\n",
     "probabilities = softmax(out)\n",
     "\n",
     "# Does it have the right shape? Should be (64, 10)\n",

diff --git a/s1_development_environment/exercise_files/5_Inference_and_Validation.ipynb b/s1_development_environment/exercise_files/5_Inference_and_Validation.ipynb
@@ -388,7 +388,7 @@
    "source": [
     "## Next Up!\n",
     "\n",
-    "In the next part, I'll show you how to save your trained models. In general, you won't want to train a model everytime you need it. Instead, you'll train once, save it, then load the model when you want to train more or use if for inference."
+    "In the next part, I'll show you how to save your trained models. In general, you won't want to train a model every time you need it. Instead, you'll train once, save it, then load the model when you want to train more or use if for inference."
    ]
   }
  ],

diff --git a/s1_development_environment/exercise_files/6_Saving_and_Loading_Models.ipynb b/s1_development_environment/exercise_files/6_Saving_and_Loading_Models.ipynb
@@ -206,7 +206,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "This means we need to rebuild the model exactly as it was when trained. Information about the model architecture needs to be saved in the checkpoint, along with the state dict. To do this, you build a dictionary with all the information you need to compeletely rebuild the model."
+    "This means we need to rebuild the model exactly as it was when trained. Information about the model architecture needs to be saved in the checkpoint, along with the state dict. To do this, you build a dictionary with all the information you need to completely rebuild the model."
    ]
   },
   {

diff --git a/s1_development_environment/exercise_files/fc_model.py b/s1_development_environment/exercise_files/fc_model.py
@@ -43,7 +43,7 @@ def validation(model, testloader, criterion):
     Arguments:
         model: torch network
         testloader: torch.utils.data.DataLoader, dataloader of test set
-        criterion: loss funtion
+        criterion: loss function
     """
     accuracy = 0
     test_loss = 0

diff --git a/s2_organisation_and_version_control/code_structure.md b/s2_organisation_and_version_control/code_structure.md
@@ -58,7 +58,7 @@ run a file I recommend always doing this from the root directory e.g.
 ```bash
 python src/data/make_dataset.py data/raw data/processed
 python src/models/train_model.py <arguments>
-ect...
+etc...
 ```
 
 in this way paths (for saving and loading files) are always relative to the root.
@@ -157,7 +157,7 @@ in this way paths (for saving and loading files) are always relative to the root
 
 That ends the module on code structure and `cookiecutter`. We again want to stress the point that `cookiecutter` is
 just one template for organizing your code. What often happens in a team is that multiple templates are needed in
-different stages of the development phase or for different product types because they share commen structure, while
+different stages of the development phase or for different product types because they share common structure, while
 still having some specifics. Keeping templates up-to-date then becomes critical such that no team member is using an
 outdated template. If you ever end up in this situation, we highly recommend to checkout
 [cruft](https://github.com/cruft/cruft) that works alongside `cookiecutter` to not only make projects but update