deploy: b97f719

coderefinery · Sep 27, 2023 · dd8f66a · dd8f66a
commit dd8f66a
Show file tree

Hide file tree

Showing 258 changed files with 91,574 additions and 0 deletions.
diff --git a/.buildinfo b/.buildinfo
@@ -0,0 +1,4 @@
+# Sphinx build info version 1
+# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
+config: 1a64e17934d86b014144529848c02f44
+tags: d77d1c0d9ca2f4c8421862c7c5a0d620
diff --git a/.nojekyll b/.nojekyll
diff --git a/_sources/encapsulation.md.txt b/_sources/encapsulation.md.txt
@@ -0,0 +1,23 @@
+# Types of encapsulation
+
+Let's say you want to move a code from one system to another.  What are the things that can go wrong?
+
+- **In-language dependencies**, e.g. Python. Can they all be expressed *only* in
+  ``requirements.txt``?  Do you wrap everything in a container?
+- **Paths**: Can you always use relative paths?
+- **Operating-system (OS) dependencies, libraries, etc.**: Can you eliminate them?
+- **Data files**: Are they controlled or few that you can migrate, or if you wanted
+  to move it, are you forced to copy the whole directory without regard to what
+  the file is (thus creating a lot of duplicates, and a big mess?)
+- Are you using something that is **OS-specific** (GNU/Linux vs BSD)?
+- Do you use support programs only available on certain computers?  Fewer
+  external utilities you use = easier portability.
+
+
+## What needs to be global vs what needs to be local?
+
+- Global data can be "seen"/accessed in the entire code.
+- Local data is only available in the local vicinity of its definition.
+- Try to have as little global data as possible.
+- Global data are often input parameters, configuration parameters, command-line arguments.
+- But try to localize these to the "main" code/function.
diff --git a/_sources/index.rst.txt b/_sources/index.rst.txt
@@ -0,0 +1,44 @@
+Modular code development - Making reusing parts of your code easier
+===================================================================
+
+Type-along/demo where we discuss and experience aspects of (un)modular
+code development. **We will focus on the "why", not on the "how"**.
+
+Image to get started: `The curse of bad code design <https://doi.org/10.1371/journal.pcbi.1008549.g005>`_
+(Ten simple rules for quick and dirty scientific programming. PLoS Comput Biol 17(3): e1008549. https://doi.org/10.1371/journal.pcbi.1008549)
+
+
+Slides
+------
+
+We also have some slides:
+https://github.com/coderefinery/modular-code-development.  But here we will try
+to do this as collaborative type-along where instructors try to solve a problem
+together and where learners guide the instructors through a collaborative
+document.
+
+
+.. toctree::
+   :maxdepth: 1
+   :caption: The lesson
+
+   questions
+   learning-outcomes
+   lesson
+   encapsulation
+
+
+.. toctree::
+   :maxdepth: 1
+   :caption: A solution
+
+   instructor-guide
+
+
+.. toctree::
+   :maxdepth: 1
+   :caption: About
+
+   All lessons <https://coderefinery.org/lessons/core/>
+   CodeRefinery <https://coderefinery.org/>
+   Reusing <https://coderefinery.org/lessons/reusing/>
diff --git a/_sources/instructor-guide.md.txt b/_sources/instructor-guide.md.txt
@@ -0,0 +1,186 @@
+(guide)=
+
+# Instructor guide (spoiler alert!)
+
+
+## Before we start
+
+We **don't have to follow this line by line** but it's important to study
+this example well before demonstrating this.
+
+Emphasize that the example is Python but we will try to see "through"
+the code and **focus on the bigger picture** and hopefully manage to imagine
+other languages in its place.
+
+We **collect ideas and feedback in the collaborative document while coding** and the instructor
+tries to react to that without going into the rabbit hole.
+
+We recommend to go through this together where the instructor(s) demonstrate(s)
+and learners can commend, suggest, and ask questions, and we are either all in
+the same video room or everybody is watching via stream. In other words, for
+this lesson, **learners are not in separate breakout-rooms**.
+
+
+## Checklist
+
+- Start with notebook
+- Generalize from 1 figure to 3 figures
+- Abstract code into functions
+- From functions with side-effects towards stateless functions
+- Move from notebook to script
+- Initialize git
+- Add `requirements.txt`
+- Add test
+- **Add command line interface**
+- Split into multiple files/modules
+
+
+## Our initial version
+
+We imagine that we assemble a working script from various StackOverflow
+recommendations and arrive at:
+
+```{literalinclude} code/initial-version.py
+:language: python
+```
+
+- We test it out **in a notebook**.
+
+
+## We add axis labels
+
+It's not the best placement but it works and later it will bite us (only the
+first plot will have labels) and we will improve it:
+
+```{literalinclude} code/with-axis-labels.py
+:language: python
+:emphasize-lines: 4,5
+```
+
+Once we get this working for 25 measurements, our task changes to also
+plot the first 100 and the first 500 measurements in two additional
+plots.
+
+
+## Plotting also 100 and 500 measurements
+
+- Next idea is perhaps code duplication.
+- Then a for-loop to iterate over `[25, 100, 500]`:
+
+```{literalinclude} code/add-iteration.py
+:language: python
+:emphasize-lines: 7
+```
+
+
+## Abstracting the plotting part into a function
+
+```{literalinclude} code/abstracting-plot.py
+:language: python
+:emphasize-lines: 8-13,26-30
+```
+
+- Discuss what we expect before running it (some will expect this not to work
+  because variables seem undefined).
+- Then try it out (it actually works).
+- Discuss problems with this solution (what if we copy-paste the function to a different file?).
+
+The point of this step was that abstracting code into functions can be really
+good for reusability but just the fact that we created a function does not mean
+that the function is reusable since in this case it depends on a variable
+defined outside the function and hence there are side-effects.
+
+
+## Small improvements
+
+- Abstracting into more functions.
+- Notice how the comments got redundant:
+
+```{literalinclude} code/small-improvements.py
+:language: python
+:emphasize-lines: 27-35
+```
+
+Discuss what would happen if we copy-paste the functions to another project
+(these functions are stateful/time-dependent).
+
+Emphasize how stateful functions and order of execution in Jupyter notebooks
+can produce unexpected results and explain why we motivate to rerun all cells
+before sharing the notebook.
+
+
+## Towards functions without side-effects
+
+Improve to more stateless functions:
+
+```{literalinclude} code/towards-pure.py
+:language: python
+:emphasize-lines: 6,15,20
+```
+
+These functions can now be copy-pasted to a different notebook or project and
+they will still work.
+
+
+## Move from notebook to script
+
+Adding unit tests is often the moment when notebook is not the right fit
+anymore.
+
+But before we add tests:
+- "File" -> "Save and Export Notebook As ..." -> "Executable Script"
+- `git init` and commit the working version.
+- Add `requirements.txt` and motivate how that can be useful to have later.
+
+As we continue from here, **create commits after meaningful changes** and later
+also share the repository with learners.  This nicely connects to other lessons
+of the workshop.
+
+
+## Unit tests
+
+Design code for testing.
+
+- Move the main scope code into a main function.
+- Discuss where to add a test and add a test to the statistics function:
+
+```{literalinclude} code/testing.py
+:language: python
+:emphasize-lines: 3,11,21-23
+```
+
+
+## Command-line interface
+
+- Add a CLI for the input data file, the number of measurements, and the output
+  file name.
+- Example here is using [click](https://click.palletsprojects.com/) but it can
+  equally well be [optparse](https://docs.python.org/3/library/optparse.html),
+  [argparse](https://docs.python.org/3/library/argparse.html),
+  [docopt](http://docopt.org/), or [Typer](https://typer.tiangolo.com/).
+- Discuss the motivations for adding a CLI:
+   - We are able to modify the behavior without changing the code
+   - We can run many of such scripts as part of a workflow
+
+```{literalinclude} code/cli.py
+:language: python
+:emphasize-lines: 4,31-37
+```
+
+
+## Split long script into modules
+
+- Discuss how you would move some functions out and organize them into separate
+  modules which can be imported to other projects: For instance
+  `compute_mean` can be moved to `statistics.py`.
+- Discuss naming.
+- Discuss interface design.
+
+
+## Summarize in the collaborative document
+
+- Now return to initial questions on the collaborative document and discuss questions and comments. If
+  there is time left, there are additional questions and exercises.
+- It is easier and more fun to teach this as a pair with somebody else where
+  one person can type and the other person helps watching the questions and
+  commends and relays them to the co-instructor.
diff --git a/_sources/learning-outcomes.md.txt b/_sources/learning-outcomes.md.txt
@@ -0,0 +1,18 @@
+# Learning outcomes
+
+- Know about **pure functions** (functions without side effects, functions which
+  given same input always return same output).
+- Learn why and how to **limit side effects** of functions.
+- Discuss why and how to limit side effects of data. Also discuss when
+  mutable data may be preferable.
+- [The Zen of Python](https://www.python.org/dev/peps/pep-0020/)
+- Discuss why **single-purpose functions** are often preferred over
+  multi-purpose functions.
+- **Split-apply-combine**, which lets you more easily parallelize. Make your code
+  modular in a way that lets you split the steps and parallelize.
+- Think about **global vs local** data structures. It is not easy to
+  separate them right.
+- Understand how a command line interface to a code can improve usability and also
+  make the code more versatile (to be combined with workflow management tools).
+- Connect modular code development to the remaining lessons (version control, testing,
+  documentation, reusability).
diff --git a/_sources/lesson.md.txt b/_sources/lesson.md.txt
@@ -0,0 +1,62 @@
+# Our task
+
+
+## Data
+
+The file [temperatures.csv](https://github.com/coderefinery/modular-type-along/blob/main/data/temperatures.csv)
+contains hourly air temperature measurements for the time range November 1,
+2019 12:00 AM - November 30, 2019 11:59 PM for the observation station "Vantaa
+Helsinki-Vantaan lentoasema".
+
+Data obtained from
+<https://en.ilmatieteenlaitos.fi/download-observations#!/> on 2019-12-09.
+
+
+## Our initial goal
+
+Our initial goal for this exercise is to plot a series of temperatures
+for **25 measurements** and to compute and plot the **arithmetic mean**. We
+imagine that we assemble a working script from various StackOverflow
+recommendations and arrive at:
+
+```{literalinclude} code/initial-version.py
+:language: python
+```
+
+This example is in Python but we will try to see "through" the code and
+focus on the bigger picture and hopefully manage to imagine other
+languages in its place. For the Python experts: we will not see the most
+elegant Python.
+
+
+## Further goals
+
+- Once we get this working for **25 measurements**, our task changes to also
+  plot the **first 100** and the **first 500 measurements** in two additional
+  plots.
+- Then we wish to generalize the code so that a user can compute and plot this
+  for **any number**, **without changing the code** (with a command line interface).
+
+
+## How we plan to solve it
+
+Before we attempt to do this, we discuss with workshop participants how
+they would tackle this problem.
+
+Together we improve the code based on suggestions from learners towards
+more modularity and re-usability.
+
+```{instructor-note}
+Participants give suggestions and ask questions via collaborative document
+and instructor(s) try to follow and answer. They can also roughly follow
+the ideas and steps in the {ref}`guide`.
+
+It is OK and good if mistakes happen and it is fun if the instructor(s) can
+convey a bit of "improv" feel to this lesson.
+```
+
+
+## Additional exercises
+
+Draw a call tree for one of your recent projects. Identify the
+functions in your call tree which are "pure" (which have no side-effects).
diff --git a/_sources/questions.md.txt b/_sources/questions.md.txt
@@ -0,0 +1,24 @@
+# Starting questions for the collaborative document
+
+We share these questions in a common collaborative document and we
+wait until we have sufficiently many answers to question A. But we also
+encourage answering other questions which we revisit at the end of the
+demo.
+
+```
+A. What does "modular code development" mean for you?
+B. What best practices can you recommend to arrive at well structured,
+   modular code in your favourite programming language?
+C. What do you know now about programming that you wish somebody told you earlier?
+```
+
+
+## Additional questions
+
+```
+D. Do you design a new code project on paper before coding? Discuss pros
+   and cons.
+E. Do you build your code top-down (starting from the big picture) or bottom-up
+   (starting from components)? Discuss pros and cons.
+F. Would you prefer your code to be 2x slower if it was easier to read and understand?
+```