Skip to content

Commit

Permalink
deploy: b97f719
Browse files Browse the repository at this point in the history
  • Loading branch information
bast committed Sep 27, 2023
0 parents commit dd8f66a
Show file tree
Hide file tree
Showing 258 changed files with 91,574 additions and 0 deletions.
4 changes: 4 additions & 0 deletions .buildinfo
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: 1a64e17934d86b014144529848c02f44
tags: d77d1c0d9ca2f4c8421862c7c5a0d620
Empty file added .nojekyll
Empty file.
23 changes: 23 additions & 0 deletions _sources/encapsulation.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Types of encapsulation

Let's say you want to move a code from one system to another. What are the things that can go wrong?

- **In-language dependencies**, e.g. Python. Can they all be expressed *only* in
``requirements.txt``? Do you wrap everything in a container?
- **Paths**: Can you always use relative paths?
- **Operating-system (OS) dependencies, libraries, etc.**: Can you eliminate them?
- **Data files**: Are they controlled or few that you can migrate, or if you wanted
to move it, are you forced to copy the whole directory without regard to what
the file is (thus creating a lot of duplicates, and a big mess?)
- Are you using something that is **OS-specific** (GNU/Linux vs BSD)?
- Do you use support programs only available on certain computers? Fewer
external utilities you use = easier portability.


## What needs to be global vs what needs to be local?

- Global data can be "seen"/accessed in the entire code.
- Local data is only available in the local vicinity of its definition.
- Try to have as little global data as possible.
- Global data are often input parameters, configuration parameters, command-line arguments.
- But try to localize these to the "main" code/function.
44 changes: 44 additions & 0 deletions _sources/index.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
Modular code development - Making reusing parts of your code easier
===================================================================

Type-along/demo where we discuss and experience aspects of (un)modular
code development. **We will focus on the "why", not on the "how"**.

Image to get started: `The curse of bad code design <https://doi.org/10.1371/journal.pcbi.1008549.g005>`_
(Ten simple rules for quick and dirty scientific programming. PLoS Comput Biol 17(3): e1008549. https://doi.org/10.1371/journal.pcbi.1008549)


Slides
------

We also have some slides:
https://github.com/coderefinery/modular-code-development. But here we will try
to do this as collaborative type-along where instructors try to solve a problem
together and where learners guide the instructors through a collaborative
document.


.. toctree::
:maxdepth: 1
:caption: The lesson

questions
learning-outcomes
lesson
encapsulation


.. toctree::
:maxdepth: 1
:caption: A solution

instructor-guide


.. toctree::
:maxdepth: 1
:caption: About

All lessons <https://coderefinery.org/lessons/core/>
CodeRefinery <https://coderefinery.org/>
Reusing <https://coderefinery.org/lessons/reusing/>
186 changes: 186 additions & 0 deletions _sources/instructor-guide.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,186 @@
(guide)=

# Instructor guide (spoiler alert!)


## Before we start

We **don't have to follow this line by line** but it's important to study
this example well before demonstrating this.

Emphasize that the example is Python but we will try to see "through"
the code and **focus on the bigger picture** and hopefully manage to imagine
other languages in its place.

We **collect ideas and feedback in the collaborative document while coding** and the instructor
tries to react to that without going into the rabbit hole.

We recommend to go through this together where the instructor(s) demonstrate(s)
and learners can commend, suggest, and ask questions, and we are either all in
the same video room or everybody is watching via stream. In other words, for
this lesson, **learners are not in separate breakout-rooms**.


## Checklist

- Start with notebook
- Generalize from 1 figure to 3 figures
- Abstract code into functions
- From functions with side-effects towards stateless functions
- Move from notebook to script
- Initialize git
- Add `requirements.txt`
- Add test
- **Add command line interface**
- Split into multiple files/modules


## Our initial version

We imagine that we assemble a working script from various StackOverflow
recommendations and arrive at:

```{literalinclude} code/initial-version.py
:language: python
```

- We test it out **in a notebook**.


## We add axis labels

It's not the best placement but it works and later it will bite us (only the
first plot will have labels) and we will improve it:

```{literalinclude} code/with-axis-labels.py
:language: python
:emphasize-lines: 4,5
```

Once we get this working for 25 measurements, our task changes to also
plot the first 100 and the first 500 measurements in two additional
plots.


## Plotting also 100 and 500 measurements

- Next idea is perhaps code duplication.
- Then a for-loop to iterate over `[25, 100, 500]`:

```{literalinclude} code/add-iteration.py
:language: python
:emphasize-lines: 7
```


## Abstracting the plotting part into a function

```{literalinclude} code/abstracting-plot.py
:language: python
:emphasize-lines: 8-13,26-30
```

- Discuss what we expect before running it (some will expect this not to work
because variables seem undefined).
- Then try it out (it actually works).
- Discuss problems with this solution (what if we copy-paste the function to a different file?).

The point of this step was that abstracting code into functions can be really
good for reusability but just the fact that we created a function does not mean
that the function is reusable since in this case it depends on a variable
defined outside the function and hence there are side-effects.


## Small improvements

- Abstracting into more functions.
- Notice how the comments got redundant:

```{literalinclude} code/small-improvements.py
:language: python
:emphasize-lines: 27-35
```

Discuss what would happen if we copy-paste the functions to another project
(these functions are stateful/time-dependent).

Emphasize how stateful functions and order of execution in Jupyter notebooks
can produce unexpected results and explain why we motivate to rerun all cells
before sharing the notebook.


## Towards functions without side-effects

Improve to more stateless functions:

```{literalinclude} code/towards-pure.py
:language: python
:emphasize-lines: 6,15,20
```

These functions can now be copy-pasted to a different notebook or project and
they will still work.


## Move from notebook to script

Adding unit tests is often the moment when notebook is not the right fit
anymore.

But before we add tests:
- "File" -> "Save and Export Notebook As ..." -> "Executable Script"
- `git init` and commit the working version.
- Add `requirements.txt` and motivate how that can be useful to have later.

As we continue from here, **create commits after meaningful changes** and later
also share the repository with learners. This nicely connects to other lessons
of the workshop.


## Unit tests

Design code for testing.

- Move the main scope code into a main function.
- Discuss where to add a test and add a test to the statistics function:

```{literalinclude} code/testing.py
:language: python
:emphasize-lines: 3,11,21-23
```


## Command-line interface

- Add a CLI for the input data file, the number of measurements, and the output
file name.
- Example here is using [click](https://click.palletsprojects.com/) but it can
equally well be [optparse](https://docs.python.org/3/library/optparse.html),
[argparse](https://docs.python.org/3/library/argparse.html),
[docopt](http://docopt.org/), or [Typer](https://typer.tiangolo.com/).
- Discuss the motivations for adding a CLI:
- We are able to modify the behavior without changing the code
- We can run many of such scripts as part of a workflow

```{literalinclude} code/cli.py
:language: python
:emphasize-lines: 4,31-37
```


## Split long script into modules

- Discuss how you would move some functions out and organize them into separate
modules which can be imported to other projects: For instance
`compute_mean` can be moved to `statistics.py`.
- Discuss naming.
- Discuss interface design.


## Summarize in the collaborative document

- Now return to initial questions on the collaborative document and discuss questions and comments. If
there is time left, there are additional questions and exercises.
- It is easier and more fun to teach this as a pair with somebody else where
one person can type and the other person helps watching the questions and
commends and relays them to the co-instructor.
18 changes: 18 additions & 0 deletions _sources/learning-outcomes.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Learning outcomes

- Know about **pure functions** (functions without side effects, functions which
given same input always return same output).
- Learn why and how to **limit side effects** of functions.
- Discuss why and how to limit side effects of data. Also discuss when
mutable data may be preferable.
- [The Zen of Python](https://www.python.org/dev/peps/pep-0020/)
- Discuss why **single-purpose functions** are often preferred over
multi-purpose functions.
- **Split-apply-combine**, which lets you more easily parallelize. Make your code
modular in a way that lets you split the steps and parallelize.
- Think about **global vs local** data structures. It is not easy to
separate them right.
- Understand how a command line interface to a code can improve usability and also
make the code more versatile (to be combined with workflow management tools).
- Connect modular code development to the remaining lessons (version control, testing,
documentation, reusability).
62 changes: 62 additions & 0 deletions _sources/lesson.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Our task


## Data

The file [temperatures.csv](https://github.com/coderefinery/modular-type-along/blob/main/data/temperatures.csv)
contains hourly air temperature measurements for the time range November 1,
2019 12:00 AM - November 30, 2019 11:59 PM for the observation station "Vantaa
Helsinki-Vantaan lentoasema".

Data obtained from
<https://en.ilmatieteenlaitos.fi/download-observations#!/> on 2019-12-09.


## Our initial goal

Our initial goal for this exercise is to plot a series of temperatures
for **25 measurements** and to compute and plot the **arithmetic mean**. We
imagine that we assemble a working script from various StackOverflow
recommendations and arrive at:

```{literalinclude} code/initial-version.py
:language: python
```

This example is in Python but we will try to see "through" the code and
focus on the bigger picture and hopefully manage to imagine other
languages in its place. For the Python experts: we will not see the most
elegant Python.


## Further goals

- Once we get this working for **25 measurements**, our task changes to also
plot the **first 100** and the **first 500 measurements** in two additional
plots.
- Then we wish to generalize the code so that a user can compute and plot this
for **any number**, **without changing the code** (with a command line interface).


## How we plan to solve it

Before we attempt to do this, we discuss with workshop participants how
they would tackle this problem.

Together we improve the code based on suggestions from learners towards
more modularity and re-usability.

```{instructor-note}
Participants give suggestions and ask questions via collaborative document
and instructor(s) try to follow and answer. They can also roughly follow
the ideas and steps in the {ref}`guide`.

It is OK and good if mistakes happen and it is fun if the instructor(s) can
convey a bit of "improv" feel to this lesson.
```


## Additional exercises

Draw a call tree for one of your recent projects. Identify the
functions in your call tree which are "pure" (which have no side-effects).
24 changes: 24 additions & 0 deletions _sources/questions.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Starting questions for the collaborative document

We share these questions in a common collaborative document and we
wait until we have sufficiently many answers to question A. But we also
encourage answering other questions which we revisit at the end of the
demo.

```
A. What does "modular code development" mean for you?
B. What best practices can you recommend to arrive at well structured,
modular code in your favourite programming language?
C. What do you know now about programming that you wish somebody told you earlier?
```


## Additional questions

```
D. Do you design a new code project on paper before coding? Discuss pros
and cons.
E. Do you build your code top-down (starting from the big picture) or bottom-up
(starting from components)? Discuss pros and cons.
F. Would you prefer your code to be 2x slower if it was easier to read and understand?
```
Loading

0 comments on commit dd8f66a

Please sign in to comment.