Skip to content

Commit

Permalink
Merge branch 'gh-pages' into auto-vars
Browse files Browse the repository at this point in the history
  • Loading branch information
gcapes authored Apr 13, 2023
2 parents 24d714c + 21cb0bb commit 7720cd2
Show file tree
Hide file tree
Showing 3 changed files with 50 additions and 39 deletions.
50 changes: 31 additions & 19 deletions _episodes/01-intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,8 @@ Our directory has the Python scripts and data files we will be working with:
{: .output}

The first step is to count the frequency of each word in a book.
For this purpose we will use a python script `countwords.py` which takes two command line arguments.
The first argument is the input file (`books/isles.txt`) and the second is the output file that is generated (here `isles.dat`) by processing the input.

~~~
$ python countwords.py books/isles.txt isles.dat
Expand Down Expand Up @@ -165,8 +167,9 @@ seconds.
The most common solution to the tedium of data processing is to write
a shell script that runs the whole pipeline from start to finish.

Using your text editor of choice (e.g. nano), add the following to a new file
named `run_pipeline.sh`.
So to reproduce the tasks that we have just done we create a new file
named `run_pipeline.sh` in which we place the commands one by one.
Using a text editor of your choice (e.g. for nano use the command `nano run_pipeline.sh`) copy and paste the following text and save it.

~~~
# USAGE: bash run_pipeline.sh
Expand Down Expand Up @@ -261,9 +264,7 @@ What we really want is an executable _description_ of our pipeline that
allows software to do the tricky part for us:
figuring out what steps need to be rerun.

Make was developed by
Stuart Feldman in 1977 as a Bell Labs summer intern, and remains in
widespread use today. Make can execute the commands needed to run our
For our pipeline Make can execute the commands needed to run our
analysis and plot our results. Like shell scripts it allows us to
execute complex sequences of commands via a single shell
command. Unlike shell scripts it explicitly records the dependencies
Expand All @@ -281,29 +282,39 @@ new files, for example:
* Compile source code into executable programs or libraries.

There are now many build tools available, for example [Apache
ANT][apache-ant], [doit][doit], and [nmake][nmake] for Windows. There
are also build tools that build scripts for use with these build tools
and others e.g. [GNU Autoconf][autoconf] and [CMake][cmake]. Which is
best for you depends on your requirements, intended usage, and
operating system. However, they all share the same fundamental
concepts as Make.
ANT][apache-ant], [doit][doit], and [nmake][nmake] for Windows.
Which is best for you depends on your requirements,
intended usage, and operating system. However, they all share the same
fundamental concepts as Make.

Also, you might come across build generation scripts e.g. [GNU
Autoconf][autoconf] and [CMake][cmake]. Those tools do not run the
pipelines directly, but rather generate files for use with the build
tools.


> ## Why Use Make if it is Almost 40 Years Old?
>
> Today, researchers working with legacy codes in C or FORTRAN, which
> are very common in high-performance computing, will, very likely
> Make development was started by Stuart Feldman in 1977 as a Bell
> Labs summer intern. Since then it has been undergoing an active
> development and several implementations are available. Since it
> solves a common issue of workflow management, it remains in
> widespread use even today.
>
> Researchers working with legacy codes in C or FORTRAN, which are
> very common in high-performance computing, will, very likely
> encounter Make.
>
> Researchers are also finding Make of use in implementing
> reproducible research workflows, automating data analysis and
> visualisation (using Python or R) and combining tables and plots
> with text to produce reports and papers for publication.
> Researchers can use Make for implementing reproducible
> research workflows, automating data analysis and visualisation
> (using Python or R) and combining tables and plots with text to
> produce reports and papers for publication.
>
> Make's fundamental concepts are common across build tools.
{: .callout}

[GNU Make][gnu-make] is a free, fast, well-documented, and very popular
Make implementation. From now on, we will focus on it, and when we say
[GNU Make][gnu-make] is a free-libre, fast, [well-documented](gnu-make-documentation),
and very popular Make implementation. From now on, we will focus on it, and when we say
Make, we mean GNU Make.

[autoconf]: http://www.gnu.org/software/autoconf/autoconf.html
Expand All @@ -313,3 +324,4 @@ Make, we mean GNU Make.
[gnu-make]: http://www.gnu.org/software/make/
[nmake]: https://docs.microsoft.com/en-us/cpp/build/reference/nmake-reference
[zipfs-law]: http://en.wikipedia.org/wiki/Zipf%27s_law
[gnu-make-documentation]: https://www.gnu.org/software/make/manual/html_node/index.html
11 changes: 5 additions & 6 deletions _episodes/02-makefiles.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ executed by Make. Note how it resembles one of the lines from our shell script.
Let us go through each line in turn:

* `#` denotes a *comment*. Any text from `#` to the end of the line is
ignored by Make.
ignored by Make but could be very helpful for anyone reading your Makefile.
* `isles.dat` is a [target]({{ page.root }}/reference.html#target), a file to be
created, or built.
* `books/isles.txt` is a [dependency]({{ page.root }}/reference.html#dependency), a
Expand All @@ -44,8 +44,7 @@ Let us go through each line in turn:
[action]({{ page.root }}/reference.html#action), a command to run to build or
update the target using the dependencies. Targets can have zero or more
actions. These actions form a recipe to build the target
from its dependencies and can be considered to be
a shell script.
from its dependencies and are executed similarly to a shell script.
* Actions are indented using a single TAB character, *not* 8 spaces. This
is a legacy of Make's 1970's origins. If the difference between
spaces and a TAB character isn’t obvious in your editor, try moving
Expand Down Expand Up @@ -360,7 +359,7 @@ Make will raise an error.
> depends on that target.
{: .callout}
This rule is also an example of a rule that has no actions. It is used
This rule (`dats`) is also an example of a rule that has no actions. It is used
purely to trigger the build of its dependencies, if needed.
If we run,
Expand All @@ -378,8 +377,8 @@ python countwords.py books/abyss.txt abyss.dat
~~~
{: .output}
If we run `dats` again, then Make will see that the dependencies (isles.dat
and abyss.dat) are already up to date.
If we run `make dats` again, then Make will see that the dependencies (`isles.dat`
and `abyss.dat`) are already up to date.
Given the target `dats` has no actions, there is `nothing to be done`:
~~~
$ make dats
Expand Down
28 changes: 14 additions & 14 deletions _episodes/04-dependencies.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,10 +39,10 @@ clean :
~~~
{: .language-make}

Our data files are a product not only of our text files but the
script, `countwords.py`, that processes the text files and creates the
data files. A change to `countwords.py` (e.g. to add a new column of
summary data or remove an existing one) results in changes to the
Our data files are produced using not only the input text files but also the
script `countwords.py` that processes the text files and creates the
data files. A change to `countwords.py` (e.g. adding a new column of
summary data or removing an existing one) results in changes to the
`.dat` files it outputs. So, let's pretend to edit `countwords.py`,
using `touch`, and re-run Make:

Expand Down Expand Up @@ -103,24 +103,24 @@ python countwords.py books/last.txt last.dat
>
{: .callout}
The following figure shows the dependencies embodied within our
Makefile, involved in building the `results.txt` target, after adding
`countwords.py` and `testzipf.py` as dependencies to their respective target
files (i.e. how the Makefile should look after completing the rest of the
exercises in this episode).
The following figure shows a graph of the dependencies, that are
involved in building the target `results.txt`. Notice the recently
added dependencies `countwords.py` and `testzipf.py`. This is how the
Makefile should look after completing the rest of the exercises
in this episode.
![results.txt dependencies after adding countwords.py and testzipf.py as dependencies](../fig/04-dependencies.png "results.txt dependencies after adding countwords.py and testzipf.py as dependencies")
> ## Why Don't the `.txt` Files Depend on `countwords.py`?
>
> `.txt` files are input files and have no dependencies. To make these
> `.txt` files are input files and as such have no dependencies. To make these
> depend on `countwords.py` would introduce a [false
> dependency]({{ page.root }}/reference.html#false-dependency).
> dependency]({{ page.root }}/reference.html#false-dependency) which is not desirable.
{: .callout}
Intuitively, we should also add `countwords.py` as dependency for
`results.txt`, as the final table should be rebuilt as we remake the
`.dat` files. However, it turns out we don't have to! Let's see what
Intuitively, we should also add `countwords.py` as a dependency for
`results.txt`, because the final table should be rebuilt if we remake the
`.dat` files. However, it turns out we don't have to do that! Let's see what
happens to `results.txt` when we update `countwords.py`:
~~~
Expand Down

0 comments on commit 7720cd2

Please sign in to comment.