diff --git a/_episodes/01-intro.md b/_episodes/01-intro.md index d0799768..fba089e5 100644 --- a/_episodes/01-intro.md +++ b/_episodes/01-intro.md @@ -44,6 +44,8 @@ Our directory has the Python scripts and data files we will be working with: {: .output} The first step is to count the frequency of each word in a book. +For this purpose we will use a python script `countwords.py` which takes two command line arguments. +The first argument is the input file (`books/isles.txt`) and the second is the output file that is generated (here `isles.dat`) by processing the input. ~~~ $ python countwords.py books/isles.txt isles.dat @@ -165,8 +167,9 @@ seconds. The most common solution to the tedium of data processing is to write a shell script that runs the whole pipeline from start to finish. -Using your text editor of choice (e.g. nano), add the following to a new file -named `run_pipeline.sh`. +So to reproduce the tasks that we have just done we create a new file +named `run_pipeline.sh` in which we place the commands one by one. +Using a text editor of your choice (e.g. for nano use the command `nano run_pipeline.sh`) copy and paste the following text and save it. ~~~ # USAGE: bash run_pipeline.sh @@ -261,9 +264,7 @@ What we really want is an executable _description_ of our pipeline that allows software to do the tricky part for us: figuring out what steps need to be rerun. -Make was developed by -Stuart Feldman in 1977 as a Bell Labs summer intern, and remains in -widespread use today. Make can execute the commands needed to run our +For our pipeline Make can execute the commands needed to run our analysis and plot our results. Like shell scripts it allows us to execute complex sequences of commands via a single shell command. Unlike shell scripts it explicitly records the dependencies @@ -281,29 +282,39 @@ new files, for example: * Compile source code into executable programs or libraries. There are now many build tools available, for example [Apache -ANT][apache-ant], [doit][doit], and [nmake][nmake] for Windows. There -are also build tools that build scripts for use with these build tools -and others e.g. [GNU Autoconf][autoconf] and [CMake][cmake]. Which is -best for you depends on your requirements, intended usage, and -operating system. However, they all share the same fundamental -concepts as Make. +ANT][apache-ant], [doit][doit], and [nmake][nmake] for Windows. +Which is best for you depends on your requirements, +intended usage, and operating system. However, they all share the same +fundamental concepts as Make. + +Also, you might come across build generation scripts e.g. [GNU +Autoconf][autoconf] and [CMake][cmake]. Those tools do not run the +pipelines directly, but rather generate files for use with the build +tools. + > ## Why Use Make if it is Almost 40 Years Old? > -> Today, researchers working with legacy codes in C or FORTRAN, which -> are very common in high-performance computing, will, very likely +> Make development was started by Stuart Feldman in 1977 as a Bell +> Labs summer intern. Since then it has been undergoing an active +> development and several implementations are available. Since it +> solves a common issue of workflow management, it remains in +> widespread use even today. +> +> Researchers working with legacy codes in C or FORTRAN, which are +> very common in high-performance computing, will, very likely > encounter Make. > -> Researchers are also finding Make of use in implementing -> reproducible research workflows, automating data analysis and -> visualisation (using Python or R) and combining tables and plots -> with text to produce reports and papers for publication. +> Researchers can use Make for implementing reproducible +> research workflows, automating data analysis and visualisation +> (using Python or R) and combining tables and plots with text to +> produce reports and papers for publication. > > Make's fundamental concepts are common across build tools. {: .callout} -[GNU Make][gnu-make] is a free, fast, well-documented, and very popular -Make implementation. From now on, we will focus on it, and when we say +[GNU Make][gnu-make] is a free-libre, fast, [well-documented](gnu-make-documentation), +and very popular Make implementation. From now on, we will focus on it, and when we say Make, we mean GNU Make. [autoconf]: http://www.gnu.org/software/autoconf/autoconf.html @@ -313,3 +324,4 @@ Make, we mean GNU Make. [gnu-make]: http://www.gnu.org/software/make/ [nmake]: https://docs.microsoft.com/en-us/cpp/build/reference/nmake-reference [zipfs-law]: http://en.wikipedia.org/wiki/Zipf%27s_law +[gnu-make-documentation]: https://www.gnu.org/software/make/manual/html_node/index.html diff --git a/_episodes/02-makefiles.md b/_episodes/02-makefiles.md index 657942a0..8402cccf 100644 --- a/_episodes/02-makefiles.md +++ b/_episodes/02-makefiles.md @@ -33,7 +33,7 @@ executed by Make. Note how it resembles one of the lines from our shell script. Let us go through each line in turn: * `#` denotes a *comment*. Any text from `#` to the end of the line is - ignored by Make. + ignored by Make but could be very helpful for anyone reading your Makefile. * `isles.dat` is a [target]({{ page.root }}/reference.html#target), a file to be created, or built. * `books/isles.txt` is a [dependency]({{ page.root }}/reference.html#dependency), a @@ -44,8 +44,7 @@ Let us go through each line in turn: [action]({{ page.root }}/reference.html#action), a command to run to build or update the target using the dependencies. Targets can have zero or more actions. These actions form a recipe to build the target - from its dependencies and can be considered to be - a shell script. + from its dependencies and are executed similarly to a shell script. * Actions are indented using a single TAB character, *not* 8 spaces. This is a legacy of Make's 1970's origins. If the difference between spaces and a TAB character isn’t obvious in your editor, try moving @@ -360,7 +359,7 @@ Make will raise an error. > depends on that target. {: .callout} -This rule is also an example of a rule that has no actions. It is used +This rule (`dats`) is also an example of a rule that has no actions. It is used purely to trigger the build of its dependencies, if needed. If we run, @@ -378,8 +377,8 @@ python countwords.py books/abyss.txt abyss.dat ~~~ {: .output} -If we run `dats` again, then Make will see that the dependencies (isles.dat -and abyss.dat) are already up to date. +If we run `make dats` again, then Make will see that the dependencies (`isles.dat` +and `abyss.dat`) are already up to date. Given the target `dats` has no actions, there is `nothing to be done`: ~~~ $ make dats diff --git a/_episodes/04-dependencies.md b/_episodes/04-dependencies.md index 8764b8e0..6de48c5c 100644 --- a/_episodes/04-dependencies.md +++ b/_episodes/04-dependencies.md @@ -39,10 +39,10 @@ clean : ~~~ {: .language-make} -Our data files are a product not only of our text files but the -script, `countwords.py`, that processes the text files and creates the -data files. A change to `countwords.py` (e.g. to add a new column of -summary data or remove an existing one) results in changes to the +Our data files are produced using not only the input text files but also the +script `countwords.py` that processes the text files and creates the +data files. A change to `countwords.py` (e.g. adding a new column of +summary data or removing an existing one) results in changes to the `.dat` files it outputs. So, let's pretend to edit `countwords.py`, using `touch`, and re-run Make: @@ -103,24 +103,24 @@ python countwords.py books/last.txt last.dat > {: .callout} -The following figure shows the dependencies embodied within our -Makefile, involved in building the `results.txt` target, after adding -`countwords.py` and `testzipf.py` as dependencies to their respective target -files (i.e. how the Makefile should look after completing the rest of the -exercises in this episode). +The following figure shows a graph of the dependencies, that are +involved in building the target `results.txt`. Notice the recently +added dependencies `countwords.py` and `testzipf.py`. This is how the +Makefile should look after completing the rest of the exercises +in this episode. ![results.txt dependencies after adding countwords.py and testzipf.py as dependencies](../fig/04-dependencies.png "results.txt dependencies after adding countwords.py and testzipf.py as dependencies") > ## Why Don't the `.txt` Files Depend on `countwords.py`? > -> `.txt` files are input files and have no dependencies. To make these +> `.txt` files are input files and as such have no dependencies. To make these > depend on `countwords.py` would introduce a [false -> dependency]({{ page.root }}/reference.html#false-dependency). +> dependency]({{ page.root }}/reference.html#false-dependency) which is not desirable. {: .callout} -Intuitively, we should also add `countwords.py` as dependency for -`results.txt`, as the final table should be rebuilt as we remake the -`.dat` files. However, it turns out we don't have to! Let's see what +Intuitively, we should also add `countwords.py` as a dependency for +`results.txt`, because the final table should be rebuilt if we remake the +`.dat` files. However, it turns out we don't have to do that! Let's see what happens to `results.txt` when we update `countwords.py`: ~~~