diff --git a/docs/03-working-with-files.md b/docs/03-working-with-files.md index 5d62d29..942f280 100644 --- a/docs/03-working-with-files.md +++ b/docs/03-working-with-files.md @@ -15,7 +15,7 @@ - How can I control who has permission to modify a file? - How can I repeat recently used commands? -:::::::::::::::::::::::::::::::::::::::::::::::::: + ## Working with Files @@ -27,37 +27,39 @@ have two results files, which are stored in our `untrimmed_fastq` directory. ### Wildcards -Navigate to your `untrimmed_fastq` directory: +!!! terminal-2 "Navigate to your `untrimmed_fastq` directory:" -```bash -$ cd ~/obss_2023/commandline/shell_data/untrimmed_fastq -``` + ```bash + $ cd ~/shell_data/untrimmed_fastq + ``` -We are interested in looking at the FASTQ files in this directory. We can list -all files with the .fastq extension using the command: + We are interested in looking at the FASTQ files in this directory. We can list + all files with the .fastq extension using the command: -```bash -$ ls *.fastq -``` + ```bash + $ ls *.fastq + ``` -```output -SRR097977.fastq SRR098026.fastq -``` + ```output + SRR097977.fastq SRR098026.fastq + ``` The `*` character is a special type of character called a wildcard, which can be used to represent any number of any type of character. Thus, `*.fastq` matches every file that ends with `.fastq`. This command: -```bash -$ ls *977.fastq -``` +!!! terminal "code" -```output -SRR097977.fastq -``` + ```bash + $ ls *977.fastq + ``` -lists only the file that ends with `977.fastq`. + ```output + SRR097977.fastq + ``` + + lists only the file that ends with `977.fastq`. This command: @@ -75,82 +77,72 @@ Lists every file in `/usr/bin` that ends in the characters `.sh`. Note that the output displays **full** paths to files, since each result starts with `/`. -::::::::::::::::::::::::::::::::::::::: challenge - -## Exercise - -Do each of the following tasks from your current directory using a single -`ls` command for each: - -1. List all of the files in `/usr/bin` that start with the letter 'c'. -2. List all of the files in `/usr/bin` that contain the letter 'a'. -3. List all of the files in `/usr/bin` that end with the letter 'o'. - -Bonus: List all of the files in `/usr/bin` that contain the letter 'a' or the -letter 'c'. - -Hint: The bonus question requires a Unix wildcard that we haven't talked about -yet. Try searching the internet for information about Unix wildcards to find -what you need to solve the bonus problem. - -::::::::::::::: solution - -## Solution - -1. `ls /usr/bin/c*` -2. `ls /usr/bin/*a*` -3. `ls /usr/bin/*o` - Bonus: `ls /usr/bin/*[ac]*` - -::::::::::::::::::::::::: - -:::::::::::::::::::::::::::::::::::::::::::::::::: - -::::::::::::::::::::::::::::::::::::::: challenge - -## Exercise - -`echo` is a built-in shell command that writes its arguments, like a line of text to standard output. -The `echo` command can also be used with pattern matching characters, such as wildcard characters. -Here we will use the `echo` command to see how the wildcard character is interpreted by the shell. - -```bash -$ echo *.fastq -``` - -```output -SRR097977.fastq SRR098026.fastq -``` - -The `*` is expanded to include any file that ends with `.fastq`. We can see that the output of -`echo *.fastq` is the same as that of `ls *.fastq`. - -What would the output look like if the wildcard could _not_ be matched? Compare the outputs of -`echo *.missing` and `ls *.missing`. - -::::::::::::::: solution + + +!!! dumbbell "Exercise" + + Do each of the following tasks from your current directory using a single + `ls` command for each: + + 1. List all of the files in `/usr/bin` that start with the letter 'c'. + 2. List all of the files in `/usr/bin` that contain the letter 'a'. + 3. List all of the files in `/usr/bin` that end with the letter 'o'. + + Bonus: List all of the files in `/usr/bin` that contain the letter 'a' or the + letter 'c'. + + Hint: The bonus question requires a Unix wildcard that we haven't talked about + yet. Try searching the internet for information about Unix wildcards to find + what you need to solve the bonus problem. + + + + ??? success "Solution" + + 1. `ls /usr/bin/c*` + 2. `ls /usr/bin/*a*` + 3. `ls /usr/bin/*o` + Bonus: `ls /usr/bin/*[ac]*` + + +!!! dumbbell "Exercise" + + `echo` is a built-in shell command that writes its arguments, like a line of text to standard output. + The `echo` command can also be used with pattern matching characters, such as wildcard characters. + Here we will use the `echo` command to see how the wildcard character is interpreted by the shell. + + ```bash + $ echo *.fastq + ``` + + ```output + SRR097977.fastq SRR098026.fastq + ``` + + The `*` is expanded to include any file that ends with `.fastq`. We can see that the output of + `echo *.fastq` is the same as that of `ls *.fastq`. + + What would the output look like if the wildcard could _not_ be matched? Compare the outputs of + `echo *.missing` and `ls *.missing`. + + ??? success "Solution" + + ```bash + $ echo *.missing + ``` + + ```output + *.missing + ``` + + ```bash + $ ls *.missing + ``` + + ```output + ls: cannot access '*.missing': No such file or directory + ``` -## Solution - -```bash -$ echo *.missing -``` - -```output -*.missing -``` - -```bash -$ ls *.missing -``` - -```output -ls: cannot access '*.missing': No such file or directory -``` - -::::::::::::::::::::::::: - -:::::::::::::::::::::::::::::::::::::::::::::::::: ## Command History @@ -166,50 +158,43 @@ A few more useful shortcuts: is very useful. - Ctrl\+L or the `clear` command will clear your screen. -You can also review your recent commands with the `history` command, by entering: +!!! terminal -2 "You can also review your recent commands with the `history` command, by entering:" -```bash -$ history -``` + ```bash + $ history + ``` to see a numbered list of recent commands. You can reuse one of these commands directly by referring to the number of that command. -For example, if your history looked like this: +!!! terminal-2 "For example, if your history looked like this:" -```output -259 ls * -260 ls /usr/bin/*.sh -261 ls *R1*fastq -``` + ```output + 259 ls * + 260 ls /usr/bin/*.sh + 261 ls *R1*fastq + ``` -then you could repeat command #260 by entering: +!!! terminal -2 "then you could repeat command #260 by entering:" -```bash -$ !260 -``` + ```bash + $ !260 + ``` Type `!` (exclamation point) and then the number of the command from your history. You will be glad you learned this when you need to re-run very complicated commands. For more information on advanced usage of `history`, read section 9.3 of [Bash manual](https://www.gnu.org/software/bash/manual/html_node/index.html). -::::::::::::::::::::::::::::::::::::::: challenge -## Exercise +!!! dumbbell "Exercise" -Find the line number in your history for the command that listed all the .sh -files in `/usr/bin`. Rerun that command. + Find the line number in your history for the command that listed all the .sh + files in `/usr/bin`. Rerun that command. -::::::::::::::: solution + ??? success "solution" -## Solution - -First type `history`. Then use `!` followed by the line number to rerun that command. - -::::::::::::::::::::::::: - -:::::::::::::::::::::::::::::::::::::::::::::::::: + First type `history`. Then use `!` followed by the line number to rerun that command. ## Examining Files @@ -221,42 +206,41 @@ contents using the program `cat`. Enter the following command from within the `untrimmed_fastq` directory: -```bash -$ cat SRR098026.fastq -``` - -This will print out all of the contents of the `SRR098026.fastq` to the screen. +!!! terminal "code" -::::::::::::::::::::::::::::::::::::::: challenge + ```bash + $ cat SRR098026.fastq + ``` -## Exercise +This will print out all of the contents of the `SRR098026.fastq` to the screen. -1. Print out the contents of the `~/obss_2023/commandline/shell_data/untrimmed_fastq/SRR097977.fastq` file. What is the last line of the file? -2. From your home directory, and without changing directories, - use one short command to print the contents of all of the files in - the `~/obss_2023/commandline/shell_data/untrimmed_fastq` directory. -::::::::::::::: solution -## Solution +!!! dumbbell "Exercise" -1. The last line of the file is `C:CCC::CCCCCCCC<8?6A:C28C<608'&&&,'$`. -2. `cat ~/obss_2023/commandline/shell_data/untrimmed_fastq/*` + 1. Print out the contents of the `~/shell_data/untrimmed_fastq/SRR097977.fastq` file. What is the last line of the file? + 2. From your home directory, and without changing directories, + use one short command to print the contents of all of the files in + the `~/shell_data/untrimmed_fastq` directory. -::::::::::::::::::::::::: + + + ??? success "Solution" + + 1. The last line of the file is `C:CCC::CCCCCCCC<8?6A:C28C<608'&&&,'$`. + 2. `cat ~/shell_data/untrimmed_fastq/*` -:::::::::::::::::::::::::::::::::::::::::::::::::: `cat` is a terrific program, but when the file is really big, it can be annoying to use. The program, `less`, is useful for this case. `less` opens the file as read only, and lets you navigate through it. The navigation commands are identical to the `man` program. -Enter the following command: +!!! terminal-2 "Enter the following command:" -```bash -$ less SRR097977.fastq -``` + ```bash + $ less SRR097977.fastq + ``` Some navigation commands in `less`: @@ -287,21 +271,16 @@ and where it is in the file. If you continue to type `/` and hit return, you wil forward to the next instance of this sequence motif. If you instead type `?` and hit return, you will search backwards and move up the file to previous examples of this motif. -::::::::::::::::::::::::::::::::::::::: challenge - -## Exercise - -What are the next three nucleotides (characters) after the first instance of the sequence quoted above? -::::::::::::::: solution +!!! dumbbell "Exercise" -## Solution + What are the next three nucleotides (characters) after the first instance of the sequence quoted above? -`CAC` -::::::::::::::::::::::::: -:::::::::::::::::::::::::::::::::::::::::::::::::: + ??? success "Solution" + + `CAC` Remember, the `man` program actually uses `less` internally and therefore uses the same commands, so you can search documentation @@ -314,58 +293,59 @@ to see the beginning or end of the file, or see how it's formatted. The commands are `head` and `tail` and they let you look at the beginning and end of a file, respectively. -```bash -$ head SRR098026.fastq -``` - -```output -@SRR098026.1 HWUSI-EAS1599_1:2:1:0:968 length=35 -NNNNNNNNNNNNNNNNCNNNNNNNNNNNNNNNNNN -+SRR098026.1 HWUSI-EAS1599_1:2:1:0:968 length=35 -!!!!!!!!!!!!!!!!#!!!!!!!!!!!!!!!!!! -@SRR098026.2 HWUSI-EAS1599_1:2:1:0:312 length=35 -NNNNNNNNNNNNNNNNANNNNNNNNNNNNNNNNNN -+SRR098026.2 HWUSI-EAS1599_1:2:1:0:312 length=35 -!!!!!!!!!!!!!!!!#!!!!!!!!!!!!!!!!!! -@SRR098026.3 HWUSI-EAS1599_1:2:1:0:570 length=35 -NNNNNNNNNNNNNNNNANNNNNNNNNNNNNNNNNN -``` - -```bash -$ tail SRR098026.fastq -``` - -```output -+SRR098026.247 HWUSI-EAS1599_1:2:1:2:1311 length=35 -#!##!#################!!!!!!!###### -@SRR098026.248 HWUSI-EAS1599_1:2:1:2:118 length=35 -GNTGNGGTCATCATACGCGCCCNNNNNNNGGCATG -+SRR098026.248 HWUSI-EAS1599_1:2:1:2:118 length=35 -B!;?!A=5922:##########!!!!!!!###### -@SRR098026.249 HWUSI-EAS1599_1:2:1:2:1057 length=35 -CNCTNTATGCGTACGGCAGTGANNNNNNNGGAGAT -+SRR098026.249 HWUSI-EAS1599_1:2:1:2:1057 length=35 -A!@B!BBB@ABAB#########!!!!!!!###### -``` - -The `-n` option to either of these commands can be used to print the -first or last `n` lines of a file. - -```bash -$ head -n 1 SRR098026.fastq -``` - -```output -@SRR098026.1 HWUSI-EAS1599_1:2:1:0:968 length=35 -``` - -```bash -$ tail -n 1 SRR098026.fastq -``` - -```output -A!@B!BBB@ABAB#########!!!!!!!###### -``` +!!! terminal "code" + + ```bash + $ head SRR098026.fastq + ``` + + ```output + @SRR098026.1 HWUSI-EAS1599_1:2:1:0:968 length=35 + NNNNNNNNNNNNNNNNCNNNNNNNNNNNNNNNNNN + +SRR098026.1 HWUSI-EAS1599_1:2:1:0:968 length=35 + !!!!!!!!!!!!!!!!#!!!!!!!!!!!!!!!!!! + @SRR098026.2 HWUSI-EAS1599_1:2:1:0:312 length=35 + NNNNNNNNNNNNNNNNANNNNNNNNNNNNNNNNNN + +SRR098026.2 HWUSI-EAS1599_1:2:1:0:312 length=35 + !!!!!!!!!!!!!!!!#!!!!!!!!!!!!!!!!!! + @SRR098026.3 HWUSI-EAS1599_1:2:1:0:570 length=35 + NNNNNNNNNNNNNNNNANNNNNNNNNNNNNNNNNN + ``` + + ```bash + $ tail SRR098026.fastq + ``` + + ```output + +SRR098026.247 HWUSI-EAS1599_1:2:1:2:1311 length=35 + #!##!#################!!!!!!!###### + @SRR098026.248 HWUSI-EAS1599_1:2:1:2:118 length=35 + GNTGNGGTCATCATACGCGCCCNNNNNNNGGCATG + +SRR098026.248 HWUSI-EAS1599_1:2:1:2:118 length=35 + B!;?!A=5922:##########!!!!!!!###### + @SRR098026.249 HWUSI-EAS1599_1:2:1:2:1057 length=35 + CNCTNTATGCGTACGGCAGTGANNNNNNNGGAGAT + +SRR098026.249 HWUSI-EAS1599_1:2:1:2:1057 length=35 + A!@B!BBB@ABAB#########!!!!!!!###### + ``` + +!!! terminal-2 "The `-n` option to either of these commands can be used to print the first or last `n` lines of a file." + + ```bash + $ head -n 1 SRR098026.fastq + ``` + + ```output + @SRR098026.1 HWUSI-EAS1599_1:2:1:0:968 length=35 + ``` + + ```bash + $ tail -n 1 SRR098026.fastq + ``` + + ```output + A!@B!BBB@ABAB#########!!!!!!!###### + ``` ## Details on the FASTQ format @@ -383,16 +363,18 @@ include... We can view the first complete read in one of the files in our dataset by using `head` to look at the first four lines. -```bash -$ head -n 4 SRR098026.fastq -``` - -```output -@SRR098026.1 HWUSI-EAS1599_1:2:1:0:968 length=35 -NNNNNNNNNNNNNNNNCNNNNNNNNNNNNNNNNNN -+SRR098026.1 HWUSI-EAS1599_1:2:1:0:968 length=35 -!!!!!!!!!!!!!!!!#!!!!!!!!!!!!!!!!!! -``` +!!! terminal "code" + + ```bash + $ head -n 4 SRR098026.fastq + ``` + + ```output + @SRR098026.1 HWUSI-EAS1599_1:2:1:0:968 length=35 + NNNNNNNNNNNNNNNNCNNNNNNNNNNNNNNNNNN + +SRR098026.1 HWUSI-EAS1599_1:2:1:0:968 length=35 + !!!!!!!!!!!!!!!!#!!!!!!!!!!!!!!!!!! + ``` All but one of the nucleotides in this read are unknown (`N`). This is a pretty bad read! Line 4 shows the quality for each nucleotide in the read. We'll cover the Fastq format more in depth tomorrow in when we look at [assessing read quality](https://otagobioinformaticsspringschool.github.io/wrangling-genomics-nesi/02-quality-control.html) in the DNA variant calling workshop. @@ -413,54 +395,53 @@ and change the file permissions so that we can read from, but not write to, the First, let's make a copy of one of our FASTQ files using the `cp` command. -Navigate to the `~/obss_2023/commandline/shell_data/untrimmed_fastq` directory and enter: +!!! terminal-2 "Navigate to the `~/shell_data/untrimmed_fastq` directory and enter:" -```bash -$ cp SRR098026.fastq SRR098026-copy.fastq -$ ls -F -``` + ```bash + $ cp SRR098026.fastq SRR098026-copy.fastq + $ ls -F + ``` -```output -SRR097977.fastq SRR098026-copy.fastq SRR098026.fastq -``` + ```output + SRR097977.fastq SRR098026-copy.fastq SRR098026.fastq + ``` We now have two copies of the `SRR098026.fastq` file, one of them named `SRR098026-copy.fastq`. We'll move this file to a new directory called `backup` where we'll store our backup data files. ### Creating Directories -The `mkdir` command is used to make a directory. Enter `mkdir` -followed by a space, then the directory name you want to create: +!!! terminal-2 "The `mkdir` command is used to make a directory. Enter `mkdir` followed by a space, then the directory name you want to create:" -```bash -$ mkdir backup -``` + ```bash + $ mkdir backup + ``` ### Moving / Renaming -We can now move our backup file to this directory. We can -move files around using the command `mv`: - -```bash -$ mv SRR098026-copy.fastq backup -$ ls backup -``` +!!! terminal-2 "We can now move our backup file to this directory. We can move files around using the command `mv`:" -```output -SRR098026-copy.fastq -``` -The `mv` command is also how you rename files. Let's rename this file to make it clear that this is a backup: + ```bash + $ mv SRR098026-copy.fastq backup + $ ls backup + ``` + + ```output + SRR098026-copy.fastq + ``` -```bash -$ cd backup -$ mv SRR098026-copy.fastq SRR098026-backup.fastq -$ ls -``` +!!! termial-2 " The `mv` command is also how you rename files. Let's rename this file to make it clear that this is a backup:" -```output -SRR098026-backup.fastq -``` + ```bash + $ cd backup + $ mv SRR098026-copy.fastq SRR098026-backup.fastq + $ ls + ``` + + ```output + SRR098026-backup.fastq + ``` ### File Permissions @@ -468,15 +449,15 @@ We've now made a backup copy of our file, but just because we have two copies, i overwrite both copies. To make sure we can't accidentally mess up this backup file, we're going to change the permissions on the file so that we're only allowed to read (i.e. view) the file, not write to it (i.e. make new changes). -View the current permissions on a file using the `-l` (long) flag for the `ls` command: +!!! terminal-2 "View the current permissions on a file using the `-l` (long) flag for the `ls` command:" -```bash -$ ls -l -``` + ```bash + $ ls -l + ``` -```output --rw-r--r-- 1 dcuser dcuser 43332 Nov 15 23:02 SRR098026-backup.fastq -``` + ```output + -rw-r--r-- 1 dcuser dcuser 43332 Nov 15 23:02 SRR098026-backup.fastq + ``` The first part of the output for the `-l` flag gives you information about the file's current permissions. There are ten slots in the permissions list. The first character in this list is related to file type, not permissions, so we'll ignore it for now. The next three @@ -493,22 +474,26 @@ talk more about this in [a later lesson](05-writing-scripts.md)). Our goal for now is to change permissions on this file so that you no longer have `w` or write permissions. We can do this using the `chmod` (change mode) command and subtracting (`-`) the write permission `-w`. -```bash -$ chmod -w SRR098026-backup.fastq -$ ls -l -``` +!!! terminal "code" -```output --r--r--r-- 1 dcuser dcuser 43332 Nov 15 23:02 SRR098026-backup.fastq -``` + ```bash + $ chmod -w SRR098026-backup.fastq + $ ls -l + ``` + + ```output + -r--r--r-- 1 dcuser dcuser 43332 Nov 15 23:02 SRR098026-backup.fastq + ``` ### Removing To prove to ourselves that you no longer have the ability to modify this file, try deleting it with the `rm` command: -```bash -$ rm SRR098026-backup.fastq -``` +!!! terminal "code" + + ```bash + $ rm SRR098026-backup.fastq + ``` You'll be asked if you want to override your file permissions: @@ -526,53 +511,46 @@ By default, `rm` will not delete directories. You can tell `rm` to delete a directory using the `-r` (recursive) option. Let's delete the backup directory we just made. -Enter the following command: +!!! terminal-2 "Enter the following command:" -```bash -$ cd .. -$ rm -r backup -``` - -This will delete not only the directory, but all files within the directory. If you have write-protected files in the directory, -you will be asked whether you want to override your permission settings. - -::::::::::::::::::::::::::::::::::::::: challenge + ```bash + $ cd .. + $ rm -r backup + ``` -## Exercise + This will delete not only the directory, but all files within the directory. If you have write-protected files in the directory, you will be asked whether you want to override your permission settings. -Starting in the `~/obss_2023/commandline/shell_data/untrimmed_fastq/` directory, do the following: -1. Make sure that you have deleted your backup directory and all files it contains. -2. Create a backup of each of your FASTQ files using `cp`. (Note: You'll need to do this individually for each of the two FASTQ files. We haven't - learned yet how to do this - with a wildcard.) -3. Use a wildcard to move all of your backup files to a new backup directory. -4. Change the permissions on all of your backup files to be write-protected. -::::::::::::::: solution +!!! dumbbell "Exercise" -## Solution - -1. `rm -r backup` -2. `cp SRR098026.fastq SRR098026-backup.fastq` and `cp SRR097977.fastq SRR097977-backup.fastq` -3. `mkdir backup` and `mv *-backup.fastq backup` -4. `chmod -w backup/*-backup.fastq` - It's always a good idea to check your work with `ls -l backup`. You should see something like: - -```output --r--r--r-- 1 dcuser dcuser 47552 Nov 15 23:06 SRR097977-backup.fastq --r--r--r-- 1 dcuser dcuser 43332 Nov 15 23:06 SRR098026-backup.fastq -``` + Starting in the `~/shell_data/untrimmed_fastq/` directory, do the following: + + 1. Make sure that you have deleted your backup directory and all files it contains. + 2. Create a backup of each of your FASTQ files using `cp`. (Note: You'll need to do this individually for each of the two FASTQ files. We haven't + learned yet how to do this + with a wildcard.) + 3. Use a wildcard to move all of your backup files to a new backup directory. + 4. Change the permissions on all of your backup files to be write-protected. + -::::::::::::::::::::::::: + ??? success "Solution" -:::::::::::::::::::::::::::::::::::::::::::::::::: + 1. `rm -r backup` + 2. `cp SRR098026.fastq SRR098026-backup.fastq` and `cp SRR097977.fastq SRR097977-backup.fastq` + 3. `mkdir backup` and `mv *-backup.fastq backup` + 4. `chmod -w backup/*-backup.fastq` + It's always a good idea to check your work with `ls -l backup`. You should see something like: + + ```output + -r--r--r-- 1 dcuser dcuser 47552 Nov 15 23:06 SRR097977-backup.fastq + -r--r--r-- 1 dcuser dcuser 43332 Nov 15 23:06 SRR098026-backup.fastq + ``` -:::::::::::::::::::::::::::::::::::::::: keypoints +!!! graduation-cap "keypoints" -- You can view file contents using `less`, `cat`, `head` or `tail`. -- The commands `cp`, `mv`, and `mkdir` are useful for manipulating existing files and creating new directories. -- You can view file permissions using `ls -l` and change permissions using `chmod`. -- The `history` command and the up arrow on your keyboard can be used to repeat recently used commands. + - You can view file contents using `less`, `cat`, `head` or `tail`. + - The commands `cp`, `mv`, and `mkdir` are useful for manipulating existing files and creating new directories. + - You can view file permissions using `ls -l` and change permissions using `chmod`. + - The `history` command and the up arrow on your keyboard can be used to repeat recently used commands. -::::::::::::::::::::::::::::::::::::::::::::::::::