Rework filtering options

cestef · Nov 22, 2023 · 7d99eb5 · 7d99eb5
1 parent fccddb1
commit 7d99eb5
Show file tree

Hide file tree

Showing 5 changed files with 268 additions and 67 deletions.
diff --git a/README.md b/README.md
@@ -90,16 +90,68 @@ Transformations:
   -S, --transform-suffix <SUFFIX>  Append a suffix to each word
   -C, --transform-capitalize       Capitalize each word
 
-Filtering:
-  -F, --filter-contains <STRING>     Contains the specified string
-      --filter-starts-with <STRING>  Start with the specified string
-      --filter-ends-with <STRING>    End with the specified string
-      --filter-regex <REGEX>         Filter out words that match the specified regex
-      --filter-max-length <LENGTH>   Maximum length
-      --filter-min-length <LENGTH>   Minimum length
-      --filter-length <LENGTH>       Exact length
+Wordlist Filtering:
+      --wordlist-filter-contains <STRING>
+          Contains the specified string [aliases: wfc]
+      --wordlist-filter-starts-with <STRING>
+          Start with the specified string [aliases: wfs]
+      --wordlist-filter-ends-with <STRING>
+          End with the specified string [aliases: wfe]
+      --wordlist-filter-regex <REGEX>
+          Match the specified regex [aliases: wfr]
+      --wordlist-filter-length <LENGTH>
+          Length range e.g.: 5, 5-10, 5,10,15, >5, <5 [aliases: wfl]
+
+Response Filtering:
+      --filter-status-code <CODE>    Reponse status code, e.g.: 200, 200-300, 200,300,400, >200, <200 [aliases: fsc]
+      --filter-contains <STRING>     Contains the specified string [aliases: fc]
+      --filter-starts-with <STRING>  Start with the specified string [aliases: fs]
+      --filter-ends-with <STRING>    End with the specified string [aliases: fe]
+      --filter-regex <REGEX>         Match the specified regex [aliases: fr]
+      --filter-length <LENGTH>       Response length e.g.: 100, >100, <100, 100-200, 100,200,300 [aliases: fl]
+      --filter-time <TIME>           Response time range in milliseconds e.g.: >1000, <1000, 1000-2000 [aliases: ft]
 ```
 
+### Inputting ranges
+
+In some cases (`<RANGE>`), you may want to input a range of values. 
+You can use the following formats:
+
+| Format     | Description                               |
+| :--------- | :---------------------------------------- |
+| `5`        | Exactly `5`                               |
+| `5-10`     | Between `5` and `10` (inclusive)          |
+| `5,10`     | Exactly `5` or `10`                       |
+| `>5`       | Greater than `5`                          |
+| `<5`       | Less than `5`                             |
+| `5,10,15`  | Exactly `5`, `10`, or `15`                |
+| `>5,10,15` | Greater than `5`, or exactly `10` or `15` |
+
+I will let you figure out the rest of the combinations.
+
+### Response Filtering
+
+To cherry-pick the responses, you can use the `--filter-*` flags to filter specific responses. For example, to only show responses that contain `admin`:
+
+```bash
+rwalk https://example.com path/to/wordlist.txt --filter-contains admin
+```
+
+or only requests that took more than `1` second:
+
+```bash
+rwalk https://example.com path/to/wordlist.txt --filter-time ">1000"
+```
+
+Available filters:
+
+- `--filter-starts-with` _`<STRING>`_ or `--fs`
+- `--filter-ends-with` _`<STRING>`_ or `--fe`
+- `--filter-contains` _`<STRING>`_ or `--fc`
+- `--filter-regex` _`<REGEX>`_ or `--fr`
+- `--filter-length` _`<LENGTH>`_ or `--fl`
+- `--filter-status-code` _`<CODE>`_ or `--fsc`
+
 ### Wordlists
 
 You can pass multiple wordlists to rwalk. For example:
@@ -112,26 +164,30 @@ rwalk will merge the wordlists and remove duplicates. You can also apply filters
 
 > **Note:** A checksum is computed for the wordlists and stored in case you abort the scan. If you resume the scan, rwalk will only load the wordlists if the checksums match. See [Saving progress](#saving-and-resuming-scans) for more information.
 
-### Filters
 
-You can filter out words from the wordlist by using the `--filter-*` flags. For example, to filter out all words that start with `admin`:
+### Wordlist Filters
+
+> **Warning:** These options filter <u>**out**</u> the words from the wordlist. For example, if you use `--wordlist-filter-starts-with admin`, all words that start with `admin` will be <u>**removed**</u> from the wordlist.
+> Don't confuse this with [Response Filtering](#response-filtering) which filters <u>**in**</u> responses.
+
+You can filter <u>**out**</u> words from the wordlist by using the `--wordlist-filter-*` (`--wf*`) flags. For example, to filter out all words that start with `admin`:
 
 ```bash
-rwalk https://example.com path/to/wordlist.txt --filter-starts-with admin
+rwalk https://example.com path/to/wordlist.txt --wordlist-filter-starts-with admin
 ```
 
 Available filters:
 
-- `--filter-starts-with` _`<STRING>`_
-- `--filter-ends-with` _`<STRING>`_
-- `--filter-contains` _`<STRING>`_
-- `--filter-regex` _`<REGEX>`_
-- `--filter-length` _`<LENGTH>`_
-- `--filter-min-length` _`<LENGTH>`_
-- `--filter-max-length` _`<LENGTH>`_
+- `--wordlist-filter-starts-with` _`<STRING>`_ or `--wfs`
+- `--wordlist-filter-ends-with` _`<STRING>`_ or `--wfe` 
+- `--wordlist-filter-contains` _`<STRING>`_ or `--wfc`
+- `--wordlist-filter-regex` _`<REGEX>`_ or `--wfr` 
+- `--wordlist-filter-length` _`<LENGTH>`_ or `--wfl` 
+- `--wordlist-filter-min-length` _`<LENGTH>`_ or `--wfm`
+- `--wordlist-filter-max-length` _`<LENGTH>`_ or `--wfx`
 
 
-### Transformations
+### Wordlist Transformations
 
 To quickly modify the wordlist, you can use the `--transform-*` flags. For example, to add a prefix to all words in the wordlist:
 
@@ -141,11 +197,14 @@ rwalk https://example.com path/to/wordlist.txt --transform-prefix "."
 
 Available transformations:
 
-- `--transform-prefix` _`<PREFIX>`_
-- `--transform-suffix` _`<SUFFIX>`_
-- `--transform-upper`
-- `--transform-lower`
-- `--transform-capitalize`
+- `--transform-prefix` _`<PREFIX>`_ or `-P`
+- `--transform-suffix` _`<SUFFIX>`_ or `-S`
+- `--transform-upper` or `-U`
+- `--transform-lower` or `-L`
+- `--transform-capitalize` or `-C`
+
+
+
 
 ### Throttling
 
@@ -227,8 +286,6 @@ Each tool was run `10` times with `100` threads. The results are below:
 | `dirsearch` | 14.263 ± 0.250 |  13.861 |  14.719 | 2.70 ± 0.07 |
 | `ffuf`      |  5.285 ± 0.090 |   5.154 |   5.358 |        1.00 |
 
-[ffuf](https://github.com/ffuf/ffuf) is the fastest tool... but not by much. rwalk is only `1.15x` slower than ffuf and ~`2.5x` faster than dirsearch. Not bad for a first release!
-
 ## License
 
 Licensed under the [MIT License](LICENSE).
diff --git a/src/cli.rs b/src/cli.rs
@@ -2,7 +2,7 @@ use clap::Parser;
 use lazy_static::lazy_static;
 use url::Url;
 
-use crate::constants::SAVE_FILE;
+use crate::{constants::SAVE_FILE, utils::parse_range_input};
 #[derive(Parser, Clone, Debug)]
 #[clap(
     version,
@@ -80,26 +80,51 @@ pub struct Opts {
     pub transform_capitalize: bool,
 
     /// Contains the specified string
-    #[clap(short='F', long, help_heading = Some("Filtering"), value_name = "STRING")]
+    #[clap(long, help_heading = Some("Wordlist Filtering"), value_name = "STRING", visible_alias = "wfc")]
+    pub wordlist_filter_contains: Option<String>,
+    /// Start with the specified string
+    #[clap(long, help_heading = Some("Wordlist Filtering"), value_name = "STRING", visible_alias = "wfs")]
+    pub wordlist_filter_starts_with: Option<String>,
+    /// End with the specified string
+    #[clap(long, help_heading = Some("Wordlist Filtering"), value_name = "STRING", visible_alias = "wfe")]
+    pub wordlist_filter_ends_with: Option<String>,
+    /// Match the specified regex
+    #[clap(long, help_heading = Some("Wordlist Filtering"), value_name = "REGEX", visible_alias = "wfr")]
+    pub wordlist_filter_regex: Option<String>,
+    /// Length range
+    /// e.g.: 5, 5-10, 5,10,15, >5, <5
+    #[clap(long, help_heading = Some("Wordlist Filtering"), value_name = "RANGE", visible_alias = "wfl", value_parser(parse_cli_range_input))]
+    pub wordlist_filter_length: Option<String>,
+
+    /// Reponse status code,
+    /// e.g.: 200, 200-300, 200,300,400, >200, <200
+    #[clap(long, help_heading = Some("Response Filtering"), value_name = "RANGE", visible_alias = "fsc", value_parser(parse_cli_range_input))]
+    pub filter_status_code: Option<String>,
+    /// Contains the specified string
+    #[clap(long, help_heading = Some("Response Filtering"), value_name = "STRING", visible_alias = "fc")]
     pub filter_contains: Option<String>,
     /// Start with the specified string
-    #[clap(long, help_heading = Some("Filtering"), value_name = "STRING")]
+    #[clap(long, help_heading = Some("Response Filtering"), value_name = "STRING", visible_alias = "fs")]
     pub filter_starts_with: Option<String>,
     /// End with the specified string
-    #[clap(long, help_heading = Some("Filtering"), value_name = "STRING")]
+    #[clap(long, help_heading = Some("Response Filtering"), value_name = "STRING", visible_alias = "fe")]
     pub filter_ends_with: Option<String>,
-    /// Filter out words that match the specified regex
-    #[clap(long, help_heading = Some("Filtering"), value_name = "REGEX")]
+    /// Match the specified regex
+    #[clap(long, help_heading = Some("Response Filtering"), value_name = "REGEX", visible_alias = "fr")]
     pub filter_regex: Option<String>,
-    /// Maximum length
-    #[clap(long, help_heading = Some("Filtering"), value_name = "LENGTH")]
-    pub filter_max_length: Option<usize>,
-    /// Minimum length
-    #[clap(long, help_heading = Some("Filtering"), value_name = "LENGTH")]
-    pub filter_min_length: Option<usize>,
-    /// Exact length
-    #[clap(long, help_heading = Some("Filtering"), value_name = "LENGTH")]
-    pub filter_length: Option<usize>,
+    /// Response length
+    /// e.g.: 100, >100, <100, 100-200, 100,200,300
+    #[clap(long, help_heading = Some("Response Filtering"), value_name = "RANGE", visible_alias = "fl", value_parser(parse_cli_range_input))]
+    pub filter_length: Option<String>,
+    /// Response time range in milliseconds
+    /// e.g.: >1000, <1000, 1000-2000
+    #[clap(long, help_heading = Some("Response Filtering"), value_name = "RANGE", visible_alias = "ft", value_parser(parse_cli_range_input))]
+    pub filter_time: Option<String>,
+}
+
+fn parse_cli_range_input(s: &str) -> Result<String, String> {
+    parse_range_input(s)?;
+    Ok(s.to_string())
 }
 
 fn parse_url(s: &str) -> Result<String, String> {

diff --git a/src/core.rs b/src/core.rs
@@ -16,6 +16,7 @@ use crate::{
     cli::OPTS,
     constants::{ERROR, STATUS_CODES, SUCCESS, WARNING},
     tree::{Tree, TreeData},
+    utils::is_response_filtered,
 };
 
 pub async fn start(
@@ -143,13 +144,26 @@ pub async fn start(
                             tokio::time::sleep(sleep).await;
                         }
                         match response {
-                            Ok(response) => {
-                                if STATUS_CODES
-                                    .iter()
-                                    .any(|x| x.contains(&response.status().as_u16()))
+                            Ok(mut response) => {
+                                let mut text = String::new();
+                                while let Ok(chunk) = response.chunk().await {
+                                    if let Some(chunk) = chunk {
+                                        text.push_str(&String::from_utf8_lossy(&chunk));
+                                    } else {
+                                        break;
+                                    }
+                                }
+                                let status_code = response.status().as_u16();
+                                let filtered = is_response_filtered(
+                                    &text,
+                                    status_code,
+                                    t1.elapsed().as_millis() as u16,
+                                );
+
+                                if filtered && STATUS_CODES.iter().any(|x| x.contains(&status_code))
                                 {
                                     progress.println(format!(
-                                        "{} {} {}",
+                                        "{} {} {} {}",
                                         if response.status().is_success() {
                                             SUCCESS.to_string().green()
                                         } else if response.status().is_redirection() {
@@ -158,7 +172,12 @@ pub async fn start(
                                             ERROR.to_string().red()
                                         },
                                         response.status().as_str().bold(),
-                                        url
+                                        url,
+                                        format!(
+                                            "{}ms",
+                                            t1.elapsed().as_millis().to_string().bold()
+                                        )
+                                        .dimmed()
                                     ));
                                     // Check if this path is already in the tree
                                     let mut found = false;
@@ -175,7 +194,7 @@ pub async fn start(
                                                 url: url.clone(),
                                                 depth: data.depth + 1,
                                                 path: word.clone(),
-                                                status_code: response.status().as_u16(),
+                                                status_code,
                                             },
                                             Some(previous_node.clone()),
                                         );

diff --git a/src/main.rs b/src/main.rs
@@ -61,6 +61,12 @@ async fn main() -> Result<()> {
                 .to_string()
                 .bold()
         );
+    } else {
+        println!(
+            "{} {} words loaded",
+            INFO.to_string().blue(),
+            before.to_string().bold()
+        );
     }
     if words.len() == 0 {
         println!("{} No words found in wordlists", ERROR.to_string().red());