Skip to content

Commit

Permalink
refactor: URL Rewriting, Writing to Output
Browse files Browse the repository at this point in the history
I ran into an issue where a website had a malformed `href` in a `link` element so URL rewriting failed. I tried to just use `-r` to remove the nodes from the selected set, but htmlq still panicked.

I decided to look at the code to fix it and found  some improvements along the way.

---

Changes

* Refactored code to only rewrite URLs when they are part of the selected elements
    * As part of this refactor, we also only remove nodes  when they are part of the selected nodes.
    * Node Removal CSS Selectors were joined into a single selector as they are semantically the same.
        * eg. looping over `a` and `link` is the same as just using the single selector `a, link`
    * Refactored node removal to happen before URL rewrites.
    * Refactored `rewrite_relative_urls` with `rewrite_relative_url`
    * Removes leading forward slashes ('/') from `href`s in `a`, `link`, or `area` elems and doesn't join them if they have 4 or more leading forward slashes.

* Replaced `output.write_all` calls with `writeln!` calls.
    * This should reduce string allocations to just include newlines.
    * It also only operates on nodes we've already selected instead of the whole tree.

* Also replaced as many `.unwrap()` calls within the code (left the ones in the test since they should be fine.
    * Mainly changed over to `.expects()`, `.unwrap_or_default()`, or `.unwrap_or_else()` calls
    * Some minor Rust changes to use newer constructs like [`let-else`](https://rust-lang.github.io/rfcs/3137-let-else.html)
  • Loading branch information
zamu-flowerpot authored and mgdm committed Apr 15, 2023
1 parent 739cd36 commit 19c7f3f
Show file tree
Hide file tree
Showing 3 changed files with 357 additions and 195 deletions.
Loading

0 comments on commit 19c7f3f

Please sign in to comment.