Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
refactor: URL Rewriting, Writing to Output
I ran into an issue where a website had a malformed `href` in a `link` element so URL rewriting failed. I tried to just use `-r` to remove the nodes from the selected set, but htmlq still panicked. I decided to look at the code to fix it and found some improvements along the way. --- Changes * Refactored code to only rewrite URLs when they are part of the selected elements * As part of this refactor, we also only remove nodes when they are part of the selected nodes. * Node Removal CSS Selectors were joined into a single selector as they are semantically the same. * eg. looping over `a` and `link` is the same as just using the single selector `a, link` * Refactored node removal to happen before URL rewrites. * Refactored `rewrite_relative_urls` with `rewrite_relative_url` * Removes leading forward slashes ('/') from `href`s in `a`, `link`, or `area` elems and doesn't join them if they have 4 or more leading forward slashes. * Replaced `output.write_all` calls with `writeln!` calls. * This should reduce string allocations to just include newlines. * It also only operates on nodes we've already selected instead of the whole tree. * Also replaced as many `.unwrap()` calls within the code (left the ones in the test since they should be fine. * Mainly changed over to `.expects()`, `.unwrap_or_default()`, or `.unwrap_or_else()` calls * Some minor Rust changes to use newer constructs like [`let-else`](https://rust-lang.github.io/rfcs/3137-let-else.html)
- Loading branch information