Skip to content

Commit

Permalink
Update import.sh
Browse files Browse the repository at this point in the history
  • Loading branch information
NotaInutilis committed Nov 26, 2023
1 parent e747bd7 commit 35a30ee
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions scripts/import.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@
cp -a ./import/original/. ./import/modified/

# Cleanup imported sources (Same code in update.sh)
## Special cleanup for imported sources of other formats (AdBlock, hosts, etc.)
find ./import/modified -type f -name "*.txt" -exec sed -ri 's/^[^#[:alnum:]]/#&/; s/^0\.0\.0\.0[[:space:]]*//i' {} \;
## Special cleanup for imported sources of other formats (match, hosts, AdBlock, etc.)
find ./import/modified -type f -name "*.txt" -exec sed -ri 's/^\*\.//i; s/^\*\:\/\/\*\.//i; s/^0\.0\.0\.0[[:space:]]*//i; s/^[^#[:alnum:]]/#&/' {} \;
## Normalizes URLs into domains: lowercases, remove leading spaces, protocol (`x://`) `www.` subdomains, everything after `/`, only one space before `#`. Keeps comments intact
find ./import/modified -type f -name "*.txt" -exec sed -ri 'h; s/[^#]*//1; x; s/#.*//; s/.*/\L&/; s/^[[:space:]]*//i; s/^.*:\/\///i; s/^[.*]*//i; s/^www\.//i; s/\/[^[:space:]]*//i; s/[[:space:]].*$/ /i; G; s/(.*)\n/\1/' {} \;
find ./import/modified -type f -name "*.txt" -exec sed -ri 's/^www\.//i' {} \; # Removing "www." twice because unmaintained imported lists are weird.
Expand Down

0 comments on commit 35a30ee

Please sign in to comment.