Skip to content

Commit

Permalink
Merge pull request #59 from Hugo-C/feature/GH-55
Browse files Browse the repository at this point in the history
Replace Alexa overlap by Tranco
  • Loading branch information
Hugo-C authored Apr 21, 2024
2 parents 0172327 + 8364e6d commit 5cbfa7b
Show file tree
Hide file tree
Showing 14 changed files with 1,603 additions and 445 deletions.
914 changes: 815 additions & 99 deletions Cargo.lock

Large diffs are not rendered by default.

5 changes: 5 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,11 @@ openssl = { version = "~0.10.55", features = ["vendored"] } # Required for sent
sentry = "0.32"
rocket-sentry = "0.17"
redis = "0.25"
log = "0.4"
env_logger = "0.11"
async-std = { version = "1.12", features = ["attributes", "tokio1"] }
reqwest = { version = "0.12.2", features = ["stream"] }
tempfile = "3.10.1"

[dependencies.rocket_db_pools]
version = "0.1.0"
Expand Down
10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,12 +42,22 @@ It takes a required `host` parameter and optionally `port` (default to 443).

### Retrieve domains from alexa top 1 million that match a jarm hash

**DEPRECATED** see `tranco-overlap` instead

````http request
GET api/v1/alexa-overlap?jarm_hash=<jarm-hash>
````

The returned list is ordered by top alexa rank first

### Retrieve domains from tranco top 1 million that match a jarm hash

````http request
GET api/v1/tranco-overlap?jarm_hash=<jarm-hash>
````

The returned list is ordered by top tranco rank first

### Retrieve recently scanned hosts

````http request
Expand Down
2 changes: 2 additions & 0 deletions docker-compose.playwright.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,10 @@ services:
environment:
- ROCKET_SENTRY_DSN=123 # this way sentry is disabled but do not crash
- ALEXA_TOP1M_RAW_DATA_PATH=/alexa_top1M.csv
- FORCE_TRANCO_TOP1M_RAW_DATA_PATH=/tranco_top1M.csv
volumes:
- ./tests/fixtures_data/alexa_top1M.csv:/alexa_top1M.csv
- ./tests/fixtures_data/tranco_top1M.csv:/tranco_top1M.csv
depends_on:
- redis

Expand Down
29 changes: 29 additions & 0 deletions examples/tranco_top1m.url
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Request for no overlap
GET {{host}}/tranco-overlap?jarm_hash=123

HTTP 200
Content-Type: application/json
[Asserts]
jsonpath "$.overlapping_domains" isEmpty

# Request with 1 overlap
GET {{host}}/tranco-overlap?jarm_hash=3fd3fd20d3fd3fd21c3fd3fd3fd3fd2b66a312d81ed1efa0f55830f7490cb2

HTTP 200
Content-Type: application/json
[Asserts]
jsonpath "$.overlapping_domains" count == 1
jsonpath "$.overlapping_domains.[0].rank" == 9
jsonpath "$.overlapping_domains.[0].domain" == "zhihu.com"

# Request with 2 overlap
GET {{host}}/tranco-overlap?jarm_hash=21d19d00021d21d00021d19d21d21d1a46380b04d662f0848f508dd171125d

HTTP 200
Content-Type: application/json
[Asserts]
jsonpath "$.overlapping_domains" count == 2
jsonpath "$.overlapping_domains.[0].rank" == 11
jsonpath "$.overlapping_domains.[0].domain" == "fake_site_1.com"
jsonpath "$.overlapping_domains.[1].rank" == 12
jsonpath "$.overlapping_domains.[1].domain" == "fake_site_2.com"
Loading

0 comments on commit 5cbfa7b

Please sign in to comment.