Skip to content

Commit

Permalink
Adds detection for OWLer and BBC bots, improves version detection for…
Browse files Browse the repository at this point in the history
… iOS and macOS (matomo-org#7546)

* Improves version detection for iOS and macOS
* Adds detection for OWLer
* Adds detection for BBC bots
  • Loading branch information
liviuconcioiu authored Dec 26, 2023
1 parent e99e861 commit 5b1e6d2
Show file tree
Hide file tree
Showing 4 changed files with 80 additions and 0 deletions.
16 changes: 16 additions & 0 deletions Tests/Parser/fixtures/oss.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4102,3 +4102,19 @@
version: 9.0.0
platform:
family: Android
-
user_agent: Aloha/1 CFNetwork/1492.0.1 Darwin/23.3.0
os:
name: iOS
short_name: IOS
version: "17.3"
platform:
family: iOS
-
user_agent: Safari/19617.1.17.11.9 CFNetwork/1490.0.4 Darwin/23.2.0
os:
name: Mac
short_name: MAC
version: "14.2"
platform:
family: Mac
36 changes: 36 additions & 0 deletions Tests/fixtures/bots.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5881,3 +5881,39 @@
producer:
name: Meltwater Deutschland GmbH
url: https://www.meltwater.com/
-
user_agent: [email protected]/1
bot:
name: OWLer
category: Crawler
url: https://openwebsearch.eu/owler/
producer:
name: Open Search Foundation e.V.
url: https://openwebsearch.eu/
-
user_agent: OWLer/0.1 (built with StormCrawler; https://ows.eu/owler; [email protected]
bot:
name: OWLer
category: Crawler
url: https://openwebsearch.eu/owler/
producer:
name: Open Search Foundation e.V.
url: https://openwebsearch.eu/
-
user_agent: Page Monitor (https://confluence.dev.bbc.co.uk/display/men/Page+Monitor)
bot:
name: BBC Page Monitor
category: Site Monitor
url: https://confluence.dev.bbc.co.uk/display/men/Page+Monitor
producer:
name: BBC
url: https://www.bbc.com/
-
user_agent: BBC-Forge-URL-Monitor-Twisted
bot:
name: BBC Forge URL Monitor
category: Site Monitor
url: https://www.bbc.com/
producer:
name: BBC
url: https://www.bbc.com/
24 changes: 24 additions & 0 deletions regexes/bots.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3508,6 +3508,30 @@
name: 'Meltwater Deutschland GmbH'
url: 'https://www.meltwater.com/'

- regex: '(?:[email protected]|OWLer)/([\d+.]+)'
name: 'OWLer'
category: 'Crawler'
url: 'https://openwebsearch.eu/owler/'
producer:
name: 'Open Search Foundation e.V.'
url: 'https://openwebsearch.eu/'

- regex: 'bbc.co.uk/display/men/Page\+Monitor'
name: 'BBC Page Monitor'
category: 'Site Monitor'
url: 'https://confluence.dev.bbc.co.uk/display/men/Page+Monitor'
producer:
name: 'BBC'
url: 'https://www.bbc.com/'

- regex: 'BBC-Forge-URL-Monitor-Twisted'
name: 'BBC Forge URL Monitor'
category: 'Site Monitor'
url: 'https://www.bbc.com/'
producer:
name: 'BBC'
url: 'https://www.bbc.com/'

# Generic detections
- regex: '[a-z0-9\-_]*((?<!cu|power[ _]|m[ _])bot(?![ _]TAB|[ _]?5[0-9]|[ _]Senior|[ _]Junior)|crawler|crawl|checker|archiver|transcoder|spider|^firefox$|^chrome$)([^a-z]|$)'
name: 'Generic Bot'
4 changes: 4 additions & 0 deletions regexes/oss.yml
Original file line number Diff line number Diff line change
Expand Up @@ -871,6 +871,8 @@
- regex: '^(?!com.apple.Safari.SearchHelper|Safari).*CFNetwork/.+ Darwin/(\d+[\.\d]+)(?!.*(?:x86_64|i386|PowerMac|Power%20Macintosh))'
name: 'iOS'
versions:
- regex: 'Darwin/23.3.0'
version: '17.3'
- regex: 'Darwin/23.2.0'
version: '17.2'
- regex: 'Darwin/23.1.0'
Expand Down Expand Up @@ -1098,6 +1100,8 @@
- regex: '(?:CFNetwork|StudioDisplay)/.+Darwin(?:/|; )(?:[\d\.]+).+(?:x86_64|i386|Power%20Macintosh)|(?:x86_64-apple-)?darwin(?:[\d\.]+)|PowerMac|com.apple.Safari.SearchHelper|^Safari'
name: 'Mac'
versions:
- regex: '(?:x86_64-apple-)?Darwin(?:/|; )?23.3.0'
version: '14.3'
- regex: '(?:x86_64-apple-)?Darwin(?:/|; )?23.2.0'
version: '14.2'
- regex: '(?:x86_64-apple-)?Darwin(?:/|; )?23.1.0'
Expand Down

0 comments on commit 5b1e6d2

Please sign in to comment.