Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ingest-user-agent #22176

Closed
ThaDafinser opened this issue Dec 14, 2016 · 7 comments
Closed

Improve ingest-user-agent #22176

ThaDafinser opened this issue Dec 14, 2016 · 7 comments
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP discuss >enhancement

Comments

@ThaDafinser
Copy link

Since the ingest-user-agent plugin is based on https://github.com/ua-parser/uap-core the results are quiet good, but could be better.

I created a comparison of all available user-agent-parsers (and an abstraction for them) in PHP here: http://thadafinser.github.io/UserAgentParserComparison/v5/index.html

At least some of those providers could be "easily" ported over to java, since the datasource is seperated:

Any interest in this?

@clintongormley clintongormley added :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP discuss >enhancement labels Dec 16, 2016
@clintongormley
Copy link
Contributor

@talevy what do you think? It'd be worth aligning whatever we do in ES and Logstash

@talevy
Copy link
Contributor

talevy commented Dec 16, 2016

@clintongormley I'll discuss this with the Logstash team. Test coverage here is rather minimal. Although it would be awesome to have a parser that matches more strings, it would be unfortunate if whichever source we move to results in poorer results for existing matches.

@ThaDafinser
Copy link
Author

@talevy if the current results need to stay, it would require a kind of "chain provider" like it was written here: https://github.com/ThaDafinser/UserAgentParser#chain-provider

@talevy
Copy link
Contributor

talevy commented Dec 19, 2016

thanks, good to know.

regarding the two libraries your recommended, I believe piwik/device-detector is out of the question due to the LGPL license.

Most of the others (including Browsecap) look useable.

@ThaDafinser, Given your testing experience, if you had to choose one to port, which would you choose to chain first?

@ThaDafinser
Copy link
Author

ThaDafinser commented Dec 20, 2016

My favorites

  • WhichBrowser
  • PiwikDeviceDetector

Why? They both have a pretty solid data background

They both try to only give results, if they can really detect the browser. Others often return wrong positives with "catch em all" rules, so you get missleading informations.

Read more here: ThaDafinser/UserAgentParserComparison#14

which would you choose to chain first?

Easier is for sure PiwikDeviceDetector, because there you have a seperate datasource
https://github.com/piwik/device-detector/tree/master/regexes

Also there were some intentions to port it already:
matomo-org/device-detector#5509
matomo-org/device-detector#5336

@talevy
Copy link
Contributor

talevy commented Dec 20, 2016

cool, thanks @ThaDafinser, since piwik is out of the question due to licensing (if I am not mistaken), then maybe it would be worthwhile to explore WhichBrowser

@talevy
Copy link
Contributor

talevy commented Mar 15, 2018

Closing and will discuss the possibility of using WhichBrowser with the Logstash team

@talevy talevy closed this as completed Mar 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP discuss >enhancement
Projects
None yet
Development

No branches or pull requests

3 participants