-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not actually fast #56
base: master
Are you sure you want to change the base?
Conversation
I use this library on my logfiles. Half of the time is spent looking up IP addresses in a on-disk database, the other half is spent in httpagentparser. The time spent parsing the log file is marginal. This change is obviously meant as a joke, but I suggest you do some profiling. Or I might even do some myself.
Extracted bits from a profile of a small sample run of my application.
|
It seems to be a result of how the library works. It invokes all the detectors one by one to see if they match. This means speed decreases linearly as more browsers are added. So I actually made it twice as slow by contributing a ton of bots and mobile browsers. So the only way to make a real difference is to detect less browsers, or majorly refactor. It's imaginable to arrange browsers in a tree. For example, if a mobile OS is detected, all desktop detectors could be ignored. Or if Webkit is detected, all Gecko and Trident detectors could be ignored. Another wild idea would be to flatten everything into a humongous regex/state machine. This requires more thought and design. Related: https://github.com/clojure/core.match/wiki/Understanding-the-algorithm |
Will look into this once I am little free from my current work. Unsure how regex based solution will perform. Also if you have any other ideas/POC code please feel free. |
One quick soltion is moving not so popular agents to existing more.py and making them optional. Would be interesting to see if that makes hap faster? This should really be a issue so more people can notice. |
That would make it faster, at the cost of detecting less. |
I have to admit that, it's very slow... |
I use this library on my logfiles.
Half of the time is spent looking up IP addresses in a on-disk database, the other half is spent in httpagentparser.
The time spent parsing the log file is marginal.
This change is obviously meant as a joke, but I suggest you do some profiling. Or I might even do some myself.