Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not actually fast #56

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Not actually fast #56

wants to merge 1 commit into from

Conversation

pepijndevos
Copy link
Contributor

I use this library on my logfiles.
Half of the time is spent looking up IP addresses in a on-disk database, the other half is spent in httpagentparser.
The time spent parsing the log file is marginal.

This change is obviously meant as a joke, but I suggest you do some profiling. Or I might even do some myself.

I use this library on my logfiles.
Half of the time is spent looking up IP addresses in a on-disk database, the other half is spent in httpagentparser.
The time spent parsing the log file is marginal.

This change is obviously meant as a joke, but I suggest you do some profiling. Or I might even do some myself.
@pepijndevos
Copy link
Contributor Author

Extracted bits from a profile of a small sample run of my application.

Tue Aug  5 10:42:46 2014    prof

         36372505 function calls (34746356 primitive calls) in 25.747 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)

  6482905    5.321    0.000    7.425    0.000 __init__.py:72(checkWords)
    99737    2.871    0.000   13.681    0.000 __init__.py:598(detect)
  6582642    2.843    0.000   10.721    0.000 __init__.py:59(detect)
    36635    0.124    0.000    0.196    0.000 __init__.py:84(getVersion)
    31946    0.072    0.000    0.128    0.000 __init__.py:488(getVersion)
    99737    0.055    0.000    0.055    0.000 __init__.py:218(checkWords)
    99737    0.052    0.000    0.069    0.000 __init__.py:30(__iter__)
...
lots of getVersion

@pepijndevos
Copy link
Contributor Author

It seems to be a result of how the library works. It invokes all the detectors one by one to see if they match. This means speed decreases linearly as more browsers are added. So I actually made it twice as slow by contributing a ton of bots and mobile browsers.

So the only way to make a real difference is to detect less browsers, or majorly refactor.

It's imaginable to arrange browsers in a tree. For example, if a mobile OS is detected, all desktop detectors could be ignored. Or if Webkit is detected, all Gecko and Trident detectors could be ignored.

Another wild idea would be to flatten everything into a humongous regex/state machine. This requires more thought and design.

Related: https://github.com/clojure/core.match/wiki/Understanding-the-algorithm

@shon
Copy link
Owner

shon commented Aug 12, 2014

Will look into this once I am little free from my current work. Unsure how regex based solution will perform. Also if you have any other ideas/POC code please feel free.

@shon
Copy link
Owner

shon commented Oct 24, 2014

One quick soltion is moving not so popular agents to existing more.py and making them optional. Would be interesting to see if that makes hap faster?

This should really be a issue so more people can notice.

@pepijndevos
Copy link
Contributor Author

That would make it faster, at the cost of detecting less.
But yea, of you just want to detect the 5 major browsers on 3 major OSes, that's fine.

@lenisko
Copy link

lenisko commented Feb 21, 2018

I have to admit that, it's very slow...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants