-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlocks on updating bot_db.botCount #54
Comments
Do you use realtime-visits or do you import the server-log? |
I process server logs from last 24 hours. I have about 38k entries in log file, and most of them are generated by bots (it is dead site with lots of content). |
Sorry, but that's the way, piwik works. I don't have a variable, where I can remember the last visit and don't know if the current hit is the last one so I must then do the update. The update should be no problem. The search before is more a thread because the correct bot is seached by a like. The performance is strongly affected by the number of bot-entries in yout table. |
You could use object property as counter and make query in
I strongly disagree. I have about 200 bots in my table - this is practically nothing, MySQL could load whole table in memory and search it pretty quick without involving hard drive (IO operations are a bottleneck in most cases). When you make an update/insert, MySQL have much more work to do (checking constraints, recalculating indexes, cache revalidation, etc.) and at the end results must be saved to a hard drive (which is slow). Inserting/updating data is almost always slower than simple selects. MySQL also has query cache, so even if you have 1k identical select queries, MySQL calculate result only once and 999 times use data from cache, which increase performance a lot. But MySQL revalidates cache every time when you update data in one of the tables that are used in cached query. So you practically killing this feature, because after each select (which creates cache) you make an update (which prunes this cache). So slow update queries are not the only problem in this case - you make select queries slower too. |
You are right, 200 rows is nothing for a modern database engine. But 38k updates should be no problem too, even on a distributed enviroment. If you can find a way to improve performance - feel free to add the code. Help is always welcome. |
I use 2 parallel processes. It gives me about 140 errors on processing log file that have only 6 MB. I can't really imagine how I could import logs from bigger sites, which have over 1 GB of logs per day. You may say that 38k update queries should be no problem for DBMS, but for me these 38k queries are unnecessary and they are creating real problem - this plugin just makes log processing slow and unstable. Unfortunately, I don't have enough time for testing solution that I proposed. I'm now just disabling parallel processing on one Piwik instance and completely disable plugin on another installation (I need parallel processes on this one). |
I normally work with much bigger DBMS (DB2 on zOS) an I would expect that when 2 threads want to update the same row, the first one gets the lock, makes the update, release the lock and then the second thread is on the run. It seems, MySQL isn't that smart. And it seems not a problem for many users. If there are others that encounter deadlocks or want to help to find a solution, please make a comment. I will left this issue open for discussions |
Piwik uses transactions on bulk processing, so lock is released after whole transaction is committed. When one process locks one row in |
FWIW, I am running into this issue doing log file processing now also. However it does confirm that the BotTracker plugin works for importing logs ! |
On every processing I have multiple errors like:
Any chance to some optimization in this matter? Right now when one bot visit my site 2k times, I have 2k update queries, but it actually could be only one query which increase
botCount
by sum of visits andbotLastVisit
by date of last visit.The text was updated successfully, but these errors were encountered: