-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite #55
Comments
I have ran into trouble with MySQL as the database isn't fast enough. On top of that, there are issues with the torrent scraper (to get meta data) that looks to be broken in that regard with the new rewrite. I have been working on trying to find a method to overcome both of these and worked through a couple of iterations with no success. I am working on the third iteration that I hope to have more success with. The new rewrite should fix a lot of the issues you guys are seeing with the tracker and scraper keeping up to speed with things |
I agree mysql isn't that great for what were running here. Although some queries I was able to get down to milliseconds using indexes in mysql. bear in mind that's only with a database of 3 million. That would greatly increase as we reach the 20 + million. Could you tell us what the 2 iterations you have tried and what your 3rd is ? I'm interested to see if we can provide any ideas. Kind regards |
Both iterations were based on separating out the tracker, scraper, and torrent lookup (first one was with MySql and second one was with redis). This next iteration is going to isolate the individual actions but run them in a single process that has access to the DHT server and the DHT nodes (for scraping metadata) |
What data store do you plan to use now , redis still ? have you looked at MongoDB ? i seen another dht scraper using it on github |
Redis and ElasticSearch are the two that I will probably be using. Redis for the peer / node information and ElasticSearch for the torrent information. |
Just an update, I have completely rewritten the DHT Server portion and put it into it's own package here: https://github.com/AlphaReign/dht-server This is a standalone DHT Server that will work as the backbone of our scraper, but will also allow us to query the DHT network for peer information so that we can download. With this being done, I can setup the initial code to just keep looking for peers and getting torrent announcements without having it tied directly into the scraper. |
You can checkout this branch here: https://github.com/AlphaReign/scraper/tree/split-fix and run:
to watch it find torrents. |
Thanks will check this out today :) |
Awesome work! |
[ node ./src/index.js Error: Cannot find module 'dht-server' @Prefinem |
Never mind i installed dht-server and bencode |
seems to be working:
|
If one were running the current |
Not currently. The end goal of this project is to have a working dht-server in it's own package. There are a few currently out there on NPM, but I have found most of them aren't suitable for a scraper, so I had planned on to taking the pieces I have right now and finishing up with a full fledged one. The majority of the DHT server is here: https://github.com/AlphaReign/scraper/blob/master/src/crawler.js What it mainly lacks is hooks for each method, and public methods for the external hooks. A good data backend is also required for performance. I had tested mongoDB but it couldn't perform under the load. Same with SQLite. My next stop will be Redis, or another in memory cache. This is actually a large reason the other dht-servers don't work. Most of them a) don't maintain enough nodes b) are slow in responses. This scraper works by being on the peer lists of thousands if not tens of thousands of nodes to get announcements from. Ideally the project would be broken down into That, or another idea I have had in mind is to setup AlphaReign nodes that are dht-servers, but support a second protocol to share torrent information between each of the AlphaReign nodes, so that everyone using AlphaReign scraper can help share the torrent information. |
This issue thread will be used to keep everyone apprised of the rewrite taking place.
The text was updated successfully, but these errors were encountered: