Rewrite #55

Raxvis · 2019-02-10T02:00:17Z

This issue thread will be used to keep everyone apprised of the rewrite taking place.

Raxvis · 2019-02-10T02:03:49Z

I have ran into trouble with MySQL as the database isn't fast enough. On top of that, there are issues with the torrent scraper (to get meta data) that looks to be broken in that regard with the new rewrite.

I have been working on trying to find a method to overcome both of these and worked through a couple of iterations with no success. I am working on the third iteration that I hope to have more success with.

The new rewrite should fix a lot of the issues you guys are seeing with the tracker and scraper keeping up to speed with things

ghost · 2019-02-10T02:18:15Z

I agree mysql isn't that great for what were running here. Although some queries I was able to get down to milliseconds using indexes in mysql. bear in mind that's only with a database of 3 million. That would greatly increase as we reach the 20 + million. Could you tell us what the 2 iterations you have tried and what your 3rd is ? I'm interested to see if we can provide any ideas.

Kind regards

Raxvis · 2019-02-10T02:46:21Z

Both iterations were based on separating out the tracker, scraper, and torrent lookup (first one was with MySql and second one was with redis). This next iteration is going to isolate the individual actions but run them in a single process that has access to the DHT server and the DHT nodes (for scraping metadata)

ghost · 2019-02-10T09:44:38Z

What data store do you plan to use now , redis still ? have you looked at MongoDB ? i seen another dht scraper using it on github

Raxvis · 2019-02-11T07:00:19Z

Redis and ElasticSearch are the two that I will probably be using.

Redis for the peer / node information and ElasticSearch for the torrent information.

Raxvis · 2019-02-11T07:02:05Z

Just an update, I have completely rewritten the DHT Server portion and put it into it's own package here: https://github.com/AlphaReign/dht-server

This is a standalone DHT Server that will work as the backbone of our scraper, but will also allow us to query the DHT network for peer information so that we can download. With this being done, I can setup the initial code to just keep looking for peers and getting torrent announcements without having it tied directly into the scraper.

Raxvis · 2019-02-11T07:04:42Z

You can checkout this branch here: https://github.com/AlphaReign/scraper/tree/split-fix and run:

yarn
node ./src/index.js

to watch it find torrents.

ghost · 2019-02-11T07:42:26Z

Thanks will check this out today :)

milezzz · 2019-02-11T07:59:15Z

Awesome work!

ghost · 2019-02-11T10:14:24Z

[ node ./src/index.js
module.js:550
throw err;
^

Error: Cannot find module 'dht-server'
at Function.Module._resolveFilename (module.js:548:15)
at Function.Module._load (module.js:475:25)
at Module.require (module.js:597:17)
at require (internal/module.js:11:18)
at Object. (/root/newscraper/src/index.js:1:75)
at Module._compile (module.js:653:30)
at Object.Module._extensions..js (module.js:664:10)
at Module.load (module.js:566:32)
at tryModuleLoad (module.js:506:12)
at Function.Module._load (module.js:498:3)
](url)

@Prefinem

ghost · 2019-02-11T12:11:14Z

Never mind i installed dht-server and bencode

milezzz · 2019-02-11T14:52:17Z

seems to be working:

onGetPeersQuery - new torrent: 8eff86639946d68f2cea7485c59a3790794f78b9
onGetPeersQuery - new torrent: ef719bfbe716bd970afb4e269eab5ccb8fc1b3f2
total nodes 2000
onGetPeersQuery - new torrent: fc9b2d35164542b5704cef777b3b2560fe485cf9
onGetPeersQuery - new torrent: ad4f9ce5aa00943c01da3fd551250bd367729a7a
onGetPeersQuery - new torrent: 1224b03c763dafedae76d1a2dfb16a0396c90e72

jangrewe · 2019-09-15T18:12:53Z

If one were running the current scraper, is the dht-server a fully working replacement (feature wise, at least), or just a PoC for now?

Raxvis · 2019-09-15T21:24:53Z

Not currently. The end goal of this project is to have a working dht-server in it's own package. There are a few currently out there on NPM, but I have found most of them aren't suitable for a scraper, so I had planned on to taking the pieces I have right now and finishing up with a full fledged one.

The majority of the DHT server is here: https://github.com/AlphaReign/scraper/blob/master/src/crawler.js

What it mainly lacks is hooks for each method, and public methods for the external hooks. A good data backend is also required for performance. I had tested mongoDB but it couldn't perform under the load. Same with SQLite. My next stop will be Redis, or another in memory cache. This is actually a large reason the other dht-servers don't work. Most of them a) don't maintain enough nodes b) are slow in responses. This scraper works by being on the peer lists of thousands if not tens of thousands of nodes to get announcements from.

Ideally the project would be broken down into
a) dht-server able to support 100K + nodes
b) tracker (such as opentracker) that also helps maintain a list of torrents
c) api for torrent information / searching

That, or another idea I have had in mind is to setup AlphaReign nodes that are dht-servers, but support a second protocol to share torrent information between each of the AlphaReign nodes, so that everyone using AlphaReign scraper can help share the torrent information.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite #55

Rewrite #55

Raxvis commented Feb 10, 2019

Raxvis commented Feb 10, 2019

ghost commented Feb 10, 2019

Raxvis commented Feb 10, 2019

ghost commented Feb 10, 2019 •

edited by ghost

Loading

Raxvis commented Feb 11, 2019

Raxvis commented Feb 11, 2019

Raxvis commented Feb 11, 2019 •

edited

Loading

ghost commented Feb 11, 2019

milezzz commented Feb 11, 2019

ghost commented Feb 11, 2019 •

edited by ghost

Loading

ghost commented Feb 11, 2019

milezzz commented Feb 11, 2019 •

edited

Loading

jangrewe commented Sep 15, 2019

Raxvis commented Sep 15, 2019

Rewrite #55

Rewrite #55

Comments

Raxvis commented Feb 10, 2019

Raxvis commented Feb 10, 2019

ghost commented Feb 10, 2019

Raxvis commented Feb 10, 2019

ghost commented Feb 10, 2019 • edited by ghost Loading

Raxvis commented Feb 11, 2019

Raxvis commented Feb 11, 2019

Raxvis commented Feb 11, 2019 • edited Loading

ghost commented Feb 11, 2019

milezzz commented Feb 11, 2019

ghost commented Feb 11, 2019 • edited by ghost Loading

ghost commented Feb 11, 2019

milezzz commented Feb 11, 2019 • edited Loading

jangrewe commented Sep 15, 2019

Raxvis commented Sep 15, 2019

ghost commented Feb 10, 2019 •

edited by ghost

Loading

Raxvis commented Feb 11, 2019 •

edited

Loading

ghost commented Feb 11, 2019 •

edited by ghost

Loading

milezzz commented Feb 11, 2019 •

edited

Loading