Skip to content

An example integration of Scrapy with TOR proxies and Playwright

License

Notifications You must be signed in to change notification settings

dmg0345/scrapy-tor-playwright-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scrapy, TOR and Playwright demo

This repository shows an example integration of the following tools:

The target used in this demo are the following authorized playgrounds:

The following websites are also used to get more information about the proxies and the web browser:

Refer to the documentation here for more details.

Development

Clone the repository as:

git clone https://github.com/dmg0345/scrapy_tor_playwright_demo

Ensure the Github file with the relevant environment variables exist as expected in the compose.yaml file and the correct paths are set in the manage.ps1 file for your environment. Afterwards, find the base Docker image for the development container at DockerHub.

To develop using devcontainers and Visual Studio Code:

docker pull dmg00345/scrapy_tor_playwright_demo:latest
docker pull pickapp/tor-proxy:latest
./manage.ps1 run

Create a release

To generate a release follow the steps below:

  1. Create a release branch from develop branch, e.g. release/X.Y.Z.
  2. Update version in conf.py file and in pyproject.toml file.
  3. Create pull request from release branch to master with the changes with title Release X.Y.Z.
  4. When merged in master create release and tag from Github, review production workflow passes for deployment.
  5. Delete the release/X.Y.Z branch.