ScreenshotToR2 is a lambda NodeJS crawler designed to collect info and capture screenshots of an array of URLs and upload them to Cloudflare R2 in a few minutes.
- URL Accessibility Check: Test if the input URL is still accessible.
- SEO & AI Summary Ready: If the URL is accessible, fetch the h1 and h2 headers for future AI summaries or SEO reference, and detect affiliate links.
- Screenshot Capture & Storage: Take a screenshot of the webpage and upload it to Cloudflare R2.
- Free Tier Utilization: Maximize the use of AWS Lambda's free tier (even for new users) and Cloudflare R2's free storage for photos.
- Step Function Integration: If you have many URLs, you can use AWS Step Functions to integrate with this Lambda function for enhanced workflow management.
Service | Free Tier Details | Link |
---|---|---|
Cloudflare R2 | 10 GB storage, 1 million Class A operations, and 10 million Class B operations per month. Egress (data transfer to Internet) is free. | Cloudflare R2 Pricing |
AWS Lambda | 1 million free requests and 400,000 GB-seconds of compute time per month. | AWS Lambda Free Tier |
- Node.js (version 20.x or higher)
- Serverless Framework CLI -
npm install -g serverless
- AWS account with permissions to create Lambda functions (Sign up for AWS)
- Cloudflare account for R2 (Sign up for Cloudflare)
git clone https://github.com/yourusername/ScreenshotToR2.git
cd ScreenshotToR2
npm install
Create a .env
file based on .env_example
and fill in your credentials.
- Configure your AWS credentials
serverless config credentials --provider aws --key YOUR_ACCESS_KEY --secret YOUR_SECRET_KEY
Alternatively, if you have an AWS profile set up, you can specify the profile in your serverless.yml
provider:
name: aws
runtime: nodejs20.x
profile: your-aws-profile
region: us-east-1
- Deploy the service:
serverless deploy
Invoke the function with an array of URLs to capture screenshots and upload them to Cloudflare R2.
{
"urls": [
{ "url": "https://example.com", "name": "example" },
{ "url": "https://another-example.com", "name": "another-example" }
]
}
serverless invoke -f screencapturesToR2 -p data.json
I opted not to use TypeScript in this project due to compatibility issues with the @sparticuz/chromium package, which can lead to errors during implementation. For more details, refer to the discussion here.
The batch process logic(results.length >= BATCH_SIZE
) is optional while this helps in managing memory usage and ensures that the function handles large sets of URLs efficiently.