Agentic Scraper 🕷️

Agentic Scraper is a web scraper designed to extract information from websites using AgentQL and Playwright, storing the results in CSV and JSON formats for easy data analysis.

Features

Field Selection: Allows users to select specific fields to scrape (e.g., product name, price, number of reviews, and rating).
Pagination Handling: Automatically scrapes multiple pages of results.
Customizable GraphQL Queries: Users can adjust queries based on desired data fields.
Streamlit UI: User-friendly interface for configuring scraping settings.
Data Download Options: Export scraped data to CSV and JSON formats.

Prerequisites

Before you begin, ensure you have the following:

Python 3.9+
Playwright for browser automation
AgentQL account and API key from the AgentQL Dashboard

Installation

Clone this repository:

https://github.com/Hassn11q/Agentic-Scraper.git
cd Agentic-Scraper

Install the required Python packages:
```
pip install -r requirements.txt
```
Install Playwright browsers:
```
playwright install
```
Get your AgentQL API key from AgentQL Dashboard
Create a .env file in the project root and add your AgentQL API key:
```
AGENTQL_API_KEY=your_api_key_here
```

Usage

Start the Streamlit App:
```
streamlit run app.py
```
Configure Scraper Settings:

Enter the target URL.
Use the Field Selection toggle in the sidebar to add desired fields.
Set pagination options to specify the number of pages to scrape.

Run the Scraper:

Click the Scrape button in the sidebar to initiate scraping.
The scraping progress and data extraction details are displayed in the main area.

Download the Results: Once the scraping is complete, download the data as a CSV or JSON file.

Configuration

You can customize the following variables in app.py to adjust the scraper's behavior:

url_input: Target URL for the e-commerce website.
items_query: GraphQL query to fetch specific product data fields.
pagination_query: GraphQL query for pagination.

Example Queries

Product Query: Customize the fields in items_query based on the available product attributes.
Pagination Query: Use pagination_query to check pagination status and handle multi-page scraping.

Acknowledgments

AgentQL for providing the querying capabilities
Playwright for browser automation Agentic-Scraper

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agentic Scraper 🕷️

Features

Prerequisites

Installation

Usage

Configuration

Example Queries

Acknowledgments

About

Releases

Packages

Languages

Hassn11q/Agentic-Scraper-App

Folders and files

Latest commit

History

Repository files navigation

Agentic Scraper 🕷️

Features

Prerequisites

Installation

Usage

Configuration

Example Queries

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages