Skip to content

An automated web crawler to fetch articles across various websites

Notifications You must be signed in to change notification settings

samarthbc/python_web_crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Python Web Crawler

Python Web Crawler is a lightweight and efficient tool designed to fetch articles from various websites. This project utilizes popular Python libraries to scrape and process web content with ease.

Features

  • Fetch articles from multiple websites.
  • Parse and extract data efficiently.
  • Customizable to suit specific website structures and requirements.

Technologies Used

  • Beautiful Soup: For parsing and navigating the HTML structure of websites.
  • Requests: For making HTTP requests to fetch web pages.

Getting Started

Prerequisites

  • Python 3.8 or later installed on your system.

Installation

  1. Clone the repository:
    git clone https://github.com/samarthbc/python_web_crawler.git
  2. Navigate to the project directory:
    cd python_web_crawler
  3. Install required libraries:
    pip install -r requirements.txt

Usage

  1. Run the crawler script:
    python crawler.py
  2. Configure the target websites and parsing logic in crawler.py to match your requirements.
  3. Extracted data will be stored or displayed based on the script configuration.

Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository.
  2. Create a new branch:
    git checkout -b feature/your-feature-name
  3. Commit your changes:
    git commit -m "Add your message here"
  4. Push the branch:
    git push origin feature/your-feature-name
  5. Open a Pull Request.

License

This project is licensed under the MIT License.

Contact

For questions or support, please contact [email protected].

About

An automated web crawler to fetch articles across various websites

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages