Skip to content

Academic papers scrapping with abstracts from Google Scholar and more...

License

Notifications You must be signed in to change notification settings

amirbabaei97/paper_scrapper

Repository files navigation

Scholarly Paper Scraper

Introduction

🔍 A simple, quick tool to search for keywords in various scholar search engines and retrieve relevant academic information, including titles, authors, and abstracts. The tool then ranks each source using the predefined scoring function which could be optimized by the user.

Features

Current Features

  • 🌐 Keyword search on Google Scholar.
  • 📑 Extraction of titles, authors, and abstracts.

Upcoming Milestones

  • 🛡️ Implement proxies to prevent blocking.
  • 💬 Implementing API for easier handling
  • 💬 Develop a custom ChatGPT interface for the scraper.
  • 📄 Implement a scoring function to rank the papers.

Getting Started

Prerequisites

  • Python 3.x
  • BeautifulSoup

Installation

# Instructions to install your tool
git clone https://github.com/amirbabaei97/paper_scrapper
cd paper_scrapper
pip install -r requirements.txt

Usage

🚀 How to use the tool:

# Example command or script
python scraper.py --keyword "machine learning"

Output format: Results are presented in a structured JSON format.

Roadmap

🚧 Future enhancements:

  • Integration with additional scientific paper search engines:
    • Google Scholar
    • Arxiv
    • Semantic Scholar
    • Open Review
    • Science.gov
    • core.ac.uk
    • Science Direct
    • PubMed
    • Scopus

Contributing

🤝 We welcome contributions!

  • Please do a fork and then send a PR with the explanations of the changes.
  • For major changes, please open an issue first to discuss what you would like to change.

License

📄 This project is licensed under the GNU General Public License.

Acknowledgments

  • Hat tip to ChatGPT for helping in the development process
  • Thank you to arXiv for use of its open access interoperability.
  • Thank you Semantic Scholar for providing a free API key for this project.

About

Academic papers scrapping with abstracts from Google Scholar and more...

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages