This repository contains a Python script and Jupyter Notebook for scraping startup data from the National Incubation Center (NIC) Hyderabad, Sindh, Pakistan website. The data includes information about startups incubated in various cohorts at NIC Hyderabad.
The National Incubation Center (NIC) Hyderabad is a hub for fostering innovation and entrepreneurship in Sindh, Pakistan. This project aims to scrape and compile data on startups from the NIC Hyderabad website. The objective is to gather information on the startups, including their names, descriptions, and links to their detailed profiles, which can be used for further analysis and research.
The dataset generated from this project includes:
- Startup names
- NIC profile URLs
- Cohort numbers
- Additional details from their profile pages
The dataset can be accessed from GitHub: NIC Startups Data
To run the script and notebook in this repository, you'll need to have Python and the following libraries installed:
numpy
pandas
requests
beautifulsoup4
You can install the required libraries using pip:
pip install numpy pandas requests beautifulsoup4
-
Clone the repository:
git clone https://github.com/YOUR_USERNAME/YOUR_REPOSITORY.git cd YOUR_REPOSITORY
-
Run the Jupyter Notebook:
- Open the Jupyter Notebook
NIC_Hyderabad_Scraper.ipynb
using JupyterLab or Jupyter Notebook. - Execute the cells in the notebook to scrape data from the NIC Hyderabad website and save it to a CSV file.
- Open the Jupyter Notebook
-
Run the Python script:
- Alternatively, you can run the Python script
scrape_nic_data.py
directly:python scrape_nic_data.py
- Alternatively, you can run the Python script
-
Access the Data:
- Once the script or notebook has been executed, the scraped data will be saved as
NIC_startups_data.csv
in the project directory. - You can open this CSV file using any spreadsheet software or load it into a data analysis tool such as Pandas for further analysis.
- Once the script or notebook has been executed, the scraped data will be saved as
- Thanks to the National Incubation Center (NIC) Hyderabad for providing access to the startup information.
- This project uses the following libraries: NumPy, Pandas, Requests, and BeautifulSoup. --