Merge branch 'main' into main

recodehive · May 11, 2024 · 28033e0 · 28033e0
2 parents c73256f + 14c1d75
commit 28033e0
Show file tree

Hide file tree

Showing 3 changed files with 288 additions and 2 deletions.
diff --git a/.github/workflows/greetings.yml b/.github/workflows/greetings.yml
@@ -0,0 +1,15 @@
+name: Greetings
+
+on: [pull_request_target, issues]
+
+jobs:
+  greeting:
+    runs-on: ubuntu-latest
+    permissions:
+      issues: write
+      pull-requests: write
+    steps:
+    - uses: actions/first-interaction@v1
+      with:
+        repo-token: ${{ secrets.GITHUB_TOKEN }}
+        issue-message: "Thank you for raising a issue, Hope you enjoing the open source. we try to reply or assign as soon possibe. Connect with mentor."
diff --git a/README.md b/README.md
@@ -2,24 +2,34 @@
 
 <h1 align="center">IMDB Movie review Scraping</h1>
 <blockquote align="center">Scraping the movie review ✏️ using python programming language💻.  </blockquote>
+
 <p align="center">For new data generation <b>Semi-supervised-sequence-learning-Project</b> we have written a python script to fetch📊, data from the 💻, imdb website and converted into txt files. </p>
+<p align="center">This project aims to replicate the Semi-supervised-sequence-learning-Project on a new dataset generated through scraping IMDb movie reviews. The generated data will be utilized for further analysis and exploration. 
+ </p>
+
 
 
 
 # Introduction
 
-**`Semi-supervised-sequence-learning-Project`** :computer: replication process is done over here and for further analysis creation of new data is required.
+**`Semi-supervised-sequence-learning-Project`** :computer: The IMDb Movie Review Scraping project aims to gather a new dataset by automatically extracting movie reviews from IMDb. This dataset will support various natural language processing tasks, including sentiment analysis and recommendation systems. Using web scraping techniques, such as Beautiful Soup, movie reviews are collected, preprocessed, and structured into a CSV format suitable for analysis, including Support Vector Machine classification.
 
 - The following script includes the following.
+
 - `Movie_review_imdb_scrapping.ipynb` - Script to scrape the data from imdb website
 - `rename_files.ipynb` - Script to rename the scrapped text files as per the requirements
 - `convert_texts_to_csv.ipynb` - Python script to make a CSV file from the txt files for SVM processing
 
+- `Movie_review_imdb_scrapping.ipynb` - Script to scrape the data from IMDb website
+- `rename_files.ipynb` - Script to rename the scraped text files as per the requirements
+- `convert_texts_to_csv.ipynb` - Python script for converting the scraped text files into a CSV format suitable for SVM processing
+
+
 
 
 ## Dependencies
 
-install Beautifulsoup using `pip install beautifulsoup4`
+Ensure Beautifulsoup is installed using `pip install beautifulsoup4`
 
 ## Installation
 

diff --git a/School_Web Scrapping.ipynb b/School_Web Scrapping.ipynb
@@ -0,0 +1,261 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "00cd96ca",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "import requests\n",
+    "from bs4 import BeautifulSoup\n",
+    "\n",
+    "\n",
+    "new=pd.DataFrame()\n",
+    "\n",
+    "for j in range(1,6):\n",
+    "\n",
+    "    url=f\"https://school.careers360.com/schools/schools-in-india?page={j}\"\n",
+    "    headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.3; Win 64 ; x64) Apple WeKit /537.36(KHTML , like Gecko) Chrome/80.0.3987.162 Safari/537.36'}\n",
+    "    webpage=requests.get(url,headers).text\n",
+    "    \n",
+    "    soup=BeautifulSoup(webpage,'lxml')\n",
+    "    schools=soup.find_all('div', class_=\"schoolListing_card position-relative\")\n",
+    "    \n",
+    "    name=[]\n",
+    "    fee=[]\n",
+    "    System=[]\n",
+    "    Place=[]\n",
+    "    rating=[]\n",
+    "    types=[]\n",
+    "    for i in schools:\n",
+    "        name.append(i.find('h2',class_='school_Name').text.strip())\n",
+    "        fee.append(i.find('div',class_=\"schoolList_Info d-flex align-items-center gap-1 flex-wrap mb-1\").text.strip()[15:22])\n",
+    "        rating.append(i.find('div',class_='school-overview').text[19:24])\n",
+    "        System.append(i.find('span',class_='comma').text.strip())\n",
+    "        types.append((i.find_all('div', class_ = 'schoolList_Info d-flex align-items-center gap-1 flex-wrap mb-1')[1]).find_all('span')[0].text.strip())\n",
+    "    \n",
+    "    d={\"NAME\":name,\n",
+    "        \"FEE\":fee, \n",
+    "        \"System\":System,\n",
+    "        \"Place\":Place,\n",
+    "        \"Type\":types,\n",
+    "        \"Rating\":rating}\n",
+    "    df = pd.DataFrame.from_dict(d, orient='index')\n",
+    "    df = df.T\n",
+    "    df.head()\n",
+    "\n",
+    "    new=pd.concat([new,df],ignore_index=True)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "d52d5ee9",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>NAME</th>\n",
+       "      <th>FEE</th>\n",
+       "      <th>System</th>\n",
+       "      <th>Place</th>\n",
+       "      <th>Type</th>\n",
+       "      <th>Rating</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>Shri Ram Centennial School,  Jaganpura</td>\n",
+       "      <td>127,800</td>\n",
+       "      <td>CBSE</td>\n",
+       "      <td>None</td>\n",
+       "      <td>Sr. Secondary/Higher Secondary School</td>\n",
+       "      <td>AAAAA</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>Vivek High School,  Sector-38-B</td>\n",
+       "      <td></td>\n",
+       "      <td>CBSE</td>\n",
+       "      <td>None</td>\n",
+       "      <td>Sr. Secondary/Higher Secondary School</td>\n",
+       "      <td>AAAAA</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>DAV Senior Secondary School (Lahore),  Sector-8C</td>\n",
+       "      <td></td>\n",
+       "      <td>CBSE</td>\n",
+       "      <td>None</td>\n",
+       "      <td>Sr. Secondary/Higher Secondary School</td>\n",
+       "      <td>AAAAA</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>Government Model Senior Secondary School,  Sec...</td>\n",
+       "      <td>682 (CB</td>\n",
+       "      <td>CBSE</td>\n",
+       "      <td>None</td>\n",
+       "      <td>Sr. Secondary/Higher Secondary School</td>\n",
+       "      <td>AAAAA</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>Carmel Convent School,  Sector 9B</td>\n",
+       "      <td>30,090</td>\n",
+       "      <td>CBSE</td>\n",
+       "      <td>None</td>\n",
+       "      <td>Sr. Secondary/Higher Secondary School</td>\n",
+       "      <td>AAAAA</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>...</th>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>195</th>\n",
+       "      <td>Salwan Public School,  Mayur Vihar Phase-III</td>\n",
+       "      <td>88,241</td>\n",
+       "      <td>CBSE</td>\n",
+       "      <td>None</td>\n",
+       "      <td>Sr. Secondary/Higher Secondary School</td>\n",
+       "      <td>AAAA+</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>196</th>\n",
+       "      <td>N K Bagrodia Global School,  Sector-17, Dwarka</td>\n",
+       "      <td></td>\n",
+       "      <td>CBSE</td>\n",
+       "      <td>None</td>\n",
+       "      <td>Sr. Secondary/Higher Secondary School</td>\n",
+       "      <td>AAAA+</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>197</th>\n",
+       "      <td>Delhi Public School,  Gavier</td>\n",
+       "      <td>170,530</td>\n",
+       "      <td>CBSE</td>\n",
+       "      <td>None</td>\n",
+       "      <td>Sr. Secondary/Higher Secondary School</td>\n",
+       "      <td>AAAA+</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>198</th>\n",
+       "      <td>Delhi Public School,  kalali</td>\n",
+       "      <td>125,075</td>\n",
+       "      <td>CBSE</td>\n",
+       "      <td>None</td>\n",
+       "      <td>Sr. Secondary/Higher Secondary School</td>\n",
+       "      <td>AAAA+</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>199</th>\n",
+       "      <td>Atmiya Vidya Mandir,  Kamrej</td>\n",
+       "      <td></td>\n",
+       "      <td>CBSE</td>\n",
+       "      <td>None</td>\n",
+       "      <td>Sr. Secondary/Higher Secondary School</td>\n",
+       "      <td>AAAA+</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "<p>200 rows × 6 columns</p>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "                                                  NAME      FEE System Place  \\\n",
+       "0               Shri Ram Centennial School,  Jaganpura  127,800   CBSE  None   \n",
+       "1                      Vivek High School,  Sector-38-B            CBSE  None   \n",
+       "2     DAV Senior Secondary School (Lahore),  Sector-8C            CBSE  None   \n",
+       "3    Government Model Senior Secondary School,  Sec...  682 (CB   CBSE  None   \n",
+       "4                    Carmel Convent School,  Sector 9B  30,090    CBSE  None   \n",
+       "..                                                 ...      ...    ...   ...   \n",
+       "195       Salwan Public School,  Mayur Vihar Phase-III  88,241    CBSE  None   \n",
+       "196     N K Bagrodia Global School,  Sector-17, Dwarka            CBSE  None   \n",
+       "197                       Delhi Public School,  Gavier  170,530   CBSE  None   \n",
+       "198                       Delhi Public School,  kalali  125,075   CBSE  None   \n",
+       "199                       Atmiya Vidya Mandir,  Kamrej            CBSE  None   \n",
+       "\n",
+       "                                      Type Rating  \n",
+       "0    Sr. Secondary/Higher Secondary School  AAAAA  \n",
+       "1    Sr. Secondary/Higher Secondary School  AAAAA  \n",
+       "2    Sr. Secondary/Higher Secondary School  AAAAA  \n",
+       "3    Sr. Secondary/Higher Secondary School  AAAAA  \n",
+       "4    Sr. Secondary/Higher Secondary School  AAAAA  \n",
+       "..                                     ...    ...  \n",
+       "195  Sr. Secondary/Higher Secondary School  AAAA+  \n",
+       "196  Sr. Secondary/Higher Secondary School  AAAA+  \n",
+       "197  Sr. Secondary/Higher Secondary School  AAAA+  \n",
+       "198  Sr. Secondary/Higher Secondary School  AAAA+  \n",
+       "199  Sr. Secondary/Higher Secondary School  AAAA+  \n",
+       "\n",
+       "[200 rows x 6 columns]"
+      ]
+     },
+     "execution_count": 2,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "new"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8f9cf05c",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}