DiamondScraper is a simple Python web scraper for BrilliantEarth.com. It scrapes data for both its natural and lab created diamond selection and writes it to a CSV file.
Buying a diamond can be frustrating and expensive.
I built DiamondScraper to create a dataset of natural and lab-created diamonds to demystify the value of the 4 Cs โ cut, color, clarity, carat.
- Firefox browser & geckodriver
- pip install
gazpacho=1.1
- conda install
pandas=1.1.3
- conda install
selenium=3.141.0
- Clone this repo
- Move to the
DiamondScraper
directory - Run
scraper.py
There is also a script processing.py
to cast categorical data types for a DataFrame.
Attribute | Description | Data Type |
---|---|---|
id | Diamond identification number provided by Brilliant Earth | int |
url | URL for the diamond details page | string |
shape | External geometric appearance of a diamond | string/categorical |
price | Price in U.S. dollars | int |
carat | Unit of measurement used to describe the weight of a diamond | float |
cut | Facets, symmetry, and reflective qualities of a diamond | string/categorical |
color | Natural color or lack of color visible within a diamond, based on the GIA grade scale | string/categorical |
clarity | Visibility of natural microscopic inclusions and imperfections within a diamond | string/categorical |
report | Diamond certificate or grading report provided by an independent gemology lab | string |
type | Natural or lab created diamonds | string |
date_fetched | Date the data was fetched | date |
Author: Miguel Corral Jr.
Email: [email protected]
LinkedIn: https://www.linkedin.com/in/iMiguel
GitHub: https://github.com/corralm
Distributed under the GNU General Public License v3.0. See LICENSE for more information.