Scrape current PCGS coin prices from https://www.pcgs.com/prices and save them to a lookup table for easy price lookup or other manipulation
This repo is not sponsored or endorsed by PCGS. Logo for stylistic purposes, following PCGS Brand Guidelines
If you use this repo, send me an email! I'd love to hear how you used it and what I can improve
- Clone the repository to the directory of your choice with
$ git clone https://github.com/ryanamannion/pcgs_scraper.git
$ cd pcgs_scraper
- If you are using a venv or other environment, activate it
$ pip install .
- Navigate to the pcgs_scraper subdirectory
$ python scraper.py
$ python pcgs_query.py -q '1909-S VDB Cent'
- Download the latest release from
Releases
on GitHub - Activate your environment
$ pip install pcgs_scraper-X.Y.Z.tar.gz
- Use pcgs_scraper functions in your own scripts
- If you need to access the downloaded files:
- find directory with
$ pip list -v | grep pcgs
- copy directory path and
cd
to it
- find directory with
scraper.py
is the main file in this library, and handles the dispatching of the scraping scripts. Additionally, this file handles postprocessing of those scraping scripts.
Running it from the command line calls the cli() function. The interface prompts the user to download the necessary files if they have not been scraped already. It will download both the PCGS#-->Description information from www.pcgs.com/pcgsnolookup as well as the PCGS#-->Price information from www.pcgs.com/prices.
scraper.py
then postprocesses that data and combines them to create a data-rich free table
(list of dictionaries) where each list item represents a coin. The free table has details about each coin, including the
PCGS Number, Year, Denomination, Mint Mark (if applicable), Detail information (e.g. Full Bands or other details
relevant to that particular coin), Price data at time of scraping, and metadata for the purposes of debugging (e.g. the URL it was
scraped from, etc.).
Please note, during this step entries from the price data which do not have a description from the number data are excluded. These are mostly type coins, as well as type sets and other subsets of coins which can be given a valuation.
The final free table is saved to data/pcgs_price_guide.{pkl, json}
, based on what the user selects in the CLI
The first step in creating the price guide is to scrape the prices from www.pcgs.com/prices.
The PCGS coin prices website is a labyrinth of html. This script first navigates to https://www.pcgs.com/prices, where it navigates through each category (e.g. Type Coins, Half-Cents and Cents, etc.) and saves urls for each subcategory. The script then follows each subcategory URL and scrapes all price information from the table, including the PCGS# as well as the prices for each grade. The website divides the grades into different pages, or bins of grades: 1-20, 25-60, and 61-70. That means that for each subcategory there are three pages to scrape data from. (Note: I skip the "Most Active" page because it is redundant).
In order to ensure that a rogue error at a later step won't cause the user to lose all the data from scraping, which can
take some time, the data is saved to a pickle file at the end of the preliminary scraping function, and before the
processing step that combines the data from the three bins into one lookup table.
This file is saved in the pcgs_scraper directory using the date and time upon
completion to name the file data/pcgs_prices-DD-MM-YYY-HH:MM:SS.pkl
. This file serves as the input to the
processing function, which merges rows with the same PCGS# to creates the lookup table. It can be used any time with the
-p
command line option to be reprocessed should you need historical price data
The processing function saves the price data both as a pickle file and as a json file (because why not). These files are saved to the same directory and named data/scraped_pcgs_prices.{json, pkl}
The resulting data structure is a dictionary. The keys are the PCGS Numbers, and the values are data extracted from the tables.
- pcgs_num: (str) PCGS Number
- desig: (str) Designation (More info: https://www.pcgs.com/news/a-look-at-pcgs-designations)
- prices: (dict) Price by grade, each grade points to a tuple (n=2) of prices (each a str). The top price in the table is index 0, the bottom is index 1. See Known Issues #1
- merged_from: the three entries for each grade bin used to create this merged entry (see paragraph one of this section), retained for debugging purposes
merged_entry = {
'pcgs_num': pcgs_num, # PCGS Number
'desig': merged_desig, # Designation (BN, RB, RD)
'prices': price_by_grade, # Price dictionary {Grade: [Price, Price+]}
'merged_from': entries, # History for merge, for debugging
}
price_guide[pcgs_num] = merged_entry
The second step in creating the price guide is to scrape the mappings of PCGS Numbers to detailed descriptions, which
adds high quality information about a coin's year, mint mark, denomination, and other details. This script uses the same
function from pcgs_prices.py
to scrape URLs from the main page of the PCGS# lookup page by category and subcategory.
Each subcategory URL is then followed and the number:description pair is scraped and stored in a free table and saved to
number_data.pkl
.
One a user has compiled the pcgs_price_guide.pkl
binary, it can be queried from the command line with:
$ python pcgs_query.py -q '1909-S VDB Wheat Cent'
or similar inputs. The query function uses some regex to determine the necessary elements of the query: the year and denomination. It can then also find the mint mark (specified with -M where M stands for mint) to help narrow down the search. Once the search algorithm has a target year and denomination (and possibly mint), it will rank the results by Levenshtein Edit Distance from the user-generated input string. The user can then simply choose from the results list (if there are more than one option) and the price data will be displayed.
Here are some general query guidelines:
- Always specify a year
- Always specify a denomination
- Can be of the form 1C, 3CS, 3 cent silver, $1, Dollar, half dollar, $2.50 etc.
- If you want to specify a mint mark, do so with a hyphen following the year, e.g.
-q '1909-S VDB Cent'
- To show help dialogue:
$ python pcgs_prices.py --help
- To scrape all prices and clean up the data:
$ python pcgs_prices.py --all
- To just scrape data, create new unprocessed data binary:
$ python pcgs_prices.py --scrape_only
- This will save a file called
pcgs_prices-DD-MM-YYY-HH:MM:SS.pkl
with the current date and time
- This will save a file called
- To just turn unprocessed binary into a lookup table:
$ python pcgs_prices.py --process path/to/pcgs_prices-DD-MM-YYY-HH:MM:SS.pkl
- This saves two files:
pcgs_price_guide.{json, pkl}
, both are of the same object
- This saves two files:
pcgs_nums.py
has no CLI options. Running $ python pcgs_nums.py
will download the number data and save it to
number_data.pkl
- Specify a query with
-q
- Specify a source price_guide binary with
-p
You can find current issues and enhancement ideas in the Issues
tab of GitHub