Skip to content

Latest commit

 

History

History
47 lines (44 loc) · 3.36 KB

README.md

File metadata and controls

47 lines (44 loc) · 3.36 KB

Patent classification 


This project is aiming to implement the patent classification at the subclass level
according to IPC and CPC systems. The total number of classes is more than 600.

The pipeline for the project implementation is as below:

  1. Extract dataset
  2. EDA of the dataset
  3. Train a model 

For all of the above tasks, the respective jupyter notebook is shared.

With the Google big query, the dataset for the classification task is generated. The generated dataset is stored in the CSV file. For each year varying from the year, 2009 to 2019 separate CSV files are created. This dataset is made publically available for experiment purposes. The attribute of these CSV files are as shown in the table below:

ID Date Title Claim cpc_subclass
8844051 2014-09-23 Lithium-ion secondary battery A lithium-ion secondary battery comprising ... H01M,Y02E,Y02T

The link to download this dataset by year is provided below.

smile