Skip to content

Latest commit

 

History

History
22 lines (16 loc) · 550 Bytes

find-duplicates.md

File metadata and controls

22 lines (16 loc) · 550 Bytes

Find Duplicates

You can find duplicate records, with respect to subsets of columns using the duplicated method. For example, if you want to find records with the same name and phone number, but igoring all other fields, then you can do the following.

import pandas

df = pandas.read_csv('address-book.csv')
dupes = df[df.duplicated(subset=['name', 'phone_number'])]

Similarly, you can drop duplicates.

import pandas

df = pandas.read_csv('address-book.csv')
df = df.drop_duplicates(subset=['name', 'phone_number'])