Find Duplicates

You can find duplicate records, with respect to subsets of columns using the duplicated method. For example, if you want to find records with the same name and phone number, but igoring all other fields, then you can do the following.

import pandas

df = pandas.read_csv('address-book.csv')
dupes = df[df.duplicated(subset=['name', 'phone_number'])]

Similarly, you can drop duplicates.

import pandas

df = pandas.read_csv('address-book.csv')
df = df.drop_duplicates(subset=['name', 'phone_number'])

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

find-duplicates.md

find-duplicates.md

Find Duplicates

Files

find-duplicates.md

Latest commit

History

find-duplicates.md

File metadata and controls

Find Duplicates