Skip to content

FilipDrabant/data-cleaning-python-pandas-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

data-cleaning-python-pandas-project

A project using Python Pandas library to clean-up a small dataset. Created to practice data cleanup with Python/Pandas with Jupyter Notebook IDE.

Generated by exporting Jupyter file as Markdown. Recommend to use Jupyter Notebook to view uploaded file.

##Goals

  1. Cleanup a small customer dataset from duplicates/normalize data.
  2. Provide the Call Centre a table only with customers that they can call.

Code

1. Import pandas and read excel file into dataframe df, print table

import pandas as pd
df = pd.read_excel(r"C:\Users\Filip\Desktop\JupyterProjects\datasets\Input.xlsx")
df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
CustomerID First_Name Last_Name Phone_Number Address Paying Customer Do_Not_Contact Not_Useful_Column
0 1001 Frodo Baggins 123-545-5421 123 Shire Lane, Shire Yes No True
1 1002 Abed Nadir 123/643/9775 93 West Main Street No Yes False
2 1003 Walter /White 7066950392 298 Drugs Driveway N NaN True
3 1004 Dwight Schrute 123-543-2345 980 Paper Avenue, Pennsylvania, 18503 Yes Y True
4 1005 Jon Snow 876|678|3469 123 Dragons Road Y No True
5 1006 Ron Swanson 304-762-2467 768 City Parkway Yes Yes True
6 1007 Jeff Winger NaN 1209 South Street No No False
7 1008 Sherlock Holmes 876|678|3469 98 Clue Drive N No False
8 1009 Gandalf NaN N/a 123 Middle Earth Yes NaN False
9 1010 Peter Parker 123-545-5421 25th Main Street, New York Yes No True
10 1011 Samwise Gamgee NaN 612 Shire Lane, Shire Yes No True
11 1012 Harry ...Potter 7066950392 2394 Hogwarts Avenue Y NaN True
12 1013 Don Draper 123-543-2345 2039 Main Street Yes N False
13 1014 Leslie Knope 876|678|3469 343 City Parkway Yes No False
14 1015 Toby Flenderson_ 304-762-2467 214 HR Avenue N No False
15 1016 Ron Weasley 123-545-5421 2395 Hogwarts Avenue No N False
16 1017 Michael Scott 123/643/9775 121 Paper Avenue, Pennsylvania Yes No False
17 1018 Clark Kent 7066950392 3498 Super Lane Y NaN True
18 1019 Creed Braton N/a N/a N/a Yes True
19 1020 Anakin Skywalker 876|678|3469 910 Tatooine Road, Tatooine Yes N True
20 1020 Anakin Skywalker 876|678|3469 910 Tatooine Road, Tatooine Yes N True

2. Clean dataframe from Duplicates

df = df.drop_duplicates()
df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
CustomerID First_Name Last_Name Phone_Number Address Paying Customer Do_Not_Contact Not_Useful_Column
0 1001 Frodo Baggins 123-545-5421 123 Shire Lane, Shire Yes No True
1 1002 Abed Nadir 123/643/9775 93 West Main Street No Yes False
2 1003 Walter /White 7066950392 298 Drugs Driveway N NaN True
3 1004 Dwight Schrute 123-543-2345 980 Paper Avenue, Pennsylvania, 18503 Yes Y True
4 1005 Jon Snow 876|678|3469 123 Dragons Road Y No True
5 1006 Ron Swanson 304-762-2467 768 City Parkway Yes Yes True
6 1007 Jeff Winger NaN 1209 South Street No No False
7 1008 Sherlock Holmes 876|678|3469 98 Clue Drive N No False
8 1009 Gandalf NaN N/a 123 Middle Earth Yes NaN False
9 1010 Peter Parker 123-545-5421 25th Main Street, New York Yes No True
10 1011 Samwise Gamgee NaN 612 Shire Lane, Shire Yes No True
11 1012 Harry ...Potter 7066950392 2394 Hogwarts Avenue Y NaN True
12 1013 Don Draper 123-543-2345 2039 Main Street Yes N False
13 1014 Leslie Knope 876|678|3469 343 City Parkway Yes No False
14 1015 Toby Flenderson_ 304-762-2467 214 HR Avenue N No False
15 1016 Ron Weasley 123-545-5421 2395 Hogwarts Avenue No N False
16 1017 Michael Scott 123/643/9775 121 Paper Avenue, Pennsylvania Yes No False
17 1018 Clark Kent 7066950392 3498 Super Lane Y NaN True
18 1019 Creed Braton N/a N/a N/a Yes True
19 1020 Anakin Skywalker 876|678|3469 910 Tatooine Road, Tatooine Yes N True

3. Remove useless columns from dataframe

df = df.drop(columns = "Not_Useful_Column")
df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
CustomerID First_Name Last_Name Phone_Number Address Paying Customer Do_Not_Contact
0 1001 Frodo Baggins 123-545-5421 123 Shire Lane, Shire Yes No
1 1002 Abed Nadir 123/643/9775 93 West Main Street No Yes
2 1003 Walter /White 7066950392 298 Drugs Driveway N NaN
3 1004 Dwight Schrute 123-543-2345 980 Paper Avenue, Pennsylvania, 18503 Yes Y
4 1005 Jon Snow 876|678|3469 123 Dragons Road Y No
5 1006 Ron Swanson 304-762-2467 768 City Parkway Yes Yes
6 1007 Jeff Winger NaN 1209 South Street No No
7 1008 Sherlock Holmes 876|678|3469 98 Clue Drive N No
8 1009 Gandalf NaN N/a 123 Middle Earth Yes NaN
9 1010 Peter Parker 123-545-5421 25th Main Street, New York Yes No
10 1011 Samwise Gamgee NaN 612 Shire Lane, Shire Yes No
11 1012 Harry ...Potter 7066950392 2394 Hogwarts Avenue Y NaN
12 1013 Don Draper 123-543-2345 2039 Main Street Yes N
13 1014 Leslie Knope 876|678|3469 343 City Parkway Yes No
14 1015 Toby Flenderson_ 304-762-2467 214 HR Avenue N No
15 1016 Ron Weasley 123-545-5421 2395 Hogwarts Avenue No N
16 1017 Michael Scott 123/643/9775 121 Paper Avenue, Pennsylvania Yes No
17 1018 Clark Kent 7066950392 3498 Super Lane Y NaN
18 1019 Creed Braton N/a N/a N/a Yes
19 1020 Anakin Skywalker 876|678|3469 910 Tatooine Road, Tatooine Yes N

3. Remove unwanted characters from the sides of surnames

df["Last_Name"] = df["Last_Name"].str.strip("./_")
df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
CustomerID First_Name Last_Name Phone_Number Address Paying Customer Do_Not_Contact
0 1001 Frodo Baggins 123-545-5421 123 Shire Lane, Shire Yes No
1 1002 Abed Nadir 123/643/9775 93 West Main Street No Yes
2 1003 Walter White 7066950392 298 Drugs Driveway N NaN
3 1004 Dwight Schrute 123-543-2345 980 Paper Avenue, Pennsylvania, 18503 Yes Y
4 1005 Jon Snow 876|678|3469 123 Dragons Road Y No
5 1006 Ron Swanson 304-762-2467 768 City Parkway Yes Yes
6 1007 Jeff Winger NaN 1209 South Street No No
7 1008 Sherlock Holmes 876|678|3469 98 Clue Drive N No
8 1009 Gandalf NaN N/a 123 Middle Earth Yes NaN
9 1010 Peter Parker 123-545-5421 25th Main Street, New York Yes No
10 1011 Samwise Gamgee NaN 612 Shire Lane, Shire Yes No
11 1012 Harry Potter 7066950392 2394 Hogwarts Avenue Y NaN
12 1013 Don Draper 123-543-2345 2039 Main Street Yes N
13 1014 Leslie Knope 876|678|3469 343 City Parkway Yes No
14 1015 Toby Flenderson 304-762-2467 214 HR Avenue N No
15 1016 Ron Weasley 123-545-5421 2395 Hogwarts Avenue No N
16 1017 Michael Scott 123/643/9775 121 Paper Avenue, Pennsylvania Yes No
17 1018 Clark Kent 7066950392 3498 Super Lane Y NaN
18 1019 Creed Braton N/a N/a N/a Yes
19 1020 Anakin Skywalker 876|678|3469 910 Tatooine Road, Tatooine Yes N

4. Remove everything but letters and numbers from phone numbers

df["Phone_Number"] = df["Phone_Number"].str.replace("[^a-zA-Z0-9]", "", regex=True)
df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
CustomerID First_Name Last_Name Phone_Number Address Paying Customer Do_Not_Contact
0 1001 Frodo Baggins 1235455421 123 Shire Lane, Shire Yes No
1 1002 Abed Nadir 1236439775 93 West Main Street No Yes
2 1003 Walter White NaN 298 Drugs Driveway N NaN
3 1004 Dwight Schrute 1235432345 980 Paper Avenue, Pennsylvania, 18503 Yes Y
4 1005 Jon Snow 8766783469 123 Dragons Road Y No
5 1006 Ron Swanson 3047622467 768 City Parkway Yes Yes
6 1007 Jeff Winger NaN 1209 South Street No No
7 1008 Sherlock Holmes 8766783469 98 Clue Drive N No
8 1009 Gandalf NaN Na 123 Middle Earth Yes NaN
9 1010 Peter Parker 1235455421 25th Main Street, New York Yes No
10 1011 Samwise Gamgee NaN 612 Shire Lane, Shire Yes No
11 1012 Harry Potter NaN 2394 Hogwarts Avenue Y NaN
12 1013 Don Draper 1235432345 2039 Main Street Yes N
13 1014 Leslie Knope 8766783469 343 City Parkway Yes No
14 1015 Toby Flenderson 3047622467 214 HR Avenue N No
15 1016 Ron Weasley 1235455421 2395 Hogwarts Avenue No N
16 1017 Michael Scott 1236439775 121 Paper Avenue, Pennsylvania Yes No
17 1018 Clark Kent NaN 3498 Super Lane Y NaN
18 1019 Creed Braton Na N/a N/a Yes
19 1020 Anakin Skywalker 8766783469 910 Tatooine Road, Tatooine Yes N

5. Add dashes to the Phone number

#df["Phone_Number"] = df["Phone_Number"].apply(lambda x : x[0:3] + "-" + x[3:6] + x[6:10]) Doesnt work because phone numbers need to be strings first
df["Phone_Number"] = df["Phone_Number"].apply(lambda x : str(x))
df["Phone_Number"] = df["Phone_Number"].apply(lambda x : x[0:3] + "-" + x[3:6] + "-" + x[6:10])
df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
CustomerID First_Name Last_Name Phone_Number Address Paying Customer Do_Not_Contact
0 1001 Frodo Baggins 123-545-5421 123 Shire Lane, Shire Yes No
1 1002 Abed Nadir 123-643-9775 93 West Main Street No Yes
2 1003 Walter White nan-- 298 Drugs Driveway N NaN
3 1004 Dwight Schrute 123-543-2345 980 Paper Avenue, Pennsylvania, 18503 Yes Y
4 1005 Jon Snow 876-678-3469 123 Dragons Road Y No
5 1006 Ron Swanson 304-762-2467 768 City Parkway Yes Yes
6 1007 Jeff Winger nan-- 1209 South Street No No
7 1008 Sherlock Holmes 876-678-3469 98 Clue Drive N No
8 1009 Gandalf NaN Na-- 123 Middle Earth Yes NaN
9 1010 Peter Parker 123-545-5421 25th Main Street, New York Yes No
10 1011 Samwise Gamgee nan-- 612 Shire Lane, Shire Yes No
11 1012 Harry Potter nan-- 2394 Hogwarts Avenue Y NaN
12 1013 Don Draper 123-543-2345 2039 Main Street Yes N
13 1014 Leslie Knope 876-678-3469 343 City Parkway Yes No
14 1015 Toby Flenderson 304-762-2467 214 HR Avenue N No
15 1016 Ron Weasley 123-545-5421 2395 Hogwarts Avenue No N
16 1017 Michael Scott 123-643-9775 121 Paper Avenue, Pennsylvania Yes No
17 1018 Clark Kent nan-- 3498 Super Lane Y NaN
18 1019 Creed Braton Na-- N/a N/a Yes
19 1020 Anakin Skywalker 876-678-3469 910 Tatooine Road, Tatooine Yes N

6. Split street_name column into three seperate columns by comma

df[['Street_Name', 'State', 'Zip_Code']] = df['Address'].str.split(',',n=2,expand=True)
df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
CustomerID First_Name Last_Name Phone_Number Address Paying Customer Do_Not_Contact Street_Name State Zip_Code
0 1001 Frodo Baggins 123-545-5421 123 Shire Lane, Shire Yes No 123 Shire Lane Shire None
1 1002 Abed Nadir 123-643-9775 93 West Main Street No Yes 93 West Main Street None None
2 1003 Walter White nan-- 298 Drugs Driveway N NaN 298 Drugs Driveway None None
3 1004 Dwight Schrute 123-543-2345 980 Paper Avenue, Pennsylvania, 18503 Yes Y 980 Paper Avenue Pennsylvania 18503
4 1005 Jon Snow 876-678-3469 123 Dragons Road Y No 123 Dragons Road None None
5 1006 Ron Swanson 304-762-2467 768 City Parkway Yes Yes 768 City Parkway None None
6 1007 Jeff Winger nan-- 1209 South Street No No 1209 South Street None None
7 1008 Sherlock Holmes 876-678-3469 98 Clue Drive N No 98 Clue Drive None None
8 1009 Gandalf NaN Na-- 123 Middle Earth Yes NaN 123 Middle Earth None None
9 1010 Peter Parker 123-545-5421 25th Main Street, New York Yes No 25th Main Street New York None
10 1011 Samwise Gamgee nan-- 612 Shire Lane, Shire Yes No 612 Shire Lane Shire None
11 1012 Harry Potter nan-- 2394 Hogwarts Avenue Y NaN 2394 Hogwarts Avenue None None
12 1013 Don Draper 123-543-2345 2039 Main Street Yes N 2039 Main Street None None
13 1014 Leslie Knope 876-678-3469 343 City Parkway Yes No 343 City Parkway None None
14 1015 Toby Flenderson 304-762-2467 214 HR Avenue N No 214 HR Avenue None None
15 1016 Ron Weasley 123-545-5421 2395 Hogwarts Avenue No N 2395 Hogwarts Avenue None None
16 1017 Michael Scott 123-643-9775 121 Paper Avenue, Pennsylvania Yes No 121 Paper Avenue Pennsylvania None
17 1018 Clark Kent nan-- 3498 Super Lane Y NaN 3498 Super Lane None None
18 1019 Creed Braton Na-- N/a N/a Yes N/a None None
19 1020 Anakin Skywalker 876-678-3469 910 Tatooine Road, Tatooine Yes N 910 Tatooine Road Tatooine None

7. Normalize Yes and No columns to just Y and N

df["Do_Not_Contact"] = df["Do_Not_Contact"].str.replace("Yes","Y")
df["Do_Not_Contact"] = df["Do_Not_Contact"].str.replace("No","N")
df["Paying Customer"] = df["Paying Customer"].str.replace("No","N")
df["Paying Customer"] = df["Paying Customer"].str.replace("Yes","Y")
df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
CustomerID First_Name Last_Name Phone_Number Address Paying Customer Do_Not_Contact Street_Name State Zip_Code
0 1001 Frodo Baggins 123-545-5421 123 Shire Lane, Shire Y N 123 Shire Lane Shire None
1 1002 Abed Nadir 123-643-9775 93 West Main Street N Y 93 West Main Street None None
2 1003 Walter White nan-- 298 Drugs Driveway N NaN 298 Drugs Driveway None None
3 1004 Dwight Schrute 123-543-2345 980 Paper Avenue, Pennsylvania, 18503 Y Y 980 Paper Avenue Pennsylvania 18503
4 1005 Jon Snow 876-678-3469 123 Dragons Road Y N 123 Dragons Road None None
5 1006 Ron Swanson 304-762-2467 768 City Parkway Y Y 768 City Parkway None None
6 1007 Jeff Winger nan-- 1209 South Street N N 1209 South Street None None
7 1008 Sherlock Holmes 876-678-3469 98 Clue Drive N N 98 Clue Drive None None
8 1009 Gandalf NaN Na-- 123 Middle Earth Y NaN 123 Middle Earth None None
9 1010 Peter Parker 123-545-5421 25th Main Street, New York Y N 25th Main Street New York None
10 1011 Samwise Gamgee nan-- 612 Shire Lane, Shire Y N 612 Shire Lane Shire None
11 1012 Harry Potter nan-- 2394 Hogwarts Avenue Y NaN 2394 Hogwarts Avenue None None
12 1013 Don Draper 123-543-2345 2039 Main Street Y N 2039 Main Street None None
13 1014 Leslie Knope 876-678-3469 343 City Parkway Y N 343 City Parkway None None
14 1015 Toby Flenderson 304-762-2467 214 HR Avenue N N 214 HR Avenue None None
15 1016 Ron Weasley 123-545-5421 2395 Hogwarts Avenue N N 2395 Hogwarts Avenue None None
16 1017 Michael Scott 123-643-9775 121 Paper Avenue, Pennsylvania Y N 121 Paper Avenue Pennsylvania None
17 1018 Clark Kent nan-- 3498 Super Lane Y NaN 3498 Super Lane None None
18 1019 Creed Braton Na-- N/a N/a Y N/a None None
19 1020 Anakin Skywalker 876-678-3469 910 Tatooine Road, Tatooine Y N 910 Tatooine Road Tatooine None

8. Replace unwanted "NaN" cells with empty string

df = df.replace('N/a','')
df = df.replace('nan--','')
df = df.replace('Na--','')
df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
CustomerID First_Name Last_Name Phone_Number Address Paying Customer Do_Not_Contact Street_Name State Zip_Code
0 1001 Frodo Baggins 123-545-5421 123 Shire Lane, Shire Y N 123 Shire Lane Shire None
1 1002 Abed Nadir 123-643-9775 93 West Main Street N Y 93 West Main Street None None
2 1003 Walter White 298 Drugs Driveway N NaN 298 Drugs Driveway None None
3 1004 Dwight Schrute 123-543-2345 980 Paper Avenue, Pennsylvania, 18503 Y Y 980 Paper Avenue Pennsylvania 18503
4 1005 Jon Snow 876-678-3469 123 Dragons Road Y N 123 Dragons Road None None
5 1006 Ron Swanson 304-762-2467 768 City Parkway Y Y 768 City Parkway None None
6 1007 Jeff Winger 1209 South Street N N 1209 South Street None None
7 1008 Sherlock Holmes 876-678-3469 98 Clue Drive N N 98 Clue Drive None None
8 1009 Gandalf NaN 123 Middle Earth Y NaN 123 Middle Earth None None
9 1010 Peter Parker 123-545-5421 25th Main Street, New York Y N 25th Main Street New York None
10 1011 Samwise Gamgee 612 Shire Lane, Shire Y N 612 Shire Lane Shire None
11 1012 Harry Potter 2394 Hogwarts Avenue Y NaN 2394 Hogwarts Avenue None None
12 1013 Don Draper 123-543-2345 2039 Main Street Y N 2039 Main Street None None
13 1014 Leslie Knope 876-678-3469 343 City Parkway Y N 343 City Parkway None None
14 1015 Toby Flenderson 304-762-2467 214 HR Avenue N N 214 HR Avenue None None
15 1016 Ron Weasley 123-545-5421 2395 Hogwarts Avenue N N 2395 Hogwarts Avenue None None
16 1017 Michael Scott 123-643-9775 121 Paper Avenue, Pennsylvania Y N 121 Paper Avenue Pennsylvania None
17 1018 Clark Kent 3498 Super Lane Y NaN 3498 Super Lane None None
18 1019 Creed Braton Y None None
19 1020 Anakin Skywalker 876-678-3469 910 Tatooine Road, Tatooine Y N 910 Tatooine Road Tatooine None

9. Replace actual NaN cells with empty string

df = df.fillna('')
df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
CustomerID First_Name Last_Name Phone_Number Address Paying Customer Do_Not_Contact Street_Name State Zip_Code
0 1001 Frodo Baggins 123-545-5421 123 Shire Lane, Shire Y N 123 Shire Lane Shire
1 1002 Abed Nadir 123-643-9775 93 West Main Street N Y 93 West Main Street
2 1003 Walter White 298 Drugs Driveway N 298 Drugs Driveway
3 1004 Dwight Schrute 123-543-2345 980 Paper Avenue, Pennsylvania, 18503 Y Y 980 Paper Avenue Pennsylvania 18503
4 1005 Jon Snow 876-678-3469 123 Dragons Road Y N 123 Dragons Road
5 1006 Ron Swanson 304-762-2467 768 City Parkway Y Y 768 City Parkway
6 1007 Jeff Winger 1209 South Street N N 1209 South Street
7 1008 Sherlock Holmes 876-678-3469 98 Clue Drive N N 98 Clue Drive
8 1009 Gandalf 123 Middle Earth Y 123 Middle Earth
9 1010 Peter Parker 123-545-5421 25th Main Street, New York Y N 25th Main Street New York
10 1011 Samwise Gamgee 612 Shire Lane, Shire Y N 612 Shire Lane Shire
11 1012 Harry Potter 2394 Hogwarts Avenue Y 2394 Hogwarts Avenue
12 1013 Don Draper 123-543-2345 2039 Main Street Y N 2039 Main Street
13 1014 Leslie Knope 876-678-3469 343 City Parkway Y N 343 City Parkway
14 1015 Toby Flenderson 304-762-2467 214 HR Avenue N N 214 HR Avenue
15 1016 Ron Weasley 123-545-5421 2395 Hogwarts Avenue N N 2395 Hogwarts Avenue
16 1017 Michael Scott 123-643-9775 121 Paper Avenue, Pennsylvania Y N 121 Paper Avenue Pennsylvania
17 1018 Clark Kent 3498 Super Lane Y 3498 Super Lane
18 1019 Creed Braton Y
19 1020 Anakin Skywalker 876-678-3469 910 Tatooine Road, Tatooine Y N 910 Tatooine Road Tatooine

10. Drop all customers who do not wish to be contacted

for x in df.index:
    if df.loc[x, 'Do_Not_Contact'] == 'Y':
        df.drop(x, inplace=True)
df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
CustomerID First_Name Last_Name Phone_Number Address Paying Customer Do_Not_Contact Street_Name State Zip_Code
0 1001 Frodo Baggins 123-545-5421 123 Shire Lane, Shire Y N 123 Shire Lane Shire
2 1003 Walter White 298 Drugs Driveway N 298 Drugs Driveway
4 1005 Jon Snow 876-678-3469 123 Dragons Road Y N 123 Dragons Road
6 1007 Jeff Winger 1209 South Street N N 1209 South Street
7 1008 Sherlock Holmes 876-678-3469 98 Clue Drive N N 98 Clue Drive
8 1009 Gandalf 123 Middle Earth Y 123 Middle Earth
9 1010 Peter Parker 123-545-5421 25th Main Street, New York Y N 25th Main Street New York
10 1011 Samwise Gamgee 612 Shire Lane, Shire Y N 612 Shire Lane Shire
11 1012 Harry Potter 2394 Hogwarts Avenue Y 2394 Hogwarts Avenue
12 1013 Don Draper 123-543-2345 2039 Main Street Y N 2039 Main Street
13 1014 Leslie Knope 876-678-3469 343 City Parkway Y N 343 City Parkway
14 1015 Toby Flenderson 304-762-2467 214 HR Avenue N N 214 HR Avenue
15 1016 Ron Weasley 123-545-5421 2395 Hogwarts Avenue N N 2395 Hogwarts Avenue
16 1017 Michael Scott 123-643-9775 121 Paper Avenue, Pennsylvania Y N 121 Paper Avenue Pennsylvania
17 1018 Clark Kent 3498 Super Lane Y 3498 Super Lane
19 1020 Anakin Skywalker 876-678-3469 910 Tatooine Road, Tatooine Y N 910 Tatooine Road Tatooine

11. Drop all users who do not have a phone number

#df.dropna(subset='Phone_Number', inplace=True) Could have done this if didn't replace NaN cells with empty string earlier
for x in df.index:
    if df.loc[x, 'Phone_Number'] == '':
        df.drop(x, inplace=True)
df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
CustomerID First_Name Last_Name Phone_Number Address Paying Customer Do_Not_Contact Street_Name State Zip_Code
0 1001 Frodo Baggins 123-545-5421 123 Shire Lane, Shire Y N 123 Shire Lane Shire
4 1005 Jon Snow 876-678-3469 123 Dragons Road Y N 123 Dragons Road
7 1008 Sherlock Holmes 876-678-3469 98 Clue Drive N N 98 Clue Drive
9 1010 Peter Parker 123-545-5421 25th Main Street, New York Y N 25th Main Street New York
12 1013 Don Draper 123-543-2345 2039 Main Street Y N 2039 Main Street
13 1014 Leslie Knope 876-678-3469 343 City Parkway Y N 343 City Parkway
14 1015 Toby Flenderson 304-762-2467 214 HR Avenue N N 214 HR Avenue
15 1016 Ron Weasley 123-545-5421 2395 Hogwarts Avenue N N 2395 Hogwarts Avenue
16 1017 Michael Scott 123-643-9775 121 Paper Avenue, Pennsylvania Y N 121 Paper Avenue Pennsylvania
19 1020 Anakin Skywalker 876-678-3469 910 Tatooine Road, Tatooine Y N 910 Tatooine Road Tatooine

12. Export new table into excel file

#df.reset_index(inplace=True) No need to do this since we dont export Index = false
df.to_excel('output.xlsx', index=False)

About

A project using Python Pandas library to clean-up a small dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published