-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathassignment1399999.py
94 lines (68 loc) · 4.74 KB
/
assignment1399999.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
##
## File: assignment13.py (STAT 3250)
## Topic: Assignment 13
##
## Two *very* important rules that must be followed in order for your assignment
## to be graded correctly:
##
## a) The Python file name must be exactly "assignment13.py" (without the quotes)
## b) The variable names q1, q2, ... must not be changed. These variable
## names should not be used anywhere else in your file. Do not delete these
## variables. If you don't know how to find a value for a variable, just
## leave the corresponding variable with "= None". (If any variable is
## missing the autograder will not grade your assignment.)
## The Gradescope autograder will be evaluating your code on a reduced
## version of the assignment13 data that includes only a fraction of the
## records. Your code needs to automatically handle all assignments
## to the variables q1, q2, ... to accommodate the reduced data set,
## so do not copy/paste things from the console window, and take care
## with hard-coding values.
## Submission Instructions: Submit your code file in Gradescope under
## 'Assignment 13'. The autograder will evaluate your answers to
## Questions 1-6.
## All questions on this assignment use the data 'assignment13-data.csv'.
## This data set consists of 100,000 records from a larger data set of
## New York city parking violation citations. There are 43 variables
## (columns) in the data set. You can see more information about the
## data at
##
## https://www.opendatanetwork.com/dataset/data.cityofnewyork.us/pvqr-7yc4
##
## For this assignment, use the "pipe version" of code descibed in the
## assignment demo file and video to the extent possible. The coding method
## will be included in the Code Efficiency score.
import pandas as pd #Import package
df = pd.read_csv('assignment13-data.csv') #Read in file
## 1. From the data frame, determine the number of records for which the
## Violation County code is BX.
q1 = len(df[df['Violation County']=='BX']) # number of records with Violation County = BX
## 2. From the data frame, determine the number of records for which the
## Issuing Agency code is P, the Vehicle Body Type is SUBN, and the
## Vehicle Make is Ford.
q2 = len(df[(df['Issuing Agency']=='P') & (df['Vehicle Body Type']=='SUBN') & (df['Vehicle Make']=='FORD')]) # number of records with conditions described above
## 3. From the data frame, extract the records for which Plate Type is COM.
## From among these, determine the number of each type of Vehicle Body
## Type. Submit a Series of the top 10 value counts (including ties), sorted
## from most to least. (The index can be the Vehicle Body Type.)
comPlateDf=df.loc[df['Plate Type']=='COM'] #Subset dataframe to COM plate type
q3 = comPlateDf['Vehicle Body Type'].groupby(comPlateDf['Vehicle Body Type']).count().nlargest(10,keep='all') # data frame of violation code counts, plate type = COM
## 4. From the data frame, extract the records for which Plate Type is PAS,
## the Registration State is NJ, and the Law Section is 408.
## From among these, determine the number of each type of Violation County.
## Submit a Series of the count values, sorted
## from most to least. (The index can be the violation county.)
pasNJ408PlateDf=df.loc[(df['Plate Type']=='PAS') & (df['Registration State']=='NJ') & (df['Law Section']==408)] #Subset to PAS plate in NJ with 408 law
q4 = pasNJ408PlateDf['Violation County'].groupby( pasNJ408PlateDf['Violation County']).count().sort_values(ascending=False) # data frame of violation county counts
## 5. From the data frame, extract the records for which Feet From Curb is
## greater than 0, and then determine the number of each Vehicle Body
## Type. Submit a Series of the count values, sorted
## from most to least, giving the top-10 (including ties). (The index can be the vehicle
## body type.)
ftFromCurbOver0Df=df.loc[df['Feet From Curb']>0] #Subset to greater than 0 feet from curb
q5 = ftFromCurbOver0Df['Vehicle Body Type'].groupby(ftFromCurbOver0Df['Vehicle Body Type']).count().nlargest(10,keep='all') # data frame of vehicle body type counts, feet from curb > 0
## 6. From the data frame, extract the records for which Feet From Curb is
## greater than 0, and then from among these determine the average number
## of feet from the curb for each Registration State. Submit a Series
## of the top-10 mean values (including ties), sorted from largest to smallest.
## (The index can be the Registration State.)
q6 = ftFromCurbOver0Df['Feet From Curb'].groupby(ftFromCurbOver0Df['Registration State']).mean().nlargest(10,keep='all') # data frame of feet from curb averages