This project is focused on analysing a dataset of CVs from a job search platorm HeadHunter.
The analysis steps include:
- Exploring data
- Transforming data and engineering useful features
- Visualizing resulting data and looking for dependencies
- Cleaning data by removing duplicates and outliers using three-sigma rule and logarithmic Z-score
The outcome of the project is a dataset prepared to be used as a part of the training sample for various ML models such as linear or logistic regression for predicting certain outcomes such as what is a competitive salary for a candidate with specific experience and requests.
Пол
->Gender
возраст
->Age
ЗП
->Salary
(desired salary in RUB)Ищет работу на должность
->Desired position
Город
->City
переезд
->Relocation
(readiness to relocate)командировки
->Travel
(readiness to travel)Занятость
->Availability
(full-time, part-time etc.)График
->Schedule
(workdays schedule)Опыт работы
->Experience
Последнее/нынешнее место работы
->Last/current company
Последняя/нынешняя должность
->Last/current position
Образование и ВУЗ
->Level of education and university
Обновление резюме
->CV last updated on
Авто
->Car
currency
-
per
(time interval of the measurement - e.g. 'D' is a day) date
time
-
close
(closing price in RUB) -
vol
(trading volume) -
proportion
(how many units of the currency theclose
price involve. E.g. ifclose
for USD$= 120$ andproportion
$= 2$ , then the USD<>RUB rate is$120 / 2 = 60$ )
Candidates with higher education expect the highest salary, while the general and special school grads expect to be paid the lowest. In the middle, there are people who didn't yet finish their higher education.
Candidates in Moscow expect the highest pay check while candidate expectations in other cities are distributed similarly.
The highest expected salary is amongts candidates who are willing to relocate byt not willing to travel for business trips. The second highest are professionals who are ready to do both, with the lowest expected salary observed in the segment of those whoe aren't ready to relocate or travel.
Amongst the candidates with higher degree, there seems to be a trend that the salary expectations grow with the age, which is quite logical.
At the same time, professionals between 18 and 22 years old hold similar salary expectations regardless of their education level. That can be potentially explain by their focus on getting experience rather than a high pay check.
We can clearly see 7 outliers whose work experience is larger or equal to their age (impossible).
Since the age distribution had a shape similar to logarithmic, I logarithmically scaled this feaure and used the 3-sigma rule to find potential outliers.
After than, I ran a Z-score analysis to verify and remove the outliers by their age.