Team Science

Our team collaborated to investigate the most desirable soft skills for data scientists to get hired. In our effort to determine this, we scraped Linkedin and Indeed for their job postings and then parsed the keyphrases with the highest occurrence.

For notebooks on how to use this project, see the vignettes/ folder. For source code, see R/ and Python/ to find the files that suits your needs. Additional documentation for modifying the source code can be found in the man/ folder. Finally, you can find the whitepapers in html and pdf formats in the docs/ folder.

Our data pipeline started with using Selenium and data cleaning methods to generate a json data file. This json data was then parsed for keywords based on its linguistic meaning. Finally, the keyphrases were aggregated and summed for every job posting by source.

A rough schema of the available data is available from our early drafts for the data storage. The left side shows all incorporated data structures in a general sense, whereas the company data is unavailable at this time.

We concluded from the complete aggregated data that the most valuable data scientist skills, in order, are:

Data Analysis
Data Analytics
Machine Learning
Modeling
Statistical Methodology
Python
SQL
Communication skills
Computer Science knowledge

Authors

Anthony Arroyo (Anthogonyst)

Josh Forster (jforster19)

Andrew Bowen (andrewbowen19)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Team Science

Authors

Files

README.md

Latest commit

History

README.md

File metadata and controls

Team Science

Authors