This project uses spaCy and other NLP tools to analyze and extract skills from resumes. It leverages various Python libraries for data manipulation, visualization, and machine learning.
- Clone the repository:
git clone https://github.com/GabrielTelles4K/spaCy-Resume-Analysis.git cd spaCy-Resume-Analysis
- Create a virtual environment:
python -m venv env
- Activate the virtual environment:
- On Windows:
.\env\Scripts\activate
- On macOS/Linux:
source env/bin/activate
- On Windows:
- Install the required packages:
pip install -r requirements.txt
- Load the resume dataset:
df = pd.read_csv("path/to/resume-dataset/Resume.csv")
- Process the resumes and extract skills:
clean = [] for i in range(data.shape[0]): review = re.sub(r'(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)|^rt|http.+?', " ", data["Resume_str"].iloc[i]) review = review.lower().split() lm = WordNetLemmatizer() review = [lm.lemmatize(word) for word in review if not word in set(stopwords.words("english"))] clean.append(" ".join(review)) data["Clean_Resume"] = clean data["skills"] = data["Clean_Resume"].str.lower().apply(get_skills) data["skills"] = data["skills"].apply(unique_skills) data.head()
- Visualize the results:
fig = px.histogram(data, x="Category", title="Distribution of Jobs Categories").update_xaxes(categoryorder="total descending") fig.show()
- Extracts and visualizes skills from resumes.
- Uses spaCy's NLP capabilities.
- Visualizes job categories and skill distributions.
This project is licensed under the MIT License.
Este projeto utiliza spaCy e outras ferramentas de PLN para analisar e extrair habilidades de currículos. Utiliza várias bibliotecas Python para manipulação de dados, visualização e aprendizado de máquina.
- Clone o repositório:
git clone https://github.com/GabrielTelles4K/spaCy-Resume-Analysis.git cd spaCy-Resume-Analysis
- Crie um ambiente virtual:
python -m venv env
- Ative o ambiente virtual:
- No Windows:
.\env\Scripts\activate
- No macOS/Linux:
source env/bin/activate
- No Windows:
- Instale os pacotes necessários:
pip install -r requirements.txt
- Carregue o dataset de currículos:
df = pd.read_csv("caminho/para/resume-dataset/Resume.csv")
- Processe os currículos e extraia habilidades:
clean = [] for i in range(data.shape[0]): review = re.sub(r'(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)|^rt|http.+?', " ", data["Resume_str"].iloc[i]) review = review.lower().split() lm = WordNetLemmatizer() review = [lm.lemmatize(word) for word in review if not word in set(stopwords.words("english"))] clean.append(" ".join(review)) data["Clean_Resume"] = clean data["skills"] = data["Clean_Resume"].str.lower().apply(get_skills) data["skills"] = data["skills"].apply(unique_skills) data.head()
- Visualize os resultados:
fig = px.histogram(data, x="Category", title="Distribuição das Categorias de Empregos").update_xaxes(categoryorder="total descending") fig.show()
- Extrai e visualiza habilidades de currículos.
- Utiliza as capacidades de PNL do spaCy.
- Visualiza categorias de emprego e distribuições de habilidades.
Este projeto é licenciado sob a Licença MIT.