Skip to content

Latest commit

 

History

History
32 lines (20 loc) · 1.45 KB

File metadata and controls

32 lines (20 loc) · 1.45 KB

Package/Script Name

-->Package installed- NLKT

  • NLTK stands for 'Natural Language Tool Kit'. It consists of the most common algorithms such as tokenizing, part-of-speech tagging, stemming, sentiment analysis, topic segmentation, and named entity recognition. NLTK helps the computer to analysis, preprocess, and understand the written text.

--> Pandas

  • pandas is a library where your data can be stored, analyzed and processed in row and column representation

--> from sklearn.feature_extraction.text import CountVectorizer

  • Scikit-learn's CountVectorizer is used to convert a collection of text documents to a vector of term/token counts. It also enables the ​pre-processing of text data prior to generating the vector representation. This functionality makes it a highly flexible feature representation module for text.

Setup instructions

  1. Input the sentences you would like to vectorize.
  2. The script will tokenize the sentences.
  3. It will transform the text to vectors where each word and its count is a feature.
  4. Then the bag of word model is ready.
  5. create dataframe where dataFrame is an analogy to excel-spreadsheet.
  6. Open excel and check the 'bowp.xlsx' where sheet name is 'data'. The dataframe will be stored over there.

Output

Image

Author(s)

Disclaimers, if any

There are no disclaimers for this script.