Skip to content

Latest commit

 

History

History
15 lines (12 loc) · 874 Bytes

README.md

File metadata and controls

15 lines (12 loc) · 874 Bytes

TED-Talks-Analysis

Creating an approch to kickstart EDA on a dataset with many Text, Numerical, Categorical, and Datetime features like TED Talks and with limited Domain Knowledge. The idea, approach and code are very generic, and so would apply to almost any dataset.

Contents:

  • Text Preprocessing
  • 200+ Feature Creation - mostly on Text columns with basic NLP like character/token count, POS and NER tags, and Sentiment
  • Understanding relation among columns by
    • Visualizing Correlation as Interactive Graphs (currently, unweighted)
    • Feature Clustering based on Correlation
  • n-grams and Keyphrase extraction
  • A Talks Recommendation Engine
  • Topic Modelling and Text Clustering

Please find other input/intermediate files here