Skip to content

notnews/toi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The Life and Times of The Times of India

Using the Proquest Times of India Corpus spanning 18XX--2008, we shed light on a slew of interesting questions.

Scripts

  • Parse ToI Parse XML to CSV. Here's the data dictionary

  • Basic Analyses

    1. number of articles per issue (by pub_date/year/month/weekday--weekend)
    2. number of words per article/title over time
    3. number (proportion) of articles by contributor w/ TNN (rest are presumably sourced via AP etc. but good to groupby)
    4. gender, religion etc. of contributors -- histogram of top 50 names, surnames using naampy and pranaam
    5. number of contributors per article
    6. number of editorial/news
  • NER

  • NER analyses

    1. Histogram of top 50 people covered.
    2. Histogram of top 50 places covered
    3. Gender, religion of people mentioned using naampy and pranaam
  • Other Ideas

    1. number of classified ads (on startpage == 1)
    2. number of ads (on startpage == 1)
    3. US vs. USSR/Russia etc.
    4. matrimonial ads: "caste no bar", 'fair complexion', etc.
    5. Need Annotation
      • news/not news
      • gov vs. not in ads
      • episodic'---x happened vs. 'thematic' --- more detailed/contextual piece
      • local/national/foreign news

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published