Skip to content

Text structuring for import/manipulation/analysis in R | TXM | IRaMuTeQ

Notifications You must be signed in to change notification settings

jtmart/text-processing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repository contains a series of scripts to batch process textual data for analysis in R, Python, TXM and IRaMuTeQ. It mainly consists of tools to extract text – and its metadata – from digital sources (PDFs, HTML, SRT), clean it (layout and OCR corrections) and format it in a CSV+TXT format for analysis. 

About

Text structuring for import/manipulation/analysis in R | TXM | IRaMuTeQ

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published