Skip to content

Latest commit

 

History

History
72 lines (54 loc) · 1.85 KB

README.md

File metadata and controls

72 lines (54 loc) · 1.85 KB

ABOUT

The premise of this project is to examine the C functions used by web browsers and see what information can be extrapolated from them. Variations on this concept include running based on limited information (small windows of functions).

This repo contains the standard program that reads an entire strace file as well as the program that reads small chunks of strace files and determines which site they're generated from.

#Requirements#

  • SciPy
  • NumPy
  • Scikit-learn

#Project Notes#

sites:

Command to use:

strace -o ./[etc.] wget -e robots=off --wait 1 --page-requisites [link]

Soundcloud test links (streaming site example):

CERN test links (lightweight site example):

//TODO:

group websites into categories, i.e. university sites, streaming sites, news sites, wikipedia pages (lists vs articles), etc.

can it differentiate between website types?

Search site terms:

  • homepage
  • wake forest
  • nyu
  • linux
  • computer science

University sites:

  • wfu.edu
  • nyu.edu
  • duke.edu
  • unc.edu
  • utexas.edu
  • berkeley.edu
  • usc.edu
  • ucla.edu
  • cornell.edu
  • uchicago.edu

Valgrind on server:

  • run strace on valgrind runs
  • see if you can see the difference between high and low memory usage