Skip to content

v0.2.0

Compare
Choose a tag to compare
@edsu edsu released this 30 Jan 03:38
· 1528 commits to main since this release

v0.2.0 of twarc includes big changes to both the command line api and the programmatic api. You now invoke twarc from the command line using one of three modes:

  • search: twarc.py --search ferguson > tweets.json
  • stream: twarc.py --stream ferguson > tweets.json
  • hydrate: twarc.py --hydrate ids.txt > tweets.json

Notice that twarc no longer decides what filename to use, and attempt to pick up where it once left off by reading the last tweet id from a previous file. The reason for this is that this functionality predated the ability to stream directly. twarc.py now just writes line oriented JSON to stdout, which you can send where you want including potentially compressing it:

twarc.py --search ferguson | gzip - > tweets.json.gz

The three command line modes map directly on to the programmatic usage. You first create a Twarc instance and then call search, stream and hydrate methods:

from twarc import Twarc

t = Twarc()

for tweet in t.search('ferguson'):
    print tweet

for tweet in t.stream('ferguson'):
    print tweet

for tweet in t.hydrate(open('ids.txt')):
    print tweet

The nice thing about these changes is that they have consolidated and simplified the rate limiting logic, and have removed about 1/3 of the code base. Please give it a try and let us know how it goes!