v0.2.0
v0.2.0 of twarc includes big changes to both the command line api and the programmatic api. You now invoke twarc from the command line using one of three modes:
- search:
twarc.py --search ferguson > tweets.json
- stream:
twarc.py --stream ferguson > tweets.json
- hydrate:
twarc.py --hydrate ids.txt > tweets.json
Notice that twarc no longer decides what filename to use, and attempt to pick up where it once left off by reading the last tweet id from a previous file. The reason for this is that this functionality predated the ability to stream directly. twarc.py now just writes line oriented JSON to stdout, which you can send where you want including potentially compressing it:
twarc.py --search ferguson | gzip - > tweets.json.gz
The three command line modes map directly on to the programmatic usage. You first create a Twarc
instance and then call search
, stream
and hydrate
methods:
from twarc import Twarc
t = Twarc()
for tweet in t.search('ferguson'):
print tweet
for tweet in t.stream('ferguson'):
print tweet
for tweet in t.hydrate(open('ids.txt')):
print tweet
The nice thing about these changes is that they have consolidated and simplified the rate limiting logic, and have removed about 1/3 of the code base. Please give it a try and let us know how it goes!