Release v0.2.0 · DocNow/twarc

v0.2.0 of twarc includes big changes to both the command line api and the programmatic api. You now invoke twarc from the command line using one of three modes:

search: twarc.py --search ferguson > tweets.json
stream: twarc.py --stream ferguson > tweets.json
hydrate: twarc.py --hydrate ids.txt > tweets.json

Notice that twarc no longer decides what filename to use, and attempt to pick up where it once left off by reading the last tweet id from a previous file. The reason for this is that this functionality predated the ability to stream directly. twarc.py now just writes line oriented JSON to stdout, which you can send where you want including potentially compressing it:

twarc.py --search ferguson | gzip - > tweets.json.gz

The three command line modes map directly on to the programmatic usage. You first create a Twarc instance and then call search, stream and hydrate methods:

from twarc import Twarc

t = Twarc()

for tweet in t.search('ferguson'):
    print tweet

for tweet in t.stream('ferguson'):
    print tweet

for tweet in t.hydrate(open('ids.txt')):
    print tweet

The nice thing about these changes is that they have consolidated and simplified the rate limiting logic, and have removed about 1/3 of the code base. Please give it a try and let us know how it goes!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.0