Releases · DocNow/twarc

15 Jan 21:17

edsu

v0.5.1

2db5a71

v0.5.1

I've been seeing some intermittent 500 errors from the Twitter API search endpoint. This small update will catch them and back off, until eventually logging the error and giving up.

Assets 2

02 Dec 17:06

edsu

v0.5.0

58806ae

v0.5.0

The --stream option has been separated out into --track --follow and
--locations to better match Twitter's filter stream API.

Similarly the twarc.stream function has been renamed to twarc.filter
and it now takes three parameters: track, follow and locations.

Assets 2

09 Nov 17:16

edsu

v0.4.0

900f765

v0.4.0

Added --warnings flag to log warnings from the Twitter API about dropped tweets during streaming.

Assets 2

07 Oct 17:18

edsu

v0.3.4

666fa78

v0.3.4

In this release the utils/archive.py script has been renamed to utils/twarc-archive.py and pip install will now make it available on the command line just like warc.py. See #80 for context.

Assets 2

03 Aug 10:29

edsu

v0.3.3

9e88f84

v0.3.3

Now handles weird 404s from Twitter API that have been noticed.

Assets 2

03 Jul 21:13

edsu

v0.3.1

6709242

v0.3.1

handle connection reset errors during hydrate
updated utils/archive.py to use config file

Assets 2

10 Jun 06:23

edsu

v0.3.0

37bb459

v0.3.0

New functionality for managing keys in a config file .twarc. You can also have multiple sets of credentials in your config which can be used with the --profile command line option.

Assets 2

06 May 15:50

edsu

v0.2.7

5cd881a

V0.2.7

handle connection reset error which are now occurring during search
added zenodo integration for citing twarc by DOI
minor changes to utilities for python3

Assets 2

17 Feb 01:12

edsu

v0.2.2

e514fc6

v0.2.2

Python3 support
now accepts twitter credentials on the command line

Assets 2

30 Jan 03:38

edsu

v0.2.0

9e2dd0b

v0.2.0

v0.2.0 of twarc includes big changes to both the command line api and the programmatic api. You now invoke twarc from the command line using one of three modes:

search: twarc.py --search ferguson > tweets.json
stream: twarc.py --stream ferguson > tweets.json
hydrate: twarc.py --hydrate ids.txt > tweets.json

Notice that twarc no longer decides what filename to use, and attempt to pick up where it once left off by reading the last tweet id from a previous file. The reason for this is that this functionality predated the ability to stream directly. twarc.py now just writes line oriented JSON to stdout, which you can send where you want including potentially compressing it:

twarc.py --search ferguson | gzip - > tweets.json.gz

The three command line modes map directly on to the programmatic usage. You first create a Twarc instance and then call search, stream and hydrate methods:

from twarc import Twarc

t = Twarc()

for tweet in t.search('ferguson'):
    print tweet

for tweet in t.stream('ferguson'):
    print tweet

for tweet in t.hydrate(open('ids.txt')):
    print tweet

The nice thing about these changes is that they have consolidated and simplified the rate limiting logic, and have removed about 1/3 of the code base. Please give it a try and let us know how it goes!

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: DocNow/twarc

v0.5.1

v0.5.0

v0.4.0

v0.3.4

v0.3.3

v0.3.1

v0.3.0

V0.2.7

v0.2.2

v0.2.0