Skip to content

Commit

Permalink
fixes #80
Browse files Browse the repository at this point in the history
  • Loading branch information
edsu committed Oct 7, 2015
1 parent 7117ec6 commit 666fa78
Show file tree
Hide file tree
Showing 3 changed files with 19 additions and 7 deletions.
13 changes: 13 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,19 @@ fetch the full JSON for each tweet and write it to stdout as line-oriented JSON:

twarc.py --hydrate ids.txt > tweets.json

## Archive

In addition to `twarc.py` when you install twarc you will also get a
`twarc-archive.py` command line tool. This uses twarc as a library to
periodically collect data matching a particular search query. It's useful if you
don't necessarily want to collect tweets as they happen with the streaming
api, and are content to perhaps run it every day (perhaps) from cron to collect
what you can. The script will keep the files organized, and is smart enough to
use the most recent file to determine when it can stop collecting so there are
no duplicates.

twarc-archive.py

## Use as a Library

If you want you can use twarc programatically as a library to collect
Expand Down
4 changes: 2 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,12 +28,12 @@ def run(self):

setup(
name='twarc',
version='0.3.3',
version='0.3.4',
url='http://github.com/edsu/twarc',
author='Ed Summers',
author_email='[email protected]',
py_modules=['twarc', ],
scripts=['twarc.py'],
scripts=['twarc.py', 'utils/twarc-archive.py'],
description='command line utility to archive Twitter search results as line-oriented-json',
cmdclass={'test': PyTest},
install_requires=dependencies,
Expand Down
9 changes: 4 additions & 5 deletions utils/archive.py → utils/twarc-archive.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
So for example if you want to search for tweets mentioning "ferguson" you can
run it:
./archive.py ferguson /mnt/tweets/ferguson
% twarc-archive.py ferguson /mnt/tweets/ferguson
The first time you run this it will search twitter for tweets matching
"ferguson" and write them to a file:
Expand All @@ -17,16 +17,15 @@
When you run the exact same command again:
./archive.py ferguson /mnt/tweets/ferguson
% twarc-archive.py ferguson /mnt/tweets/ferguson
it will get the first tweet id in tweets-0001.json and use it to write another
file which includes any new tweets since that tweet:
/mnt/tweets/ferguson/tweets-0002.json
This functionality was initially part of twarc.py itself (not in a utility).
If it proves useful perhaps it can go back in. But for now twarc.py writes
to stdout to let you manage your data the way you want to.
This functionality was initially part of twarc.py itself, but has been split out
into a separate utility.
"""

Expand Down

0 comments on commit 666fa78

Please sign in to comment.