fixes #80

DocNow · Oct 7, 2015 · 666fa78 · 666fa78
1 parent 7117ec6
commit 666fa78
Show file tree

Hide file tree

Showing 3 changed files with 19 additions and 7 deletions.
diff --git a/README.md b/README.md
@@ -101,6 +101,19 @@ fetch the full JSON for each tweet and write it to stdout as line-oriented JSON:
 
     twarc.py --hydrate ids.txt > tweets.json
 
+## Archive
+
+In addition to `twarc.py` when you install twarc you will also get a
+`twarc-archive.py` command line tool. This uses twarc as a library to
+periodically collect data matching a particular search query. It's useful if you
+don't necessarily want to collect tweets as they happen with the streaming
+api, and are content to perhaps run it every day (perhaps) from cron to collect
+what you can. The script will keep the files organized, and is smart enough to
+use the most recent file to determine when it can stop collecting so there are
+no duplicates.
+
+    twarc-archive.py 
+
 ## Use as a Library
 
 If you want you can use twarc programatically as a library to collect

diff --git a/setup.py b/setup.py
@@ -28,12 +28,12 @@ def run(self):
 
 setup(
     name='twarc',
-    version='0.3.3',
+    version='0.3.4',
     url='http://github.com/edsu/twarc',
     author='Ed Summers',
     author_email='[email protected]',
     py_modules=['twarc', ],
-    scripts=['twarc.py'],
+    scripts=['twarc.py', 'utils/twarc-archive.py'],
     description='command line utility to archive Twitter search results as line-oriented-json',
     cmdclass={'test': PyTest},
     install_requires=dependencies,

diff --git a/utils/archive.py → utils/twarc-archive.py b/utils/archive.py → utils/twarc-archive.py
@@ -8,7 +8,7 @@
 So for example if you want to search for tweets mentioning "ferguson" you can 
 run it:
 
-    ./archive.py ferguson /mnt/tweets/ferguson
+    % twarc-archive.py ferguson /mnt/tweets/ferguson
 
 The first time you run this it will search twitter for tweets matching 
 "ferguson" and write them to a file:
@@ -17,16 +17,15 @@
 
 When you run the exact same command again:
 
-    ./archive.py ferguson /mnt/tweets/ferguson
+    % twarc-archive.py ferguson /mnt/tweets/ferguson
 
 it will get the first tweet id in tweets-0001.json and use it to write another 
 file which includes any new tweets since that tweet:
 
     /mnt/tweets/ferguson/tweets-0002.json
 
-This functionality was initially part of twarc.py itself (not in a utility).
-If it proves useful perhaps it can go back in. But for now twarc.py writes
-to stdout to let you manage your data the way you want to.
+This functionality was initially part of twarc.py itself, but has been split out
+into a separate utility.
 
 """