diff --git a/README.md b/README.md index f43ad6c..14f8d4b 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,3 @@ -> [!NOTE] -> Note: Downloading the audio in tweets is not supported yet. - > [!WARNING] > Do not leak your `cookie.json`, this will lead to your Twitter account being stolen. @@ -9,26 +6,12 @@ ## Introduction - This tool can automatically simulate browser operations to crawl all users' tweets and save all static resources (videos, pictures) locally without calling the Twitter API. - At the same time, sqlite3 is used to save the crawled data as an index file for easy query. -## Install -### Install dependencies +## Installation & Configuration - Install `Python3.10+` - Install `Poetry` - Install `Chrome 119.0+` - Run the command `poetry install` in the directory with `pyproject.toml` -### Prepare configuration - Configure `config.yaml` -- Edit line 69 of `/twitter_user_tweet_crawler/__main__.py` -- Prepare Chrome user data folder (set data_dir to `/twitter_user_tweet_crawler/userdata/` as an example) - 1. Create a new folder under `/twitter_user_tweet_crawler/userdata/` - 2. If you need ***n browser instances at the same time***, create ***n+1 folders*** - 3. For example, you need 3 threads to work at the same time - 4. Just create new `/twitter_user_tweet_crawler/userdata/1` `/twitter_user_tweet_crawler/userdata/2` `/twitter_user_tweet_crawler/userdata/3` `/twitter_user_tweet_crawler/userdata/4` -- Pre-configured Chrome - 1. Execute the command `/usr/bin/google-chrome-stable --user-data-dir=/1` - 2. Install Tampermonkey extension - 3. Open the `Tampermonkey extension` interface to create a new js, copy the content in `script.js`Ctrl+S - 4. Change the browser save path to `/twitter_user_tweet_crawler/output/res` - 5. ...and so on until all configurations are completed ## Run 1. Run the command in the upper-level directory with `pyproject.toml` ```commandline diff --git a/README_zh_CN.md b/README_zh_CN.md index 4af3b7a..dc2152b 100644 --- a/README_zh_CN.md +++ b/README_zh_CN.md @@ -1,6 +1,3 @@ -> [!NOTE] -> 注意:下载推文中的音频该功能暂未支持。 - > [!WARNING] > 不要泄漏自己的`cookie.json`,这将导致你的推特账号被盗。 @@ -9,26 +6,12 @@ ## 简介 - 此工具能够自动的模拟浏览器操作爬取用户的全部推文并将全部静态资源(视频、图片)保存在本地,无需调用Twitter API - 同时利用sqlite3将爬取到的数据保存为索引文件,方便查询。 -## 安装 -### 安装依赖 +## 安装 & 配置 - 安装`Python3.10+` - 安装`Poetry` - 安装`Chrome 119.0+` - 在有`pyproject.toml`的目录运行指令`poetry install` -### 准备配置 - 配置`config.yaml` -- 编辑`/twitter_user_tweet_crawler/__main__.py`的69行 -- 准备Chrome用户数据文件夹 (将data_dir设置为`/twitter_user_tweet_crawler/userdata/`为例) - 1. 在`/twitter_user_tweet_crawler/userdata/`下新建文件夹 - 2. 你需要同时进行***n个浏览器实例***就新建***n+1个文件夹*** - 3. 比方说你需要3个线程同时工作 - 4. 就新建`/twitter_user_tweet_crawler/userdata/1` `/twitter_user_tweet_crawler/userdata/2` `/twitter_user_tweet_crawler/userdata/3` `/twitter_user_tweet_crawler/userdata/4` -- 预配置 Chrome - 1. 执行指令`/usr/bin/google-chrome-stable --user-data-dir=/1` - 2. 安装Tampermonkey拓展 - 3. 打开 `Tampermonkey扩展` 界面新建js,拷贝`script.js`中的内容之后Ctrl+S - 4. 更改浏览器保存路径为`/twitter_user_tweet_crawler/output/res` - 5. ...依次类推,直至全部配置完毕 ## 运行 1. 在有`pyproject.toml`的上级目录运行指令 ```commandline