Skip to content

Commit

Permalink
📝 README
Browse files Browse the repository at this point in the history
  • Loading branch information
kaixinol committed Nov 20, 2023
1 parent cc2a6e6 commit 8ed55d4
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 36 deletions.
19 changes: 1 addition & 18 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,3 @@
> [!NOTE]
> Note: Downloading the audio in tweets is not supported yet.

> [!WARNING]
> Do not leak your `cookie.json`, this will lead to your Twitter account being stolen.

Expand All @@ -9,26 +6,12 @@
## Introduction
- This tool can automatically simulate browser operations to crawl all users' tweets and save all static resources (videos, pictures) locally without calling the Twitter API.
- At the same time, sqlite3 is used to save the crawled data as an index file for easy query.
## Install
### Install dependencies
## Installation & Configuration
- Install `Python3.10+`
- Install `Poetry`
- Install `Chrome 119.0+`
- Run the command `poetry install` in the directory with `pyproject.toml`
### Prepare configuration
- Configure `config.yaml`
- Edit line 69 of `/twitter_user_tweet_crawler/__main__.py`
- Prepare Chrome user data folder <i><u> (set data_dir to `/twitter_user_tweet_crawler/userdata/` as an example)</u></i>
1. Create a new folder under `/twitter_user_tweet_crawler/userdata/`
2. If you need ***n browser instances at the same time***, create ***n+1 folders***
3. For example, you need 3 threads to work at the same time
4. Just create new `/twitter_user_tweet_crawler/userdata/1` `/twitter_user_tweet_crawler/userdata/2` `/twitter_user_tweet_crawler/userdata/3` `/twitter_user_tweet_crawler/userdata/4`
- Pre-configured Chrome
1. Execute the command `/usr/bin/google-chrome-stable --user-data-dir=<data_dir>/1`
2. Install Tampermonkey extension
3. Open the `Tampermonkey extension` interface to create a new js, copy the content in `script.js`<kbd>Ctrl+S</kbd>
4. Change the browser save path to `/twitter_user_tweet_crawler/output/res`
5. ...and so on until all configurations are completed
## Run
1. Run the command in the upper-level directory with `pyproject.toml`
```commandline
Expand Down
19 changes: 1 addition & 18 deletions README_zh_CN.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,3 @@
> [!NOTE]
> 注意下载推文中的音频该功能暂未支持

> [!WARNING]
> 不要泄漏自己的`cookie.json`这将导致你的推特账号被盗

Expand All @@ -9,26 +6,12 @@
## 简介
- 此工具能够自动的模拟浏览器操作爬取用户的全部推文并将全部静态资源(视频图片)保存在本地,无需调用Twitter API
- 同时利用sqlite3将爬取到的数据保存为索引文件方便查询
## 安装
### 安装依赖
## 安装 & 配置
- 安装`Python3.10+`
- 安装`Poetry`
- 安装`Chrome 119.0+`
- 在有`pyproject.toml`的目录运行指令`poetry install`
### 准备配置
- 配置`config.yaml`
- 编辑`/twitter_user_tweet_crawler/__main__.py`的69行
- 准备Chrome用户数据文件夹 <i><u>将data_dir设置为`/twitter_user_tweet_crawler/userdata/`为例</u></i>
1. `/twitter_user_tweet_crawler/userdata/`下新建文件夹
2. 你需要同时进行***n个浏览器实例***就新建***n+1个文件夹***
3. 比方说你需要3个线程同时工作
4. 就新建`/twitter_user_tweet_crawler/userdata/1` `/twitter_user_tweet_crawler/userdata/2` `/twitter_user_tweet_crawler/userdata/3` `/twitter_user_tweet_crawler/userdata/4`
- 预配置 Chrome
1. 执行指令`/usr/bin/google-chrome-stable --user-data-dir=<data_dir>/1`
2. 安装Tampermonkey拓展
3. 打开 `Tampermonkey扩展` 界面新建js拷贝`script.js`中的内容之后<kbd>Ctrl+S</kbd>
4. 更改浏览器保存路径为`/twitter_user_tweet_crawler/output/res`
5. ...依次类推直至全部配置完毕
## 运行
1. 在有`pyproject.toml`的上级目录运行指令
```commandline
Expand Down

0 comments on commit 8ed55d4

Please sign in to comment.