Skip to content

Commit

Permalink
large update. added twitter spaces live data scrapers
Browse files Browse the repository at this point in the history
  • Loading branch information
trevorhobenshield committed Jun 2, 2023
1 parent d1c6fc2 commit ed30cfb
Show file tree
Hide file tree
Showing 11 changed files with 1,272 additions and 615 deletions.
Binary file added assets/spaces-audio.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/spaces-transcript-01.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/spaces-transcript-02.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
111 changes: 101 additions & 10 deletions readme.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,23 @@
> Currently affected by layoffs. If anyone is hiring for Software Developer or Machine Learning Engineer roles in **Vancouver, BC** or remotely in **Canada** please feel free to send me a message at `[email protected]`. Thanks!
### Implementation of Twitter's v1, v2, and GraphQL APIs
## Implementation of Twitter's v1, v2, and GraphQL APIs

Tools include: [Scraping](#scraping), [Account Automation](#automation), [Search](#search)

Automated email challenge solvers are supported for **Proton Mail** accounts using [proton-python-client](https://github.com/ProtonMail/proton-python-client). See [here](#automated-solvers) for more information.

## Table of Contents

* [Installation](#installation)
* [Automation](#automation)
* [Scraping](#scraping)
* [Users/Tweets data](#get-all-usertweet-data)
* [Search](#search)
* [Get all user/tweet data](#get-all-usertweet-data)
* [Resume Pagination](#resume-pagination)
* [Search](#search)
* [Spaces](#spaces)
* [Live Audio Capture](#live-audio-capture)
* [Live Transcript Capture](#live-transcript-capture)
* [Search and Metadata](#search-and-metadata)
* [Automated Solvers](#automated-solvers)
* [Example API Responses](#example-api-responses)

Expand Down Expand Up @@ -190,7 +197,7 @@ scraper.tweet_stats([111111, 222222, 333333])

# get recommended users based on user
scraper.recommended_users()
scraper.recommended_users(123)
scraper.recommended_users([123])

# tweet data
tweets_by_ids = scraper.tweets_by_id([987, 876, 754])
Expand All @@ -210,20 +217,18 @@ scraper.trends()
```

#### Resume Pagination
Pagination is already done by default, however there are circumstances where you may need to resume pagination from a specific cursor. For example, the `Followers` endpoint only allows for 50 requests every 15 minutes. In this case, we can resume from where we left off by providing a specific cursor value. This technique applies to any other endpoint defined in `twitter.constants.Operation`.
**Pagination is already done by default**, however there are circumstances where you may need to resume pagination from a specific cursor. For example, the `Followers` endpoint only allows for 50 requests every 15 minutes. In this case, we can resume from where we left off by providing a specific cursor value.
```python
from twitter.scraper import Scraper
from twitter.constants import Operation

email, username, password = ...,...,...
scraper = Scraper(email, username, password, debug=1, save=True)

operation = Operation.Followers
user_id = 44196397
cursor = '1765001818576065118|1654241854176100129' # example cursor
limit = 250 # arbitrary limit for demonstration
cursor = '1767341853908517597|1663601806447476672' # example cursor
limit = 100 # arbitrary limit for demonstration
follower_subset, last_cursor = scraper.followers([user_id], limit=limit, cursor=cursor)

follower_subset, last_cursor = scraper.resume_pagination(scraper.session, user_id, operation, limit=limit, cursor=cursor)
# use last_cursor to resume pagination
```

Expand Down Expand Up @@ -267,6 +272,92 @@ https://developer.twitter.com/en/docs/twitter-api/v1/rules-and-filtering/search-

https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query

### Spaces

#### Live Audio Capture

Capture live audio for up to 500 streams per IP

![](assets/spaces-audio.gif)

```python
from twitter.scraper import Scraper
from twitter.util import init_session

session = init_session() # initialize guest session, no login required
scraper = Scraper(session=session, debug=1, save=True)

rooms = [...]
scraper.spaces_live(rooms=rooms) # capture live audio from list of rooms
```

#### Live Transcript Capture

**Raw transcript chunks**

![](assets/spaces-transcript-02.gif)

```python
from twitter.scraper import Scraper
from twitter.util import init_session

session = init_session() # initialize guest session, no login required
scraper = Scraper(session=session, debug=1, save=True)

# room must be live, i.e. in "Running" state
scraper.space_live_transcript('1zqKVPlQNApJB', frequency=2) # word-level live transcript. (dirty, on-the-fly transcription before post-processing)
```


**Processed (final) transcript chunks**

![](assets/spaces-transcript-01.gif)


```python
from twitter.scraper import Scraper
from twitter.util import init_session

session = init_session() # initialize guest session, no login required
scraper = Scraper(session=session, debug=1, save=True)

# room must be live, i.e. in "Running" state
scraper.space_live_transcript('1zqKVPlQNApJB', frequency=1) # finalized live transcript. (clean)
```

#### Search and Metadata
```python
from twitter.scraper import Scraper
from twitter.util import init_session
from twitter.constants import SpaceCategory

session = init_session() # initialize guest session, no login required
scraper = Scraper(session=session, debug=1, save=True)

# download audio and chat-log from space
spaces = scraper.spaces(rooms=['1eaJbrAPnBVJX', '1eaJbrAlZjjJX'], audio=True, chat=True)

# pull metadata only
spaces = scraper.spaces(rooms=['1eaJbrAPnBVJX', '1eaJbrAlZjjJX'])

# search for spaces in "Upcoming", "Top" and "Live" categories
spaces = scraper.spaces(search=[
{
'filter': SpaceCategory.Upcoming,
'query': 'hello'
},
{
'filter': SpaceCategory.Top,
'query': 'world'
},
{
'filter': SpaceCategory.Live,
'query': 'foo bar'
}
])
```



### Automated Solvers
To set up automated email confirmation/verification solvers, add your Proton Mail credentials below as shown.
Expand Down
112 changes: 98 additions & 14 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,35 +3,44 @@
from setuptools import find_packages, setup

install_requires = [
"aiofiles",
"websockets",
"nest_asyncio",
"aiohttp",
"httpx",
"tqdm",
"orjson",
"requests",
"bcrypt",
"python-gnupg",
"pyopenssl",
"requests",
'uvloop; platform_system != "Windows"',
]

setup(
name="twitter-api-client",
version="0.8.3",
version="0.8.4",
python_requires=">=3.10.10",
description="Twitter API",
long_description=dedent('''
Implementation of Twitter's v1, v2, and GraphQL APIs
## Implementation of Twitter's v1, v2, and GraphQL APIs
Tools include: [Scraping](#scraping), [Account Automation](#automation), [Search](#search)
Automated email challenge solvers are supported for Proton Mail accounts. See [here](#automated-solvers) for more information.
Automated email challenge solvers are supported for **Proton Mail** accounts using [proton-python-client](https://github.com/ProtonMail/proton-python-client). See [here](#automated-solvers) for more information.
## Table of Contents
* [Installation](#installation)
* [Automation](#automation)
* [Scraping](#scraping)
* [Users/Tweets data](#get-all-usertweet-data)
* [Search](#search)
* [Get all user/tweet data](#get-all-usertweet-data)
* [Resume Pagination](#resume-pagination)
* [Search](#search)
* [Spaces](#spaces)
* [Live Audio Capture](#live-audio-capture)
* [Live Transcript Capture](#live-transcript-capture)
* [Search and Metadata](#search-and-metadata)
* [Automated Solvers](#automated-solvers)
* [Example API Responses](#example-api-responses)
Expand Down Expand Up @@ -207,7 +216,7 @@
# get recommended users based on user
scraper.recommended_users()
scraper.recommended_users(123)
scraper.recommended_users([123])
# tweet data
tweets_by_ids = scraper.tweets_by_id([987, 876, 754])
Expand All @@ -227,20 +236,18 @@
```
#### Resume Pagination
Pagination is already done by default, however there are circumstances where you may need to resume pagination from a specific cursor. For example, the `Followers` endpoint only allows for 50 requests every 15 minutes. In this case, we can resume from where we left off by providing a specific cursor value. This technique applies to any other endpoint defined in `twitter.constants.Operation`.
**Pagination is already done by default**, however there are circumstances where you may need to resume pagination from a specific cursor. For example, the `Followers` endpoint only allows for 50 requests every 15 minutes. In this case, we can resume from where we left off by providing a specific cursor value.
```python
from twitter.scraper import Scraper
from twitter.constants import Operation
email, username, password = ...,...,...
scraper = Scraper(email, username, password, debug=1, save=True)
operation = Operation.Followers
user_id = 44196397
cursor = '1765001818576065118|1654241854176100129' # example cursor
limit = 250 # arbitrary limit for demonstration
cursor = '1767341853908517597|1663601806447476672' # example cursor
limit = 100 # arbitrary limit for demonstration
follower_subset, last_cursor = scraper.followers([user_id], limit=limit, cursor=cursor)
follower_subset, last_cursor = scraper.resume_pagination(scraper.session, user_id, operation, limit=limit, cursor=cursor)
# use last_cursor to resume pagination
```
Expand Down Expand Up @@ -281,6 +288,82 @@
https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query
### Spaces
#### Live Audio Capture
Capture live audio for up to 500 streams per IP
```python
from twitter.scraper import Scraper
from twitter.util import init_session
session = init_session() # initialize guest session, no login required
scraper = Scraper(session=session, debug=1, save=True)
rooms = [...]
scraper.spaces_live(rooms=rooms) # capture live audio from list of rooms
```
#### Live Transcript Capture
**Raw transcript chunks**
```python
from twitter.scraper import Scraper
from twitter.util import init_session
session = init_session() # initialize guest session, no login required
scraper = Scraper(session=session, debug=1, save=True)
# room must be live, i.e. in "Running" state
scraper.space_live_transcript('1zqKVPlQNApJB', frequency=2) # word-level live transcript. (dirty, on-the-fly transcription before post-processing)
```
**Processed (final) transcript chunks**
```python
from twitter.scraper import Scraper
from twitter.util import init_session
session = init_session() # initialize guest session, no login required
scraper = Scraper(session=session, debug=1, save=True)
# room must be live, i.e. in "Running" state
scraper.space_live_transcript('1zqKVPlQNApJB', frequency=1) # finalized live transcript. (clean)
```
#### Search and Metadata
```python
from twitter.scraper import Scraper
from twitter.util import init_session
from twitter.constants import SpaceCategory
session = init_session() # initialize guest session, no login required
scraper = Scraper(session=session, debug=1, save=True)
# download audio and chat-log from space
spaces = scraper.spaces(rooms=['1eaJbrAPnBVJX', '1eaJbrAlZjjJX'], audio=True, chat=True)
# pull metadata only
spaces = scraper.spaces(rooms=['1eaJbrAPnBVJX', '1eaJbrAlZjjJX'])
# search for spaces in "Upcoming", "Top" and "Live" categories
spaces = scraper.spaces(search=[
{
'filter': SpaceCategory.Upcoming,
'query': 'hello'
},
{
'filter': SpaceCategory.Top,
'query': 'world'
},
{
'filter': SpaceCategory.Live,
'query': 'foo bar'
}
])
```
### Automated Solvers
To set up automated email confirmation/verification solvers, add your Proton Mail credentials below as shown.
Expand All @@ -295,6 +378,7 @@
proton_email, proton_password = ..., ...
account = Scraper(email, username, password, debug=1, save=True, protonmail={'email':proton_email, 'password':proton_password})
```
'''),
long_description_content_type='text/markdown',
author="Trevor Hobenshield",
Expand Down
Loading

0 comments on commit ed30cfb

Please sign in to comment.