📁 repo2file4gpt

Expanding LLMs Knowledge through Public Codebases

repo2file4gpt facilitates the ingestion of open source GitHub repositories into AI systems, unlocking a vast set of technical knowledge for machine learning.

By structuring and aggregating content in an indexed markdown format, repo2file4gpt prepares the harvested open source intelligence for direct integration into large language models and Q&A systems. The full hierarchy from the original GitHub tree is preserved to retain useful context.

Installation

You can install repo2file4gpt directly from PyPI:

pip install repo2file4gpt

Quick Start

Command Line Interface

After installing repo2file4gpt, you can use it from the command line as follows:

repo2file4gpt --token YOUR_GITHUB_TOKEN --repos user/repo1 user/repo2 --filetypes py js --output_dir ./outputs/

Replace YOUR_GITHUB_TOKEN with your actual GitHub token, and user/repo1 and user/repo2 with the actual repositories you want to process.

Python Code

You can also use repo2file4gpt in your Python code:

import repo2file4gpt

# Specify the GitHub token, list of repositories, file types, and output directory
token = "YOUR_GITHUB_TOKEN"
repos = ["user/repo1", "user/repo2"]
filetypes = ["py", "js"]
output_dir = "./outputs/"

# Create a RepositoryScraper instance
processor = repo2file4gpt.RepositoryScraper(token, filetypes, repo2file4gpt.LINE_LIMITS, output_dir)

# Process the repositories
processor.process_repositories(repos)

Again, replace YOUR_GITHUB_TOKEN with your actual GitHub token, and user/repo1 and user/repo2 with the actual repositories you want to process.

TODO

Add support for more file types.
Improve error handling for robustness.
Optimize performance for large repositories.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
repo2file4gpt		repo2file4gpt
tests		tests
.deepsource.toml		.deepsource.toml
.env.template		.env.template
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📁 repo2file4gpt

Expanding LLMs Knowledge through Public Codebases

Installation

Quick Start

Command Line Interface

Python Code

TODO

Contributing

About

Releases 1

Packages

Contributors 2

Languages

License

jrazi/repo2file4gpt

Folders and files

Latest commit

History

Repository files navigation

📁 repo2file4gpt

Expanding LLMs Knowledge through Public Codebases

Installation

Quick Start

Command Line Interface

Python Code

TODO

Contributing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages