Skip to content

Commit

Permalink
Updating Readme
Browse files Browse the repository at this point in the history
  • Loading branch information
siddharth7113 committed May 7, 2024
1 parent cc5cd70 commit 89edfe1
Showing 1 changed file with 109 additions and 14 deletions.
123 changes: 109 additions & 14 deletions readme.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,118 @@
# Data Insights from SEC-10K filings
# FinTech-Lab-Summer-2024

### Insights from SEC-10K filings using LLM
## Project Overview
This project focuses on extracting, analyzing, and visualizing key financial insights from 10-K filings of public firms using a large language model (LLM). The insights are presented through a web interface built with Streamlit, which provides a dynamic way to explore financial trends and data-driven insights.

## Features
- **Data Extraction**:
1. Python library SEC-Edgar-Download is used to extract SEC filings.
2. The filings of any firms can be downloaded using the script located in
`src/data/raw_data/download_sec_filings.py`
3. `Downloader.log` containes information about the missing as well as the downloaded files.

- **Data Pre-processing**

Pre-processing steps takes in 3 parts :
1. First the raw html,css contained is removed from downloaded filings and stored in `src\data\pre_processed_data\cleaned-sec-edgar-filings`
2. Then lemmetization and Stemming is performed on the cleaned data and only 5 words before and after the numerical feautres were selected and stored in
`src\data\pre_processed_data\processed-numeric-contexts`
3. Finally using all feautres that are considered finacially insightful(*mentioned below*) are extracted using regex expression and finally stored in
`src\data\pre_processed_data\feature`

## Installation
```bash
git clone https://github.com/siddharth7113/Fintech-Lab-Summer-2024
cd siddharth7113/Fintech-Lab-Summer-2024
pip install -r requirements.txt
- **Text Analysis**:

1. Each of these features files are sent to LLM (**Mixtral-7b-Instruct**) via OpenRouterAPI and saved in `src/output/output-responses`
2. Using a python script these annual files for each firm is combined in text format which is stored in `src/output/pre-analysis_combined`.
3. This combined text is again sent to LLM (**Mixtral-7b-Instruct**) via OpenRouterAPI to obtain two types of files : text_insights, csv_insights.
(These files are used to )

```
## Usage
```
python main.py
```
## Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.Please make sure to update tests as appropriate.
- **Data Visualization**: Interactive charts and graphs to display financial metrics and trends.
- **Web Interface**: A Streamlit application that allows users to select different firms and view corresponding financial insights and visualizations.

---

### Important terms and their need :

### Financial Statements and Performance Metrics
- **Revenue**: Represents the total income generated by a business.
- **Expenses**: Refers to the costs incurred by a business in its operations.
- **Net Income**: Calculated as revenue minus expenses, indicating the profitability of a company.
- **Assets**: Include all resources owned by a company that have economic value.
- **Liabilities**: Represent the company's debts or obligations.
- **Equity**: Reflects the ownership interest in a company's assets after deducting liabilities.
- **Cash Flow**: Shows the movement of cash in and out of a business.
- **Operating Margin**: Indicates the profitability of a company's core business activities.
- **Gross Margin**: Represents the percentage of revenue that exceeds the cost of goods sold.
- **EBITDA**: Stands for Earnings Before Interest, Taxes, Depreciation, and Amortization.

### Financial Analysis and Reporting
- **Financial Ratios**: Include metrics like debt-to-equity ratio and return on equity used to assess a company's financial health.
- **Earnings Per Share**: Calculated as net income divided by the number of outstanding shares.
- **Tax Rate**: Refers to the percentage of income that a company pays in taxes.

### Investment and Risk Management
- **Debt**: Represents borrowed funds that a company must repay.
- **Investment Gains/Losses**: Reflect the profits or losses from investment activities.
- **Hedging Activities**: Strategies used to reduce risks associated with price fluctuations.
- **Derivative Instruments**: Financial contracts whose value is derived from an underlying asset.

### Other Financial Terms
- **Common Stock**: Represents ownership in a company and typically carries voting rights.
- **Subsequent Events**: Events occurring after the end of a reporting period that may impact financial statements.
- **Fair Value Measurements**: Refers to the estimated value of an asset or liability based on market conditions.
- **Geographic Concentration Risk**: Risk associated with a company's heavy reliance on a particular geographic region.

---

## Directory Structure



FinTech-Lab-Summer-2024/
├── .github/workflows # CI/CD pipelines for automated testing and deployment.
│ └── python-ci.yml
├── docs # Documentation related to the project.
├── src # Source code for the project.
│ ├── analysis # Scripts for data analysis.
│ │ ├── csv # CSV files with analyzed financial data.
│ │ ├── text-summaries # Textual summaries extracted from 10-K filings.
│ │ └── txt_to_csv.py # Script to convert text data to CSV.
│ │
│ ├── app # Streamlit application.
│ │ └── streamlit_app.py # Main application script.
│ │
│ ├── scripts # Utility scripts for data processing.
│ │ ├── data_extraction.py # Script for downloading SEC filings.
│ │ ├── feature_extraction.py # Script for feature extraction from text.
│ │ └── lemmitization.py # Script for text normalization.
│ │
│ └── data # Data used or generated by the scripts.
│ ├── pre-processed_data # Preprocessed datasets.
│ └── pre-processing_scripts # Scripts that preprocess data.
├── tests # Automated tests for the project.
│ └── test_analysis.py # Test cases for data analysis scripts.
├── requirements.txt # Project dependencies.
└── README.md # Project overview and setup instructions.

---

## **Streamlit Link**

[url-link](www.google.com)

## Tech-Stack


## Contributing

Contributions are welcome! Please fork the repository and open a pull request with your features or fixes.

## License
This project is licensed under the terms of the MIT license.

0 comments on commit 89edfe1

Please sign in to comment.