Skip to content

Commit

Permalink
docs: add architecture documentation and update README
Browse files Browse the repository at this point in the history
- Add comprehensive system architecture documentation
- Add component diagrams using mermaid
- Document data flow and security architecture
- Add database schema and deployment architecture
- Update README with prominent links to documentation
- Add Python badge to tech stack
  • Loading branch information
jackccrawford committed Nov 21, 2024
1 parent 355310c commit 5093ee3
Show file tree
Hide file tree
Showing 2 changed files with 320 additions and 0 deletions.
9 changes: 9 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,12 @@ Transform GPU monitoring from complex metrics into intuitive visual patterns. En
![Dark Mode Dashboard](images/DarkMode-Stressed.png)
*Real-time GPU metrics visualized for instant comprehension*

## Project Overview
- [Requirements & User Stories](docs/requirements/REQUIREMENTS.md)
- [Technical Architecture](docs/architecture/ARCHITECTURE.md)
- [Development Guide](docs/requirements/DEVELOPMENT_GUIDE.md)
- [API Documentation](docs/API.md)

## Why GPU Sentinel Pro?

Do you find yourself:
Expand Down Expand Up @@ -86,13 +92,16 @@ See [Installation Guide](docs/INSTALLATION.md) for detailed setup instructions.

## Documentation

- [Requirements & User Stories](docs/requirements/REQUIREMENTS.md)
- [Technical Architecture](docs/architecture/ARCHITECTURE.md)
- [API Reference](docs/API.md)
- [Installation Guide](docs/INSTALLATION.md)
- [Contributing Guide](CONTRIBUTING.md)
- [Security Policy](SECURITY.md)

## Tech Stack

![Python](https://img.shields.io/badge/Python-3.10%2B-blue?style=for-the-badge&logo=python&logoColor=white)
![FastAPI](https://img.shields.io/badge/FastAPI-005571?style=for-the-badge&logo=fastapi)
![React](https://img.shields.io/badge/React-20232A?style=for-the-badge&logo=react&logoColor=61DAFB)
![TypeScript](https://img.shields.io/badge/TypeScript-007ACC?style=for-the-badge&logo=typescript&logoColor=white)
Expand Down
311 changes: 311 additions & 0 deletions docs/architecture/ARCHITECTURE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,311 @@
# GPU Sentinel Pro - System Architecture

## System Overview

```mermaid
graph TB
subgraph "Frontend Layer"
R[React Application]
V[Vite Dev Server]
end
subgraph "Backend Layer"
F[FastAPI Server]
N[NVIDIA SMI Interface]
A[Alert Manager]
end
subgraph "Data Layer"
S[(Supabase DB)]
C[Cache Layer]
end
R -->|HTTP/WebSocket| F
F -->|Query| S
F -->|Commands| N
F -->|Triggers| A
A -->|Store| S
F -->|Cache| C
```

## Component Architecture

### Frontend Components

```mermaid
graph TB
subgraph "UI Layer"
D[Dashboard]
M[Metrics Display]
A[Alert Panel]
H[History View]
end
subgraph "State Management"
Q[Query Client]
S[State Store]
end
subgraph "Data Layer"
AP[API Client]
WS[WebSocket Client]
end
D --> M
D --> A
D --> H
M --> Q
A --> Q
H --> Q
Q --> AP
Q --> WS
Q --> S
```

### Backend Services

```mermaid
graph LR
subgraph "API Layer"
E[Endpoints]
M[Middleware]
A[Auth]
end
subgraph "Core Services"
GM[GPU Monitor]
AM[Alert Manager]
HM[History Manager]
end
subgraph "Infrastructure"
DB[Database]
C[Cache]
N[NVIDIA SMI]
end
E --> M
M --> A
M --> GM
M --> AM
M --> HM
GM --> N
AM --> DB
HM --> DB
GM --> C
```

## Data Flow

### Real-time Metrics Flow
1. NVIDIA SMI polls GPU metrics (250ms intervals)
2. Backend processes and validates data
3. WebSocket pushes updates to frontend
4. React components re-render with new data
5. Metrics stored in time-series database

### Alert Flow
1. Backend evaluates metrics against thresholds
2. Alert triggered if threshold exceeded
3. Alert stored in database
4. WebSocket pushes alert to frontend
5. Alert notification displayed
6. External notifications sent (email/webhook)

## Technical Components

### Frontend Stack
- **Framework**: React 18+
- **Language**: TypeScript 5+
- **Build Tool**: Vite
- **State Management**: React Query
- **UI Components**: Custom components
- **Data Visualization**: Custom charts
- **WebSocket Client**: Native WebSocket

### Backend Stack
- **Framework**: FastAPI
- **Language**: Python 3.10+
- **ASGI Server**: Uvicorn
- **Task Queue**: Background tasks
- **Caching**: In-memory + Redis
- **Monitoring**: Custom metrics

### Database Schema

#### GPU Metrics Table
```sql
CREATE TABLE gpu_metrics (
id BIGSERIAL PRIMARY KEY,
timestamp TIMESTAMPTZ NOT NULL,
gpu_id INTEGER NOT NULL,
temperature FLOAT,
memory_used BIGINT,
memory_total BIGINT,
gpu_utilization INTEGER,
power_draw FLOAT,
power_limit FLOAT,
fan_speed INTEGER,
metadata JSONB,
created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_gpu_metrics_timestamp
ON gpu_metrics (timestamp DESC);
CREATE INDEX idx_gpu_metrics_gpu_id
ON gpu_metrics (gpu_id);
```

#### Alerts Table
```sql
CREATE TABLE alerts (
id BIGSERIAL PRIMARY KEY,
timestamp TIMESTAMPTZ NOT NULL,
gpu_id INTEGER NOT NULL,
alert_type VARCHAR(50) NOT NULL,
severity VARCHAR(20) NOT NULL,
message TEXT NOT NULL,
value FLOAT,
threshold FLOAT,
acknowledged BOOLEAN DEFAULT FALSE,
acknowledged_at TIMESTAMPTZ,
created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_alerts_timestamp
ON alerts (timestamp DESC);
CREATE INDEX idx_alerts_gpu_id
ON alerts (gpu_id);
```

## Security Architecture

### Authentication Flow
1. Client requests access
2. Server validates credentials
3. JWT token issued
4. Token included in subsequent requests
5. Token refresh mechanism

### Authorization Levels
- **Admin**: Full system access
- **User**: View and acknowledge alerts
- **Reader**: View-only access
- **API**: Programmatic access

### Data Security
- Encryption at rest
- TLS for data in transit
- Secure WebSocket connections
- Rate limiting
- Input validation

## Deployment Architecture

### Development Environment
```mermaid
graph LR
D[Developer] --> L[Local Environment]
L --> T[Tests]
T --> G[Git]
G --> A[GitHub Actions]
```

### Production Environment
```mermaid
graph LR
G[GitHub] --> A[GitHub Actions]
A --> B[Build]
B --> T[Test]
T --> D[Deploy]
D --> P[Production]
```

## Performance Considerations

### Frontend Optimization
- Component memoization
- Virtual scrolling for large datasets
- Efficient re-rendering
- Asset optimization
- Code splitting

### Backend Optimization
- Connection pooling
- Query optimization
- Caching strategy
- Async operations
- Resource limits

### Database Optimization
- Partitioning strategy
- Index optimization
- Query performance
- Data retention
- Backup strategy

## Monitoring and Logging

### System Metrics
- API response times
- WebSocket performance
- Database query times
- Cache hit rates
- Error rates

### Application Logs
- Request/response logging
- Error tracking
- Performance metrics
- Security events
- System health

## Scalability Considerations

### Horizontal Scaling
- Stateless backend
- Load balancing
- Session management
- Cache distribution
- Database replication

### Vertical Scaling
- Resource optimization
- Memory management
- Connection pooling
- Query optimization
- Batch processing

## Future Architecture Considerations

### Planned Enhancements
- Kubernetes integration
- Cloud provider metrics
- ML-based predictions
- Advanced analytics
- Custom dashboards

### Technical Debt Management
- Code quality metrics
- Performance monitoring
- Security scanning
- Dependency updates
- Documentation updates

## Development Workflow

### Code Pipeline
```mermaid
graph LR
F[Feature Branch] --> T[Tests]
T --> R[Review]
R --> M[Main Branch]
M --> D[Deploy]
```

### Quality Assurance
- Automated testing
- Code review process
- Performance testing
- Security scanning
- Documentation review

0 comments on commit 5093ee3

Please sign in to comment.