Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(backend): enhance config loader with embedding support and impro… #74

Merged
merged 20 commits into from
Jan 6, 2025
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
78a3545
feat(backend): enhance config loader with embedding support and impro…
Sma1lboy Dec 16, 2024
72c6dbc
feat(backend): implement ModelStatusManager for tracking model downlo…
Sma1lboy Dec 16, 2024
1340532
chore: adding uitls ufc
Sma1lboy Dec 16, 2024
0ee6b84
Merge branch 'main' of https://github.com/Sma1lboy/codefox into feat-…
NarwhalChen Dec 25, 2024
0b412b0
feat: adding embedding provider and openai-embedding provider
NarwhalChen Dec 25, 2024
b378185
[autofix.ci] apply automated fixes
autofix-ci[bot] Dec 25, 2024
4c66d05
feat: implmenting dynamic loader for embedding and models in hugging …
NarwhalChen Dec 28, 2024
ad74c49
Merge branch 'feat-adding-embedding-config' of https://github.com/Sma…
NarwhalChen Dec 28, 2024
97cae53
Merge branch 'main' into feat-adding-embedding-config
Sma1lboy Dec 28, 2024
e579146
feat(backend): refactor model downloading and configuration handling
Sma1lboy Dec 28, 2024
8b25760
Merge branch 'feat-adding-embedding-config' of https://github.com/Sma…
NarwhalChen Dec 30, 2024
b7a8b30
to: using fastemb to implement embedding downloader
NarwhalChen Dec 30, 2024
7e68b3a
to: moving embedding provider to backend
NarwhalChen Dec 30, 2024
2c2027d
[autofix.ci] apply automated fixes
autofix-ci[bot] Dec 30, 2024
fb9be7d
feat: request llm-server with openai/some embedding provider
NarwhalChen Dec 30, 2024
0b85081
Merge branch 'feat-adding-embedding-config' of https://github.com/Sma…
NarwhalChen Dec 30, 2024
6ad8b9b
to: adding return type and comments for generateEmbResponse
NarwhalChen Dec 31, 2024
56c6ffc
fix: fixing the change of env variable
NarwhalChen Jan 6, 2025
b9c1909
fix: deleting useless embedding index.ts
NarwhalChen Jan 6, 2025
50d3bd1
Merge branch 'main' into feat-adding-embedding-config
ZHallen122 Jan 6, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 19 additions & 5 deletions backend/src/config/common-path.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import * as path from 'path';
import { existsSync, mkdirSync, promises } from 'fs-extra';
import { existsSync, mkdirSync, promises, writeFileSync } from 'fs-extra';
import { ConfigType } from '@nestjs/config';

// Constants for base directories
const APP_NAME = 'codefox';
Expand Down Expand Up @@ -27,17 +28,30 @@ const ensureDir = (dirPath: string): string => {
export const getRootDir = (): string => ensureDir(ROOT_DIR);

// Configuration Paths
export const getConfigDir = (): string =>
ensureDir(path.join(getRootDir(), 'config'));
export const getConfigPath = (configName: string): string =>
path.join(getConfigDir(), `${configName}.json`);
export const getConfigPath = (): string => {
const rootPath = ensureDir(path.join(getRootDir()));
return path.join(rootPath, 'config.json');
};

export const getModelStatusPath = (): string => {
const rootPath = ensureDir(getRootDir());
const modelStatusPath = path.join(rootPath, `model-status.json`);
writeFileSync(modelStatusPath, '{}');
return modelStatusPath;
};

// Models Directory
export const getModelsDir = (): string =>
ensureDir(path.join(getRootDir(), 'models'));
export const getModelPath = (modelName: string): string =>
path.join(getModelsDir(), modelName);

// Embs Directory
export const getEmbDir = (): string =>
ensureDir(path.join(getRootDir(), 'embeddings'));
export const getEmbPath = (modelName: string): string =>
path.join(getModelsDir(), modelName);

Comment on lines +51 to +56
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Use getEmbDir() instead of getModelsDir() in getEmbPath to avoid confusion.
Currently, getEmbPath points to the main models directory rather than the dedicated embeddings directory, which might lead to unintended overwriting or mixing of files.

 export const getEmbPath = (modelName: string): string =>
-  path.join(getModelsDir(), modelName);
+  path.join(getEmbDir(), modelName);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Embs Directory
export const getEmbDir = (): string =>
ensureDir(path.join(getRootDir(), 'embeddings'));
export const getEmbPath = (modelName: string): string =>
path.join(getModelsDir(), modelName);
// Embs Directory
export const getEmbDir = (): string =>
ensureDir(path.join(getRootDir(), 'embeddings'));
export const getEmbPath = (modelName: string): string =>
path.join(getEmbDir(), modelName);

// Project-Specific Paths
export const getProjectsDir = (): string =>
ensureDir(path.join(getRootDir(), 'projects'));
Expand Down
228 changes: 213 additions & 15 deletions backend/src/config/config-loader.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,54 +2,252 @@ import * as fs from 'fs';
import * as path from 'path';
import * as _ from 'lodash';
import { getConfigPath } from './common-path';
export interface ChatConfig {
import { ConfigType } from 'src/downloader/universal-utils';
import { Logger } from '@nestjs/common';

export interface ModelConfig {
model: string;
endpoint?: string;
token?: string;
default?: boolean;
task?: string;
}

export class ConfigLoader {
private chatsConfig: ChatConfig[];
export interface EmbeddingConfig {
model: string;
endpoint?: string;
default?: boolean;
token?: string;
}

export interface AppConfig {
models?: ModelConfig[];
embeddings?: EmbeddingConfig[];
}

export const exampleConfigContent = `{
// Chat models configuration
// You can configure multiple chat models
"models": [
// Example of OpenAI GPT configuration
{
"model": "gpt-3.5-turbo",
"endpoint": "https://api.openai.com/v1",
"token": "your-openai-token", // Replace with your OpenAI token
"default": true // Set as default chat model
},

// Example of local model configuration
{
"model": "llama2",
"endpoint": "http://localhost:11434/v1"
}
],

// Embedding model configuration (optional)
"embeddings": [{
"model": "text-embedding-ada-002",
"endpoint": "https://api.openai.com/v1",
"token": "your-openai-token", // Replace with your OpenAI token
"default": true // Set as default embedding
}]
}`;

export class ConfigLoader {
readonly logger = new Logger(ConfigLoader.name);
private type: string;
private static instances: Map<ConfigType, ConfigLoader> = new Map();
private static config: AppConfig;
private readonly configPath: string;

constructor() {
this.configPath = getConfigPath('config');
private constructor(type: ConfigType) {
this.type = type;
this.configPath = getConfigPath();
this.initConfigFile();
this.loadConfig();
}

public static getInstance(type: ConfigType): ConfigLoader {
if (!ConfigLoader.instances.has(type)) {
ConfigLoader.instances.set(type, new ConfigLoader(type));
}
return ConfigLoader.instances.get(type)!;
}

public initConfigFile(): void {
Logger.log('Creating example config file', 'ConfigLoader');

const config = getConfigPath();
if (fs.existsSync(config)) {
return;
}

if (!fs.existsSync(config)) {
//make file
fs.writeFileSync(config, exampleConfigContent, 'utf-8');
}
Logger.log('Creating example config file', 'ConfigLoader');
}

public reload(): void {
this.loadConfig();
}

private loadConfig() {
const file = fs.readFileSync(this.configPath, 'utf-8');
try {
Logger.log(
`Loading configuration from ${this.configPath}`,
'ConfigLoader',
);
const file = fs.readFileSync(this.configPath, 'utf-8');
const jsonContent = file.replace(
/\\"|"(?:\\"|[^"])*"|(\/\/.*|\/\*[\s\S]*?\*\/)/g,
(m, g) => (g ? '' : m),
);
ConfigLoader.config = JSON.parse(jsonContent);
this.validateConfig();
} catch (error) {
if (
error.code === 'ENOENT' ||
error.message.includes('Unexpected end of JSON input')
) {
ConfigLoader.config = {};
this.saveConfig();
} else {
throw error;
}
}

this.chatsConfig = JSON.parse(file);
console.log('Raw file content:', this.chatsConfig);
this.logger.log(ConfigLoader.config);
}

get<T>(path: string) {
get<T>(path?: string): T {
if (!path) {
return this.chatsConfig as unknown as T;
return ConfigLoader.config as unknown as T;
}
return _.get(this.chatsConfig, path) as T;
return _.get(ConfigLoader.config, path) as T;
}

set(path: string, value: any) {
_.set(this.chatsConfig, path, value);
_.set(ConfigLoader.config, path, value);
this.saveConfig();
}

private saveConfig() {
const configDir = path.dirname(this.configPath);
if (!fs.existsSync(configDir)) {
fs.mkdirSync(configDir, { recursive: true });
}
fs.writeFileSync(
this.configPath,
JSON.stringify(this.chatsConfig, null, 4),
JSON.stringify(ConfigLoader.config, null, 2),
'utf-8',
);
}

addConfig(config: ModelConfig | EmbeddingConfig) {
if (!ConfigLoader.config[this.type]) {
ConfigLoader.config[this.type] = [];
}
this.logger.log(ConfigLoader.config);
const index = ConfigLoader.config[this.type].findIndex(
(chat) => chat.model === config.model,
);
if (index !== -1) {
ConfigLoader.config[this.type].splice(index, 1);
}

if (config.default) {
ConfigLoader.config.models.forEach((chat) => {
chat.default = false;
});
}

ConfigLoader.config[this.type].push(config);
this.saveConfig();
}
Comment on lines +148 to +168
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Potential mix-up with this.type and models.
When adding a default embedding, the loop explicitly sets ConfigLoader.config.models.forEach(...) to disable existing defaults. This might unintentionally alter models when this.type is ConfigType.EMBEDDINGS.

+ // If adding an embedding config, ensure we only reset defaults among embeddings
 if (config.default) {
-  ConfigLoader.config.models.forEach((chat) => {
+  (ConfigLoader.config[this.type] as (ModelConfig | EmbeddingConfig)[]).forEach((c) => {
     c.default = false;
   });
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
addConfig(config: ModelConfig | EmbeddingConfig) {
if (!ConfigLoader.config[this.type]) {
ConfigLoader.config[this.type] = [];
}
this.logger.log(ConfigLoader.config);
const index = ConfigLoader.config[this.type].findIndex(
(chat) => chat.model === config.model,
);
if (index !== -1) {
ConfigLoader.config[this.type].splice(index, 1);
}
if (config.default) {
ConfigLoader.config.models.forEach((chat) => {
chat.default = false;
});
}
ConfigLoader.config[this.type].push(config);
this.saveConfig();
}
addConfig(config: ModelConfig | EmbeddingConfig) {
if (!ConfigLoader.config[this.type]) {
ConfigLoader.config[this.type] = [];
}
this.logger.log(ConfigLoader.config);
const index = ConfigLoader.config[this.type].findIndex(
(chat) => chat.model === config.model,
);
if (index !== -1) {
ConfigLoader.config[this.type].splice(index, 1);
}
if (config.default) {
(ConfigLoader.config[this.type] as (ModelConfig | EmbeddingConfig)[]).forEach((c) => {
c.default = false;
});
}
ConfigLoader.config[this.type].push(config);
this.saveConfig();
}


removeConfig(modelName: string): boolean {
if (!ConfigLoader.config[this.type]) {
return false;
}

const initialLength = ConfigLoader.config[this.type].length;
ConfigLoader.config.models = ConfigLoader.config[this.type].filter(
(chat) => chat.model !== modelName,
);

if (ConfigLoader.config[this.type].length !== initialLength) {
this.saveConfig();
return true;
}

return false;
}
Comment on lines +170 to +186
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Inconsistent usage of this.type in removeConfig.
Updating ConfigLoader.config.models even if this.type is 'embeddings' poses a risk of accidentally removing clone items from models.

- ConfigLoader.config.models = ConfigLoader.config[this.type].filter(
-   (chat) => chat.model !== modelName,
- );
+ ConfigLoader.config[this.type] = ConfigLoader.config[this.type].filter(
+   (chat: ModelConfig | EmbeddingConfig) => chat.model !== modelName,
+ );
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
removeConfig(modelName: string): boolean {
if (!ConfigLoader.config[this.type]) {
return false;
}
const initialLength = ConfigLoader.config[this.type].length;
ConfigLoader.config.models = ConfigLoader.config[this.type].filter(
(chat) => chat.model !== modelName,
);
if (ConfigLoader.config[this.type].length !== initialLength) {
this.saveConfig();
return true;
}
return false;
}
removeConfig(modelName: string): boolean {
if (!ConfigLoader.config[this.type]) {
return false;
}
const initialLength = ConfigLoader.config[this.type].length;
ConfigLoader.config[this.type] = ConfigLoader.config[this.type].filter(
(chat: ModelConfig | EmbeddingConfig) => chat.model !== modelName,
);
if (ConfigLoader.config[this.type].length !== initialLength) {
this.saveConfig();
return true;
}
return false;
}


getAllConfigs(): EmbeddingConfig[] | ModelConfig[] | null {
const res = ConfigLoader.config[this.type];
return Array.isArray(res) ? res : null;
}

validateConfig() {
if (!this.chatsConfig) {
throw new Error("Invalid configuration: 'chats' section is missing.");
if (!ConfigLoader.config) {
ConfigLoader.config = {};
}

if (typeof ConfigLoader.config !== 'object') {
throw new Error('Invalid configuration: Must be an object');
}

if (ConfigLoader.config.models) {
if (!Array.isArray(ConfigLoader.config.models)) {
throw new Error("Invalid configuration: 'chats' must be an array");
}

ConfigLoader.config.models.forEach((chat, index) => {
if (!chat.model) {
throw new Error(
`Invalid chat configuration at index ${index}: 'model' is required`,
);
}
});

const defaultChats = ConfigLoader.config.models.filter(
(chat) => chat.default,
);
if (defaultChats.length > 1) {
throw new Error(
'Invalid configuration: Multiple default chat configurations found',
);
}
}

if (ConfigLoader.config[ConfigType.EMBEDDINGS]) {
this.logger.log(ConfigLoader.config[ConfigType.EMBEDDINGS]);
if (!Array.isArray(ConfigLoader.config[ConfigType.EMBEDDINGS])) {
throw new Error("Invalid configuration: 'embeddings' must be an array");
}

ConfigLoader.config.models.forEach((emb, index) => {
if (!emb.model) {
throw new Error(
`Invalid chat configuration at index ${index}: 'model' is required`,
);
}
});

const defaultChats = ConfigLoader.config[ConfigType.EMBEDDINGS].filter(
(chat) => chat.default,
);
if (defaultChats.length > 1) {
throw new Error(
'Invalid configuration: Multiple default emb configurations found',
);
}
}
}

getConfig(): AppConfig {
return ConfigLoader.config;
}
}
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@
import path from 'path';
import * as fs from 'fs';
import { ConfigLoader } from '../../config/config-loader';
import { ModelDownloader } from '../model-downloader';
import { downloadAllModels } from '../utils';
import { getConfigDir, getConfigPath } from 'src/config/common-path';
import {
ConfigLoader,
ModelConfig,
EmbeddingConfig,
} from '../../config/config-loader';
import { UniversalDownloader } from '../model-downloader';
import { ConfigType, downloadAll, TaskType } from '../universal-utils';

const originalIsArray = Array.isArray;

Expand Down Expand Up @@ -36,27 +39,35 @@ Array.isArray = jest.fn((type: any): type is any[] => {
// });

describe('loadAllChatsModels with real model loading', () => {
let configLoader: ConfigLoader;
let modelConfigLoader: ConfigLoader;
let embConfigLoader: ConfigLoader;
beforeAll(async () => {
const testConfig = [
{
model: 'Felladrin/onnx-flan-alpaca-base',
task: 'text2text-generation',
},
];
const configPath = getConfigPath('config');
fs.writeFileSync(configPath, JSON.stringify(testConfig, null, 2), 'utf8');
modelConfigLoader = ConfigLoader.getInstance(ConfigType.CHATS);
embConfigLoader = ConfigLoader.getInstance(ConfigType.EMBEDDINGS);
const modelConfig: ModelConfig = {
model: 'Xenova/flan-t5-small',
endpoint: 'http://localhost:11434/v1',
token: 'your-token-here',
task: 'text2text-generation',
};
modelConfigLoader.addConfig(modelConfig);

configLoader = new ConfigLoader();
await downloadAllModels();
}, 600000);
const embConfig: EmbeddingConfig = {
model: 'fast-bge-base-en-v1.5',
endpoint: 'http://localhost:11434/v1',
token: 'your-token-here',
};
embConfigLoader.addConfig(embConfig);
console.log('preload starts');
await downloadAll();
console.log('preload successfully');
}, 60000000);

it('should load real models specified in config', async () => {
const downloader = ModelDownloader.getInstance();

const downloader = UniversalDownloader.getInstance();
const chat1Model = await downloader.getLocalModel(
'text2text-generation',
'Felladrin/onnx-flan-alpaca-base',
TaskType.CHAT,
'Xenova/flan-t5-small',
);
expect(chat1Model).toBeDefined();
console.log('Loaded Model:', chat1Model);
Expand All @@ -82,7 +93,7 @@ describe('loadAllChatsModels with real model loading', () => {
} catch (error) {
console.error('Error during model inference:', error);
}
}, 600000);
}, 6000000);
});

afterAll(() => {
Expand Down
Loading
Loading