diff --git a/README.md b/README.md index 1c26fbca..6ded9648 100644 --- a/README.md +++ b/README.md @@ -1,145 +1 @@ -# Atoma Node infrastructure - -## Introduction - -The present repository contains the logic to run an Atoma node. Atoma nodes empower the Atoma Network, a decentralized network -for verifiable AI inference. Nodes can lend their GPU compute to the protocol, so large language models (LLMs for short) can be -run across a decentralized network, at a cheaper cost. Moreover, nodes can be organized in order to provide verifiability guarantees -of the correctness of their generated outputs. This means that the Atoma Network can empower smart contracts, deployed on any blockchain, -to run verifiable inference and guarantee an intelligence layer to Web3. - -## Run a node - -In order to run an Atoma node, you should provide enough evidence of holding a powerful enough machine for AI inference. We -allow a wide range of possible hardware, including NVIDIA GPU series (RTX3090, RTX4090, A100, etc), as well as a good enough -Internet bandwidth connection. We further highly encourage nodes to register on Atoma contracts on supported blockchains -(these include Arbitrum, Solana, Sui, etc). In order to register, we suggest the reader to follow the instructions in the -Atoma contract [repo](https://github.com/atoma-network/atoma-contracts). - -Once registration has been completed, the user is required to: - -1. Clone this repo (it is assumed that Rust is installed, otherwise follow the instructions [here](https://www.rust-lang.org/tools/install)). -2. Create configuration files for both the model inference service, the event listener service and the blockchain client service. Notice -that the event listener and blockchain client services need to be for the same blockchain, in which the node has previously registered. That said, a single node can be registered in multiple blockchains and listen to events on each of these (for higher accrued rewards). -3. The model inference service follows schematically (in toml format): - -```toml -api_key = "YOUR_HUGGING_FACE_API_KEY" # for downloading models -cache_dir = "PATH_FOR_MODEL_STORAGE" # where you want the downloaded models to be stored -flush_storage = FLUSH_STORAGE # bool value, when the user stops the Atoma node, it flushes or not the downloaded models -jrpc_port = JRPC_PORT # Atoma node port for JSON rpc service -models = [MODEL_CONFIG] # Specifications for each model the user wants to operate, as an Atoma Node -tracing = TRACING # bool value, allows for tracing -``` - -4. In the above point, the `MODEL_CONFIG` refers to a set of supported model configurations, see below, as follows - -``` -[DEVICE, PRECISION, MODEL_TYPE, USE_FLASH_ATTENTION] -``` - -where `DEVICE` is an integer referring to the device index of GPU operating (if your machine only supports one single GPU cards, device should be 0, or if using cpu or metal devices). `PRECISION` refers to the model inference precision, supported values are -`"f32"`, `"bf16"`, `"f16"` (if you host quantized models, this field is not relevant). `MODEL_TYPE` is a string referring to the name -of the model to be hosted (on the given device), a full list of model names can be found here. `USE_FLASH_ATTENTION` is a boolean value which allows to run inference with the optimized flash attention algorithm. - -5. The event subscriber service configuration file is specified as (in toml format): - -```toml -http_url = "RPC_NODE_HTTP_URL" # to connect via http to a rpc node on the blockchain -ws_url = "RPC_NODE_WEB_SOCKET_URL" # to connect via web socket to a rpc node on the blockchain, relevant for listening to events -package_id = "SUI_PACKAGE_ID" # the Atoma contract object id, on Sui. -small_id = JRPC_PORT # a unique identifier provided to the node, upon on-chain registration - -[request_timeout] # a request timeout parameter -secs = 300 -nanos = 0 -``` - -6. The Atoma blockchain client service configuration file is specified as (in toml format): - -```toml -config_path = "SUI_CLIENT_CONFIG_PATH" # the path to the sui client configuration path (for connecting the user's wallet) -atoma_db_id = "ATOMA_DB_ID" # the Atoma db object id, this value is publicly available (see below) -node_badge_id = "NODE_BADGE_ID" # the node's own badge object id, this value is provided to the node upon registration -package_id = "ATOMA_CALL_CONTRACT_ID" # the Atoma's contract package id, this value is publicly available (see below) -small_id = SMALL_ID # a unique identifier provided to the node, upon on-chain registration (same as above) - -max_concurrent_requests = 1000 # how many concurrent requests are supported by the Sui's client service - -[request_timeout] # a request timeout parameter -secs = 300 -nanos = 0 -``` - -7. Once the node is registered and the configuration files set, the node then just needs to run the following commands: - -```sh -$ cd atoma-node -$ RUST_LOG=info cargo run --release --features -- --atoma-sui-client-config-path --model-config-path --sui-subscriber-path -``` - -The value `YOUR_GPU_ENV` could be either `cuda`, `metal`, `flash-attn` or if you wish to run inference on the CPU, remove the `--features `. If you set `use_flash_attention = true` in 4. above, you should execute the binary as - -```sh -$ -RUST_LOG=info cargo run --release --features flash-attn -- --atoma-sui-client-config-path --model-config-path --sui-subscriber-path -``` - -## Supported models - -The supported models currently are: - -| Model Type | Hugging Face model name | -|--------------------------------|--------------------------------| -| falcon_7b | tiiuae/falcon-7b | -| falcon_40b | tiiuae/falcon-40b | -| falcon_180b | tiiuae/falcon-180b | -| llama_v1 | Narsil/amall-7b | -| llama_v2 | meta-llama/Llama-2-7b-hf | -| llama_solar_10_7b | upstage/SOLAR-10.7B-v1.0 | -| llama_tiny_llama_1_1b_chat | TinyLlama/TinyLlama-1.1B-Chat-v1.0 | -| llama3_8b | meta-llama/Meta-Llama-3-8B | -| llama3_instruct_8b | meta-llama/Meta-Llama-3-8B-Instruct | -| llama3_70b | meta-llama/Meta-Llama-3-70B | -| mamba_130m | state-spaces/mamba-130m | -| mamba_370m | state-spaces/mamba-370m | -| mamba_790m | state-spaces/mamba-790m | -| mamba_1-4b | state-spaces/mamba-1.4b | -| mamba_2-8b | state-spaces/mamba-2.8b | -| mistral_7bv01 | mistralai/Mistral-7B-v0.1 | -| mistral_7bv02 | mistralai/Mistral-7B-v0.2 | -| mistral_7b-instruct-v01 | mistralai/Mistral-7B-Instruct-v0.1 | -| mistral_7b-instruct-v02 | mistralai/Mistral-7B-Instruct-v0.2 | -| mixtral_8x7b | mistralai/Mixtral-8x7B-v0.1 | -| phi_3-mini | microsoft/Phi-3-mini-4k-instruct | -| stable_diffusion_v1-5 | runwayml/stable-diffusion-v1-5 | -| stable_diffusion_v2-1 | stabilityai/stable-diffusion-2-1 | -| stable_diffusion_xl | stabilityai/stable-diffusion-xl-base-1.0 | -| stable_diffusion_turbo | stabilityai/sdxl-turbo | -| quantized_7b | TheBloke/Llama-2-7B-GGML | -| quantized_13b | TheBloke/Llama-2-13B-GGML | -| quantized_70b | TheBloke/Llama-2-70B-GGML | -| quantized_7b-chat | TheBloke/Llama-2-7B-Chat-GGML | -| quantized_13b-chat | TheBloke/Llama-2-13B-Chat-GGML | -| quantized_70b-chat | TheBloke/Llama-2-70B-Chat-GGML | -| quantized_7b-code | TheBloke/CodeLlama-7B-GGUF | -| quantized_13b-code | TheBloke/CodeLlama-13B-GGUF | -| quantized_32b-code | TheBloke/CodeLlama-34B-GGUF | -| quantized_7b-leo | TheBloke/leo-hessianai-7B-GGUF | -| quantized_13b-leo | TheBloke/leo-hessianai-13B-GGUF | -| quantized_7b-mistral | TheBloke/Mistral-7B-v0.1-GGUF | -| quantized_7b-mistral-instruct | TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF | -| quantized_7b-mistral-instruct-v0.2 | TheBloke/Mistral-7B-Instruct-v0.2-GGUF | -| quantized_7b-zephyr-a | TheBloke/zephyr-7B-alpha-GGUF | -| quantized_7b-zephyr-b | TheBloke/zephyr-7B-beta-GGUF | -| quantized_7b-open-chat-3.5 | TheBloke/openchat_3.5-GGUF | -| quantized_7b-starling-a | TheBloke/Starling-LM-7B-alpha-GGUF | -| quantized_mixtral | TheBloke/Mixtral-8x7B-v0.1-GGUF | -| quantized_mixtral-instruct | TheBloke/Mistral-7B-Instruct-v0.1-GGUF | -| quantized_llama3-8b | QuantFactory/Meta-Llama-3-8B-GGUF | - -For example, if a user wants to run a node hosting a quantized 7b mistral model, it can do so simply by setting - -``` -MODEL_TYPE = "quantized_7b-mistral" -``` +# atoma-node diff --git a/atoma-types/src/lib.rs b/atoma-types/src/lib.rs index ad0f6eb3..895e340f 100644 --- a/atoma-types/src/lib.rs +++ b/atoma-types/src/lib.rs @@ -4,14 +4,6 @@ use serde_json::{json, Value}; pub type SmallId = u64; -/// Represents a request object containing information about a request -/// Prompt event, emitted on a given blockchain, see https://github.com/atoma-network/atoma-contracts/blob/main/sui/packages/atoma/sources/gate.move#L45. -/// It includes information about a ticket ID, sampled nodes, and request parameters. -/// -/// Fields: -/// id: Vec - The ticket ID associated with the request (or event). -/// sampled_nodes: Vec - A vector of sampled nodes, each represented by a SmallId structure. -/// body: serde_json::Value - JSON value containing request parameters. #[derive(Clone, Debug, Deserialize)] pub struct Request { #[serde(rename(deserialize = "ticket_id"))] @@ -69,9 +61,6 @@ impl TryFrom for Request { } } -/// Parses the body of a JSON value. This JSON value is supposed to be obtained -/// from a Sui `Text2TextPromptEvent`, -/// see https://github.com/atoma-network/atoma-contracts/blob/main/sui/packages/atoma/sources/gate.move#L28 fn parse_body(json: Value) -> Result { let output = json!({ "max_tokens": parse_u64(&json["max_tokens"])?, @@ -101,8 +90,6 @@ fn parse_f32_from_le_bytes(value: &Value) -> Result { Ok(f32::from_le_bytes(f32_le_bytes)) } -/// Parses an appropriate JSON value, representing a `u64` number, from a Sui -/// `Text2TextPromptEvent` `u64` fields. fn parse_u64(value: &Value) -> Result { value .as_str() @@ -111,12 +98,6 @@ fn parse_u64(value: &Value) -> Result { .map_err(|e| anyhow!("Failed to parse `u64` from string, with error: {e}")) } -/// Represents a response object containing information about a response, including an ID, sampled nodes, and the response data. -/// -/// Fields: -/// id: Vec - The ticket id associated with the request, that lead to the generation of this response. -/// sampled_nodes: Vec - A vector of sampled nodes, each represented by a SmallId structure. -/// response: serde_json::Value - JSON value containing the response data. #[derive(Debug)] pub struct Response { id: Vec,