Skip to content

Commit

Permalink
Release 3.4.2 (#970)
Browse files Browse the repository at this point in the history
* [feature] external axon flags (#887)

* add external axon changes

* add defaults for new axon flags

* fix args to axon

* default to internal ip and port if not specified

* add new args and todefaults

* add axon unit tests

* add description for subtensor integration test

* move test to unit test

* create new test file
add/update copyright notices

* don't default to internal ip

* add tests for setting the full_address

* add tests for subtensor.serve w/external axon info

* allow external port config to be None

* switch to mock instead of patch

* fix test mocks

* change mock config create

* fix/add default config

* change asserts add mesage

* fix check call args

* fix mock config set

* only call once

* fix help wording

* should be True

* [fix] fixes unstake with max-stake flag (#905)

* add equality to None to the balance class

* add tests for the None case

* local train bug fix (#906)

* [feature] [CUDA solver] Add multi-GPU and ask for CUDA during btcli run (#893)

* added cuda solver

* boost versions to fix pip error

* allow choosing device id

* fix solution check to use keccak

* adds params for cuda and dev_id to register

* list devices by name during selection

* add block number logging

* fix calculation of hashrate

* fix update interval default

* add --TPB arg to register

* add update_interval flag

* switch back to old looping/work structure

* change typing

* device count is a function

* stop early if wallet registered

* add update interval and num proc flag

* add better number output

* optimize multiproc cpu reg
keeping proc until solution

* fix test

* change import to cubit

* fix import and default

* up default
should have default in CLI call

* add comments about params

* fix config var access

* add cubit as extra

* handle stale pow differently
check registration after failure

* restrict number of processes for integration test

* fix stale check

* use wallet.is_registered instead

* attempt to fix test issue

* fix my test

* oops typo

* typo again ugh

* remove print out

* fix partly reg test

* fix if solution None

* fix test?

* fix patch

* add args for cuda to subtensor

* add cuda args to reregister call

* add to wallet register the cuda args

* fix refs and tests

* add for val test also

* fix tests with rereg

* fix patch for tests

* add mock_register to subtensor passed instead

* move register under the check for isregistered

* use patch obj instead

* fit patch object

* fix prompt

* remove unneeded if

* modify POW submit to use rolling submit again

* add backoff to block get from network

* add test for backoff get block

* suppress the dev id flag if not set

* remove dest so it uses first arg

* fix pow submit loop

* move registration status with

* fix max attempts check

* remove status in subtensor.register

* add submit status

* change to neuron get instead

* fix count

* try to patch live display

* fix patch

* .

* separate test cases

* add POWNotStale and tests

* add more test cases for block get with retry

* fix return to None

* fix arg order

* fix indent

* add test to verify solution is submitted

* fix mock call

* patch hex bytes instead

* typo :/

* fix print out for unstake

* fix indexing into mock call

* call indexing

* access dict not with dot

* fix other indent

* add CUDAException for cubit

* up cubit version

* [Feature] ask cuda during btcli run (#890)

* add ask for cuda reg config in btcli run

* suppress unset arg

* [Feature] [cuda solver] multi gpu (#891)

* change diff display out

* remove logging

* check cubit support in the check config

* allow 1 or more devices in flag

* cuda flag should be suppress

* modify how cpu count is found

* make a solver base class

* add a solverbase for CUDA

* use mutli process kernel launching, one per GPU

* move check under dot get accessor

* Feature/cuda solver multi gpu (#892)

* change diff display out

* remove logging

* check cubit support in the check config

* allow 1 or more devices in flag

* cuda flag should be suppress

* modify how cpu count is found

* make a solver base class

* add a solverbase for CUDA

* use mutli process kernel launching, one per GPU

* move check under dot get accessor

* add All gpus specification

* continue trying reg after Stale

* catch for OSX

* dont use qsize

* add test for continue after being stale

* patch get_nowait instead of qsize

* [Docs] Update old docs link to new link. Change discord invite to custom link (#915)

* Update old docs link to new one

This change deletes the old gitbooks documentation link and replaces it with the new one.

* fix discord links

Co-authored-by: Mac Thrasher <[email protected]>

* Fix for test_neuron.py (#917)

prevents downloading from huggingface

* [feature] add --seed option to regen_hotkey (#916)

* add seed option to regen hotkey

* make seed optional and fix docstring

* add tests for both coldkey and hotkey regen w/seed

* oops, make seed optional

* fix old test, add config.seed

* circle ci version update and fix (#920)

* Add test_phrases_split unit test

Asserts that randomly instantiated compact_topk encodings can be correctly decoded to recover the original topk_tensor.

* Update unravel_topk_token_phrases with faster implementation

Replaces .tensor_split() with block indexing to avoid extra copy operations.

* Rename test_phrases_split to test_random_topk_token_phrases

* Unit tests cleanup (#922)

* circle ci version update and fix

* Test clean up

* uncomment test and remove specific test

* remove loguru and fix flaky tests

* fix syncing

* removing tokenizer equivalence + some bug fixes

* moving old dataset test

* Deactivate test_random_topk_token_phrases unit test

* Create topk_tensor on origin device

* Normalization Update (#909)

* local train bug fix

* normalization update

* fix tests

* remove test

* updated normalization

* Naming changes, bug fixes

* subtensor update for max clip

* max weight to a million

* Fixes for ordering and comments

* additional tests

* string fix

* numerical stability and testing updates

* minor update for division by zero

* Naming and spacing fixes

* epsilon update

* small fix

* Adding development workflow documentation and script for bumping the version (#918)

BIT-582 Adding development workflow documentation and script for bumping the version

* Revert "Normalization Update (#909)"

This reverts commit 3990a28.

* Parachain registration (#912)

* removed ws assumption

* removing check

* never registered

* Fixed sched_getaffinity for mac osx

* Started adding parachain support

* [hot-fix] fix indent again. add test (#907)

fix indent again. add test

* Fixed registration check and first time registration

* Removed old entrypoint list structure

* Fixed unit tests

Co-authored-by: Eugene <[email protected]>
Co-authored-by: Ala Shaabana <[email protected]>
Co-authored-by: Cameron Fairchild <[email protected]>

* Bit 583 memory optimization v4 (#929)

* set allowed receptor to be 0 in validator to not store any receptor

* max_active receptro to 0

* fix

* feature/BIT-579/Adding Prometheus (#928)

* BIT-582 Adding development workflow documentation and script for bumping the version

* BIT-579 Adding prometheus_client==0.14.1 to requirements

* BIT-579 Removing wandb defaults from sample_configs

* Revert "BIT-579 Removing wandb defaults from sample_configs"

This reverts commit 2940cc7.

* BIT-579 Starting prometheus code. Adding metric_exporter concept/element and its MetricsExporterFactory

* BIT-579 Adding prometheus_client==0.14.1 to requirements

* BIT-579 Removing wandb defaults from sample_configs

* Revert "BIT-579 Removing wandb defaults from sample_configs"

This reverts commit 2940cc7.

* BIT-579 Starting prometheus code. Adding metric_exporter concept/element and its MetricsExporterFactory

* Revert "BIT-579 Starting prometheus code. Adding metric_exporter concept/element and its MetricsExporterFactory"

This reverts commit 8742d7f.

* BIT-579 Adding _prometheus to bittensor

* BIT-579 Adding prometheus code to bittensor/_neuron/text/core_*

* BIT-579 Adding prometheus code to bittensor/_config/config_impl.py. Sends the config to the inprocess prometheus server if it exists.

* BIT-579 Adding prometheus code to bittensor/_axon/*

* BIT-579 Adding prometheus code to bittensor/_dendrite/*

* BIT-579 Fixing syntax error

* BIT-579 Fixing missing import: time

* BIT-579 fixing typo

* BIT-579 fixing test: unit_tests/bittensor_tests/test_neuron.py

Co-authored-by: Unconst <[email protected]>

* Dendrite Text Generate (#941)

* adds generate to dendrite

* vune fixes

* extend readme

Co-authored-by: unconst <[email protected]>

* Subtensor and Normalization updates (#936)

* local train bug fix

* normalization update

* fix tests

* remove test

* updated normalization

* Naming changes, bug fixes

* subtensor update for max clip

* max weight to a million

* Fixes for ordering and comments

* additional tests

* string fix

* numerical stability and testing updates

* minor update for division by zero

* Naming and spacing fixes

* epsilon update

* small fix

* additional subtensor parameters

* remove print

* help string fixes

* Prometheus bug fix (#942)

* local train bug fix

* normalization update

* fix tests

* remove test

* updated normalization

* Naming changes, bug fixes

* subtensor update for max clip

* max weight to a million

* Fixes for ordering and comments

* additional tests

* string fix

* numerical stability and testing updates

* minor update for division by zero

* Naming and spacing fixes

* epsilon update

* small fix

* additional subtensor parameters

* remove print

* help string fixes

* small bug fix

* [Fix] only reregister if flag is set (#937)

* add test for expected reregister behaviour

* add fix

* pass passed args into earlier parse

* fix test by using args

* exit before actual register

* use strtobool

Co-authored-by: Unconst <[email protected]>

* [BIT 584] [feature] btcli register output stats not in place (#923)

* add flags for output_in_place during registration

* stop tracking best

* refactor registration logging output

* fix reregister from type bool

* change in_place and use_cuda to strtobool

* add param and defaults

* fix reference before assignment

* add new logger to cuda rege

* pass param to btcli register call

* oops

* fix init

* try slight timeout

* try fix

* oop

* ?

* fix use_cuda flag

* add test for new use_cuda flag setup

* use create pow to patch

* all no prompt dev id

* fix console.error

* use lower for str comparison

* call self register instead

* add test for wallet register call

* tests are for wallet reregister

* fix typo

* no self on top-level test

* fix tests?

* use reregister

* typo in test

* fix assert

* fix assert

* should be False

* fix time output to use timedelta

* add log verbose as option to reg output

* should be action

* fix typo

* add missing function arg

* fix spacing

* fix flags

* fix flags

* fix test

* should pass in args to config pre-parse

* use None instead of NA

Co-authored-by: isabella618033 <[email protected]>
Co-authored-by: Unconst <[email protected]>

* [Fix] multi cuda fix (#940)

* adjust none end calculation

* attempt to fix stop issue

* modify stop

* update nonce_start by correct amount

* fix nonce init to only random and update

* fix update amount

* add start values

* add test

* try different hashrate calc

* try EWMA for hash_rate

* oops bad import

* change name to worker

* extract helper and modify comment

* fix time now

* catch Full

* use a finished queue instead of times

* move constants to function params

* fix name of n

* fix verbose log

* allow --output_in_place

* fix n

* change to --no_ouput_in_place

* fix test

* Fix/pin wandb (#945)

pin below 0.13.4

* [Fix] change bellagene entrypoint string (#938)

dont add special case for network endpoint

Co-authored-by: Ala Shaabana <[email protected]>

* Update dockerfile to current on dockerhub (#934)

* update dockerfile to current on dockerhub

* add netcat

* move nvm install up to take advantage of caching

* use pip

* add nvm install checksum

Co-authored-by: Ala Shaabana <[email protected]>

* Minor fixes (#955)

minor fixes

Co-authored-by: unconst <[email protected]>

* Remove locals from cli and bittensor common (#947)

remove locals from cli and bittensor common

Co-authored-by: unconst <[email protected]>
Co-authored-by: Ala Shaabana <[email protected]>

* [feature] Improve dataloader performance (#950)

* use threadpool and futures for dataloader

* add cli arg for max directories

Co-authored-by: Joey Legere <[email protected]>
Co-authored-by: Ala Shaabana <[email protected]>

* No set weights (#959)

* add no set weights

* add no_set_weights

* fix logging

* comments fix;

Co-authored-by: unconst <[email protected]>

* Bit 590 backward fix (#957)

* init

* no local forward and remote forward overlap

* clean up

* saving remote

* fix local size mismatch

* clean up

* fix

* hidden state and causalLM deterministicness

* rm backward

* default to have dendrite backward

* [Fix] add perpet hash rate and adjust alpha (#960)

* perpet hash rate and adjust alpha

* move reg code to registrationpy

* try different calc

* fix div by 0

* fix for cpu too

* fix race

* modify reg metrics output

* fix test mock

* oops

* [Fix] stake conversion issue (#958)

* modify balance arithm to cast to float first

* fix tests to model this behavior

* fix prompt spacing

* should be value error

* add test for eq balance other

* add comment to explain change

* fix tests

* .

* fix class

* balance fix

* try fix to staking

* fix comments

* add test for fix

* fix test

* fix impl

* add tests with bad types

* catch Typerror too and notimplerror

* catch typeerror

* .

* catch valueerror also

* initial commit

* fix manager server no return

* Dasyncio (#967)

* initial commit

* fix manager server no return

Co-authored-by: unconst <[email protected]>

* Update __init__.py

* Moving to release

* Release 3.4.2 (#969)

* initial commit

* fix manager server no return

* Moving to release

Co-authored-by: unconst <[email protected]>

* fix failing test_forward_priority_2nd_request_timeout

* remove test_receptor test

* fix tests

* Decrease validator moving average window

Decrease validator moving average window from 20 (alpha=0.05) to 10 (alpha=0.1) steps. This parameter could probably eventually be set to alpha=0.2.

The current 20-step window means that a server model change will take 20 steps * ~250 blocks/epoch * 12 sec = approx. 17 hours to reach full score in the validator neuron stats, because of the moving average slowly weighing in new model performance. 17 hours is probably too long, and it is also likely affecting registration immunity.

* Release 3.4.2 (#972)

* remove test_receptor test

* fix tests

Co-authored-by: unconst <[email protected]>

* No version checking (#974)

* no version checking

* fix integration tests

* remove print

Co-authored-by: Thebes <[email protected]>

* Promo suffix (#977)

* initial commit

* promo change to axon and dendrite

Co-authored-by: Thebes <[email protected]>

* Update bittensor/VERSION

* Validator exit (#980)

* remove test_receptor test

* fix tests

* fix valdidator exit

Co-authored-by: unconst <[email protected]>

* Promo suffix (#977) (#981)

* Promo suffix (#977)

* initial commit

* promo change to axon and dendrite

Co-authored-by: Thebes <[email protected]>

* Validator exit (#980)

* remove test_receptor test

* fix tests

* fix valdidator exit

Co-authored-by: unconst <[email protected]>

Co-authored-by: Thebes <[email protected]>

Co-authored-by: Cameron Fairchild <[email protected]>
Co-authored-by: Eugene <[email protected]>
Co-authored-by: Eugene-hu <[email protected]>
Co-authored-by: Mac Thrasher <[email protected]>
Co-authored-by: opentaco <[email protected]>
Co-authored-by: opentaco <[email protected]>
Co-authored-by: Eduardo García <[email protected]>
Co-authored-by: Ala Shaabana <[email protected]>
Co-authored-by: Ala Shaabana <[email protected]>
Co-authored-by: isabella618033 <[email protected]>
Co-authored-by: unconst <[email protected]>
Co-authored-by: Cameron Fairchild <[email protected]>
Co-authored-by: joeylegere <[email protected]>
Co-authored-by: Joey Legere <[email protected]>
  • Loading branch information
15 people authored Nov 9, 2022
1 parent 68a8c0a commit f5c5e1d
Show file tree
Hide file tree
Showing 34 changed files with 1,917 additions and 1,453 deletions.
37 changes: 23 additions & 14 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
FROM nvidia/cuda:11.2.1-base
# syntax=docker/dockerfile:1
FROM pytorch/pytorch:1.12.0-cuda11.3-cudnn8-devel

LABEL bittensor.image.authors="bittensor.com" \
bittensor.image.vendor="Bittensor" \
Expand All @@ -14,22 +15,30 @@ ARG DEBIAN_FRONTEND=noninteractive
RUN apt-key del 7fa2af80
RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/3bf863cc.pub
RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64/7fa2af80.pub
# Update the base image
RUN apt update && apt upgrade -y
# Install bittensor
## Install dependencies
RUN apt install -y curl sudo nano git htop netcat wget unzip python3-dev python3-pip tmux apt-utils cmake build-essential
## Upgrade pip
RUN pip3 install --upgrade pip

RUN apt-get update && apt-get install --no-install-recommends --no-install-suggests -y apt-utils curl git cmake build-essential unzip python3-pip wget iproute2 software-properties-common
# Install nvm and pm2
RUN curl -o install_nvm.sh https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.1/install.sh && \
echo 'fabc489b39a5e9c999c7cab4d281cdbbcbad10ec2f8b9a7f7144ad701b6bfdc7 install_nvm.sh' | sha256sum --check && \
bash install_nvm.sh

RUN add-apt-repository ppa:deadsnakes/ppa
RUN apt-get update
RUN apt-get install python3 python3-dev -y
RUN python3 -m pip install --upgrade pip
RUN bash -c "source $HOME/.nvm/nvm.sh && \
# use node 16
nvm install 16 && \
# install pm2
npm install --location=global pm2"

# add Bittensor code to docker image
RUN mkdir /bittensor
RUN mkdir /home/.bittensor
COPY . /bittensor
RUN mkdir -p /root/.bittensor/bittensor
RUN cd ~/.bittensor/bittensor && \
python3 -m pip install bittensor

WORKDIR /bittensor
RUN pip install --upgrade numpy pandas setuptools "tqdm>=4.27,<4.50.0" wheel
RUN pip install -r requirements.txt
RUN pip install .
# Increase ulimit to 1,000,000
RUN prlimit --pid=$PPID --nofile=1000000

EXPOSE 8091
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
3.4.1
3.4.2
19 changes: 16 additions & 3 deletions bittensor/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,16 +16,29 @@
# DEALINGS IN THE SOFTWARE.

from rich.console import Console
from rich.traceback import install
from prometheus_client import Info

import nest_asyncio
nest_asyncio.apply()

# Bittensor code and protocol version.
__version__ = '3.4.1'
__version__ = '3.4.2'
version_split = __version__.split(".")
__version_as_int__ = (100 * int(version_split[0])) + (10 * int(version_split[1])) + (1 * int(version_split[2]))


# Turn off rich console locals trace.
from rich.traceback import install
install(show_locals=False)

# Rich console.
__console__ = Console()
__use_console__ = True

# Remove overdue locals in debug training.
install(show_locals=False)

def turn_console_off():
from io import StringIO
__use_console__ = False
Expand Down Expand Up @@ -62,8 +75,8 @@ def turn_console_off():

__nobunaga_entrypoint__ = "staging.nobunaga.opentensor.ai:9944"


__bellagene_entrypoint__ = "parachain.opentensor.ai:443"
# Needs to use wss://
__bellagene_entrypoint__ = "wss://parachain.opentensor.ai:443"


__local_entrypoint__ = "127.0.0.1:9944"
Expand Down
70 changes: 33 additions & 37 deletions bittensor/_axon/axon_impl.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,18 +27,33 @@
import grpc
import wandb
import pandas
import uuid
from loguru import logger
import torch.nn.functional as F
import concurrent

from prometheus_client import Counter, Histogram, Enum, CollectorRegistry

import bittensor
import bittensor.utils.stats as stat_utils
from datetime import datetime

logger = logger.opt(colors=True)

from prometheus_client import Counter, Histogram, Enum, CollectorRegistry
PROM_axon_is_started = Enum('axon_is_started', 'is_started', states=['stopped', 'started'])
PROM_total_forward = Counter('axon_total_forward', 'total_forward', ['wallet', 'identifier'])
PROM_total_backward = Counter('axon_total_backward', 'total_backward', ['wallet', 'identifier'])
PROM_forward_latency = Histogram('axon_forward_latency', 'forward_latency', ['wallet', 'identifier'], buckets=list(range(0,bittensor.__blocktime__,1)))
PROM_backward_latency = Histogram('axon_backward_latency', 'backward_latency', ['wallet', 'identifier'], buckets=list(range(0,bittensor.__blocktime__,1)))
PROM_forward_synapses = Counter('axon_forward_synapses', 'forward_synapses', ['wallet', 'identifier', "synapse"])
PROM_backward_synapses = Counter('axon_backward_synapses', 'backward_synapses', ['wallet', 'identifier', "synapse"])
PROM_forward_codes = Counter('axon_forward_codes', 'forward_codes', ['wallet', 'identifier', "code"])
PROM_backward_codes = Counter('axon_backward_codes', 'backward_codes', ['wallet', 'identifier', "code"])
PROM_forward_hotkeys = Counter('axon_forward_hotkeys', 'forward_hotkeys', ['wallet', 'identifier', "hotkey"])
PROM_backward_hotkeys = Counter('axon_backward_hotkeys', 'backward_hotkeys', ['wallet', 'identifier', "hotkey"])
PROM_forward_bytes = Counter('axon_forward_bytes', 'forward_bytes', ['wallet', 'identifier', "hotkey"])
PROM_backward_bytes = Counter('axon_backward_bytes', 'backward_bytes', ['wallet', 'identifier', "hotkey"])

class Axon( bittensor.grpc.BittensorServicer ):
r""" Services Forward and Backward requests from other neurons.
"""
Expand Down Expand Up @@ -103,27 +118,8 @@ def __init__(

# -- Priority
self.priority = priority
self.priority_threadpool= priority_threadpool

# == Prometheus
# We are running over various suffix values in the event that there are multiple axons in the same process.
# The first axon is created with a null suffix and subsequent values are ordered like so: axon_is_started, axon_is_started_1, axon_is_started_2 etc...

if self.prometheus_level != bittensor.prometheus.level.OFF.name:
registry = CollectorRegistry()
self.is_started = Enum('axon_is_started', 'is_started', states=['stopped', 'started'], registry=registry)
self.total_forward = Counter('axon_total_forward', 'total_forward', registry=registry)
self.total_backward = Counter('axon_total_backward', 'total_backward', registry=registry)
self.forward_latency = Histogram('axon_forward_latency', 'forward_latency', buckets=list(range(0,bittensor.__blocktime__,1)), registry=registry)
self.backward_latency = Histogram('axon_backward_latency', 'backward_latency', buckets=list(range(0,bittensor.__blocktime__,1)), registry=registry)
self.forward_synapses = Counter('axon_forward_synapses', 'forward_synapses', ["synapse"], registry=registry)
self.backward_synapses = Counter('axon_backward_synapses', 'backward_synapses', ["synapse"], registry=registry)
self.forward_codes = Counter('axon_forward_codes', 'forward_codes', ["code"], registry=registry)
self.backward_codes = Counter('axon_backward_codes', 'backward_codes', ["code"], registry=registry)
self.forward_hotkeys = Counter('axon_forward_hotkeys', 'forward_hotkeys', ["hotkey"], registry=registry)
self.backward_hotkeys = Counter('axon_backward_hotkeys', 'backward_hotkeys', ["hotkey"], registry=registry)
self.forward_bytes = Counter('axon_forward_bytes', 'forward_bytes', ["hotkey"], registry=registry)
self.backward_bytes = Counter('axon_backward_bytes', 'backward_bytes', ["hotkey"], registry=registry)
self.priority_threadpool = priority_threadpool
self._prometheus_uuid = uuid.uuid1()

def __str__(self) -> str:
return "Axon({}, {}, {}, {})".format( self.ip, self.port, self.wallet.hotkey.ss58_address, "started" if self.started else "stopped")
Expand Down Expand Up @@ -239,17 +235,17 @@ def check_if_should_return() -> bool:
def finalize_codes_stats_and_logs( message = None):
# === Prometheus
if self.prometheus_level != bittensor.prometheus.level.OFF.name:
self.total_forward.inc()
self.forward_latency.observe( clock.time() - start_time )
PROM_total_forward.labels( wallet = self.wallet.hotkey.ss58_address, identifier = self._prometheus_uuid ).inc()
PROM_forward_latency.labels( wallet = self.wallet.hotkey.ss58_address, identifier = self._prometheus_uuid ).observe( clock.time() - start_time )
if self.prometheus_level == bittensor.prometheus.level.DEBUG.name:
self.forward_hotkeys.labels( request.hotkey ).inc()
self.forward_bytes.labels( request.hotkey ).inc( sys.getsizeof( request ) )
PROM_forward_hotkeys.labels( wallet = self.wallet.hotkey.ss58_address, identifier = self._prometheus_uuid, hotkey = request.hotkey ).inc()
PROM_forward_bytes.labels( wallet = self.wallet.hotkey.ss58_address, identifier = self._prometheus_uuid, hotkey = request.hotkey ).inc( sys.getsizeof( request ) )

for index, synapse in enumerate( synapses ):
# === Prometheus
if self.prometheus_level != bittensor.prometheus.level.OFF.name:
self.forward_synapses.labels( str(synapse) ).inc()
self.forward_codes.labels( str(synapse_codes[ index ]) ).inc()
PROM_forward_synapses.labels( wallet = self.wallet.hotkey.ss58_address, identifier = self._prometheus_uuid, synapse = str(synapse) ).inc()
PROM_forward_codes.labels( wallet = self.wallet.hotkey.ss58_address, identifier = self._prometheus_uuid, code = str(synapse_codes[ index ]) ).inc()

# === Logging
request.synapses [ index ].return_code = synapse_codes[ index ] # Set synapse wire proto codes.
Expand All @@ -261,7 +257,7 @@ def finalize_codes_stats_and_logs( message = None):
code = synapse_codes[ index ],
call_time = synapse_call_times[ index ],
pubkey = request.hotkey,
inputs = synapse_inputs [index] ,
inputs = deserialized_forward_tensors [index].shape if deserialized_forward_tensors [index] != None else None ,
outputs = None if synapse_responses[index] == None else list( synapse_responses[index].shape ),
message = synapse_messages[ index ] if message == None else message,
synapse = synapse.synapse_type
Expand Down Expand Up @@ -471,17 +467,17 @@ def check_if_should_return() -> bool:
def finalize_codes_stats_and_logs():
# === Prometheus
if self.prometheus_level != bittensor.prometheus.level.OFF.name:
self.total_backward.inc()
self.backward_latency.observe( clock.time() - start_time )
PROM_total_backward.labels( wallet = self.wallet.hotkey.ss58_address, identifier = self._prometheus_uuid ).inc()
PROM_backward_latency.labels( wallet = self.wallet.hotkey.ss58_address, identifier = self._prometheus_uuid ).observe( clock.time() - start_time )
if self.prometheus_level == bittensor.prometheus.level.DEBUG.name:
self.backward_hotkeys.labels( request.hotkey ).inc()
self.backward_bytes.labels( request.hotkey ).inc( sys.getsizeof( request ) )
PROM_backward_hotkeys.labels( wallet = self.wallet.hotkey.ss58_address, identifier = self._prometheus_uuid, hotkey = request.hotkey ).inc()
PROM_backward_bytes.labels( wallet = self.wallet.hotkey.ss58_address, identifier = self._prometheus_uuid, hotkey = request.hotkey ).inc( sys.getsizeof( request ) )

for index, synapse in enumerate( synapses ):
# === Prometheus
if self.prometheus_level != bittensor.prometheus.level.OFF.name:
self.backward_synapses.labels( str(synapse) ).inc()
self.backward_codes.labels( str(synapse_codes[ index ]) ).inc()
PROM_backward_synapses.labels( wallet = self.wallet.hotkey.ss58_address, identifier = self._prometheus_uuid, synapse = str(synapse) ).inc()
PROM_backward_codes.labels( wallet = self.wallet.hotkey.ss58_address, identifier = self._prometheus_uuid, code = str(synapse_codes[ index ]) ).inc()

# === Logging
request.synapses [ index ].return_code = synapse_codes[ index ] # Set synapse wire proto codes.
Expand Down Expand Up @@ -818,7 +814,7 @@ def start(self) -> 'Axon':

# Switch prometheus ENUM.
if self.prometheus_level != bittensor.prometheus.level.OFF.name:
self.is_started.state('started')
PROM_axon_is_started.state('started')

return self

Expand All @@ -832,7 +828,7 @@ def stop(self) -> 'Axon':

# Switch prometheus ENUM.
if self.prometheus_level != bittensor.prometheus.level.OFF.name:
self.is_started.state('stopped')
PROM_axon_is_started.state('stopped')

return self

Expand Down
Loading

0 comments on commit f5c5e1d

Please sign in to comment.