v0.0.0beta20
What's Changed
- Patch
post_file
client method by @song-william in #323 - Add pod disruption budget to all endpoints by @yunfeng-scale in #328
- create celery worker with inference worker profile by @saiatmakuri in #327
- Bump http forwarder request CPU by @yunfeng-scale in #330
- [Docs] Clarify get-events API usage by @seanshi-scale in #320
- Enable additional Datadog tagging for jobs by @song-william in #324
- fix celery worker profile for s3 access by @saiatmakuri in #333
- Hardcode number of forwarder workers by @yunfeng-scale in #334
- Standardize logging initialization by @song-william in #337
- Fix up the mammoth max length issue. by @sam-scale in #335
- Add docs for Model.create, update default values and fix per_worker concurrency by @yunfeng-scale in #332
- updating docs to add codellama models by @ian-scale in #343
- Add PodDisruptionBudget to model engine by @yunfeng-scale in #342
- Allow auth to accept API keys by @saiatmakuri in #326
- Add job_name in build logs for easier debugging by @song-william in #340
- Make PDB optional by @yunfeng-scale in #344
- Revert "fix celery worker profile for s3 access" by @yixu34 in #345
- Revert "Revert "fix celery worker profile for s3 access"" by @saiatmakuri in #346
- Pass file ID to fine-tuning script by @squeakymouse in #347
- llama should have None max length by @sam-scale in #348
- taking out codellama13b and 34b by @ian-scale in #349
- Change DATADOG_TRACE_ENABLED to DD_TRACE_ENABLED by @edwardpark97 in #350
- Allow fine-tuning hyperparameter to be Dict by @squeakymouse in #353
- adding real auth to integration tests by @ian-scale in #352
- add new llm-jp models to llm-engine by @ian-scale in #354
- Generalize SQS region by @jaisanliang in #355
- Track LLM Metrics by @saiatmakuri in #356
- Remove extra trace facet "launch.resource_name" by @saiatmakuri in #359
- Ianmacleod/add codellama instruct, refactor codellama models by @ian-scale in #360
- Various changes/bugfixes to chart/code to streamline deployment on different forms of infra by @seanshi-scale in #339
- Add PR template by @song-william in #341
- Unmount aws config from root by @song-william in #361
- Implement automated code coverage for CI by @tiffzhao5 in #362
- Download only known files by @squeakymouse in #364
- Documentation fix by @squeakymouse in #365
- Change more AWS config mount paths by @squeakymouse in #367
- Validating inference framework image tags by @tiffzhao5 in #357
- Ianmacleod/add codellama 34b by @ian-scale in #369
- Better error when model is not ready for predictions by @tiffzhao5 in #368
- Improve metrics route team tags by @saiatmakuri in #371
- Enable custom istio metric tags with Telemetry API by @song-william in #373
- Use Variable name for Telemetry Helm Resources by @song-william in #374
- Forward HTTP status code for sync requests by @yunfeng-scale in #375
- Integrate TensorRT-LLM by @yunfeng-scale in #358
- Fine-tuning e2e integration test by @tiffzhao5 in #372
- Found a bug in the codellama vllm model_len logic. by @sam-scale in #380
- Fix sample.yaml by @yunfeng-scale in #381
- count prompt tokens by @saiatmakuri in #366
- Fix integration test by @yunfeng-scale in #383
- emit metrics on token counts by @saiatmakuri in #382
- Increase llama-2 max_input_tokens by @sam-scale in #384
- Revert "Found a bug in the codellama vllm model_len logic." by @yunfeng-scale in #386
- Some updates to integration tests by @yunfeng-scale in #385
- Celery autoscaler by @squeakymouse in #378
- Don't install Celery autoscaler for test deployments by @squeakymouse in #388
- LLM update API route by @squeakymouse in #387
- adding zephyr 7b by @ian-scale in #389
- update tensor-rt llm in enum by @ian-scale in #390
- pypi version bump by @ian-scale in #391
New Contributors
- @edwardpark97 made their first contribution in #350
- @jaisanliang made their first contribution in #355
- @tiffzhao5 made their first contribution in #362
Full Changelog: v0.0.0beta19...v0.0.0beta20