-
Notifications
You must be signed in to change notification settings - Fork 612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Context retrieval only works for first user message #444
Comments
@wukaixingxp, @ashwinb I've just had a look at this.
I've done a bit of testing and the RAG query that is generated actually joins together all the messages: In your case for messages:
it generates:
I added this print statement. My llama-stack-apps code is here In some cases, this works for me:
branched off of your branch here. Print statement in llama-stack here iiuc, the problem here is that the search results are inconsistent or a bit poor. I've ran some of my own queries against the faiss index and they're a bit inconsistent: Query: "Llama 3.2 3B Instruct" Top 2 results are:
Query: "What are some small Llama models I can run on small devices like my phone?"
(Last result is relevant but the first 3 aren't that useful) If I have a bit of time I might see how we could improve them. Maybe adding keyword search [1], trying different/bigger embedding models [2], different chunking schemes [3] might help here? |
llama-stack install from source:https://github.com/meta-llama/llama-stack/tree/cherrypick-working
System Info
python -m "torch.utils.collect_env"
/home/kaiwu/miniconda3/envs/llama/lib/python3.10/runpy.py:126: RuntimeWarning: 'torch.utils.collect_env' found in sys.modules after import of package 'torch.utils', but prior to execution of 'torch.utils.collect_env'; this may result in unpredictable behaviour
warn(RuntimeWarning(msg))
Collecting environment information...
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: CentOS Stream 9 (x86_64)
GCC version: (GCC) 11.5.0 20240719 (Red Hat 11.5.0-2)
Clang version: Could not collect
CMake version: version 3.30.2
Libc version: glibc-2.34
Python version: 3.10.14 (main, Mar 21 2024, 16:24:04) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-6.4.3-0_fbk14_zion_2601_gcd42476b84e9-x86_64-with-glibc2.34
Is CUDA available: True
CUDA runtime version: 12.1.105
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA H100
GPU 1: NVIDIA H100
GPU 2: NVIDIA H100
GPU 3: NVIDIA H100
GPU 4: NVIDIA H100
GPU 5: NVIDIA H100
GPU 6: NVIDIA H100
GPU 7: NVIDIA H100
Nvidia driver version: 535.154.05
cuDNN version: Probably one of the following:
/usr/lib64/libcudnn.so.8.9.2
/usr/lib64/libcudnn_adv_infer.so.8.9.2
/usr/lib64/libcudnn_adv_train.so.8.9.2
/usr/lib64/libcudnn_cnn_infer.so.8.9.2
/usr/lib64/libcudnn_cnn_train.so.8.9.2
/usr/lib64/libcudnn_ops_infer.so.8.9.2
/usr/lib64/libcudnn_ops_train.so.8.9.2
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 52 bits physical, 57 bits virtual
Byte Order: Little Endian
CPU(s): 384
On-line CPU(s) list: 0-383
Vendor ID: AuthenticAMD
Model name: AMD EPYC 9654 96-Core Processor
CPU family: 25
Model: 17
Thread(s) per core: 2
Core(s) per socket: 96
Socket(s): 2
Stepping: 1
Frequency boost: enabled
CPU(s) scaling MHz: 82%
CPU max MHz: 3707.8120
CPU min MHz: 1500.0000
BogoMIPS: 4792.80
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 invpcid_single hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif x2avic v_spec_ctrl vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid overflow_recov succor smca fsrm flush_l1d
Virtualization: AMD-V
L1d cache: 6 MiB (192 instances)
L1i cache: 6 MiB (192 instances)
L2 cache: 192 MiB (192 instances)
L3 cache: 768 MiB (24 instances)
NUMA node(s): 2
NUMA node0 CPU(s): 0-95,192-287
NUMA node1 CPU(s): 96-191,288-383
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Vulnerable: eIBRS with unprivileged eBPF
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Versions of relevant libraries:
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] onnx==1.16.2
[pip3] onnxruntime==1.19.2
[pip3] torch==2.4.0
[pip3] torchvision==0.19.0
[pip3] triton==3.0.0
[conda] numpy 1.26.4 pypi_0 pypi
[conda] torch 2.4.0 pypi_0 pypi
[conda] torchvision 0.19.0 pypi_0 pypi
[conda] triton 3.0.0 pypi_0 pypi
Information
🐛 Describe the bug
There is a llama3.1 model card and llama3.2 model card in the database, and I tried to ask
The RAG only retrieve the context from llama3.2 model card for first message but did not do retrieval for the second message, the context is still llama3.2 model card from first message. It will be great if we can have Context Retrieval for every User message.
My code is here and use
python rag_main.py localhost 5000 ./example_data/
to start this exampleError logs
Inserted 3 documents into bank: rag_agent_docs
Created bank: rag_agent_docs
Found 2 models [ModelDefWithProvider(identifier='Llama3.2-11B-Vision-Instruct', llama_model='Llama3.2-11B-Vision-Instruct', metadata={}, provider_id='meta-reference', type='model'), ModelDefWithProvider(identifier='Llama-Guard-3-1B', llama_model='Llama-Guard-3-1B', metadata={}, provider_id='meta1', type='model')]
Use model: Llama3.2-11B-Vision-Instruct
Generating response for: What is the name of the llama model released on October 24, 2024?
messages [{'role': 'user', 'content': 'What is the name of the llama model released on October 24, 2024?'}]
----input_query------- What is the name of the llama model released on October 24, 2024?
Turn(input_messages=[UserMessage(content='What is the name of the llama model released on October 24, 2024?', role='user', context="Here are the retrieved documents for relevant context:\n=== START-RETRIEVED-CONTEXT ===\n\nid:llama_3.2.md; content:. Llama 3.2 has been trained on a broader collection of languages than these 8 supported languages. Developers may fine-tune Llama 3.2 models for languages beyond these supported languages, provided they comply with the Llama 3.2 Community License and the Acceptable Use Policy. Developers are always expected to ensure that their deployments, including those that involve additional languages, are completed safely and responsibly.\n\nLlama 3.2 Model Family: Token counts refer to pretraining data only. All model versions use Grouped-Query Attention (GQA) for improved inference scalability.\n\nModel Release Date: Oct 24, 2024\n\nStatus: This is a static model trained on an offline dataset. Future versions may be released that improve model capabilities and safety.\n\nLicense: Use of Llama 3.2 is governed by the Llama 3.2 Community License (a custom, commercial license agreement).\n\nFeedback: Instructions on how to provide feedback or comments on the model can be found in the Llama Models README. For more technical information about generation parameters and recipes for how to use Llama 3.2 in applications, please go here.\n\n## Intended Use\n\nIntended Use Cases: Llama 3.2 is intended for commercial and research use in multiple languages. Instruction tuned text only models are intended for assistant-like chat and agentic applications like knowledge retrieval and summarization, mobile AI powered writing assistants and query and prompt rewriting. Pretrained models can be adapted for a variety of additional natural language generation tasks. Similarly, quantized models can be adapted for a variety of on-device use-cases with limited compute resources.\n\nOut of Scope: Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3.2 Community License. Use in languages beyond those explicitly referenced as supported in this model card.\n\n## Hardware and Software\n\nTraining Factors: We used custom training libraries, Meta's custom built GPU cluster, and production infrastructure for pretraining. Fine-tuning, quantization, annotation, and evaluation were also performed on production infrastructure.\n\nTraining Energy Use\nid:llama_3.2.md; content:. Llama 3.2 has been trained on a broader collection of languages than these 8 supported languages. Developers may fine-tune Llama 3.2 models for languages beyond these supported languages, provided they comply with the Llama 3.2 Community License and the Acceptable Use Policy. Developers are always expected to ensure that their deployments, including those that involve additional languages, are completed safely and responsibly.\n\nLlama 3.2 Model Family:** Token counts refer to pretraining data only. All model versions use Grouped-Query Attention (GQA) for improved inference scalability.\n\nModel Release Date: Oct 24, 2024\n\nStatus: This is a static model trained on an offline dataset. Future versions may be released that improve model capabilities and safety.\n\nLicense: Use of Llama 3.2 is governed by the Llama 3.2 Community License (a custom, commercial license agreement).\n\nFeedback: Instructions on how to provide feedback or comments on the model can be found in the Llama Models README. For more technical information about generation parameters and recipes for how to use Llama 3.2 in applications, please go here.\n\n## Intended Use\n\nIntended Use Cases: Llama 3.2 is intended for commercial and research use in multiple languages. Instruction tuned text only models are intended for assistant-like chat and agentic applications like knowledge retrieval and summarization, mobile AI powered writing assistants and query and prompt rewriting. Pretrained models can be adapted for a variety of additional natural language generation tasks. Similarly, quantized models can be adapted for a variety of on-device use-cases with limited compute resources.\n\nOut of Scope: Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3.2 Community License. Use in languages beyond those explicitly referenced as supported in this model card.\n\n## Hardware and Software\n\nTraining Factors: We used custom training libraries, Meta's custom built GPU cluster, and production infrastructure for pretraining. Fine-tuning, quantization, annotation, and evaluation were also performed on production infrastructure.\n\nTraining Energy Use\n\n=== END-RETRIEVED-CONTEXT ===\n")], output_attachments=[], output_message=CompletionMessage(content='The name of the llama model released on October 24, 2024, is not explicitly mentioned in the provided documents. However, the document mentions that the model is "Llama 3.2", but it does not indicate if "Llama 3.2" is the name of the specific model released on October 24, 2024, or if it is a version or variant of the model.\n\nIt does mention the Model Release Date as Oct 24, 2024, but this refers to the release of Llama 3.2, not the name of the specific model.\n\nTo answer your question accurately, I don't know the name of the llama model released on October 24, 2024, as this information is not explicitly mentioned in the provided documents.', role='assistant', stop_reason='end_of_turn', tool_calls=[]), session_id='de83a6c2-5643-42b0-9c89-01640439b524', started_at=datetime.datetime(2024, 11, 13, 9, 48, 44, 297982), steps=[MemoryRetrievalStep(inserted_context=['Here are the retrieved documents for relevant context:\n=== START-RETRIEVED-CONTEXT ===\n', "id:llama_3.2.md; content:. Llama 3.2 has been trained on a broader collection of languages than these 8 supported languages. Developers may fine-tune Llama 3.2 models for languages beyond these supported languages, provided they comply with the Llama 3.2 Community License and the Acceptable Use Policy. Developers are always expected to ensure that their deployments, including those that involve additional languages, are completed safely and responsibly.\n\nLlama 3.2 Model Family:** Token counts refer to pretraining data only. All model versions use Grouped-Query Attention (GQA) for improved inference scalability.\n\nModel Release Date: Oct 24, 2024\n\nStatus: This is a static model trained on an offline dataset. Future versions may be released that improve model capabilities and safety.\n\nLicense: Use of Llama 3.2 is governed by the Llama 3.2 Community License (a custom, commercial license agreement).\n\nFeedback: Instructions on how to provide feedback or comments on the model can be found in the Llama Models README. For more technical information about generation parameters and recipes for how to use Llama 3.2 in applications, please go here.\n\n## Intended Use\n\nIntended Use Cases: Llama 3.2 is intended for commercial and research use in multiple languages. Instruction tuned text only models are intended for assistant-like chat and agentic applications like knowledge retrieval and summarization, mobile AI powered writing assistants and query and prompt rewriting. Pretrained models can be adapted for a variety of additional natural language generation tasks. Similarly, quantized models can be adapted for a variety of on-device use-cases with limited compute resources.\n\nOut of Scope: Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3.2 Community License. Use in languages beyond those explicitly referenced as supported in this model card.\n\n## Hardware and Software\n\nTraining Factors: We used custom training libraries, Meta's custom built GPU cluster, and production infrastructure for pretraining. Fine-tuning, quantization, annotation, and evaluation were also performed on production infrastructure.\n\nTraining Energy Use", "id:llama_3.2.md; content:. Llama 3.2 has been trained on a broader collection of languages than these 8 supported languages. Developers may fine-tune Llama 3.2 models for languages beyond these supported languages, provided they comply with the Llama 3.2 Community License and the Acceptable Use Policy. Developers are always expected to ensure that their deployments, including those that involve additional languages, are completed safely and responsibly.\n\nLlama 3.2 Model Family:** Token counts refer to pretraining data only. All model versions use Grouped-Query Attention (GQA) for improved inference scalability.\n\nModel Release Date: Oct 24, 2024\n\nStatus: This is a static model trained on an offline dataset. Future versions may be released that improve model capabilities and safety.\n\nLicense: Use of Llama 3.2 is governed by the Llama 3.2 Community License (a custom, commercial license agreement).\n\nFeedback: Instructions on how to provide feedback or comments on the model can be found in the Llama Models README. For more technical information about generation parameters and recipes for how to use Llama 3.2 in applications, please go here.\n\n## Intended Use\n\nIntended Use Cases: Llama 3.2 is intended for commercial and research use in multiple languages. Instruction tuned text only models are intended for assistant-like chat and agentic applications like knowledge retrieval and summarization, mobile AI powered writing assistants and query and prompt rewriting. Pretrained models can be adapted for a variety of additional natural language generation tasks. Similarly, quantized models can be adapted for a variety of on-device use-cases with limited compute resources.\n\nOut of Scope: Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3.2 Community License. Use in languages beyond those explicitly referenced as supported in this model card.\n\n## Hardware and Software\n\nTraining Factors: We used custom training libraries, Meta's custom built GPU cluster, and production infrastructure for pretraining. Fine-tuning, quantization, annotation, and evaluation were also performed on production infrastructure.\n\nTraining Energy Use", '\n=== END-RETRIEVED-CONTEXT ===\n'], memory_bank_ids=['rag_agent_docs'], step_id='d916a947-4dee-42e2-ac1a-410d54c7da3d', step_type='memory_retrieval', turn_id='4efeaab0-d7f1-495f-b653-3fd173a59db3', completed_at=None, started_at=None), InferenceStep(inference_model_response=CompletionMessage(content='The name of the llama model released on October 24, 2024, is not explicitly mentioned in the provided documents. However, the document mentions that the model is "Llama 3.2", but it does not indicate if "Llama 3.2" is the name of the specific model released on October 24, 2024, or if it is a version or variant of the model.\n\nIt does mention the Model Release Date as Oct 24, 2024, but this refers to the release of Llama 3.2, not the name of the specific model.\n\nTo answer your question accurately, I don't know the name of the llama model released on October 24, 2024, as this information is not explicitly mentioned in the provided documents.', role='assistant', stop_reason='end_of_turn', tool_calls=[]), step_id='603d12ab-f127-46de-9ccb-4e07bdccc7e3', step_type='inference', turn_id='4efeaab0-d7f1-495f-b653-3fd173a59db3', completed_at=None, started_at=None)], turn_id='4efeaab0-d7f1-495f-b653-3fd173a59db3', completed_at=datetime.datetime(2024, 11, 13, 9, 48, 50, 996089))
Generating response for: What about Llama 3.1 model, what is the release date for it?
messages [{'role': 'user', 'content': 'What about Llama 3.1 model, what is the release date for it?'}]
----input_query------- What about Llama 3.1 model, what is the release date for it?
Turn(input_messages=[UserMessage(content='What about Llama 3.1 model, what is the release date for it?', role='user', context="Here are the retrieved documents for relevant context:\n=== START-RETRIEVED-CONTEXT ===\n\nid:llama_3.2.md; content:. Llama 3.2 has been trained on a broader collection of languages than these 8 supported languages. Developers may fine-tune Llama 3.2 models for languages beyond these supported languages, provided they comply with the Llama 3.2 Community License and the Acceptable Use Policy. Developers are always expected to ensure that their deployments, including those that involve additional languages, are completed safely and responsibly.\n\nLlama 3.2 Model Family:** Token counts refer to pretraining data only. All model versions use Grouped-Query Attention (GQA) for improved inference scalability.\n\nModel Release Date: Oct 24, 2024\n\nStatus: This is a static model trained on an offline dataset. Future versions may be released that improve model capabilities and safety.\n\nLicense: Use of Llama 3.2 is governed by the Llama 3.2 Community License (a custom, commercial license agreement).\n\nFeedback: Instructions on how to provide feedback or comments on the model can be found in the Llama Models README. For more technical information about generation parameters and recipes for how to use Llama 3.2 in applications, please go here.\n\n## Intended Use\n\nIntended Use Cases: Llama 3.2 is intended for commercial and research use in multiple languages. Instruction tuned text only models are intended for assistant-like chat and agentic applications like knowledge retrieval and summarization, mobile AI powered writing assistants and query and prompt rewriting. Pretrained models can be adapted for a variety of additional natural language generation tasks. Similarly, quantized models can be adapted for a variety of on-device use-cases with limited compute resources.\n\nOut of Scope: Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3.2 Community License. Use in languages beyond those explicitly referenced as supported in this model card.\n\n## Hardware and Software\n\nTraining Factors: We used custom training libraries, Meta's custom built GPU cluster, and production infrastructure for pretraining. Fine-tuning, quantization, annotation, and evaluation were also performed on production infrastructure.\n\nTraining Energy Use\nid:llama_3.2.md; content:. Llama 3.2 has been trained on a broader collection of languages than these 8 supported languages. Developers may fine-tune Llama 3.2 models for languages beyond these supported languages, provided they comply with the Llama 3.2 Community License and the Acceptable Use Policy. Developers are always expected to ensure that their deployments, including those that involve additional languages, are completed safely and responsibly.\n\nLlama 3.2 Model Family:** Token counts refer to pretraining data only. All model versions use Grouped-Query Attention (GQA) for improved inference scalability.\n\nModel Release Date: Oct 24, 2024\n\nStatus: This is a static model trained on an offline dataset. Future versions may be released that improve model capabilities and safety.\n\nLicense: Use of Llama 3.2 is governed by the Llama 3.2 Community License (a custom, commercial license agreement).\n\nFeedback: Instructions on how to provide feedback or comments on the model can be found in the Llama Models README. For more technical information about generation parameters and recipes for how to use Llama 3.2 in applications, please go here.\n\n## Intended Use\n\nIntended Use Cases: Llama 3.2 is intended for commercial and research use in multiple languages. Instruction tuned text only models are intended for assistant-like chat and agentic applications like knowledge retrieval and summarization, mobile AI powered writing assistants and query and prompt rewriting. Pretrained models can be adapted for a variety of additional natural language generation tasks. Similarly, quantized models can be adapted for a variety of on-device use-cases with limited compute resources.\n\nOut of Scope: Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3.2 Community License. Use in languages beyond those explicitly referenced as supported in this model card.\n\n## Hardware and Software\n\nTraining Factors: We used custom training libraries, Meta's custom built GPU cluster, and production infrastructure for pretraining. Fine-tuning, quantization, annotation, and evaluation were also performed on production infrastructure.\n\nTraining Energy Use\n\n=== END-RETRIEVED-CONTEXT ===\n")], output_attachments=[], output_message=CompletionMessage(content="The release date for Llama 3.1 model is not mentioned in the provided documents. However, there is information about Llama 3.2 model's release date, which is October 24, 2024.\n\nIt appears that there is no information about the Llama 3.1 model in the provided documents.", role='assistant', stop_reason='end_of_turn', tool_calls=[]), session_id='de83a6c2-5643-42b0-9c89-01640439b524', started_at=datetime.datetime(2024, 11, 13, 9, 48, 51, 113170), steps=[MemoryRetrievalStep(inserted_context=['Here are the retrieved documents for relevant context:\n=== START-RETRIEVED-CONTEXT ===\n', "id:llama_3.2.md; content:. Llama 3.2 has been trained on a broader collection of languages than these 8 supported languages. Developers may fine-tune Llama 3.2 models for languages beyond these supported languages, provided they comply with the Llama 3.2 Community License and the Acceptable Use Policy. Developers are always expected to ensure that their deployments, including those that involve additional languages, are completed safely and responsibly.\n\nLlama 3.2 Model Family:** Token counts refer to pretraining data only. All model versions use Grouped-Query Attention (GQA) for improved inference scalability.\n\nModel Release Date: Oct 24, 2024\n\nStatus: This is a static model trained on an offline dataset. Future versions may be released that improve model capabilities and safety.\n\nLicense: Use of Llama 3.2 is governed by the Llama 3.2 Community License (a custom, commercial license agreement).\n\nFeedback: Instructions on how to provide feedback or comments on the model can be found in the Llama Models README. For more technical information about generation parameters and recipes for how to use Llama 3.2 in applications, please go here.\n\n## Intended Use\n\nIntended Use Cases: Llama 3.2 is intended for commercial and research use in multiple languages. Instruction tuned text only models are intended for assistant-like chat and agentic applications like knowledge retrieval and summarization, mobile AI powered writing assistants and query and prompt rewriting. Pretrained models can be adapted for a variety of additional natural language generation tasks. Similarly, quantized models can be adapted for a variety of on-device use-cases with limited compute resources.\n\nOut of Scope: Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3.2 Community License. Use in languages beyond those explicitly referenced as supported in this model card.\n\n## Hardware and Software\n\nTraining Factors: We used custom training libraries, Meta's custom built GPU cluster, and production infrastructure for pretraining. Fine-tuning, quantization, annotation, and evaluation were also performed on production infrastructure.\n\nTraining Energy Use", "id:llama_3.2.md; content:. Llama 3.2 has been trained on a broader collection of languages than these 8 supported languages. Developers may fine-tune Llama 3.2 models for languages beyond these supported languages, provided they comply with the Llama 3.2 Community License and the Acceptable Use Policy. Developers are always expected to ensure that their deployments, including those that involve additional languages, are completed safely and responsibly.\n\nLlama 3.2 Model Family:** Token counts refer to pretraining data only. All model versions use Grouped-Query Attention (GQA) for improved inference scalability.\n\nModel Release Date: Oct 24, 2024\n\nStatus: This is a static model trained on an offline dataset. Future versions may be released that improve model capabilities and safety.\n\nLicense: Use of Llama 3.2 is governed by the Llama 3.2 Community License (a custom, commercial license agreement).\n\nFeedback: Instructions on how to provide feedback or comments on the model can be found in the Llama Models README. For more technical information about generation parameters and recipes for how to use Llama 3.2 in applications, please go here.\n\n## Intended Use\n\nIntended Use Cases: Llama 3.2 is intended for commercial and research use in multiple languages. Instruction tuned text only models are intended for assistant-like chat and agentic applications like knowledge retrieval and summarization, mobile AI powered writing assistants and query and prompt rewriting. Pretrained models can be adapted for a variety of additional natural language generation tasks. Similarly, quantized models can be adapted for a variety of on-device use-cases with limited compute resources.\n\nOut of Scope: Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3.2 Community License. Use in languages beyond those explicitly referenced as supported in this model card.\n\n## Hardware and Software\n\nTraining Factors: We used custom training libraries, Meta's custom built GPU cluster, and production infrastructure for pretraining. Fine-tuning, quantization, annotation, and evaluation were also performed on production infrastructure.\n\n**Training Energy Use", '\n=== END-RETRIEVED-CONTEXT ===\n'], memory_bank_ids=['rag_agent_docs'], step_id='e41a178b-182c-444c-8cb6-544979d75a17', step_type='memory_retrieval', turn_id='5b91a548-219f-4805-833f-5535b84abe29', completed_at=None, started_at=None), InferenceStep(inference_model_response=CompletionMessage(content="The release date for Llama 3.1 model is not mentioned in the provided documents. However, there is information about Llama 3.2 model's release date, which is October 24, 2024.\n\nIt appears that there is no information about the Llama 3.1 model in the provided documents.", role='assistant', stop_reason='end_of_turn', tool_calls=[]), step_id='dc72b93c-8f17-44e4-b50f-5f272b11327a', step_type='inference', turn_id='5b91a548-219f-4805-833f-5535b84abe29', completed_at=None, started_at=None)], turn_id='5b91a548-219f-4805-833f-5535b84abe29', completed_at=datetime.datetime(2024, 11, 13, 9, 48, 54, 441075))
The name of the llama model released on October 24, 2024, is not explicitly mentioned in the provided documents. However, the document mentions that the model is "Llama 3.2", but it does not indicate if "Llama 3.2" is the name of the specific model released on October 24, 2024, or if it is a version or variant of the model.
It does mention the Model Release Date as Oct 24, 2024, but this refers to the release of Llama 3.2, not the name of the specific model.
To answer your question accurately, I don't know the name of the llama model released on October 24, 2024, as this information is not explicitly mentioned in the provided documents.
The release date for Llama 3.1 model is not mentioned in the provided documents. However, there is information about Llama 3.2 model's release date, which is October 24, 2024.
It appears that there is no information about the Llama 3.1 model in the provided documents.
Expected behavior
It will be great if we can have Context Retrieval for every User message.
The text was updated successfully, but these errors were encountered: