diff --git a/README.md b/README.md
index edc29ba..577b568 100644
--- a/README.md
+++ b/README.md
@@ -24,6 +24,8 @@ For more detailed information, please refer to the [INSTALL.md](INSTALL.md) file
## Examples
+You can find more detailed examples in our [Examples Documentation](docs/EXAMPLES.md).
+
### Example 1: Filter List
Run the **filter list** example to see how AgentM filters a list of items based on whether they are considered healthy snacks:
@@ -92,6 +94,17 @@ Chain of Thought Reasoning: To find the square root of 144 step-by-step, follow
4. **Testing Numbers**:
- Let's start with smaller numbers:
+ - 1 * 1 = 1 (not the answer)
+ - 2 * 2 = 4 (not the answer)
+ - 3 * 3 = 9 (not the answer)
+ - 4 * 4 = 16 (not the answer)
+ - 5 * 5 = 25 (not the answer)
+ - 6 * 6 = 36 (not the answer)
+ - 7 * 7 = 49 (not the answer)
+ - 8 * 8 = 64 (not the answer)
+ - 9 * 9 = 81 (not the answer)
+ - 10 * 10 = 100 (not the answer)
+ - 11 * 11 = 121 (not the answer)
- 12 * 12 = 144 (this is the answer)
5. **Conclusion**: The square root of 144 is 12.
@@ -101,4 +114,4 @@ Thus, the final answer is:
\[
\sqrt{144} = 12
\]
-```
\ No newline at end of file
+```
diff --git a/docs/EXAMPLES.md b/docs/EXAMPLES.md
new file mode 100644
index 0000000..4d4ba34
--- /dev/null
+++ b/docs/EXAMPLES.md
@@ -0,0 +1,197 @@
+# AgentM Examples
+
+This document contains detailed examples of how to use the micro-agents in AgentM. Here, we walk through practical use cases and outputs to help you understand how the library functions in various scenarios.
+
+For more general information and a quick overview, refer to the [README](../README.md).
+
+## Example 1: Filter List
+
+Run the **filter list** example to see how AgentM filters a list of items based on whether they are considered healthy snacks:
+
+```bash
+python examples/filter_list_example.py
+```
+
+### Sample Output:
+```bash
+Original list: ['Apple', 'Chocolate bar', 'Carrot', 'Chips', 'Orange']
+
+Filtered results:
+
+{'explanation': 'The apple is a healthy snack option, as it is low in calories, rich in fiber, and provides essential vitamins such as vitamin C.', 'remove_item': False}
+{'explanation': 'A chocolate bar is generally considered an unhealthy snack because it is high in sugar and saturated fats, which can contribute to weight gain and other health issues if consumed in excess.', 'remove_item': True}
+{'explanation': 'Carrots are a healthy snack option, as they are low in calories, high in fiber, and rich in vitamins and minerals.', 'remove_item': False}
+{'explanation': 'Chips are typically high in unhealthy fats, sodium, and calories, making them a less nutritious snack choice...', 'remove_item': True}
+{'explanation': 'The orange is a healthy snack option, as it is low in calories and high in vitamin C and dietary fiber, making it a nutritious choice.', 'remove_item': False}
+
+Final Filtered List: ['Apple', 'Carrot', 'Orange']
+```
+
+## Example 2: Sort List
+
+Run the **sort list** example to see how AgentM sorts a list of items using LLM intelligence:
+
+```bash
+python examples/sort_list_example.py
+```
+
+### Sample Output:
+```bash
+2024-09-11 10:46:22,401 - INFO - Sending batch comparison request with prompt: Compare Apple and Orange and return the items in the correct order as 'item1,item2'.
+2024-09-11 10:46:22,729 - INFO - Received response: Apple,Orange
+2024-09-11 10:46:22,730 - INFO - Sending batch comparison request with prompt: Compare Grape and Pineapple and return the items in the correct order as 'item1,item2'.
+2024-09-11 10:46:23,094 - INFO - Received response: Grape,Pineapple
+2024-09-11 10:46:23,094 - INFO - Sending batch comparison request with prompt: Compare Banana and Grape and return the items in the correct order as 'item1,item2'.
+2024-09-11 10:46:23,539 - INFO - Received response: Banana,Grape
+2024-09-11 10:46:23,540 - INFO - Sending batch comparison request with prompt: Compare Apple and Banana and return the items in the correct order as 'item1,item2'.
+Compare Orange and Grape and return the items in the correct order as 'item1,item2'.
+2024-09-11 10:46:24,067 - INFO - Received response: Apple,Banana
+Grape,Orange
+Original list: ['Apple', 'Orange', 'Banana', 'Grape', 'Pineapple']
+Sorted list: ['Apple', 'Banana', 'Orange', 'Grape', 'Pineapple']
+```
+
+## Example 3: Chain of Thought
+
+Run the **chain of thought** example to see how AgentM solves problems using a step-by-step reasoning approach:
+
+```bash
+python examples/chain_of_thought_example.py
+```
+
+### Sample Output:
+```bash
+Question: What is the square root of 144?
+Chain of Thought Reasoning: To find the square root of 144 step-by-step, follow these steps:
+
+1. **Understanding Square Roots**: The square root of a number is a value that, when multiplied by itself, gives that number. For example, if x is the square root of y, then x * x = y.
+
+2. **Identifying the Number**: In this case, we need to find the square root of 144.
+
+3. **Finding Factors**: We'll look for a number that, when multiplied by itself, equals 144.
+
+4. **Testing Numbers**:
+ - Let's start with smaller numbers:
+ - 1 * 1 = 1 (not the answer)
+ - 2 * 2 = 4 (not the answer)
+ - 3 * 3 = 9 (not the answer)
+ - 4 * 4 = 16 (not the answer)
+ - 5 * 5 = 25 (not the answer)
+ - 6 * 6 = 36 (not the answer)
+ - 7 * 7 = 49 (not the answer)
+ - 8 * 8 = 64 (not the answer)
+ - 9 * 9 = 81 (not the answer)
+ - 10 * 10 = 100 (not the answer)
+ - 11 * 11 = 121 (not the answer)
+ - 12 * 12 = 144 (this is the answer)
+
+5. **Conclusion**: The square root of 144 is 12.
+
+Thus, the final answer is:
+
+\[
+\sqrt{144} = 12
+\]
+```
+
+## Example 4: Binary Classification List
+
+Run the **binary classification list** example to classify items as healthy or unhealthy:
+
+```bash
+python examples/binary_classify_list_example.py
+```
+
+### Sample Output:
+```bash
+Classifying item: Based on the following criteria 'Classify each item as either healthy (true) or unhealthy (false)', classify the item 'Apple' as true or false.
+Received response for item: Based on the following criteria 'Classify each item as either healthy (true) or unhealthy (false)', classify the item 'Apple' as true or false. -> True
+Classifying item: Based on the following criteria 'Classify each item as either healthy (true) or unhealthy (false)', classify the item 'Chocolate' as true or false.
+Received response for item: Based on the following criteria 'Classify each item as either healthy (true) or unhealthy (false)', classify the item 'Chocolate' as true or false. -> Chocolate can be classified as unhealthy (false) if it contains high levels of sugar and fat, such as most commercially available milk chocolates. However, dark chocolate with a high cocoa content (70% or more) has health benefits and can be considered healthy in moderation.
+
+Without specific context, the classification can vary. Generally:
+
+- If we consider typical chocolate, it would be classified as unhealthy (false).
+- If it is high-quality dark chocolate and consumed in moderation, it could be classified as healthy (true).
+
+For general classification, I would classify 'Chocolate' as unhealthy (false).
+Classifying item: Based on the following criteria 'Classify each item as either healthy (true) or unhealthy (false)', classify the item 'Carrot' as true or false.
+Received response for item: Based on the following criteria 'Classify each item as either healthy (true) or unhealthy (false)', classify the item 'Carrot' as true or false. -> True
+Original list: ['Apple', 'Chocolate', 'Carrot']
+Binary classified results: [{'item': "Based on the following criteria 'Classify each item as either healthy (true) or unhealthy (false)', classify the item 'Apple' as true or false.", 'classification': 'True'}, {'item': "Based on the following criteria 'Classify each item as either healthy (true) or unhealthy (false)', classify the item 'Chocolate' as true or false.", 'classification': "Chocolate can be classified as unhealthy (false) if it contains high levels of sugar and fat, such as most commercially available milk chocolates. However, dark chocolate with a high cocoa content (70% or more) has health benefits and can be considered healthy in moderation. \n\nWithout specific context, the classification can vary. Generally:\n\n- If we consider typical chocolate, it would be classified as unhealthy (false).\n- If it is high-quality dark chocolate and consumed in moderation, it could be classified as healthy (true).\n\nFor general classification, I would classify 'Chocolate' as unhealthy (false)."}, {'item': "Based on the following criteria 'Classify each item as either healthy (true) or unhealthy (false)', classify the item 'Carrot' as true or false.", 'classification': 'True'}]
+```
+
+## Example 5: Project List
+
+Run the **project list** example to project items based on a rule:
+
+```bash
+python examples/project_list_example.py
+```
+
+### Sample Output:
+
+```bash
+Original list: ['Apple', 'Banana', 'Carrot']
+Projected results: [{'item': "Project the following item based on the rule 'Project these items as their vitamin content': Apple.", 'projection': "Apples are known for their vitamin content, particularly vitamin C. A medium-sized apple typically contains about 14% of the daily value of vitamin C. Additionally, apples provide small amounts of other vitamins such as vitamin A and some B vitamins (like B6 and riboflavin), but their overall vitamin content is modest compared to other fruits. \n\nTherefore, based on the rule 'Project these items as their vitamin content,' the projected vitamin content for an apple would primarily highlight its contribution of vitamin C."}, {'item': "Project the following item based on the rule 'Project these items as their vitamin content': Banana.", 'projection': "Bananas are known for their nutrient content, particularly their vitamins. Here's a projection of the vitamin content found in a banana:\n\n- **Vitamin B6**: Bananas are an excellent source of Vitamin B6, which plays a crucial role in metabolism and brain health.\n- **Vitamin C**: They contain a moderate amount of Vitamin C, which is important for immune function and skin health.\n- **Vitamin A**: Present in smaller amounts, Vitamin A supports vision and immune health.\n- **Folate (Vitamin B9)**: Bananas provide some folate, which is essential for cell division and DNA synthesis.\n\nIn summary, bananas are primarily valued for their high Vitamin B6 content, along with contributions of Vitamin C, Vitamin A, and Folate."}, {'item': "Project the following item based on the rule 'Project these items as their vitamin content': Carrot.", 'projection': 'Carrots are known for their high vitamin content, particularly vitamin A. Here’s the projected vitamin content based on that focus:\n\n- **Vitamin A**: Carrots are rich in beta-carotene, which the body converts into vitamin A. This vitamin is important for vision, immune function, and skin health.\n- **Vitamin K1**: Essential for blood clotting and bone health.\n- **Vitamin B6**: Important for metabolism and brain health.\n- **Vitamin C**: Contributes to immune function and skin health.\n\nOverall, carrots can be projected as a nutrient-dense food with a strong emphasis on vitamin A and a good mix of other vitamins.'}]
+```
+
+## Example 6: Generate Object
+
+Run the **generate object** example to create an object based on a description:
+
+```bash
+python examples/generate_object_example.py
+```
+
+### Sample Output:
+
+```bash
+Object description: A machine that can sort fruits.
+Generated object: {'object_description': 'A machine that can sort fruits.', 'generated_object': '**High-Level Design of a Fruit Sorting Machine**\n\n**Machine Overview:**\nThe Fruit Sorting Machine is an automated system designed to efficiently categorize and sort various types of fruit based on size, color, weight, and quality. The machine aims to streamline the sorting process, reduce manual labor, and enhance consistency in fruit processing.\n\n**Components:**\n\n1. **Fruit Input Hopper:**\n - Description: A large funnel-shaped container where fruits are initially loaded.\n - Features: Adjustable height, vibration mechanism to prevent jamming, and a safety cover.\n\n2. **Conveyor Belt System:**\n - Description: A series of conveyor belts that transport fruits from the input hopper to the sorting area.\n - Features: Adjustable speed settings, rubberized surface for grip, and incline/decline options for gravity-assisted sorting.\n\n3. **Image Recognition System:**\n - Description: A series of high-resolution cameras positioned above the conveyor belt.\n - Function: Capture real-time images of the fruits as they pass by for size, color, and quality analysis using advanced image processing software.\n\n4. **Weight Measurement Station:**\n - Description: A designated area on the conveyor belt equipped with load cells.\n - Function: Measure the weight of each fruit to sort them according to predefined weight categories.\n\n5. **Sorting Mechanism:**\n - Description: A combination of pneumatic actuators and mechanical arms.\n - Function: Based on data from image recognition and weight measurement, fruits are directed to specific sorting paths.\n - Features: Quick-release mechanisms for rapid sorting and adjustable sorting paths for various fruit types.\n\n6. **Output Bins:**\n - Description: Several storage bins placed at the end of the sorting path to collect sorted fruits.\n - Function: Each bin is designated for a specific category based on size, weight, and quality.\n - Features: Labeling system for easy identification and removable for cleaning and maintenance.\n\n7. **Control Panel:**\n - Description: A user-friendly interface for operators to set sorting parameters, monitor performance, and troubleshoot.\n - Features: Touchscreen display, emergency stop button, and connectivity for remote monitoring.\n\n8. **Power Supply System:**\n - Description: A robust electrical system to power all components of the machine.\n - Features: Energy-efficient motors and backup battery support to ensure continuous operation.\n\n9. **Support Frame:**\n - Description: The structural framework supporting all components.\n - Features: Adjustable height settings, made of durable materials (stainless steel/aluminum) to resist wear and corrosion.\n\n**Workflow Summary:**\n1. Fruits are loaded into the input hopper.\n2. The conveyor belt transports the fruits to the image recognition system, where they are scanned for size and color.\n3. Fruits then pass over the weight measurement station to determine their weight.\n4. The sorting mechanism utilizes data from the recognition and weighing systems to direct each fruit into the appropriate output bin.\n5. Sorted fruits collected in designated bins can then be packaged or processed further.\n\n**Additional Considerations:**\n- Potential integration with a data management system to log sorting data for future analysis.\n- Implementation of a cleaning system to ensure hygiene, especially for perishable fruits.\n- Safety features to protect operators, such as shielded moving parts and emergency stops.\n\nThis high-level design outlines the fundamental components and workflow of a fruit sorting machine that works efficiently, accurately, and safely, fulfilling the needs of agricultural and food processing industries.'}
+```
+
+## Example 7: Map List
+
+Run the **map list** example to apply transformations to list items:
+
+```bash
+python examples/map_list_example.py
+```
+
+### Sample Output:
+
+```bash
+Original list: ['Apple', 'Banana', 'Carrot']
+Transformed list: ['APPLE', 'BANANA', 'CARROT']
+```
+
+## Example 8: Reduce List
+
+Run the **reduce list** example to reduce items to achieve a specific goal:
+
+```bash
+python examples/reduce_list_example.py
+```
+
+### Sample Output:
+
+```bash
+Original list: ['Banana', 'Apple', 'Carrot']
+Reduced results: [{'item': "Reduce the item 'Banana' to achieve the goal: Reduce these items to a single word representing their nutritional value.", 'reduced_item': 'Fruit'}, {'item': "Reduce the item 'Apple' to achieve the goal: Reduce these items to a single word representing their nutritional value.", 'reduced_item': 'Fruit'}, {'item': "Reduce the item 'Carrot' to achieve the goal: Reduce these items to a single word representing their nutritional value.", 'reduced_item': 'Beta-carotene'}]
+```
+
+## Example 9: Summarize List
+
+Run the **summarize list** example to generate concise summaries:
+
+```bash
+python examples/summarize_list_example.py
+```
+
+### Sample Output:
+
+```bash
+Original list: ['The quick brown fox jumps over the lazy dog.', 'Python is a popular programming language.']
+Summarized results: [{'item': 'Summarize the following: The quick brown fox jumps over the lazy dog..', 'summary': 'A swift brown fox leaps over a sluggish dog.'}, {'item': 'Summarize the following: Python is a popular programming language..', 'summary': 'Python is a widely used programming language known for its simplicity and versatility.'}]
+```
\ No newline at end of file
diff --git a/examples/binary_classify_list_example.py b/examples/binary_classify_list_example.py
new file mode 100644
index 0000000..8bd5ff9
--- /dev/null
+++ b/examples/binary_classify_list_example.py
@@ -0,0 +1,14 @@
+import asyncio
+from core.binary_classify_list_agent import BinaryClassifyListAgent
+
+async def run_binary_classify_list_example():
+ items_to_classify = ['Apple', 'Chocolate', 'Carrot']
+ criteria = 'Classify each item as either healthy (true) or unhealthy (false)'
+ agent = BinaryClassifyListAgent(list_to_classify=items_to_classify, criteria=criteria)
+ classified_items = await agent.classify_list()
+
+ print("Original list:", items_to_classify)
+ print("Binary classified results:", classified_items)
+
+if __name__ == "__main__":
+ asyncio.run(run_binary_classify_list_example())
\ No newline at end of file
diff --git a/examples/classify_list_example.py b/examples/classify_list_example.py
new file mode 100644
index 0000000..8505795
--- /dev/null
+++ b/examples/classify_list_example.py
@@ -0,0 +1,14 @@
+import asyncio
+from core.classify_list_agent import ClassifyListAgent
+
+async def run_classify_list_example():
+ items_to_classify = ['Apple', 'Chocolate', 'Carrot']
+ classification_criteria = 'Classify each item as healthy or unhealthy snack'
+ agent = ClassifyListAgent(list_to_classify=items_to_classify, classification_criteria=classification_criteria)
+ classified_items = await agent.classify_list()
+
+ print("Original list:", items_to_classify)
+ print("Classified results:", classified_items)
+
+if __name__ == "__main__":
+ asyncio.run(run_classify_list_example())
\ No newline at end of file
diff --git a/examples/data/news-feed.xml b/examples/data/news-feed.xml
new file mode 100644
index 0000000..ce5b051
--- /dev/null
+++ b/examples/data/news-feed.xml
@@ -0,0 +1,359 @@
+
+
+
+ cs.AI updates on arXiv.org
+ http://rss.arxiv.org/rss/cs.AI
+ cs.AI updates on the arXiv.org e-print archive.
+
+ http://www.rssboard.org/rss-specification
+ en-us
+ Mon, 02 Sep 2024 04:00:05 +0000
+ rss-help@arxiv.org
+ Mon, 02 Sep 2024 00:00:00 -0400
+
+ Sunday
+ Saturday
+
+ -
+ Beyond Preferences in AI Alignment
+ https://arxiv.org/abs/2408.16984
+ arXiv:2408.16984v1 Announce Type: new
+Abstract: The dominant practice of AI alignment assumes (1) that preferences are an adequate representation of human values, (2) that human rationality can be understood in terms of maximizing the satisfaction of preferences, and (3) that AI systems should be aligned with the preferences of one or more humans to ensure that they behave safely and in accordance with our values. Whether implicitly followed or explicitly endorsed, these commitments constitute what we term a preferentist approach to AI alignment. In this paper, we characterize and challenge the preferentist approach, describing conceptual and technical alternatives that are ripe for further research. We first survey the limits of rational choice theory as a descriptive model, explaining how preferences fail to capture the thick semantic content of human values, and how utility representations neglect the possible incommensurability of those values. We then critique the normativity of expected utility theory (EUT) for humans and AI, drawing upon arguments showing how rational agents need not comply with EUT, while highlighting how EUT is silent on which preferences are normatively acceptable. Finally, we argue that these limitations motivate a reframing of the targets of AI alignment: Instead of alignment with the preferences of a human user, developer, or humanity-writ-large, AI systems should be aligned with normative standards appropriate to their social roles, such as the role of a general-purpose assistant. Furthermore, these standards should be negotiated and agreed upon by all relevant stakeholders. On this alternative conception of alignment, a multiplicity of AI systems will be able to serve diverse ends, aligned with normative standards that promote mutual benefit and limit harm despite our plural and divergent values.
+ oai:arXiv.org:2408.16984v1
+ cs.AI
+ Mon, 02 Sep 2024 00:00:00 -0400
+ new
+ http://arxiv.org/licenses/nonexclusive-distrib/1.0/
+ Tan Zhi-Xuan, Micah Carroll, Matija Franklin, Hal Ashton
+
+ -
+ Strategic Arms with Side Communication Prevail Over Low-Regret MAB Algorithms
+ https://arxiv.org/abs/2408.17101
+ arXiv:2408.17101v1 Announce Type: new
+Abstract: In the strategic multi-armed bandit setting, when arms possess perfect information about the player's behavior, they can establish an equilibrium where: 1. they retain almost all of their value, 2. they leave the player with a substantial (linear) regret. This study illustrates that, even if complete information is not publicly available to all arms but is shared among them, it is possible to achieve a similar equilibrium. The primary challenge lies in designing a communication protocol that incentivizes the arms to communicate truthfully.
+ oai:arXiv.org:2408.17101v1
+ cs.AI
+ Mon, 02 Sep 2024 00:00:00 -0400
+ new
+ http://arxiv.org/licenses/nonexclusive-distrib/1.0/
+ 10.1109/ICASSP48485.2024.10446895
+ ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024, pp.7435-7439
+ Ahmed Ben Yahmed (CREST, ENSAE Paris), Cl\'ement Calauz\`enes (CREST, ENSAE Paris), Vianney Perchet (CREST, ENSAE Paris)
+
+ -
+ Identifying and Clustering Counter Relationships of Team Compositions in PvP Games for Efficient Balance Analysis
+ https://arxiv.org/abs/2408.17180
+ arXiv:2408.17180v1 Announce Type: new
+Abstract: How can balance be quantified in game settings? This question is crucial for game designers, especially in player-versus-player (PvP) games, where analyzing the strength relations among predefined team compositions-such as hero combinations in multiplayer online battle arena (MOBA) games or decks in card games-is essential for enhancing gameplay and achieving balance. We have developed two advanced measures that extend beyond the simplistic win rate to quantify balance in zero-sum competitive scenarios. These measures are derived from win value estimations, which employ strength rating approximations via the Bradley-Terry model and counter relationship approximations via vector quantization, significantly reducing the computational complexity associated with traditional win value estimations. Throughout the learning process of these models, we identify useful categories of compositions and pinpoint their counter relationships, aligning with the experiences of human players without requiring specific game knowledge. Our methodology hinges on a simple technique to enhance codebook utilization in discrete representation with a deterministic vector quantization process for an extremely small state space. Our framework has been validated in popular online games, including Age of Empires II, Hearthstone, Brawl Stars, and League of Legends. The accuracy of the observed strength relations in these games is comparable to traditional pairwise win value predictions, while also offering a more manageable complexity for analysis. Ultimately, our findings contribute to a deeper understanding of PvP game dynamics and present a methodology that significantly improves game balance evaluation and design.
+ oai:arXiv.org:2408.17180v1
+ cs.AI
+ cs.GT
+ cs.IR
+ cs.LG
+ cs.MA
+ Mon, 02 Sep 2024 00:00:00 -0400
+ new
+ http://arxiv.org/licenses/nonexclusive-distrib/1.0/
+ Chiu-Chou Lin, Yu-Wei Shih, Kuei-Ting Kuo, Yu-Cheng Chen, Chien-Hua Chen, Wei-Chen Chiu, I-Chen Wu
+
+ -
+ Reasoning with maximal consistent signatures
+ https://arxiv.org/abs/2408.17190
+ arXiv:2408.17190v1 Announce Type: new
+Abstract: We analyse a specific instance of the general approach of reasoning based on forgetting by Lang and Marquis. More precisely, we discuss an approach for reasoning with inconsistent information using maximal consistent subsignatures, where a maximal consistent subsignature is a maximal set of propositions such that forgetting the remaining propositions restores consistency. We analyse maximal consistent subsignatures and the corresponding minimal inconsistent subsignatures in-depth and show, among others, that the hitting set duality applies for them as well. We further analyse inference relations based on maximal consistent subsignatures wrt. rationality postulates from non-monotonic reasoning and computational complexity. We also consider the relationship of our approach with inconsistency measurement and paraconsistent reasoning.
+ oai:arXiv.org:2408.17190v1
+ cs.AI
+ Mon, 02 Sep 2024 00:00:00 -0400
+ new
+ http://creativecommons.org/licenses/by-nc-sa/4.0/
+ Matthias Thimm, Jandson Santos Ribeiro Santos
+
+ -
+ Towards Symbolic XAI -- Explanation Through Human Understandable Logical Relationships Between Features
+ https://arxiv.org/abs/2408.17198
+ arXiv:2408.17198v1 Announce Type: new
+Abstract: Explainable Artificial Intelligence (XAI) plays a crucial role in fostering transparency and trust in AI systems, where traditional XAI approaches typically offer one level of abstraction for explanations, often in the form of heatmaps highlighting single or multiple input features. However, we ask whether abstract reasoning or problem-solving strategies of a model may also be relevant, as these align more closely with how humans approach solutions to problems. We propose a framework, called Symbolic XAI, that attributes relevance to symbolic queries expressing logical relationships between input features, thereby capturing the abstract reasoning behind a model's predictions. The methodology is built upon a simple yet general multi-order decomposition of model predictions. This decomposition can be specified using higher-order propagation-based relevance methods, such as GNN-LRP, or perturbation-based explanation methods commonly used in XAI. The effectiveness of our framework is demonstrated in the domains of natural language processing (NLP), vision, and quantum chemistry (QC), where abstract symbolic domain knowledge is abundant and of significant interest to users. The Symbolic XAI framework provides an understanding of the model's decision-making process that is both flexible for customization by the user and human-readable through logical formulas.
+ oai:arXiv.org:2408.17198v1
+ cs.AI
+ cs.LG
+ Mon, 02 Sep 2024 00:00:00 -0400
+ new
+ http://arxiv.org/licenses/nonexclusive-distrib/1.0/
+ Thomas Schnake, Farnoush Rezaei Jafaria, Jonas Lederer, Ping Xiong, Shinichi Nakajima, Stefan Gugler, Gr\'egoire Montavon, Klaus-Robert M\"uller
+
+ -
+ A methodological framework for Resilience as a Service (RaaS) in multimodal urban transportation networks
+ https://arxiv.org/abs/2408.17233
+ arXiv:2408.17233v1 Announce Type: new
+Abstract: Public transportation systems are experiencing an increase in commuter traffic. This increase underscores the need for resilience strategies to manage unexpected service disruptions, ensuring rapid and effective responses that minimize adverse effects on stakeholders and enhance the system's ability to maintain essential functions and recover quickly. This study aims to explore the management of public transport disruptions through resilience as a service (RaaS) strategies, developing an optimization model to effectively allocate resources and minimize the cost for operators and passengers. The proposed model includes multiple transportation options, such as buses, taxis, and automated vans, and evaluates them as bridging alternatives to rail-disrupted services based on factors such as their availability, capacity, speed, and proximity to the disrupted station. This ensures that the most suitable vehicles are deployed to maintain service continuity. Applied to a case study in the Ile de France region, Paris and suburbs, complemented by a microscopic simulation, the model is compared to existing solutions such as bus bridging and reserve fleets. The results highlight the model's performance in minimizing costs and enhancing stakeholder satisfaction, optimizing transport management during disruptions.
+ oai:arXiv.org:2408.17233v1
+ cs.AI
+ Mon, 02 Sep 2024 00:00:00 -0400
+ new
+ http://creativecommons.org/licenses/by-nc-nd/4.0/
+ Sara Jaber (Univ. Gustave Eiffel, COSYS, GRETTIA, Paris, France,VEDECOM, mobiLAB, Department of new solutions of mobility services,shared energy, Versailles, France), Mostafa Ameli (Univ. Gustave Eiffel, COSYS, GRETTIA, Paris, France), S. M. Hassan Mahdavi (VEDECOM, mobiLAB, Department of new solutions of mobility services,shared energy, Versailles, France), Neila Bhouri (Univ. Gustave Eiffel, COSYS, GRETTIA, Paris, France)
+
+ -
+ Flexible and Effective Mixing of Large Language Models into a Mixture of Domain Experts
+ https://arxiv.org/abs/2408.17280
+ arXiv:2408.17280v1 Announce Type: new
+Abstract: We present a toolkit for creating low-cost Mixture-of-Domain-Experts (MOE) from trained models. The toolkit can be used for creating a mixture from models or from adapters. We perform extensive tests and offer guidance on defining the architecture of the resulting MOE using the toolkit. A public repository is available.
+ oai:arXiv.org:2408.17280v1
+ cs.AI
+ cs.CL
+ Mon, 02 Sep 2024 00:00:00 -0400
+ new
+ http://creativecommons.org/licenses/by-sa/4.0/
+ Rhui Dih Lee, Laura Wynter, Raghu Kiran Ganti
+
+ -
+ Bridging Domain Knowledge and Process Discovery Using Large Language Models
+ https://arxiv.org/abs/2408.17316
+ arXiv:2408.17316v1 Announce Type: new
+Abstract: Discovering good process models is essential for different process analysis tasks such as conformance checking and process improvements. Automated process discovery methods often overlook valuable domain knowledge. This knowledge, including insights from domain experts and detailed process documentation, remains largely untapped during process discovery. This paper leverages Large Language Models (LLMs) to integrate such knowledge directly into process discovery. We use rules derived from LLMs to guide model construction, ensuring alignment with both domain knowledge and actual process executions. By integrating LLMs, we create a bridge between process knowledge expressed in natural language and the discovery of robust process models, advancing process discovery methodologies significantly. To showcase the usability of our framework, we conducted a case study with the UWV employee insurance agency, demonstrating its practical benefits and effectiveness.
+ oai:arXiv.org:2408.17316v1
+ cs.AI
+ cs.CL
+ Mon, 02 Sep 2024 00:00:00 -0400
+ new
+ http://creativecommons.org/licenses/by/4.0/
+ Ali Norouzifar, Humam Kourani, Marcus Dees, Wil van der Aalst
+
+ -
+ Traffic expertise meets residual RL: Knowledge-informed model-based residual reinforcement learning for CAV trajectory control
+ https://arxiv.org/abs/2408.17380
+ arXiv:2408.17380v1 Announce Type: new
+Abstract: Model-based reinforcement learning (RL) is anticipated to exhibit higher sample efficiency compared to model-free RL by utilizing a virtual environment model. However, it is challenging to obtain sufficiently accurate representations of the environmental dynamics due to uncertainties in complex systems and environments. An inaccurate environment model may degrade the sample efficiency and performance of model-based RL. Furthermore, while model-based RL can improve sample efficiency, it often still requires substantial training time to learn from scratch, potentially limiting its advantages over model-free approaches. To address these challenges, this paper introduces a knowledge-informed model-based residual reinforcement learning framework aimed at enhancing learning efficiency by infusing established expert knowledge into the learning process and avoiding the issue of beginning from zero. Our approach integrates traffic expert knowledge into a virtual environment model, employing the Intelligent Driver Model (IDM) for basic dynamics and neural networks for residual dynamics, thus ensuring adaptability to complex scenarios. We propose a novel strategy that combines traditional control methods with residual RL, facilitating efficient learning and policy optimization without the need to learn from scratch. The proposed approach is applied to CAV trajectory control tasks for the dissipation of stop-and-go waves in mixed traffic flow. Experimental results demonstrate that our proposed approach enables the CAV agent to achieve superior performance in trajectory control compared to the baseline agents in terms of sample efficiency, traffic flow smoothness and traffic mobility. The source code and supplementary materials are available at https://github.com/zihaosheng/traffic-expertise-RL/.
+ oai:arXiv.org:2408.17380v1
+ cs.AI
+ cs.LG
+ Mon, 02 Sep 2024 00:00:00 -0400
+ new
+ http://arxiv.org/licenses/nonexclusive-distrib/1.0/
+ Zihao Sheng, Zilin Huang, Sikai Chen
+
+ -
+ Exploring the Effect of Explanation Content and Format on User Comprehension and Trust
+ https://arxiv.org/abs/2408.17401
+ arXiv:2408.17401v1 Announce Type: new
+Abstract: In recent years, various methods have been introduced for explaining the outputs of "black-box" AI models. However, it is not well understood whether users actually comprehend and trust these explanations. In this paper, we focus on explanations for a regression tool for assessing cancer risk and examine the effect of the explanations' content and format on the user-centric metrics of comprehension and trust. Regarding content, we experiment with two explanation methods: the popular SHAP, based on game-theoretic notions and thus potentially complex for everyday users to comprehend, and occlusion-1, based on feature occlusion which may be more comprehensible. Regarding format, we present SHAP explanations as charts (SC), as is conventional, and occlusion-1 explanations as charts (OC) as well as text (OT), to which their simpler nature also lends itself. The experiments amount to user studies questioning participants, with two different levels of expertise (the general population and those with some medical training), on their subjective and objective comprehension of and trust in explanations for the outputs of the regression tool. In both studies we found a clear preference in terms of subjective comprehension and trust for occlusion-1 over SHAP explanations in general, when comparing based on content. However, direct comparisons of explanations when controlling for format only revealed evidence for OT over SC explanations in most cases, suggesting that the dominance of occlusion-1 over SHAP explanations may be driven by a preference for text over charts as explanations. Finally, we found no evidence of a difference between the explanation types in terms of objective comprehension. Thus overall, the choice of the content and format of explanations needs careful attention, since in some contexts format, rather than content, may play the critical role in improving user experience.
+ oai:arXiv.org:2408.17401v1
+ cs.AI
+ Mon, 02 Sep 2024 00:00:00 -0400
+ new
+ http://creativecommons.org/licenses/by/4.0/
+ Antonio Rago, Bence Palfi, Purin Sukpanichnant, Hannibal Nabli, Kavyesh Vivek, Olga Kostopoulou, James Kinross, Francesca Toni
+
+ -
+ An Effective Information Theoretic Framework for Channel Pruning
+ https://arxiv.org/abs/2408.16772
+ arXiv:2408.16772v1 Announce Type: cross
+Abstract: Channel pruning is a promising method for accelerating and compressing convolutional neural networks. However, current pruning algorithms still remain unsolved problems that how to assign layer-wise pruning ratios properly and discard the least important channels with a convincing criterion. In this paper, we present a novel channel pruning approach via information theory and interpretability of neural networks. Specifically, we regard information entropy as the expected amount of information for convolutional layers. In addition, if we suppose a matrix as a system of linear equations, a higher-rank matrix represents there exist more solutions to it, which indicates more uncertainty. From the point of view of information theory, the rank can also describe the amount of information. In a neural network, considering the rank and entropy as two information indicators of convolutional layers, we propose a fusion function to reach a compromise of them, where the fusion results are defined as ``information concentration''. When pre-defining layer-wise pruning ratios, we employ the information concentration as a reference instead of heuristic and engineering tuning to provide a more interpretable solution. Moreover, we leverage Shapley values, which are a potent tool in the interpretability of neural networks, to evaluate the channel contributions and discard the least important channels for model compression while maintaining its performance. Extensive experiments demonstrate the effectiveness and promising performance of our method. For example, our method improves the accuracy by 0.21% when reducing 45.5% FLOPs and removing 40.3% parameters for ResNet-56 on CIFAR-10. Moreover, our method obtains loss in Top-1/Top-5 accuracies of 0.43%/0.11% by reducing 41.6% FLOPs and removing 35.0% parameters for ResNet-50 on ImageNet.
+ oai:arXiv.org:2408.16772v1
+ cs.IT
+ cs.AI
+ cs.LG
+ math.IT
+ Mon, 02 Sep 2024 00:00:00 -0400
+ cross
+ http://arxiv.org/licenses/nonexclusive-distrib/1.0/
+ 10.1109/TNNLS.2024.3365194
+ Yihao Chen, Zefang Wang
+
+ -
+ Online Behavior Modification for Expressive User Control of RL-Trained Robots
+ https://arxiv.org/abs/2408.16776
+ arXiv:2408.16776v1 Announce Type: cross
+Abstract: Reinforcement Learning (RL) is an effective method for robots to learn tasks. However, in typical RL, end-users have little to no control over how the robot does the task after the robot has been deployed. To address this, we introduce the idea of online behavior modification, a paradigm in which users have control over behavior features of a robot in real time as it autonomously completes a task using an RL-trained policy. To show the value of this user-centered formulation for human-robot interaction, we present a behavior diversity based algorithm, Adjustable Control Of RL Dynamics (ACORD), and demonstrate its applicability to online behavior modification in simulation and a user study. In the study (n=23) users adjust the style of paintings as a robot traces a shape autonomously. We compare ACORD to RL and Shared Autonomy (SA), and show ACORD affords user-preferred levels of control and expression, comparable to SA, but with the potential for autonomous execution and robustness of RL.
+ oai:arXiv.org:2408.16776v1
+ cs.RO
+ cs.AI
+ Mon, 02 Sep 2024 00:00:00 -0400
+ cross
+ http://creativecommons.org/licenses/by-nc-nd/4.0/
+ 10.1145/3610977.3634947
+ Isaac Sheidlower, Mavis Murdock, Emma Bethel, Reuben M. Aronson, Elaine Schaertl Short
+
+ -
+ Inductive Learning of Logical Theories with LLMs: A Complexity-graded Analysis
+ https://arxiv.org/abs/2408.16779
+ arXiv:2408.16779v1 Announce Type: cross
+Abstract: This work presents a novel systematic methodology to analyse the capabilities and limitations of Large Language Models (LLMs) with feedback from a formal inference engine, on logic theory induction. The analysis is complexity-graded w.r.t. rule dependency structure, allowing quantification of specific inference challenges on LLM performance. Integrating LLMs with formal methods is a promising frontier in the Natural Language Processing field, as an important avenue for improving model inference control and explainability. In particular, inductive learning over complex sets of facts and rules, poses unique challenges for current autoregressive models, as they lack explicit symbolic grounding. While they can be complemented by formal systems, the properties delivered by LLMs regarding inductive learning, are not well understood and quantified. Empirical results indicate that the largest LLMs can achieve competitive results against a SOTA Inductive Logic Programming (ILP) system baseline, but also that tracking long predicate relationship chains is a more difficult obstacle than theory complexity for the LLMs.
+ oai:arXiv.org:2408.16779v1
+ cs.CL
+ cs.AI
+ cs.LO
+ Mon, 02 Sep 2024 00:00:00 -0400
+ cross
+ http://creativecommons.org/licenses/by-nc-sa/4.0/
+ Jo\~ao Pedro Gandarela, Danilo S. Carvalho, Andr\'e Freitas
+
+ -
+ $EvoAl^{2048}$
+ https://arxiv.org/abs/2408.16780
+ arXiv:2408.16780v1 Announce Type: cross
+Abstract: As AI solutions enter safety-critical products, the explainability and interpretability of solutions generated by AI products become increasingly important. In the long term, such explanations are the key to gaining users' acceptance of AI-based systems' decisions. We report on applying a model-driven-based optimisation to search for an interpretable and explainable policy that solves the game 2048. This paper describes a solution to the GECCO'24 Interpretable Control Competition using the open-source software EvoAl. We aimed to develop an approach for creating interpretable policies that are easy to adapt to new ideas.
+ oai:arXiv.org:2408.16780v1
+ cs.NE
+ cs.AI
+ Mon, 02 Sep 2024 00:00:00 -0400
+ cross
+ http://creativecommons.org/licenses/by-sa/4.0/
+ Bernhard J. Berger (University of Rostock, Software Engineering Chair Rostock, Germany, Hamburg University of Technology, Institute of Embedded Systems, Germany), Christina Plump (DFKI - Cyber-Physical Systems Bremen, Germany), Rolf Drechsler (University of Bremen, Departments of Mathematics and Computer Science, DFKI - Cyber-Physical Systems Bremen, Germany)
+
+ -
+ Uncertainty-aware segmentation for rainfall prediction post processing
+ https://arxiv.org/abs/2408.16792
+ arXiv:2408.16792v1 Announce Type: cross
+Abstract: Accurate precipitation forecasts are crucial for applications such as flood management, agricultural planning, water resource allocation, and weather warnings. Despite advances in numerical weather prediction (NWP) models, they still exhibit significant biases and uncertainties, especially at high spatial and temporal resolutions. To address these limitations, we explore uncertainty-aware deep learning models for post-processing daily cumulative quantitative precipitation forecasts to obtain forecast uncertainties that lead to a better trade-off between accuracy and reliability. Our study compares different state-of-the-art models, and we propose a variant of the well-known SDE-Net, called SDE U-Net, tailored to segmentation problems like ours. We evaluate its performance for both typical and intense precipitation events.
+ Our results show that all deep learning models significantly outperform the average baseline NWP solution, with our implementation of the SDE U-Net showing the best trade-off between accuracy and reliability. Integrating these models, which account for uncertainty, into operational forecasting systems can improve decision-making and preparedness for weather-related events.
+ oai:arXiv.org:2408.16792v1
+ physics.ao-ph
+ cs.AI
+ cs.LG
+ Mon, 02 Sep 2024 00:00:00 -0400
+ cross
+ http://creativecommons.org/licenses/by/4.0/
+ Simone Monaco, Luca Monaco, Daniele Apiletti
+
+ -
+ Generative AI in Ship Design
+ https://arxiv.org/abs/2408.16798
+ arXiv:2408.16798v1 Announce Type: cross
+Abstract: The process of ship design is intricate, heavily influenced by the hull form which accounts for approximately 70% of the total cost. Traditional methods rely on human-driven iterative processes based on naval architecture principles and engineering analysis. In contrast, generative AI presents a novel approach, utilizing computational algorithms rooted in machine learning and artificial intelligence to optimize ship hull design. This report outlines the systematic creation of a generative AI for this purpose, involving steps such as dataset collection, model architecture selection, training, and validation. Utilizing the "SHIP-D" dataset, consisting of 30,000 hull forms, the report adopts the Gaussian Mixture Model (GMM) as the generative model architecture. GMMs offer a statistical framework to analyze data distribution, crucial for generating innovative ship designs efficiently. Overall, this approach holds promise in revolutionizing ship design by exploring a broader design space and integrating multidisciplinary optimization objectives effectively.
+ oai:arXiv.org:2408.16798v1
+ cs.LG
+ cs.AI
+ Mon, 02 Sep 2024 00:00:00 -0400
+ cross
+ http://arxiv.org/licenses/nonexclusive-distrib/1.0/
+ Sahil Thakur, Navneet V Saxena, Prof Sitikantha Roy
+
+ -
+ HLogformer: A Hierarchical Transformer for Representing Log Data
+ https://arxiv.org/abs/2408.16803
+ arXiv:2408.16803v1 Announce Type: cross
+Abstract: Transformers have gained widespread acclaim for their versatility in handling diverse data structures, yet their application to log data remains underexplored. Log data, characterized by its hierarchical, dictionary-like structure, poses unique challenges when processed using conventional transformer models. Traditional methods often rely on manually crafted templates for parsing logs, a process that is labor-intensive and lacks generalizability. Additionally, the linear treatment of log sequences by standard transformers neglects the rich, nested relationships within log entries, leading to suboptimal representations and excessive memory usage.
+ To address these issues, we introduce HLogformer, a novel hierarchical transformer framework specifically designed for log data. HLogformer leverages the hierarchical structure of log entries to significantly reduce memory costs and enhance representation learning. Unlike traditional models that treat log data as flat sequences, our framework processes log entries in a manner that respects their inherent hierarchical organization. This approach ensures comprehensive encoding of both fine-grained details and broader contextual relationships.
+ Our contributions are threefold: First, HLogformer is the first framework to design a dynamic hierarchical transformer tailored for dictionary-like log data. Second, it dramatically reduces memory costs associated with processing extensive log sequences. Third, comprehensive experiments demonstrate that HLogformer more effectively encodes hierarchical contextual information, proving to be highly effective for downstream tasks such as synthetic anomaly detection and product recommendation.
+ oai:arXiv.org:2408.16803v1
+ cs.LG
+ cs.AI
+ Mon, 02 Sep 2024 00:00:00 -0400
+ cross
+ http://arxiv.org/licenses/nonexclusive-distrib/1.0/
+ Zhichao Hou, Mina Ghashami, Mikhail Kuznetsov, MohamadAli Torkamani
+
+ -
+ Physics-Informed Neural Networks and Extensions
+ https://arxiv.org/abs/2408.16806
+ arXiv:2408.16806v1 Announce Type: cross
+Abstract: In this paper, we review the new method Physics-Informed Neural Networks (PINNs) that has become the main pillar in scientific machine learning, we present recent practical extensions, and provide a specific example in data-driven discovery of governing differential equations.
+ oai:arXiv.org:2408.16806v1
+ cs.LG
+ cs.AI
+ Mon, 02 Sep 2024 00:00:00 -0400
+ cross
+ http://creativecommons.org/licenses/by/4.0/
+ Maziar Raissi, Paris Perdikaris, Nazanin Ahmadi, George Em Karniadakis
+
+ -
+ GSTAM: Efficient Graph Distillation with Structural Attention-Matching
+ https://arxiv.org/abs/2408.16871
+ arXiv:2408.16871v1 Announce Type: cross
+Abstract: Graph distillation has emerged as a solution for reducing large graph datasets to smaller, more manageable, and informative ones. Existing methods primarily target node classification, involve computationally intensive processes, and fail to capture the true distribution of the full graph dataset. To address these issues, we introduce Graph Distillation with Structural Attention Matching (GSTAM), a novel method for condensing graph classification datasets. GSTAM leverages the attention maps of GNNs to distill structural information from the original dataset into synthetic graphs. The structural attention-matching mechanism exploits the areas of the input graph that GNNs prioritize for classification, effectively distilling such information into the synthetic graphs and improving overall distillation performance. Comprehensive experiments demonstrate GSTAM's superiority over existing methods, achieving 0.45% to 6.5% better performance in extreme condensation ratios, highlighting its potential use in advancing distillation for graph classification tasks (Code available at https://github.com/arashrasti96/GSTAM).
+ oai:arXiv.org:2408.16871v1
+ cs.LG
+ cs.AI
+ Mon, 02 Sep 2024 00:00:00 -0400
+ cross
+ http://creativecommons.org/licenses/by-nc-sa/4.0/
+ Arash Rasti-Meymandi, Ahmad Sajedi, Zhaopan Xu, Konstantinos N. Plataniotis
+
+ -
+ Analyzing Inference Privacy Risks Through Gradients in Machine Learning
+ https://arxiv.org/abs/2408.16913
+ arXiv:2408.16913v1 Announce Type: cross
+Abstract: In distributed learning settings, models are iteratively updated with shared gradients computed from potentially sensitive user data. While previous work has studied various privacy risks of sharing gradients, our paper aims to provide a systematic approach to analyze private information leakage from gradients. We present a unified game-based framework that encompasses a broad range of attacks including attribute, property, distributional, and user disclosures. We investigate how different uncertainties of the adversary affect their inferential power via extensive experiments on five datasets across various data modalities. Our results demonstrate the inefficacy of solely relying on data aggregation to achieve privacy against inference attacks in distributed learning. We further evaluate five types of defenses, namely, gradient pruning, signed gradient descent, adversarial perturbations, variational information bottleneck, and differential privacy, under both static and adaptive adversary settings. We provide an information-theoretic view for analyzing the effectiveness of these defenses against inference from gradients. Finally, we introduce a method for auditing attribute inference privacy, improving the empirical estimation of worst-case privacy through crafting adversarial canary records.
+ oai:arXiv.org:2408.16913v1
+ cs.LG
+ cs.AI
+ cs.CR
+ stat.ML
+ Mon, 02 Sep 2024 00:00:00 -0400
+ cross
+ http://arxiv.org/licenses/nonexclusive-distrib/1.0/
+ Zhuohang Li, Andrew Lowy, Jing Liu, Toshiaki Koike-Akino, Kieran Parsons, Bradley Malin, Ye Wang
+
+ -
+ ACE-2005-PT: Corpus for Event Extraction in Portuguese
+ https://arxiv.org/abs/2408.16928
+ arXiv:2408.16928v1 Announce Type: cross
+Abstract: Event extraction is an NLP task that commonly involves identifying the central word (trigger) for an event and its associated arguments in text. ACE-2005 is widely recognised as the standard corpus in this field. While other corpora, like PropBank, primarily focus on annotating predicate-argument structure, ACE-2005 provides comprehensive information about the overall event structure and semantics. However, its limited language coverage restricts its usability. This paper introduces ACE-2005-PT, a corpus created by translating ACE-2005 into Portuguese, with European and Brazilian variants. To speed up the process of obtaining ACE-2005-PT, we rely on automatic translators. This, however, poses some challenges related to automatically identifying the correct alignments between multi-word annotations in the original text and in the corresponding translated sentence. To achieve this, we developed an alignment pipeline that incorporates several alignment techniques: lemmatization, fuzzy matching, synonym matching, multiple translations and a BERT-based word aligner. To measure the alignment effectiveness, a subset of annotations from the ACE-2005-PT corpus was manually aligned by a linguist expert. This subset was then compared against our pipeline results which achieved exact and relaxed match scores of 70.55\% and 87.55\% respectively. As a result, we successfully generated a Portuguese version of the ACE-2005 corpus, which has been accepted for publication by LDC.
+ oai:arXiv.org:2408.16928v1
+ cs.CL
+ cs.AI
+ Mon, 02 Sep 2024 00:00:00 -0400
+ cross
+ http://creativecommons.org/licenses/by/4.0/
+ 10.1145/3626772.3657872
+ SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (2024)
+ Lu\'is Filipe Cunha, Purifica\c{c}\~ao Silvano, Ricardo Campos, Al\'ipio Jorge
+
+ -
+ Event Extraction for Portuguese: A QA-driven Approach using ACE-2005
+ https://arxiv.org/abs/2408.16932
+ arXiv:2408.16932v1 Announce Type: cross
+Abstract: Event extraction is an Information Retrieval task that commonly consists of identifying the central word for the event (trigger) and the event's arguments. This task has been extensively studied for English but lags behind for Portuguese, partly due to the lack of task-specific annotated corpora. This paper proposes a framework in which two separated BERT-based models were fine-tuned to identify and classify events in Portuguese documents. We decompose this task into two sub-tasks. Firstly, we use a token classification model to detect event triggers. To extract event arguments, we train a Question Answering model that queries the triggers about their corresponding event argument roles. Given the lack of event annotated corpora in Portuguese, we translated the original version of the ACE-2005 dataset (a reference in the field) into Portuguese, producing a new corpus for Portuguese event extraction. To accomplish this, we developed an automatic translation pipeline. Our framework obtains F1 marks of 64.4 for trigger classification and 46.7 for argument classification setting, thus a new state-of-the-art reference for these tasks in Portuguese.
+ oai:arXiv.org:2408.16932v1
+ cs.CL
+ cs.AI
+ Mon, 02 Sep 2024 00:00:00 -0400
+ cross
+ http://creativecommons.org/licenses/by/4.0/
+ 10.1007/9
+ Progress in Artificial Intelligence. EPIA 2023. Lecture Notes in Computer Science(), vol 14115. Springer, Cham
+ Lu\'is Filipe Cunha, Ricardo Campos, Al\'ipio Jorge
+
+ -
+ A longitudinal sentiment analysis of Sinophobia during COVID-19 using large language models
+ https://arxiv.org/abs/2408.16942
+ arXiv:2408.16942v1 Announce Type: cross
+Abstract: The COVID-19 pandemic has exacerbated xenophobia, particularly Sinophobia, leading to widespread discrimination against individuals of Chinese descent. Large language models (LLMs) are pre-trained deep learning models used for natural language processing (NLP) tasks. The ability of LLMs to understand and generate human-like text makes them particularly useful for analysing social media data to detect and evaluate sentiments. We present a sentiment analysis framework utilising LLMs for longitudinal sentiment analysis of the Sinophobic sentiments expressed in X (Twitter) during the COVID-19 pandemic. The results show a significant correlation between the spikes in Sinophobic tweets, Sinophobic sentiments and surges in COVID-19 cases, revealing that the evolution of the pandemic influenced public sentiment and the prevalence of Sinophobic discourse. Furthermore, the sentiment analysis revealed a predominant presence of negative sentiments, such as annoyance and denial, which underscores the impact of political narratives and misinformation shaping public opinion. The lack of empathetic sentiment which was present in previous studies related to COVID-19 highlights the way the political narratives in media viewed the pandemic and how it blamed the Chinese community. Our study highlights the importance of transparent communication in mitigating xenophobic sentiments during global crises.
+ oai:arXiv.org:2408.16942v1
+ cs.CL
+ cs.AI
+ Mon, 02 Sep 2024 00:00:00 -0400
+ cross
+ http://creativecommons.org/licenses/by/4.0/
+ Chen Wang, Rohitash Chandra
+
+ -
+ Different Victims, Same Layout: Email Visual Similarity Detection for Enhanced Email Protection
+ https://arxiv.org/abs/2408.16945
+ arXiv:2408.16945v1 Announce Type: cross
+Abstract: In the pursuit of an effective spam detection system, the focus has often been on identifying known spam patterns either through rule-based detection systems or machine learning (ML) solutions. However, both systems are susceptible to evasion techniques and zero-day attacks that can be achieved at low cost. Therefore, an email that bypassed the defense system once can do it again in the following days, even though rules are updated or the ML models are retrained. The recurrence of failures to detect emails that exhibit layout similarities to previously undetected spam is concerning for customers and can erode their trust in a company. Our observations show that threat actors reuse email kits extensively and can bypass detection with little effort, for example, by making changes to the content of emails. In this work, we propose an email visual similarity detection approach, named Pisco, to improve the detection capabilities of an email threat defense system. We apply our proof of concept to some real-world samples received from different sources. Our results show that email kits are being reused extensively and visually similar emails are sent to our customers at various time intervals. Therefore, this method could be very helpful in situations where detection features that rely on contextual information and keywords are bypassed, an occurrence our observations show happens frequently.
+ oai:arXiv.org:2408.16945v1
+ cs.CR
+ cs.AI
+ cs.LG
+ Mon, 02 Sep 2024 00:00:00 -0400
+ cross
+ http://creativecommons.org/licenses/by-nc-sa/4.0/
+ Sachin Shukla, Omid Mirzaei
+
+ -
+ Transient Fault Tolerant Semantic Segmentation for Autonomous Driving
+ https://arxiv.org/abs/2408.16952
+ arXiv:2408.16952v1 Announce Type: cross
+Abstract: Deep learning models are crucial for autonomous vehicle perception, but their reliability is challenged by algorithmic limitations and hardware faults. We address the latter by examining fault-tolerance in semantic segmentation models. Using established hardware fault models, we evaluate existing hardening techniques both in terms of accuracy and uncertainty and introduce ReLUMax, a novel simple activation function designed to enhance resilience against transient faults. ReLUMax integrates seamlessly into existing architectures without time overhead. Our experiments demonstrate that ReLUMax effectively improves robustness, preserving performance and boosting prediction confidence, thus contributing to the development of reliable autonomous driving systems.
+ oai:arXiv.org:2408.16952v1
+ cs.CV
+ cs.AI
+ Mon, 02 Sep 2024 00:00:00 -0400
+ cross
+ http://arxiv.org/licenses/nonexclusive-distrib/1.0/
+ Leonardo Iurada, Niccol\`o Cavagnero, Fernando Fernandes Dos Santos, Giuseppe Averta, Paolo Rech, Tatiana Tommasi
+
+
+
\ No newline at end of file
diff --git a/examples/data/paul-graham-essay.txt b/examples/data/paul-graham-essay.txt
new file mode 100644
index 0000000..e793a76
--- /dev/null
+++ b/examples/data/paul-graham-essay.txt
@@ -0,0 +1,110 @@
+**The Bus Ticket Theory of Genius**
+November 2019
+
+Everyone knows that to do great work you need both natural ability and determination. But there's a third ingredient that's not as well understood: an obsessive interest in a particular topic.
+
+To explain this point I need to burn my reputation with some group of people, and I'm going to choose bus ticket collectors. There are people who collect old bus tickets. Like many collectors, they have an obsessive interest in the minutiae of what they collect. They can keep track of distinctions between different types of bus tickets that would be hard for the rest of us to remember. Because we don't care enough. What's the point of spending so much time thinking about old bus tickets?
+
+Which leads us to the second feature of this kind of obsession: there is no point. A bus ticket collector's love is disinterested. They're not doing it to impress us or to make themselves rich, but for its own sake.
+
+When you look at the lives of people who've done great work, you see a consistent pattern. They often begin with a bus ticket collector's obsessive interest in something that would have seemed pointless to most of their contemporaries. One of the most striking features of Darwin's book about his voyage on the Beagle is the sheer depth of his interest in natural history. His curiosity seems infinite. Ditto for Ramanujan, sitting by the hour working out on his slate what happens to series.
+
+It's a mistake to think they were "laying the groundwork" for the discoveries they made later. There's too much intention in that metaphor. Like bus ticket collectors, they were doing it because they liked it.
+
+But there is a difference between Ramanujan and a bus ticket collector. Series matter, and bus tickets don't.
+
+If I had to put the recipe for genius into one sentence, that might be it: to have a disinterested obsession with something that matters.
+
+Aren't I forgetting about the other two ingredients? Less than you might think. An obsessive interest in a topic is both a proxy for ability and a substitute for determination. Unless you have sufficient mathematical aptitude, you won't find series interesting. And when you're obsessively interested in something, you don't need as much determination: you don't need to push yourself as hard when curiosity is pulling you.
+
+An obsessive interest will even bring you luck, to the extent anything can. Chance, as Pasteur said, favors the prepared mind, and if there's one thing an obsessed mind is, it's prepared.
+
+The disinterestedness of this kind of obsession is its most important feature. Not just because it's a filter for earnestness, but because it helps you discover new ideas.
+
+The paths that lead to new ideas tend to look unpromising. If they looked promising, other people would already have explored them. How do the people who do great work discover these paths that others overlook? The popular story is that they simply have better vision: because they're so talented, they see paths that others miss. But if you look at the way great discoveries are made, that's not what happens. Darwin didn't pay closer attention to individual species than other people because he saw that this would lead to great discoveries, and they didn't. He was just really, really interested in such things.
+
+Darwin couldn't turn it off. Neither could Ramanujan. They didn't discover the hidden paths that they did because they seemed promising, but because they couldn't help it. That's what allowed them to follow paths that someone who was merely ambitious would have ignored.
+
+What rational person would decide that the way to write great novels was to begin by spending several years creating an imaginary elvish language, like Tolkien, or visiting every household in southwestern Britain, like Trollope? No one, including Tolkien and Trollope.
+
+The bus ticket theory is similar to Carlyle's famous definition of genius as an infinite capacity for taking pains. But there are two differences. The bus ticket theory makes it clear that the source of this infinite capacity for taking pains is not infinite diligence, as Carlyle seems to have meant, but the sort of infinite interest that collectors have. It also adds an important qualification: an infinite capacity for taking pains about something that matters.
+
+So what matters? You can never be sure. It's precisely because no one can tell in advance which paths are promising that you can discover new ideas by working on what you're interested in.
+
+But there are some heuristics you can use to guess whether an obsession might be one that matters. For example, it's more promising if you're creating something, rather than just consuming something someone else creates. It's more promising if something you're interested in is difficult, especially if it's more difficult for other people than it is for you. And the obsessions of talented people are more likely to be promising. When talented people become interested in random things, they're not truly random.
+
+But you can never be sure. In fact, here's an interesting idea that's also rather alarming if it's true: it may be that to do great work, you also have to waste a lot of time.
+
+In many different areas, reward is proportionate to risk. If that rule holds here, then the way to find paths that lead to truly great work is to be willing to expend a lot of effort on things that turn out to be every bit as unpromising as they seem.
+
+I'm not sure if this is true. On one hand, it seems surprisingly difficult to waste your time so long as you're working hard on something interesting. So much of what you do ends up being useful. But on the other hand, the rule about the relationship between risk and reward is so powerful that it seems to hold wherever risk occurs. Newton's case, at least, suggests that the risk/reward rule holds here. He's famous for one particular obsession of his that turned out to be unprecedentedly fruitful: using math to describe the world. But he had two other obsessions, alchemy and theology, that seem to have been complete wastes of time. He ended up net ahead. His bet on what we now call physics paid off so well that it more than compensated for the other two. But were the other two necessary, in the sense that he had to take big risks to make such big discoveries? I don't know.
+
+Here's an even more alarming idea: might one make all bad bets? It probably happens quite often. But we don't know how often, because these people don't become famous.
+
+It's not merely that the returns from following a path are hard to predict. They change dramatically over time. 1830 was a really good time to be obsessively interested in natural history. If Darwin had been born in 1709 instead of 1809, we might never have heard of him.
+
+What can one do in the face of such uncertainty? One solution is to hedge your bets, which in this case means to follow the obviously promising paths instead of your own private obsessions. But as with any hedge, you're decreasing reward when you decrease risk. If you forgo working on what you like in order to follow some more conventionally ambitious path, you might miss something wonderful that you'd otherwise have discovered. That too must happen all the time, perhaps even more often than the genius whose bets all fail.
+
+The other solution is to let yourself be interested in lots of different things. You don't decrease your upside if you switch between equally genuine interests based on which seems to be working so far. But there is a danger here too: if you work on too many different projects, you might not get deeply enough into any of them.
+
+One interesting thing about the bus ticket theory is that it may help explain why different types of people excel at different kinds of work. Interest is much more unevenly distributed than ability. If natural ability is all you need to do great work, and natural ability is evenly distributed, you have to invent elaborate theories to explain the skewed distributions we see among those who actually do great work in various fields. But it may be that much of the skew has a simpler explanation: different people are interested in different things.
+
+The bus ticket theory also explains why people are less likely to do great work after they have children. Here interest has to compete not just with external obstacles, but with another interest, and one that for most people is extremely powerful. It's harder to find time for work after you have kids, but that's the easy part. The real change is that you don't want to.
+
+But the most exciting implication of the bus ticket theory is that it suggests ways to encourage great work. If the recipe for genius is simply natural ability plus hard work, all we can do is hope we have a lot of ability, and work as hard as we can. But if interest is a critical ingredient in genius, we may be able, by cultivating interest, to cultivate genius.
+
+For example, for the very ambitious, the bus ticket theory suggests that the way to do great work is to relax a little. Instead of gritting your teeth and diligently pursuing what all your peers agree is the most promising line of research, maybe you should try doing something just for fun. And if you're stuck, that may be the vector along which to break out.
+
+I've always liked Hamming's famous double-barrelled question: what are the most important problems in your field, and why aren't you working on one of them? It's a great way to shake yourself up. But it may be overfitting a bit. It might be at least as useful to ask yourself: if you could take a year off to work on something that probably wouldn't be important but would be really interesting, what would it be?
+
+The bus ticket theory also suggests a way to avoid slowing down as you get older. Perhaps the reason people have fewer new ideas as they get older is not simply that they're losing their edge. It may also be because once you become established, you can no longer mess about with irresponsible side projects the way you could when you were young and no one cared what you did.
+
+The solution to that is obvious: remain irresponsible. It will be hard, though, because the apparently random projects you take up to stave off decline will read to outsiders as evidence of it. And you yourself won't know for sure that they're wrong. But it will at least be more fun to work on what you want.
+
+It may even be that we can cultivate a habit of intellectual bus ticket collecting in kids. The usual plan in education is to start with a broad, shallow focus, then gradually become more specialized. But I've done the opposite with my kids. I know I can count on their school to handle the broad, shallow part, so I take them deep.
+
+When they get interested in something, however random, I encourage them to go preposterously, bus ticket collectorly, deep. I don't do this because of the bus ticket theory. I do it because I want them to feel the joy of learning, and they're never going to feel that about something I'm making them learn. It has to be something they're interested in. I'm just following the path of least resistance; depth is a byproduct. But if in trying to show them the joy of learning I also end up training them to go deep, so much the better.
+
+Will it have any effect? I have no idea. But that uncertainty may be the most interesting point of all. There is so much more to learn about how to do great work. As old as human civilization feels, it's really still very young if we haven't nailed something so basic. It's exciting to think there are still discoveries to make about discovery. If that's the sort of thing you're interested in.
+
+
+
+
+
+
+
+
+
+
+
+Notes
+
+[1] There are other types of collecting that illustrate this point better than bus tickets, but they're also more popular. It seemed just as well to use an inferior example rather than offend more people by telling them their hobby doesn't matter.
+
+[2] I worried a little about using the word "disinterested," since some people mistakenly believe it means not interested. But anyone who expects to be a genius will have to know the meaning of such a basic word, so I figure they may as well start now.
+
+[3] Think how often genius must have been nipped in the bud by people being told, or telling themselves, to stop messing about and be responsible. Ramanujan's mother was a huge enabler. Imagine if she hadn't been. Imagine if his parents had made him go out and get a job instead of sitting around at home doing math.
+
+On the other hand, anyone quoting the preceding paragraph to justify not getting a job is probably mistaken.
+
+[4] 1709 Darwin is to time what the Milanese Leonardo is to space.
+
+[5] "An infinite capacity for taking pains" is a paraphrase of what Carlyle wrote. What he wrote, in his History of Frederick the Great, was "... it is the fruit of 'genius' (which means transcendent capacity of taking trouble, first of all)...." Since the paraphrase seems the name of the idea at this point, I kept it.
+
+Carlyle's History was published in 1858. In 1785 Hérault de Séchelles quoted Buffon as saying "Le génie n'est qu'une plus grande aptitude à la patience." (Genius is only a greater aptitude for patience.)
+
+[6] Trollope was establishing the system of postal routes. He himself sensed the obsessiveness with which he pursued this goal.
+It is amusing to watch how a passion will grow upon a man. During those two years it was the ambition of my life to cover the country with rural letter-carriers.
+Even Newton occasionally sensed the degree of his obsessiveness. After computing pi to 15 digits, he wrote in a letter to a friend:
+I am ashamed to tell you to how many figures I carried these computations, having no other business at the time.
+Incidentally, Ramanujan was also a compulsive calculator. As Kanigel writes in his excellent biography:
+One Ramanujan scholar, B. M. Wilson, later told how Ramanujan's research into number theory was often "preceded by a table of numerical results, carried usually to a length from which most of us would shrink."
+[7] Working to understand the natural world counts as creating rather than consuming.
+
+Newton tripped over this distinction when he chose to work on theology. His beliefs did not allow him to see it, but chasing down paradoxes in nature is fruitful in a way that chasing down paradoxes in sacred texts is not.
+
+[8] How much of people's propensity to become interested in a topic is inborn? My experience so far suggests the answer is: most of it. Different kids get interested in different things, and it's hard to make a child interested in something they wouldn't otherwise be. Not in a way that sticks. The most you can do on behalf of a topic is to make sure it gets a fair showing — to make it clear to them, for example, that there's more to math than the dull drills they do in school. After that it's up to the child.
+
+
+
+Thanks to Marc Andreessen, Trevor Blackwell, Patrick Collison, Kevin Lacker, Jessica Livingston, Jackie McDonough, Robert Morris, Lisa Randall, Zak Stone, and my 7 year old for reading drafts of this.
\ No newline at end of file
diff --git a/examples/data/satya-nadella-build2024-keynote.txt b/examples/data/satya-nadella-build2024-keynote.txt
new file mode 100644
index 0000000..e7ee773
--- /dev/null
+++ b/examples/data/satya-nadella-build2024-keynote.txt
@@ -0,0 +1,350 @@
+05212024 Build Satya Nadella
+
+Microsoft Build 2024
+Satya Nadella, Chairman and CEO, Microsoft
+Tuesday, May 21, 2024
+ SATYA NADELLA: Good morning. It’s fantastic to be back here at Microsoft Build. Welcome to everyone here and joining us on the web. Developer conferences are always the most exciting and the most fun when there are these fundamental changes that you can sense in the air.
+ I’ve marked all my adult life by coming to PDCs and Builds for the last three decades. I still remember distinctly the first time Win32, which was I guess ‘91, .NET, Azure, right? These are moments that I’ve marked my life with, and it just feels like, yet again, we are at a moment like that. It’s just that the scale and the scope is so much more deeper and broader this time around. Every layer of the tech stack is changing. Everything from the power draw and the cooling layer of data centers to the NPUs at the edge, it is all being shaped by these new workloads, right, this distributed, synchronous data power and workloads, reshaping every layer of the tech stack.
+
+But if you think about, even going all the way back to the beginning of modern computing, let’s say 70 years ago, there have been two real dreams we’ve had. The first is can computers understand us instead of us having to understand computers? And second, in a world where we have these this ever increasing of people, places and things, right, and so as you digitize more artifacts on peoples, places and things, and you have more information, can computers help us reason, plan and act more effectively on all that information?
+
+Those are the two dreams that we’ve had for the last 70-plus years, and here we are. I think that we’ve had real breakthroughs on both fronts. The core underlying force is one of the questions I always ask myself, which is like, "OK, this is great, and this is maybe like the golden age of systems, but what’s really driving it?"
+
+I always come back to these scaling laws, and so just like Moore’s Law helped to drive the information revolution, the scaling laws of DNNs are really – along with the model architecture and the interesting ways to use data and generate data, it’s really driving this intelligence revolution. You could say Moore’s Law was probably more stable in the sense that it was scaling at maybe 15 months, 18 months. We now have these things that are scaling every six months or doubling every six months.
+
+Effectively, what we have, though, with these scaling laws is a new natural user interface that’s multi-modal. That means supports tech, speech, images and videos, and it has input and output. We have memory that retains important context, recalls both our personal knowledge and data across our apps and devices. We have new reasoning and planning capabilities that help us understand very complex context and complete complex tasks while reducing the cognitive load on us.
+
+But what stands out for me, as I look back at this past year, is how you all, as developers, have taken all of these capabilities and applied them, quite frankly, to change the world around us.
+I’ll always remember this moment in January 2023 when I met a rural Indian farmer who was able to reason over some government farm subsidies that he had heard about on television using GPT-3.5 and his voice. It was remarkable, right?
+
+For me, it just brought home the power of all of this because the frontier model developed in the West Coast of the United States just a few months earlier was used by a developer in India to directly improve the life of a rural Indian farmer. The rate of diffusion is unlike anything I’ve seen in my professional career, and it’s just increasing.
+
+In fact, earlier this month I was in Southeast Asia. I was in Thailand, where I met a developer and I was having a great roundtable, and he was talking to me about how he’s using Phi-3 and GPT-4, and he was using Phi-3 to just optimize all of the things that he was doing with RAG. I mean, this is crazy. I mean, this is unbelievable. It had just launched a few weeks earlier, and I was there in Thailand, in Bangkok, listening to a developer talk about this technology as a real expert on it.
+
+So it’s just great to see the democratization force which we love to talk about, but to witness it has just been something. And this is, quite frankly, is the impact and why we are in this industry. I would say it’s what gives us the deep meaning in our work.
+
+I want to start, though, with a very big thank you to every one of you how has really gone about bringing about this impact to the world. Thank you all so very much.
+
+When I think about the progress we have made since the last time we were here at Build, we’ve built really three platforms. The first is Microsoft Copilot, which is your everyday companion. It puts knowledge and expertise at your fingertips and helps you act on it. And we built the Copilot stack so that you can build your AI applications and solutions and experiences. And just yesterday we introduced a new category of Copilot+ PCs, the fastest AI-first PCs ever built.
+
+All three of these things are exciting platforms, but I want to start with Copilot+ PCs. We are exposing AI as a first class namespace for Windows. This week we are introducing the Windows Copilot Runtime to make Windows the best platform for you to be able to build your AI applications.
+
+What Win32 was to graphical user interface we believe the Windows Copilot Runtime will be for AI. It starts with our Windows Copilot Library. A collection of these ready-to-use local APIs that help you integrate into your new experiences all of the AI capabilities that we shared yesterday.
+
+Now, this includes no-code integration for Studio Effects, with things like creative filters, teleprompter, voice focus and much more, but of course, if you want to access these models themselves, you can directly call them through APIs. We have 40-plus models available out of the box, including Phi Silica, the newest member of our small language family models, which we specifically, which we specifically designed to run locally on your inputs on Copilot+ PCs, bringing that lightning fast local inference to the device.
+
+The other thing is that the Copilot Library also makes it easy for you to incorporate RAG inside of your applications on device data. It gives you the right tools to build a vector store within your app. It enables you to do that semantic search that you saw with Recall, but now, in your own application, you can construct these prompts using local data for RAG applications.
+
+Today, I’m so thrilled to announce, as well, that we will be natively supporting PyTorch and new WebNN framework through Windows through XML. Native PyTorch support means thousands of OSS models will just work out of the box on Windows, making it easy for you to get started.
+
+In fact, with WebNN, web developers finally have a web native machine learning framework that gives them direct access to both GPUs and NPUs. In fact, last night I was playing with it, turning it on in Edge and seeing the WebNN sample code running. It’s just so cool to see it now using the NPUs. Both PyTorch and WebNN are available in developer preview today.
+
+Let’s take a look.
+
+(Video segment.)
+
+These are just some of the many announcements today. We are introducing more than 50-plus new products and partnerships to create new opportunities for you. We’ve always been a platform company, and our goal is to build the most complete end-to-end stack, from infrastructure, to data, to the application extensibility so that you can apply the power of this technology to build your own applications.
+
+And so today I want to highlight our top news for this event across every layer of this Copilot stack. So let’s dive right in with infrastructure. We have the most complete, scalable AI infrastructure that meets your needs in this AI era. We’re building Azure as the world’s computer. We have the most comprehensive global infrastructure with more than 60-plus data center regions, more than any other cloud provider.
+
+Over the past year, we’ve expanded our data center regions and AI capacity from Japan to Mexico, from Spain to Wisconsin. We’re making a best-in-class AI infrastructure available everywhere, and we’re doing this with a focus on delivering cloud services sustainability. In fact, we’re on track to meet our goal to have our data centers powered by 100% renewable energy by next year.
+
+We are optimizing power and efficiency across every layer of the stack, from the data center to the network. Our latest data center designs are purpose built for these AI workloads so that we can effectively and responsibly use every megawatt of power to drive down the cost of AI and the power draw. We are incorporating advanced data center cooling techniques to fit the thermal profile of the workloads and match it to the environment in the location where it operates.
+
+At the silicon layer, we are dynamically able to map workloads to the best accelerated AI hardware so that we have the best performance, and our custom IO hardware and server designs allow us to provide dramatically faster networking, remote storage and local storage throughput. This end-to-end approach is really helping us get to the unprecedented scale.
+In fact, last November, we announced the most powerful AI simple computer in the cloud for training, using just actually a very small fraction of our cloud infrastructure. And over the past six months, we’ve added 30 times that supercomputing power to Azure. It’s crazy to see the scale.
+
+And of course, we’re not just scaling training our fleets. We’re scaling our inference fleet around the world, quadrupling the number of countries where Azure AI services are available today, and it’s great to see that.
+
+At the heart of our AI infrastructure are the world’s most advanced AI accelerators. We offer the most complete selection of AI accelerators, including from Nvidia and AMD, as well as our own Azure Maia, all dynamically optimized for the workloads. This means whether you’re using Microsoft Copilot or building your own Copilot apps, we ensure that you get the best accelerator performance at the best cost.
+
+For example, you will see this in what has happened with GPT-4.x It’s 12x cheaper and 6x faster since its launch, and that’s the type of progress you will continue to see as we evolve the system architecture. It all starts with this very deep, deep partnership with Nvidia, which spans the entirety of the Copilot stack, across both all of their hardware innovation as well as the system software innovation. Together, we offer Azure Confidential Computing on GPUs to really help you protect sensitive data around the AI models, end to end.
+
+In fact, we are bringing the latest H200s to Azure later this year, and we’ll be among the first cloud providers to offer Nvidia’s Blackwell GPU V100s as well as GB 200 configurations. And we are continuing to work with them to train and optimize both large language models like GPT-4o, as well as small language models like the Phi-3 family.
+
+Now, beyond the hardware, we are bringing Nvidia’s key enterprise platform offerings to our cloud like the Omniverse Cloud and DGX Cloud to Azure, with deep integration with the broader Microsoft Cloud.
+
+For example, Nvidia recently announced that their DGX Cloud integrates natively with Microsoft Fabric. That means you can train those models using DGX Cloud with the full access to Fabric data, and Omniverse APIs will be available first on Azure for developers to build their industrial AI solutions. We are also working with Nvidia, NIM industry-specific developer services and making them fantastic on Azure. And so a lot of exciting work with Nvidia.
+
+Now, coming to AMD, I am really excited to share that we are the first cloud to deliver the general availability of VMs AMD ND MI300X accelerator. It’s a big milestone for both AMD and Microsoft. We’ve been working at it for a while, and it’s great to see that today, as we speak, it offers the best price performance on GPT-4 inference. And we will continue to move forward with Azure Maia.
+
+In fact, our first cluster are live, and soon, if you are using Copilot or one of the Azure OpenAI services, some of your prompts will be served using Maia hardware.
+
+Now, beyond AI, our end-to-end systems optimization also makes cloud native apps and the development of cloud native apps better. Six months ago is when we announced our first general purpose ARM-based compute processor, Microsoft Cobalt. And today, I am really excited to announce the public preview of Cobalt-based VMs.
+
+Cobalt is being used for video processing and permissions management in Microsoft 365, helping power billions of conversations on services like Microsoft Teams already. We are delivering that same ARM-based performance and efficiencies to many customers, in fact, including Elastic and MongoDB. Siemens, Snowflake and Teradata.
+
+In our most recent benchmark data and tests, our Cobalt 100 VMs delivered up to 40% better performance than any other generally available ARM based VM. We are very, very excited about Cobalt getting into the market.
+
+Now, let’s move up the stack to the foundation models. With Azure AI, we offer the broadest selection of frontier and open-source models, including LLMs and SLMs so you can choose the model that makes the most sense for your unique needs and your application needs.
+
+In fact, more than 50,000 organizations use Azure AI today, and that is great momentum. It all starts, though, with our most strategic and most important partnership with OpenAI. Just last week, OpenAI announced GPT-4.0, their latest multimodal model, which was trained on Azure. It’s an absolute breakthrough.
+
+It has text, audio, image and video as input and output. It can respond and just have a human-like conversation that’s fast and fluid. It can even be interrupted mid-sentence. PGPT-4o is also the top performing model on benchmarks across a variety of modalities, and it’s always going to get you to the best answer.
+
+It has state-of-the-art performance and understanding the information you actually provide in your prompt, because that’s really what matters. What OpenAI I demoed last week, I mean, that was just pretty magical stuff. And yesterday we showed how Copilot will leverage PGPT-4o.
+
+In the video I’m going to play, notice that you can share your screen or session as a prompt in Copilot and have Copilot help you with whatever it is that you are doing.
+
+Let’s role the video.
+
+(Video segment begins.)
+
+RYAN: Hey, Copilot how’s it going?
+
+COPILOT: Hey, I’m doing great. And it looks like you’re about to dive into some Minecraft. Ready to build, explorer and maybe dive a few mobs?
+
+RYAN: Yeah, I’m trying to figure out how to make a sword. My son plays this game all the time and I have no idea what I’m doing.
+
+COPILOT: No worries. I’m here to help you impress your son. I see you’ve got a crafting table in front of you, which is a great start. To craft a sword, you’ll need some materials. Can you open your inventory? Just press E on your keyboard.
+
+RYAN: Yeah, do I have what I need?
+
+COPILOT: Let’s see. You’ve got some sticks. Perfect for the swords handle. But it looks like you’re missing the material for the blade. You can use wood, stone, iron, gold or diamond. Let’s start by gathering some wood or stone. Shall we head out and find some resources?
+
+RYAN: Yeah, that sounds great. Oh my God, what’s that?
+
+COPILOT: Oh no, it’s a zombie. Run! You need to get away from it fast. Either sprint away or quickly build a tower block straight up to get out of its reach. If you can find a shelter or dig into the side of a hill, do it now. Just make sure it can’t reach you.
+
+RYAN: All right, I think I might be safe now.
+
+COPILOT: Whew, that was a close one. Great job finding shelter.
+
+(End of demo experience.)
+
+SATYA NADELLA: It’s really cool to see that. The magic here is that Copilot is your personal assistant that’s helping you with any task, helping you stay, quite frankly, in the flow in the moment, whether it’s a game, a document or a line of code. As OpenAI innovates, our promise is that we will bring all that innovation to Azure too.
+
+In fact, the same day that OpenAI announced PGPT-4o, we made the model available for testing on Azure OpenAI service. And today, I’m excited to say that it’s generally available on Azure AI.
+
+What this means, though, is that we can now have these groundbreaking apps that all of you can build using this capability. One of the coolest things is that now, any app or any website, can essentially be turned into a full, multi-modal, full duplex conversational canvas.
+
+Let’s take a look.
+
+(Start video segment.)
+
+SETH: All right, let me ask the agent here.
+
+AGENT: Hey, Seth. How’s it going? What can I help you with today?
+
+SETH: I am fantastic. I’m sorry I am in such a hurry. I’m going to go on an overnight camping hike. Where we’re camping, it’s going to be dark and cold. Should I use these shoes? It’s the mountains, and it’s probably going to be cold.
+
+AGENT: Hey. No worries. Let me take a look. Ah. Those are the Treksta hiking sandals. While they’re fantastic for summer hikes and keeping your feet cool, they might not be the best choice for cold mountains –
+
+SETH: OK, I’m sorry to interrupt. Can you pick the best thing that I should use? And can you just add it to my cart? Just one product.
+
+AGENT: No worries. Let’s go with the Trek-ready hiking boots. They’re durable, provide great ankle support, and most importantly, they’ll keep your feet warm in the cold mountains. I’ll add them to your cart now. Ready for checkout?
+
+SETH: Oh, I see that amazing, muchísimas gracias (continuing in Spanish.)
+
+AGENT: De Nada, Seth (continuing in Spanish).
+.
+(End of video segment.)
+
+SATYA NADELLA: Seth was in a hurry, but there you go, PGPT-4o kept up with it. It’s incredible, and so I really want to thank the OpenAI team for their partnership and really the responsible approach to innovation, helping our industry move forward.
+
+In fact, Sam will be here, joining Kevin in a little bit to talk a lot more about what’s coming, because that’s the exciting stuff, how do you all sample what comes next? We are also bringing lots and lots of other models in as well, from Cohere, Databricks, Deci, Meta, Mistral and Snowflake, all to Azure AI.
+
+We want to support the broadest set of models from every country, every language. I’m excited to announce, in fact, we’re bringing models from Cohere, G42, NTT Data, Nixla, as well as many more, as models as services, because that’s the way you can easily get to manage AI models. And we all love open source too.
+
+In fact, two years ago at Build, we were the first to partner with Hugging Face, making it simple for you to access the leading open-source library with state-of-the art language models via Azure AI. And today, I’m really excited to announce that we’re expanding our partnership, bringing more models from Hugging Face with text generation inference and with text embedding inference directly into Azure AI Studio.
+
+And we are not stopping there. We are adding not just large language models, but we are also leading the small language model revolution. Our Phi-3 family of SLMs are the most capable and most cost effective. They outperform models of the same size or the next size up, even across a variety of language, reasoning, coding, as well as math benchmarks.
+
+If you think about it, by performance to parameter count ratio, it’s truly best in class. And today we are adding new models to the Phi-3 family to add even more flexibility across that quality cost curve. We’re introducing Phi-3 Vision, a 4.2 billion parameter multi-modal model with language and vision capabilities. It can be used to reason over real-world images or generate insights and answer questions about images, as you can see right here.
+
+And we’re also making a 7 billion parameter Phi-3 small and a 14 billion parameter five three medium model available. With Phi, you can build apps that span the web, Android, iOS, Windows and the edge. They can take advantage of local hardware when available and fall back on the cloud when not, really simplifying all of what our VS developers have to do to support multiple platforms using one AI model.
+
+Now, it’s just awesome to see how many developers are already using Phi-3 to do incredible things. From Amity Solutions, the Thai company that I mentioned earlier, the ICC, which has built a Copilot for Indian farmers to ask questions about their crops, Epic in healthcare, which is now using Phi to summarize complex patient histories more quickly and efficiently. And another very, very cool use cases in education.
+
+Today, I’m very thrilled to announce a new partnership with Khan Academy. We’ll be working together to use Phi-3 to make math tutoring more accessible. And I’m also excited to share that they’ll be making Khanmigo, their AI assistant, free to all U.S. teachers. Let’s roll the video here already.
+
+(Applause.)
+
+(Begin video segment.)
+
+TEACHER: I felt like I was in a place in my teaching career, where I felt like I was kind of losing my sparkle. And I would just feel really defeated when I looked out on the classroom and I would see students that just didn’t look engaged.
+
+SPEAKER: Teachers have an incredibly hard job, and what we think we can do is leverage technology to take some of the stuff off of their plate to really, actually humanize the classroom.
+
+TEACHER: By some miracle, we became a Khanmigo pilot school.
+
+SAL KHAN: With new advances in generative AI, we launched Khanmigo. The point is to be that personalized tutor for every student, and to be a teaching assistant for every teacher.
+
+TEACHER: I started to build these more robust lessons, and I started to see my students engage.
+
+SPEAKER: We’re working with Microsoft on these Phi models that are specifically tuned for math tutoring. If we can make a small language model like Phi work really well in that use case, then we would like to kind of shift the traffic to Phi in those particular scenarios. Using a small language model, the cost is a lot lower.
+
+SAL KHAN: We’re really excited that Khanmigo, and especially in the partnership with Microsoft, being able to give these teacher tools for free to U.S. teachers is going to make a dramatic impact in U.S. education.
+
+TEACHER: I think we’re going to make them the innovators, the questioners. Isn’t that really just why you wake up every morning, because that’s our future, our next generation? And to me, that’s everything.
+
+(End video segment.)
+
+(Applause.)
+
+SATYA NADELLA: I’m super excited to see the impact this all will have and what Khan Academy will do. And Sal is going to, in fact, join Kevin soon to share more. And I’m really thankful for teachers like Melissa and everything that they do. Thank you very much.
+
+Of course, it’s about more than just models. It’s about the tools you need to build these experiences. With Azure AI Studio, we provide an end-to-end tooling solution to develop and safeguard the Copilot apps you build. We also provide tooling and guidance to evaluate your AI models and applications for performance and quality, which is one of the most important tasks, as you can imagine, with all these models. And I’m excited to announce that Azure AI Studio now is generally available.
+
+(Applause.)
+
+It’s an end-to-end development environment to build, train and finetune AI models and do so responsibly. It includes built-in support of what is perhaps the most important feature, which is in this age of AI, which is AI safety. Azure AI Studio includes the state of the art safety tooling to everything from detecting hallucinations in model outputs, risk and safety monitoring. It helps understand which inputs and outputs are triggering content filters, prompts shields, by the way, to detect and block these prompt injection attacks.
+
+And so, today we are adding new capabilities, including custom categories, so that you can create these unique filters for prompts and completions with rapid deployment options, which I think is super important as you deploy these models into the real world, if an emerging threat appears.
+
+Beyond Azure AI Studio, we recognize that there are advanced applications, where you need much more customization of these models for very specific use cases. And today, I’m really excited to announce that Azure AI custom models will come, giving you the ability to train a custom model that’s unique to your domain, to your data, that’s perhaps proprietary.
+
+The same builders and data scientists, who have been working with OpenAI brought all the Phi advances to you, will work with all of you to be able to build out these custom models. The output will be domain specific. It’ll be multitask and multi-modal, best in class as defined by benchmarks, including perhaps even specific language proficiency that may be required.
+
+Now, let’s just go up the stack to data. Ultimately, in order to train, finetune, ground your models, you need your data to be in its best shape. And to do so, we are building out the full data estate, from operational stores to analytics in Azure. We’ve also added AI capabilities to all of our operational stores, whether it’s Cosmos DB or SQL or PostgreSQL.
+
+At the core, though, of the Intelligent Data platform is Microsoft Fabric. We now have over 11,000 customers, including leaders in every industry, who are using Fabric. It’s fantastic to see the progress.
+
+(Applause.)
+
+With Fabric, you get everything you need in a single, integrated SaaS platform. It’s deeply integrated at its most fundamental level with compute and storage being unified. Your experience is unified, governance is unified, and more importantly, the business model is unified.
+
+And what’s also great about Fabric is that it works with data anywhere, not just on Azure, but it can be on AWS or on GCP or even on your on-premise datacenter. And today, we are taking the next step. We’re introducing real-time intelligence in Fabric.
+
+(Applause.)
+
+Customers today have more and more of this real-time data coming from your IoT systems, your telemetry systems. In fact, cloud applications themselves are generating lots of data, but with Fabric, anyone can unlock actionable insights across all of your data estate. Let’s take a look.
+
+(Begin video segment.)
+
+VOICEOVER: Introducing real-time intelligence in Microsoft Fabric, an end-to-end solution empowering you to get instant, actionable insights on streaming data. At its heart lies a central place to discover, manage and consume event data across your entire organization with a rich, governed experience.
+
+Get started quickly by bringing in data from Microsoft sources and across clouds with a variety of out-of-the-box connectors. Route the relevant data to the right destination in Fabric using a simple drag-and-drop experience. Explore insights on petabytes of streaming data with just a few clicks.
+
+Elevate your analysis by harnessing the intelligence of Copilot in Microsoft Fabric, using simple, natural language. Make efficient business decisions in the moment with real time, actionable insights, and respond to changing landscapes proactively. Allow users to monitor the data they care about, detect changing patterns, and set alerts or actions that drive business value.
+
+All your data, all your teams, all in one place, this is Microsoft Fabric.
+
+(End video segment.)
+
+(Applause.)
+
+SATYA NADELLA: And we’re making it even easier to design, build and interoperate with Fabric with your own applications. In fact, we’re building out a new app platform with Fabric Workload Development kit so that people like Esri, for example, who have integrated their spatial analytics with Fabric, so that customers can generate insights from their own location data using Azure’s rich tools and libraries, right on Fabric. This is just exciting to see. It’s the first time you know where the analytics stack is really a first class app platform as well.
+
+(Applause.)
+
+And beyond Fabric, we’re integrating the power of AI across the entirety of the data stack. There’s no question that RAG is core to any AI-powered application, especially in the enterprise today. And Azure AI Search makes it possible to run RAG at any scale, delivering very highly accurate responses using the state of the art retrieval systems. In fact, ChatGPT supports, for GPTs, data assistants API, are all powered by Azure AI Search today.
+
+And with built-in OneLake integration, Azure AI Search will automatically index your unstructured data, too. And it’s also integrated into Azure AI Studio to support bringing your own embedding model, for example. And so, it’s pretty incredible to see Azure Search grow over the last year into that very core developer service.
+
+Now, let’s go up to developer tools. Nearly 50 years after our founding as a developer tools company, here we are, once again redefining software development. GitHub Copilot was the first, I would say, hit product of this generative AI age. And it’s the most widely adopted AI developer tools, 1.8 million subs across 50,000 organizations are using it.
+
+(Applause.)
+
+And with GitHub Copilot, we are empowering every developer on the planet to be able to access programing languages and programing knowledge in their own native language. Think about that. Any person can start programing, whether it’s in Hindi or Brazilian Portuguese, and bring back the joy of coding to their native language.
+
+And with Copilot Workspace, staying in your flow has never been easier. We are an order of magnitude closer to a world where any person can go from idea to code in an instant. You start with an issue. It creates a spec based on its deep understanding of your code base. It then creates a plan, which you can execute to generate the code across the full repo, that is, multiple files.
+
+At every point in this process, from the issue to spec to plan to code, you are in control. You can edit it. And that’s really what is fundamentally a new way of building software. And we are looking forward to making it much more broadly available in the coming months.
+
+And today, we are taking one more big leap forward. We are bridging the broader developer tools and services ecosystem with Copilot for the first time. We’re really thrilled to be announcing GitHub Copilot Extensions.
+
+(Applause.)
+
+Now, you can customize GitHub Copilot with capabilities from third-party services, whether it’s Docker, Sentry and many, many more. And, of course, we have a new extension for Azure, too, GitHub Copilot for Azure. You can instantly deploy to Azure to get information about your Azure resources, just using natural language. And what Copilot did for coding we are now doing for infra and ops.
+
+To show you all this in action, here is Neha from our GitHub team. Neha, take it away.
+
+(Applause.)
+
+NEHA BATRA: Thanks, Satya.
+
+GitHub Copilot gives you suggestions in your favorite editor, like here, where I’m writing unit tests. Copilot is great at meeting you where you’re at, regardless of the language you’re most comfortable with.
+
+Let’s ask for something simple like how to write a prime number test in Java, but let’s converse in Spanish using my voice.
+
+(Global language.)
+
+Look at that. Gracias, Copilot.
+
+Copilot is great at turning natural language into code and back again, but what about beyond the code? With the new GitHub Copilot Extensions, you can now bring the context from your connected systems to you.
+
+Now, I can ask Azure where my app is deployed. I could ask what my available Azure resources are, or I could diagnose issues with my environment.
+
+And this isn’t just for Azure. As Satya announced, any developer can now create extensions for GitHub Copilot, and that includes any tool in your stack, including your in-house tools, keeping you in the flow across your entire day.
+
+Actually, 75% of a developer’s day is spent outside of coding, gathering requirements, writing specifications and creating plans. Let’s show how GitHub Copilot can help with that, live on stage for the first time.
+
+Typically, my day starts by looking at GitHub issues. Looks like we want to support a rich text input for our product description. Let’s open Workspace and get some help with that.
+
+Copilot interprets the intent of the issue to see what’s required, and it then looks across the entire codebase, and it proposes what changes should be made. This specification is fully editable and the whole process is iterative.
+
+But actually, this looks pretty good. Copilot can now help us build a plan on how to implement this change.
+
+All right, that’s a great start, but we must not forget about our documentation. Let’s edit the plan and have Copilot update our readme.
+
+And then we can even get Copilot’s help and starting to implement the code for us.
+
+Now, this was just a simple example, but in a large enterprise code base, there are tens of thousands of files and dozens of stakeholders involved. And that means meetings, so many meetings. Workspace helps you focus on what you need to change. And, by the way, as a developer, I’m always in control. I can see exactly what changes Copilot is proposing, and I can even get a live preview.
+
+All right, let’s test out the input.
+
+All right, this looks great. I can go back, and I can edit my code in VS Code, or I can submit these changes as a pull request to share with my team.
+
+GitHub Copilot, Copilot Extensions and Copilot Workspace help you stay focused on solving problems and keeping you in the flow.
+
+Back to you, Satya.
+
+(Applause.)
+
+SATYA NADELLA: Thank you so much, Neha. I mean, I’ll tell you, GitHub Copilot and everything that that ecosystem is doing is just bringing back a lot of fun and a lot of joy back to coding. And really, the thing about staying in that flow is, I think, what we all have dreamt for and dreamt about, and it’s coming back.
+
+That brings us to the very top of the stack, Microsoft Copilot. We built Copilot so that you have the ability to tap into the world’s knowledge, as well as the knowledge inside of your organization and act on it.
+
+Now, Copilot has had a remarkable impact. It’s democratizing expertise across organizations. It’s having a real cascading effect. In fact, it reminds me of the very beginning of the PC era, where work, the work artifact and the workflow were all changing. And it’s just really having broad enterprise business process impact. I always say that it’s lowering both the floor and raising the ceiling at the same time for anything any one of us can do.
+
+Since no two business processes are the same, with Copilot Studio, you now can extend Copilot to be able to customize it for your business processes and workflows. Today, we are introducing Copilot Connectors in Copilot Studio, so you can ground Copilot with data from across the Graph, from Power Platform, Fabric, Dataverse, as well as you now have all the third-party connectors for SaaS applications, from Adobe, Atlassian, ServiceNow, Snowflake and many, many more.
+
+This makes the process of grounding Copilot in first and third-party line of business data just a wizard-like experience, enabling you to quickly incorporate your own organizational knowledge and data.
+
+We’re also extending Copilot beyond a personal assistant to become a team assistant. I’m thrilled today to announce Team Copilot.
+
+(Applause.)
+
+You’ll be able to invoke a Team Copilot wherever you collaborate in Teams. It can be in Teams, it can be in Loop, it can be in Planner and many, many other places. I mean, think about it. It can be your meeting facilitator when you’re in Teams, creating agendas, tracking time, taking notes for you, or a collaborator, writing chats, surfacing the most important information, tracking action items, addressing unresolved issues. And it can even be your project manager, ensuring that every project that you’re working on as a team is running smoothly.
+
+These capabilities will all come to you all and be available in preview later this year. And we’re not stopping there.
+
+With Copilot Studio, anyone can build Copilots that have agent capabilities, and work on your behalf, and independently and proactively orchestrate tasks for you. Now, simply provide your Copilot a job description, or choose from one of our pre-made templates and equip it with the necessary knowledge and actions, and Copilot will work in the background and act asynchronously for you. That’s, I think, one of the key things that’s going to really change in the next year, where you’re going to have Copilot plus agents with this async behavior.
+
+You can delegate authority to Copilots to automate long-running business processes. Copilot can even ask for help when it encounters situations that it does not know much about and it can’t handle. And to show you all of this, let’s roll the video.
+
+(Begin video segment.)
+
+VOICEOVER: Redefine business processes with Copilot Studio. Create Copilots that act as agents, working independently for you.
+
+Simply describe what you want your Copilot to do. Easily configure your Copilot with the details and needs, like instructions, triggers, knowledge and actions.
+
+Quickly test your Copilot before you deploy, and seamlessly publish across multiple channels.
+
+Watch it use memory for context, reason over user input and manage long-running tasks.
+
+Copilot can learn from feedback to improve.
+
+And you’re always in control.
+
+Put Copilot to work for you with Copilot Studio.
+
+(End video segment.)
+
+(Applause.)
+
+SATYA NADELLA: All around this stack is perhaps one of the most important things that we, at Microsoft, are doing, which is wrapping it with robust security. Security underlies our approach with Copilot, Copilot+ PCs, Copilot Stack. We’re committed to our Secure Future Initiative. You can see, you’ll see us make rapid progress across each of the six pillars of SFI, and the core design principles, which is secure by design, secure by default and secure operations. You’ll hear, throughout this conference, in fact, a lot more in Scott’s keynote tomorrow, how it underlies everything that we build and everything that we do.
+
+Coming to the close, there are many announcements that you will hear about at Build, but I want to go back to, I think, the core of what I think why we chose to be in this industry and why we come to work every day as developers, which is the mission, ultimately, of empowering every person and every organization. At the end of the day, it’s not about innovation that is only useful for a few. It’s about really being able to empower that everyone. And it comes down to you all as developers and builders of this new world.
+
+For us, it’s never, never about celebrating tech for tech’s sake. It’s about celebrating what we can do with technology to create magical experiences that make a real difference in our countries, in our companies, in our communities. Already, this new generation of AI is having an incredible impact, thanks to all of you, the passion you bring and the hard work you put in. And I want to leave you with this one unbelievable example of how you’re all building a more accessible world, which means a lot to me, using our platform and tools.
+
+Thank you all so very much. Enjoy the rest of Build.
+
+(Applause.)
+
+END
+
+
+
diff --git a/examples/generate_object_example.py b/examples/generate_object_example.py
new file mode 100644
index 0000000..a1f5aaa
--- /dev/null
+++ b/examples/generate_object_example.py
@@ -0,0 +1,14 @@
+import asyncio
+from core.generate_object_agent import GenerateObjectAgent
+
+async def run_generate_object_example():
+ description = "A machine that can sort fruits."
+ goal = "Generate a high-level design of the machine."
+ agent = GenerateObjectAgent(object_description=description, goal=goal)
+ generated_object = await agent.generate_object()
+
+ print("Object description:", description)
+ print("Generated object:", generated_object)
+
+if __name__ == "__main__":
+ asyncio.run(run_generate_object_example())
\ No newline at end of file
diff --git a/examples/grounded_answer_example.py b/examples/grounded_answer_example.py
new file mode 100644
index 0000000..181c991
--- /dev/null
+++ b/examples/grounded_answer_example.py
@@ -0,0 +1,15 @@
+import asyncio
+from core.grounded_answer_agent import GroundedAnswerAgent
+
+async def run_grounded_answer_example():
+ question = "What is the capital of France?"
+ context = "France is a country in Western Europe. Paris is its capital and largest city."
+ instructions = "Ensure the answer is grounded only in the provided context."
+ agent = GroundedAnswerAgent(question=question, context=context, instructions=instructions)
+ result = await agent.answer()
+
+ print("Question:", question)
+ print("Result:", result)
+
+if __name__ == "__main__":
+ asyncio.run(run_grounded_answer_example())
\ No newline at end of file
diff --git a/examples/map_list_example.py b/examples/map_list_example.py
new file mode 100644
index 0000000..9dc3fe5
--- /dev/null
+++ b/examples/map_list_example.py
@@ -0,0 +1,14 @@
+import asyncio
+from core.map_list_agent import MapListAgent
+
+async def run_map_list_example():
+ items_to_map = ['Apple', 'Banana', 'Carrot']
+ transformation = 'Convert all items to uppercase'
+ agent = MapListAgent(list_to_map=items_to_map, transformation=transformation)
+ transformed_items = await agent.map_list()
+
+ print("Original list:", items_to_map)
+ print("Transformed list:", transformed_items)
+
+if __name__ == "__main__":
+ asyncio.run(run_map_list_example())
\ No newline at end of file
diff --git a/examples/project_list_example.py b/examples/project_list_example.py
new file mode 100644
index 0000000..27aab4d
--- /dev/null
+++ b/examples/project_list_example.py
@@ -0,0 +1,14 @@
+import asyncio
+from core.project_list_agent import ProjectListAgent
+
+async def run_project_list_example():
+ items_to_project = ['Apple', 'Banana', 'Carrot']
+ projection_rule = 'Project these items as their vitamin content'
+ agent = ProjectListAgent(list_to_project=items_to_project, projection_rule=projection_rule)
+ projected_items = await agent.project_list()
+
+ print("Original list:", items_to_project)
+ print("Projected results:", projected_items)
+
+if __name__ == "__main__":
+ asyncio.run(run_project_list_example())
\ No newline at end of file
diff --git a/examples/reduce_list_example.py b/examples/reduce_list_example.py
new file mode 100644
index 0000000..049f339
--- /dev/null
+++ b/examples/reduce_list_example.py
@@ -0,0 +1,14 @@
+import asyncio
+from core.reduce_list_agent import ReduceListAgent
+
+async def run_reduce_list_example():
+ items_to_reduce = ['Banana', 'Apple', 'Carrot']
+ reduction_goal = 'Reduce these items to a single word representing their nutritional value'
+ agent = ReduceListAgent(list_to_reduce=items_to_reduce, reduction_goal=reduction_goal)
+ reduced_items = await agent.reduce_list()
+
+ print("Original list:", items_to_reduce)
+ print("Reduced results:", reduced_items)
+
+if __name__ == "__main__":
+ asyncio.run(run_reduce_list_example())
\ No newline at end of file
diff --git a/examples/summarize_list_example.py b/examples/summarize_list_example.py
new file mode 100644
index 0000000..f96b16a
--- /dev/null
+++ b/examples/summarize_list_example.py
@@ -0,0 +1,13 @@
+import asyncio
+from core.summarize_list_agent import SummarizeListAgent
+
+async def run_summarize_list_example():
+ items_to_summarize = ['The quick brown fox jumps over the lazy dog.', 'Python is a popular programming language.']
+ agent = SummarizeListAgent(list_to_summarize=items_to_summarize)
+ summaries = await agent.summarize_list()
+
+ print("Original list:", items_to_summarize)
+ print("Summarized results:", summaries)
+
+if __name__ == "__main__":
+ asyncio.run(run_summarize_list_example())
\ No newline at end of file
diff --git a/src/core/binary_classify_list_agent.py b/src/core/binary_classify_list_agent.py
new file mode 100644
index 0000000..87bbc6b
--- /dev/null
+++ b/src/core/binary_classify_list_agent.py
@@ -0,0 +1,36 @@
+import asyncio
+from typing import List, Dict
+from .openai_api import OpenAIClient
+from .logging import Logger # Using correct logging abstraction
+
+class BinaryClassifyListAgent:
+ def __init__(self, list_to_classify: List[str], criteria: str, max_tokens: int = 1000, temperature: float = 0.0):
+ self.list_to_classify = list_to_classify
+ self.criteria = criteria
+ self.max_tokens = max_tokens
+ self.temperature = temperature
+ self.openai_client = OpenAIClient()
+ self.logger = Logger()
+
+ async def classify_list(self) -> List[Dict]:
+ tasks = []
+ for item in self.list_to_classify:
+ user_prompt = f"Based on the following criteria '{self.criteria}', classify the item '{item}' as true or false."
+ tasks.append(self.classify_item(user_prompt))
+
+ results = await asyncio.gather(*tasks)
+ return results
+
+ async def classify_item(self, user_prompt: str) -> Dict:
+ system_prompt = "You are an assistant tasked with binary classification of items."
+
+ self.logger.info(f"Classifying item: {user_prompt}") # Logging the classification request
+
+ response = await self.openai_client.complete_chat([
+ {"role": "system", "content": system_prompt},
+ {"role": "user", "content": user_prompt}
+ ], max_tokens=self.max_tokens)
+
+ self.logger.info(f"Received response for item: {user_prompt} -> {response.strip()}") # Logging the response
+
+ return {"item": user_prompt, "classification": response.strip()}
diff --git a/src/core/classify_list_agent.py b/src/core/classify_list_agent.py
new file mode 100644
index 0000000..a46799c
--- /dev/null
+++ b/src/core/classify_list_agent.py
@@ -0,0 +1,28 @@
+import asyncio
+from typing import List, Dict
+from .openai_api import OpenAIClient
+
+class ClassifyListAgent:
+ def __init__(self, list_to_classify: List[str], classification_criteria: str, max_tokens: int = 1000):
+ self.list_to_classify = list_to_classify
+ self.classification_criteria = classification_criteria
+ self.max_tokens = max_tokens
+ self.openai_client = OpenAIClient()
+
+ async def classify_list(self) -> List[Dict]:
+ tasks = []
+ for item in self.list_to_classify:
+ user_prompt = f"Classify the item '{item}' according to the following criteria: {self.classification_criteria}."
+ tasks.append(self.classify_item(user_prompt))
+
+ results = await asyncio.gather(*tasks)
+ return results
+
+ async def classify_item(self, user_prompt: str) -> Dict:
+ system_prompt = f"You are an assistant tasked with classifying items based on the given criteria."
+ response = await self.openai_client.complete_chat([
+ {"role": "system", "content": system_prompt},
+ {"role": "user", "content": user_prompt}
+ ], max_tokens=self.max_tokens)
+
+ return {"item": user_prompt, "classification": response.strip()}
diff --git a/src/core/generate_object_agent.py b/src/core/generate_object_agent.py
new file mode 100644
index 0000000..52c61df
--- /dev/null
+++ b/src/core/generate_object_agent.py
@@ -0,0 +1,21 @@
+import asyncio
+from typing import Dict
+from .openai_api import OpenAIClient
+
+class GenerateObjectAgent:
+ def __init__(self, object_description: str, goal: str, max_tokens: int = 1000):
+ self.object_description = object_description
+ self.goal = goal
+ self.max_tokens = max_tokens
+ self.openai_client = OpenAIClient()
+
+ async def generate_object(self) -> Dict:
+ system_prompt = f"You are an assistant tasked with generating objects based on a given description. The goal is: {self.goal}."
+ user_prompt = f"Generate an object based on the following description: {self.object_description}."
+
+ response = await self.openai_client.complete_chat([
+ {"role": "system", "content": system_prompt},
+ {"role": "user", "content": user_prompt}
+ ], max_tokens=self.max_tokens)
+
+ return {"object_description": self.object_description, "generated_object": response.strip()}
diff --git a/src/core/grounded_answer_agent.py b/src/core/grounded_answer_agent.py
new file mode 100644
index 0000000..7763b52
--- /dev/null
+++ b/src/core/grounded_answer_agent.py
@@ -0,0 +1,47 @@
+import asyncio
+import json
+import jsonschema
+from typing import Dict
+from .openai_api import OpenAIClient
+
+class GroundedAnswerAgent:
+ def __init__(self, question: str, context: str, instructions: str = '', max_tokens: int = 1000):
+ self.question = question
+ self.context = context
+ self.instructions = instructions
+ self.max_tokens = max_tokens
+ self.openai_client = OpenAIClient()
+
+ # JSON schema for validation
+ schema = {
+ "type": "object",
+ "properties": {
+ "explanation": {"type": "string"},
+ "answer": {"type": "string"}
+ },
+ "required": ["explanation", "answer"],
+ "additionalProperties": False
+ }
+
+ async def answer(self) -> Dict:
+ return await self.grounded_answer()
+
+ async def grounded_answer(self) -> Dict:
+ system_prompt = f"\n{self.context}\n\n\nBase your answer only on the information provided in the above .\nReturn your answer using the JSON