Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

example use case for executable command + vcr #154

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 97 additions & 0 deletions mob-sessions-retros/2023-12-17 Executable VCR.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# Todos

- [ ] example use case for vcr-like functionality, for instance **prompt an llm to rename variable**
- expected behavior is:
- input: prompt (`"suggest a better name for var X in the following code"`)
- please rename the name foo to smthg better

---

```
function foo(a,b) {
return a +b
}
```
- output:
- [ ] build it
- [ ]


Events:
Nitsan:
- Said hello
- Decided to find a work item
- wanted to release for the person who asked for it in our github issues and couldn't because we are not Llewellyn
- Dropped that
- Nitsan suggested ideas from his own private todo list
- Picked the VCR thing
- Made a TODO
- Quickly went to writing a pseudo-code test for the purpose of documenting the intention
- The pseudo-code became code.
- It wasn't perfect, but the pseudo-code became code
- Approvaltests got us so far without any modifications
Susan:
- Made a barebones todo list
- Went into the specifics on the first item in the todo list
- Wrote the test cases we wanted to test:
- That we sent the correct command and that we receive the expected output
- Skipped the second failing test so we can focus on the first test
- Built the first test
- Made the CallLLM class
- put in the command
- added the code to call the OpenAI API
- Modified the prompt so that the response was in a format we like

Joss:
- Started by making our goals for the session
- decided on trying to make a vcr test or something that would mimic the functionality of a vcr test
- Goal was to take the output of an LLM and parse it so that we could test only the useful bits of what it was giving us
- did coding
- Swapped 5 minutes to 3 minutes
- Good that we took the time to make sure we understood what we were doing and how to do it
- Did that instead of doing work on other people's turns
-
Jacqueline:
- 10m waiting for everybody, saw Nitsan's new cam
- picked VCR tests with executable commands (wasn't sure how we might build this - so thought of a use case - prompting chatGPT for suggesting renames in code)
- started a TODO list
- Felt that not everybody understands - asked Nitsan to help with explaining
- Nitsan started walking though pseudo-code, that later became code; I would have picked drawing
- There were interesting points: leaning on the ensemble to move forward, even when not knowing exactly what to do - e.g. "prompt here"
- Felling like things were slow - asked for faster rotations - 5m -> 3m
- We got through impl an ExecutableCommand - piecing pieces together from the team

Interesting:

- different interpretations: 1. Lc: just start writing code, move forward vs. 2. wait and explain for better understanding
- Jo: Doing **is** Learning
- Jo: Since we're not working on production code - this helps in focusing on hands-on learning
- Jc: Expect code to start off imperfect, that's what refactoring is for
- Agree on this: instead of talking, write code - bias to action
- Also - expect less from the code at first, doesn't need to be perfect
- Pseudo-code for shared understanding - lowers the expectation for perfection
- Having a variety of perspectives about the events, we're focusing on different aspects of the contributing process. E.g. Susan broke down the structural pieces of what we'e built - te specifics of what we did, and in contrast Joss said "did coding", focusing more about the "Why?" (not the "What?")

Strong Emotions

- I rarely understand at the beginning - but, we can figure it out as an ensemble as we go; when we get to the end, it makes better sense; can still make progress as a Talker by focusing on the intent of the previous Talker.
- Used to not knowing anything; This is a safe place

- Nitsan Enjoys sessions where we get on the same page. This will make the next session faster. Now we all understand the goal.

- Felt like the tooling got in the way when working on approvaltests in gitpod. We don't know who will see editors that pop up. Didn't have intellisense or copilot.
- We can make this better, but

- Having timer be our next role is helpful
- Having the typist start the timer is more transparent
- Less requirements on norms

- Worry - We've added an openai dependency that won't work on CI. Need an openai api key to work
- Can be fixed easily

-

Changes:

- Bias to Action: just start writing the code, BUT we don't expect this code to be good; as Arlo says "perfect is the enemy of good, but good is the enemy of slightly less terrible"; so let's aim for terrible, but it'll show us the direction to go forward
- Experiment: having the typist start the timer
53 changes: 53 additions & 0 deletions tests/executable_commands/test_llm_output.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
from approvaltests import verify, verify_executable_command
import pytest

from approval_utilities.approvaltests.core.executable_command import ExecutableCommand

from openai import OpenAI

client = OpenAI()


def call_llm(prompt: str) -> str:
pass


class CallLlm(ExecutableCommand):
def get_command(self) -> str:
return """
please rename the name foo to smthg better

---

```
function foo(a,b) {
return a +b
}
```

output format:
suggested name - <name here>
"""

def execute_command(self, command: str) -> str:
completion = client.chat.completions.create(
model="gpt-4-1106-preview",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": command},
],
)
return completion.choices[0].message.content


def test_total_output():
verify_executable_command(CallLlm())


def get_new_name(llm_output):
pass


@pytest.mark.skip()
def test_useful_output():
verify(get_new_name(llm_output=reuse_recorded_llm_output()))
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@

please rename the name foo to smthg better

---

```
function foo(a,b) {
return a +b
}
```

output format:
suggested name - <name here>

Loading