Skip to content

Commit

Permalink
remove intution note
Browse files Browse the repository at this point in the history
  • Loading branch information
MadcowD committed Dec 16, 2024
1 parent 0e2759c commit 4f8fb94
Showing 1 changed file with 0 additions and 4 deletions.
4 changes: 0 additions & 4 deletions docs/src/core_concepts/evaluations.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,6 @@ Prompt engineering without evaluations is often characterized by subjective asse

Without evaluations, there is no systematic way to ensure that a revised prompt actually improves performance on the desired tasks. There is no guarantee that adjusting a single detail in the prompt to improve outputs on one example does not degrade outputs elsewhere. Over time, as prompt engineers read through too many model responses, they become either desensitized to quality issues or hypersensitive to minor flaws. This miscalibration saps productivity and leads to unprincipled prompt tuning. Subjective judgment cannot scale, fails to capture statistical performance trends, and offers no verifiable path to satisfy external stakeholders who demand reliability, accuracy, or compliance with given standards.

.. note::

The intuitive, trial-and-error style of prompt engineering can be visually depicted. Imagine a simple diagram in ell Studio (ell’s local, version-controlled dashboard) that shows a single prompt evolving over time, each modification recorded and compared. Without evaluations, this “diff” of prompt versions tells us only that the code changed—not whether it changed for the better.


The Concept of Evals
--------------------
Expand Down

0 comments on commit 4f8fb94

Please sign in to comment.