Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* more metrics rambling * basic evaluation class * todos * fix bedrock * more ramblings * switch to logging * more ramblings * thoughts * working evals * eval update * Delete docs/ramblings/test.py * Delete docs/ramblings/typeddictpartial.py * eval update * cleaner specification of criteria * more spec discussions around chat and yielding * . * . * interesting human eval spec * refactor types locations * new back relationships for evaluatio nrun summaries * some more interesintg thogyuhts on human feedback and data collection * adding idea of a invocation group but not fully implemented * working evals with road to summarizatioin * . * example of a discriminator * additioanl update * solve the no dataset problem * working writing to the db * base models * data model stubs * working serverside api * eval cards * . * working eval cards * shrinking version badges? * horizontal layout * cleaner layout * trendlines * clean metrics * latest run summary and invocaiton plots! * basic eval page * generalize version history * diea for the metrics page. * more goofing * unified icon * refacroed computation graph a bit more * better layouting algorithm * metric displays * metrics * beautful * . * added error bars * . * datanow comes in real time * enable parallel writes * fix writing to wrong graph * . * . * . * refactor evalaution location * evaluations * a bit of cleanup & refactor * lcoal evalaution utils * local util refactor pt 2 * fix bug where invocation id interface didnt work when the store wasn't set. * new tests * more tests * additioanl refactor todo * refactor for serialization module * a bit cleaner version of the evaluator now * push * streaming evaluations. * in progress display * hierarchical cluster sorting for parallel execution. * partial labelers * . * convert to flat representation * flat * . * new individual evaluation page * label display * is constant subtle bras * fixing sdiebar bug * expandable groups * fix bool bug * histogram view * adding histogram and better sorting * working search and filteres for the evals * metrics show up now in the sidebar * fix null renderer bug * some clean asserts * passing tests * evals * sliguhtly improved tables * all versions of evaluations * evals folder * dataset hash -> dataset id * dataset storage * dataset view ish * dataset page * migration for exisitng ell studio databases * fixed migrations and module seperation * preferring main * cleanup evals * python 3.9 fix * clean up eval lsit * 3.9 lru cache fix
- Loading branch information