Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timelines #8

Open
Gautam-Rajeev opened this issue May 1, 2024 · 2 comments
Open

Timelines #8

Gautam-Rajeev opened this issue May 1, 2024 · 2 comments
Assignees

Comments

@Gautam-Rajeev
Copy link
Contributor

Jeet /Aryan :

29th April : --- Breakdown into subparts and clean up of initial documents
13th May : Fully stitched up MVP which is able to ask questions to to user based on the document type
27th May: Evaluation framework fully setup for overall and individual parts of the system , test set of benchmarks created. all models setup as APIs interacting with one another
10th June: improvement on the individual components, support for Hindi for pdf breakdown into schema , adding language components to user interface.
24th June:
8th July:
22nd July:

@Grinzypino
Copy link
Contributor

Grinzypino commented May 2, 2024

Weekly Learnings & Updates (Jeet)

Week 1 & 2:

  • The Documentation of all the 6 tickets were done and along with this new flowcharts were made for few unclear ones.
  • Researched for the methods on figuring out a information sufficiency check system for the copilot.
  • Methods like decision trees and hard coded questions to ask were discarded and using LLMs for it was chosen.
  • Further moved towards building a flask app copilot that will use LLM for info sufficiency check.

@AryanPrakhar
Copy link
Contributor

Weekly learning & Updates (Aryan)


Week 1 & 2:

  • Documented the discussion around project implementation strategy.
  • Took the task of working on ticket 1 and 3 that is 'Document Analysis and Section Building' and 'Closeness Evaluation'.
  • Researched on optimum methods to extract semantics from the documents, especially in the case of Hindi language.
  • The initial idea for semantics extraction was converting the orders to English and using Named Entity Recognition, Dependency Parsing and Semantic Role Labelling.
  • In my quest for solutions, I found out that LLMs do really well on the information retrieval task, even in the case of Hindi documents.
  • Read about storing retrieved information to json schema. Experimented upon this capability of LLMs and found out that this is indeed true and LLMs perform quite well on the information retrieval task.
  • Read the provide court documents. Identified the common components in these documents. Created a json schema for a baseline format for all the necessary information required to draft a court order.
  • Experimented the same with LLM, json schema was retrieved accurately.
  • Providing an example court order to the LLM, checked how accurately the LLM could generate a court order draft. Got great results.
  • Researching on the optimum measures for closeness of generated court orders. Read about Text distance but it won't be the best match for our case.
  • Instead, looking at the possibility of using ROGUE/BLUE, LLM based, sematic similarity matching etc upon the feedback of mentors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants