OBQnA
is a high-level OO Python package which aims to provide an easy and intuitive way of creating a OpenBook Question ‘n’ Answer system.
The package parses PDF files using Apache Tika, splits the corpus into passages and calculates their corresponding Dense vector representation exploiting a Transformer NLP model. For each question asked, the system performs a Dense Passage Retrieval, using an efficient similarity search library (Faiss, ScaNN or Annoy) and extracts the answer from the retrieved passages.
To install simply do pip install -r requirements.tx
- note: If you want to use GPU please install CUDA
For more detailed explanation please read the Documentation.
from obqna.process import PDFParser, Passages
from obqna.qa import QuestionAnswering
parser = PDFParser("../books/") # Path of PDFs
books = parser.parse()
books = parser.clean(books)
passages = Passages()
corpus = passages.df2passages(books)
searcher_type = "scann" # other choices: "faiss", "annoy"
qna = QuestionAnswering(searcher_type)
qna.prepare(corpus)
questions = [
"Who is Galadriel?",
"Who is Isildur?",
"Who is Boromir's father?",
"Who is Aragorn?",
"Was the ring destroyed?",
"What language is on the One Ring inscription?"
]
for question in questions:
print(f"Question: {question}: ")
results = qna.ask(question)
print(f"Answer: {results['answer']}")
print(10*'-')
Question: Who is Galadriel?:
Answer: The Lady of Lorien
----------
Question: Who is Isildur?:
Answer: Elendils son
----------
Question: Who is Boromir's father?:
Answer: Lord Denethor
----------
Question: Who is Aragorn?:
Answer: Heir of Isildur
----------
Question: Was the ring destroyed?:
Answer: it perished from the world in the ruin of his first realm
----------
Question: What language is on the One Ring inscription?:
Answer: Black Speech
----------