You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@davidsbatista This sounds great! One idea I had for this is some way to indicate that we'd like to utilize something like NLTK to do sentence splitting. So normally I think the list of separator characters would look like ["\n\n", ".", " "] to accomplish splitting by paragrah, then sentence, and then by word. And I was wondering if we could replace "." with something like "nltk" or some other tag to indicate we'd like to use a separate algorithm to handle the splitting.
Also I wanted to ask will the splitting by separators (e.g. ["\n\n", ".", " "]) be handled using a regex splitter? I think supporting regex would be great so we could provide more complicated separators to better handle complex documents and do things like header detection.
Use a set of predefined separators to split text recursively. The process follows these steps:
The text was updated successfully, but these errors were encountered: