-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Advice needed: text summarization via a pre-trained model on a local computer #587
Comments
There aren't many pretrained TF models for summarization on Hugging Face, and I think those models tend to be saved in Keras format anyway so will need converting into TF SavedModel format before they can be used with TF Java. I'd probably start with something like this - https://huggingface.co/google/flan-t5-large, but you could also use a decoder-only model (i.e. an LLM) though there aren't many of those in TF or Keras h5 format either (maybe Google's gemma model?). You'll need to load in the tokenizer (which for T5 is sentencepiece) and then tokenize the inputs, before passing them into TF-Java to get the output predicted token, loop it back around and keep predicting until you hit a termination condition (like To be honest, it might be simpler to use jlama or llama3.java and use a pre-trained llama 3 checkpoint. Those models are fairly good at summarization and the libraries already have tokenization and the token generation loop sorted which you'd need to implement on top of TF-Java. I don't think either of them support GPUs, so it depends how big your workload is in terms of batch size and latency. |
@Craigacp, thanks for advice! If I understand correctly, the tokenizer I would have to use is Sentencepiece (I would try with pegasus-xsum pretrained, but it uses the same tokenizer, so it doesn't matter) in order to tokenize inputs. At this moment, I generate the savedModel format from the pagesus-xsum model (=got the .pb and/variables files etc....) Now I wonder, if TensorFlow Java API supports SentencePiece somehow or not? |
...and, for example, DeepJavaLibrary (DJL) have the Sentencepiece implementation |
You can put the sentencepiece op from tensorflow-text into a TF graph, but that is a bit of a pain to do from Java and will require you to understand how TF works on a deeper level. Otherwise DJL's wrapper should be fine, and there are others available. You'll still need to write the generative loop yourself, passing in the input tokens, the previously generated tokens and the key-value cache, then building a sampling mechanism. |
thanks @Craigacp, I made some progress, by using DJL Sentencepiece implementation, over TF SavedModel format I made. I successfully tokenized the input text. But I got stuck with what you said:
|
The output of the model will be a probability distribution over tokens. The simplest thing is to do "greedy decoding", where you pick the most likely token (i.e. the one with the highest probability), then you append that token id to your input tokens and run inference again. |
ok, here is what I am doing:
-question 1 : is this a good way, or is something missing?
I am getting:
|
Hello everyone, I intend to do text summarization using some (any) pretrained model, which I would load from the local computer.
If something like this is possible, I am interested in:
Any help/online resource/... is welcome
The text was updated successfully, but these errors were encountered: