Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way to chunk the acquired content? #6

Open
ki-suzuki opened this issue Sep 8, 2023 · 3 comments
Open

Is there a way to chunk the acquired content? #6

ki-suzuki opened this issue Sep 8, 2023 · 3 comments

Comments

@ki-suzuki
Copy link

ki-suzuki commented Sep 8, 2023

Is there a method to chunk the acquired content when committing it to Azure Search? If so, I would like to learn about it.

@ohtwadi
Copy link

ohtwadi commented Sep 8, 2023

If you are talking about batching multiple documents, then yes. In fact, this is done by default. Please take a look at the documentation.

<queue
      class="com.norconex.committer.core3.batch.queue.impl.FSQueue">
    <batchSize>
      (Optional number of documents queued after which we process a batch.
       Default is 20.)
    </batchSize>
...

@ki-suzuki
Copy link
Author

@ohtwadi
Thank you for your prompt response.

I believe what I am looking for is not this method.

Here are the details.
For instance, I mean after retrieving the body of the HTML, and if the content of the body is so large that I want to divide it into several smaller chunks and submit them to the Azure Cognitive Search Index as separate records.

Is there some methods to achieve this?

Thank you.

@ohtwadi
Copy link

ohtwadi commented Sep 12, 2023

Does the DOMSplitter fit the bill for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants