-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ERROR - Could not commit batched operations #4
Comments
Collector Config file: <httpcollector id="Collector1">
#set($http = "com.norconex.collector.http")
#set($core = "com.norconex.collector.core")
#set($urlNormalizer = "${http}.url.impl.GenericURLNormalizer")
#set($filterExtension = "${core}.filter.impl.ExtensionReferenceFilter")
#set($filterRegexRef = "${core}.filter.impl.RegexReferenceFilter")
#set($urlFilter = "com.norconex.collector.http.filter.impl.RegexURLFilter")
<crawlerDefaults>
<urlNormalizer class="$urlNormalizer" />
<numThreads>4</numThreads>
<maxDepth>1</maxDepth>
<maxDocuments>-1</maxDocuments>
<workDir>./norconexcollector</workDir>
<orphansStrategy>DELETE</orphansStrategy>
<delay default="0" />
<sitemapResolverFactory ignore="false" />
<robotsTxt ignore="true" />
<referenceFilters>
<filter class="$filterExtension" onMatch="exclude">jpg,jepg,svg,gif,png,ico,css,js,xlsx,pdf,zip,xml</filter>
</referenceFilters>
</crawlerDefaults>
<crawlers>
<crawler id="CrawlerID">
<startURLs stayOnDomain="true" stayOnPort="false" stayOnProtocol="false">
<sitemap>https://*******.com/sitemap.xml</sitemap>
</startURLs>
<importer>
<postParseHandlers>
<tagger class="com.norconex.importer.handler.tagger.impl.DebugTagger" logLevel="INFO" />
<tagger class="com.norconex.importer.handler.tagger.impl.KeepOnlyTagger">
<fields>document.reference,title,description,content</fields>
</tagger>
<tagger class="com.norconex.importer.handler.tagger.impl.RenameTagger">
<rename fromField="document.reference" toField="reference"/>
</tagger>
<transformer class="com.norconex.importer.handler.transformer.impl.ReduceConsecutivesTransformer">
<!-- carriage return -->
<reduce>\r</reduce>
<!-- new line -->
<reduce>\n</reduce>
<!-- tab -->
<reduce>\t</reduce>
<!-- whitespaces -->
<reduce>\s</reduce>
</transformer>
<transformer class="com.norconex.importer.handler.transformer.impl.ReplaceTransformer">
<replace>
<fromValue>\n</fromValue>
<toValue></toValue>
</replace>
<replace>
<fromValue>\t</fromValue>
<toValue></toValue>
</replace>
</transformer>
</postParseHandlers>
</importer>
<!-- Azure committer setting -->
<committer class="com.norconex.committer.azuresearch.AzureSearchCommitter">
<endpoint>********</endpoint>
<apiKey>***********</apiKey>
<indexName>**********</indexName>
<maxRetries>3</maxRetries>
<targetContentField>content</targetContentField>
<queueDir>./queuedir</queueDir>
<queueSize>6000</queueSize>
</committer>
</crawler>
</crawlers>
</httpcollector> |
This error is coming from Azure. Is it possible you have large documents? Online research suggests you are getting this when uploading something too big. I would suggest you try adding 10 (or lower) to your committer to see if it makes a difference (from a default of 100). You can find many Azure/IIS users having this problem and the upload limit seems configurable. For instance, this Microsoft thread give you a few options: https://social.msdn.microsoft.com/Forums/sqlserver/en-US/d729a842-8ed9-466e-9ba8-4256ea294548/http11-413-request-entity-too-large?forum=biztalkgeneral An excerpt:
Hopefully this can give you a few pointers, else, you will have to ask Azure support for how to increase the limit. |
Hello @essiembre , Thanks for this quick update. I will check the setting and will ask Azure support if it will not solve. Br, |
Hi ,
I am trying to crawl sitemap xml file which includes bulk urls - 100% completed (563 processed/563 total)
I am getting error when committing to Azure.
I have tried many time running the norconex -
command being used: collector-http.bat -a start -c collectorconfig.xml
PFB error details from logs -
Can you please advise what needs to be done for this.
Br,
Akash
The text was updated successfully, but these errors were encountered: