You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My crawler does a language detection on crawled documents and then assigns data such as "content", "title" and "description" to different fields based on the language detected. I use ScriptTagger for this.
So my Solr schema doesn't actually have a "content" field, but the Norconex Solr Committer still sends a field called "content". This results in an error:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at https://myserver.fi/solr/mycollection: ERROR: [doc=https://www.example.com/] unknown field 'content'
The documentation says I can use <sourceContentField> and <targetContentField> to rename the content field. But is there a way to remove it completely? After all I've in a way already renamed it with the ScriptTagger in the Importer phase.
The text was updated successfully, but these errors were encountered:
Right now, not sure you can simply ignore it in the Committer. You can do it in Solr though if you are using a managed schema:
<dynamicFieldname="*"type="ignored"multiValued="true" />
<!-- If "ignored" is not defined as a file type in your Solr version: -->
<fieldTypename="ignored"indexed="false"stored="false"class="solr.StrField" />
Another idea is to take one of your existing field, and define it as both the source and target content field. For example, if you tell it your content source is your "title" and and the target field for it is also called "title", then that is a way to fool it.
Let me know if you would like to make this a feature request to be able to skip submitting the content with a flag.
Thank you, the proposed solution ignoring the content in Solr with a dynamic field works well. This is sufficient for my use case, but I suppose it wouldn't hurt to make a feature request, because it may be useful for someone else.
My crawler does a language detection on crawled documents and then assigns data such as "content", "title" and "description" to different fields based on the language detected. I use
ScriptTagger
for this.So my Solr schema doesn't actually have a "content" field, but the Norconex Solr Committer still sends a field called "content". This results in an error:
The documentation says I can use
<sourceContentField>
and<targetContentField>
to rename the content field. But is there a way to remove it completely? After all I've in a way already renamed it with theScriptTagger
in the Importer phase.The text was updated successfully, but these errors were encountered: