Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible to omit document content altogether? #16

Open
ronjakoi opened this issue Jan 24, 2019 · 3 comments
Open

Possible to omit document content altogether? #16

ronjakoi opened this issue Jan 24, 2019 · 3 comments

Comments

@ronjakoi
Copy link

My crawler does a language detection on crawled documents and then assigns data such as "content", "title" and "description" to different fields based on the language detected. I use ScriptTagger for this.

So my Solr schema doesn't actually have a "content" field, but the Norconex Solr Committer still sends a field called "content". This results in an error:

org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at https://myserver.fi/solr/mycollection: ERROR: [doc=https://www.example.com/] unknown field 'content'

The documentation says I can use <sourceContentField> and <targetContentField> to rename the content field. But is there a way to remove it completely? After all I've in a way already renamed it with the ScriptTagger in the Importer phase.

@essiembre
Copy link
Contributor

Right now, not sure you can simply ignore it in the Committer. You can do it in Solr though if you are using a managed schema:

<dynamicField name="*" type="ignored" multiValued="true" /> 
<!-- If "ignored" is not defined as a file type in your Solr version: -->
<fieldType name="ignored" indexed="false" stored="false" class="solr.StrField" />

Another idea is to take one of your existing field, and define it as both the source and target content field. For example, if you tell it your content source is your "title" and and the target field for it is also called "title", then that is a way to fool it.

Let me know if you would like to make this a feature request to be able to skip submitting the content with a flag.

@ronjakoi
Copy link
Author

Thank you, the proposed solution ignoring the content in Solr with a dynamic field works well. This is sufficient for my use case, but I suppose it wouldn't hurt to make a feature request, because it may be useful for someone else.

@essiembre
Copy link
Contributor

Thanks for confirming. Feature request it is. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants