Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to commit the processed items to Elastic search using norconex file system collector #35

Open
sanjeevarayuduuppara opened this issue Feb 13, 2019 · 1 comment

Comments

@sanjeevarayuduuppara
Copy link

sanjeevarayuduuppara commented Feb 13, 2019

Hi
I am using norconex filesystem collector to crawl files from shared path. I am trying the commit the processed items to Elastic search and File committer. It is not committing to Elastic search/Solr but getting saved into file system.
PFB the config file. Please help me to resolve the issue.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xml>
<!-- 
   Copyright 2010-2017 Norconex Inc.

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
-->

<fscollector id="Text Files">

## Either uncomment or set the following variables or create yourself a 
## sample-config.variables (or properties) with the same variables set.

#set($path = "valid path")
#set($workdir = "E:\filesystem\norconex-collector-filesystem-2.8.0\norconex-collector-filesystem-2.8.0\examples")

#set($tagger = "com.norconex.importer.handler.tagger.impl")
#set($transformer = "com.norconex.importer.handler.transformer.impl")

  <logsDir>${workdir}/logs</logsDir>
  <progressDir>${workdir}/progress</progressDir>


  <crawlers>
    <crawler id="Sample Crawler">

      <workDir>${workdir}</workDir>

      <startPaths>
        <path>${path}</path>
      </startPaths>
      
      <numThreads>2</numThreads>

      <keepDownloads>false</keepDownloads>

      <importer>
        <postParseHandlers>
          <tagger class="${tagger}.ReplaceTagger">
            <replace fromField="samplefield" regex="true">
              <fromValue>ping</fromValue><toValue>pong</toValue>
            </replace>
            <replace fromField="Subject" regex="true">
				<fromValue>Sample to crawl</fromValue><toValue>Sample crawled</toValue>
			</replace>            
          </tagger>
        </postParseHandlers>
      </importer>
       <committer class="com.norconex.committer.elasticsearch.ElasticsearchCommitter">
    	<nodes>http://localhost:9200</nodes>
    	<indexName>filetest</indexName>
    	<typeName>filetest1</typeName>
      </committer>
	     <committer class="com.norconex.committer.core.impl.JSONFileCommitter">
      <directory>${workdir}/jsoncrawledFiles</directory>
      <pretty>true</pretty>
      <!-- <docsPerFile>(max number of docs per JSON file)</docsPerFile> -->
      <!-- <compress>[false|true]</compress> -->
      <splitAddDelete>true</splitAddDelete>
      <fileNamePrefix>test</fileNamePrefix>
      <fileNameSuffix>json</fileNameSuffix>
  </committer>
      <committer class="com.norconex.committer.core.impl.FileSystemCommitter">
        <directory>${workdir}/crawledFiles</directory>
      </committer>
	
    </crawler>
  </crawlers>

</fscollector>
@essiembre
Copy link
Contributor

You cannot have multiple committers defined like you are doing. One is simply ignored. Either use just one, or if you need multiple, you can wrap them both into a MultiCommitter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants