Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log4j.properties and log file generated by NORCONEX COLLECTOR #435

Closed
mauromi76 opened this issue Dec 5, 2017 · 6 comments
Closed

Log4j.properties and log file generated by NORCONEX COLLECTOR #435

mauromi76 opened this issue Dec 5, 2017 · 6 comments

Comments

@mauromi76
Copy link

mauromi76 commented Dec 5, 2017

Hi
I hope you can help us with this.
We downloaded the norconex collector at https://www.norconex.com/collectors/collector-http/ and we are evaluating the product.

Install and run was successful but we are trying to reduce the size of the log file generated by removing the INFO lines.

So here's the steps we followed.

We changed the settings in the file "log4j.properties" like this:

# Set level of information printed in log file/console
# (DEBUG > INFO > WARN > ERROR > FATAL)
# By default, use INFO
log4j.rootLogger=ERROR, CONSOLE

And then when we run the crawler we run the command:

/opt/norconex-collector/collector-http.sh -a start -c config.xml > /opt/norconex-collector/workdir/nohup.log &

The result is that in the "nohup.log" file we see the log as we want, with only ERRORS and above levels, but the crawler still generates a log file in the folder "logs/latest/logs" with the full report (which includes INFO,ERROR and so on).

What should we have to do to remove the INFO level from the log generated in folder "logs/latest/logs" ?

is this a setting made into the compiled source code or we can override it somehow from the log4j.properties file?

thanks in advance
Mauro

@essiembre
Copy link
Contributor

Can you try changing the first line to:

log4j.rootLogger=INFO, FILE_ONLY

Then it should remove logging form your nohup.log and hopefully logs/latest/logs will be OK.

If not, please share your log4j.properties config.

@mauromi76
Copy link
Author

mauromi76 commented Dec 6, 2017

Hi Pascal
thanks for your reply.

maybe there was a misunderstanding :)

Actually the log generated in my "nohup.log" file is how I want; that means that in file "nohup.log" I can see ONLY the lines with ERROR level (and this is EXACTLY what we want to reduce the size of the log file).

My problem is the log generated in "logs/latest/logs". That log, seems to be independent from every setting I change in the log4j.properties.

so my purpose is (if possible) make the crawler TO NOT generate the log into "logs/latest/logs" folder or (if that must be generated) make it the same as it is in my "nohup.log".

would that be possible?

thanks
Mauro

@essiembre
Copy link
Contributor

Unfortunately, without a code change, log files will always be generated under /latest/. The reason behind is for better integration with tools like JEF Monitor, which knows how to reference the logs to report on them. This is a nice capability to have (log predictability), but limits the control you have with logs. For that reason, the next major release will revisit logging to be more flexible. Related tickets: Norconex/jef#6 and Norconex/jef#7.

Until that happens, there is a little hack you can do to make sure the log file under /latest/ is always empty (created, but zero length). You can practically disable the root logger by setting it to FATAL in the log4j.properties file, and then re-enable logging for CONSOLE for individual libraries. Have the begining of your log4j.properties file look like this:

log4j.rootLogger=FATAL, CONSOLE

log4j.logger.CrawlerEvent=INFO, CONSOLE
log4j.logger.com=INFO, CONSOLE
log4j.logger.org=INFO, CONSOLE
log4j.logger.net=INFO, CONSOLE
log4j.logger.edu=INFO, CONSOLE
log4j.logger.ucar=INFO, CONSOLE
log4j.logger.de=INFO, CONSOLE
log4j.logger.opennlp=INFO, CONSOLE
log4j.logger.javax=INFO, CONSOLE
log4j.logger.au=INFO, CONSOLE
log4j.additivity.com=false

Let me know if that works for you.

@mauromi76
Copy link
Author

Hi
I tried the changes you mention but that seems to have no effects on the log file genereated in "latest/" folder.
All the changes made in "log4j.properties" file affect only the log generated in CONSOLE.

My suspect is that the logger (and its properties) are set in the source code and the only way to change them would be overriding some class (but I have no idea which one). Plus... i'm not a java developer and that makes everything even more hard :/

I can post my whole content of "log4j.properties" file if that can help in finding a solution, or I guess, I have to wait for next releases :)

thanks
Mauro

@essiembre
Copy link
Contributor

essiembre commented Dec 12, 2017

I am marking this as a feature request, which will be dependant on Norconex/jef#6.

In the meantime you can have a scheduled task/cronjob delete logs you do not want, or modify the launch script to delete the logs when done.

@essiembre
Copy link
Contributor

You can now rely 100% on your own log4j configuration with the latest snapshot release and prevent the Collector from creating its own logs. Have a look at: #593 (comment) for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants