Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JEF Monitor causes crawl failures on Windows #519

Closed
rustyx opened this issue Sep 5, 2018 · 4 comments
Closed

JEF Monitor causes crawl failures on Windows #519

rustyx opened this issue Sep 5, 2018 · 4 comments

Comments

@rustyx
Copy link

rustyx commented Sep 5, 2018

Due to file locking, a running JEF Monitor causes intermittent crawler start failures with an exception like this:

java.io.IOException: Could not move "D:\work\logs\latest\logs\test1.log" to "D:\work\logs\backup\2018\09\05\logs\201809052154420193__test1.log".
        at com.norconex.commons.lang.file.FileUtil.moveFile(FileUtil.java:179)
        at com.norconex.jef4.log.FileLogManager.backup(FileLogManager.java:186)
        at com.norconex.jef4.suite.JobSuite.backupSuite(JobSuite.java:543)
        at com.norconex.jef4.suite.JobSuite.initialize(JobSuite.java:473)
        at com.norconex.jef4.suite.JobSuite.doExecute(JobSuite.java:277)
        at com.norconex.jef4.suite.JobSuite.execute(JobSuite.java:168)
        at com.norconex.collector.core.AbstractCollector.start(AbstractCollector.java:131)
        at com.norconex.collector.core.AbstractCollectorLauncher.launch(AbstractCollectorLauncher.java:95)
        at com.norconex.collector.http.HttpCollector.main(HttpCollector.java:74)

Windows error message is: The process cannot access the file because it is being used by another process

@rustyx rustyx changed the title JEF causes crawl failures on Windows JEF Monitor causes crawl failures on Windows Sep 5, 2018
@essiembre
Copy link
Contributor

A few things to check:

  • Which version are you using? Some file locking issues should have been resolved in the most recent releases of the HTTP Collector.
  • Are you running the collector from the command line, or wrapped in a long-running java process? There is a known bug with Java on Windows that files locks are not always released until the JVM exits (more info here). Running it on command line (i.e., on its own dedicated JVM) should fix this. May be worth trying an upgrade to Java 10 as well, since it does a few improvements to its garbage collection which maybe helps.
  • Do you have colliding paths in your collector configuration file? This can happen if you have more than one collector sharing the same configured paths. It can also happen if you have multiple crawlers defined in your configuration (e.g., make sure queueDir is different for each committers you have).

@rustyx
Copy link
Author

rustyx commented Sep 7, 2018

Using the latest versions, collector-http 2.9.0 and jef-monitor 4.0.5. Java 8.
Running a single collector with a single job, from command line.

JEF-monitor periodically opens "<work-dir>/latest/logs/<collector-id>.log" and reads from it. A collector wants to move the same file every time it starts. new FileInputStream(File) on Windows opens the file in shared-read/write mode. Hence the race condition, the file may not be moved while open in shared-write mode.

JEF-monitor should open the log with a shared-read mode instead, so that moveFile can proceed even when the file is being read (which happens to be the default behavior on *NIX). Note, retrying the move a few times is a good idea, but gives no guarantee of success.

Here's how it can be done with Java NIO:

File file = new File("test.txt");
File file2 = new File("test2.txt");
file.delete();
file2.delete();
try (FileOutputStream out = new FileOutputStream(file)) {
	out.write("Test data".getBytes());
}
try (FileInputStream in = new FileInputStream(file)) {
	System.out.println("Rename while reading: " + file.renameTo(file2));
}
System.out.println("Rename after reading: " + file.renameTo(file2));
file2.renameTo(file); // rename back
try (InputStream in = Files.newInputStream(file.toPath(), StandardOpenOption.READ)) {
	System.out.println("Rename while reading with StandardOpenOption.READ: " + file.renameTo(file2));
}

Output:

Rename while reading: false
Rename after reading: true
Rename while reading with StandardOpenOption.READ: true

@essiembre
Copy link
Contributor

Kamino closed and cloned this issue to Norconex/jef-monitor

@essiembre
Copy link
Contributor

Ha... JEF Monitor. I can understand where the problem is coming from then. I have moved this to JEF Monitor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants