-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix loki source file windows #2282
Conversation
💻 Deploy preview deleted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -143,6 +144,24 @@ func (c *Component) Run(ctx context.Context) error { | |||
c.mut.RUnlock() | |||
}() | |||
|
|||
tickerChan := make(chan struct{}) | |||
if c.args.RetryInterval > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesnt handle updating the RetryInterval after running, can we add a test for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to handle updating
|
||
The `encoding` argument must be a valid [IANA encoding][] name. If not set, it | ||
defaults to UTF-8. | ||
|
||
You can use the `tail_from_end` argument when you want to tail a large file without reading its entire content. | ||
When set to true, only new logs will be read, ignoring the existing ones. | ||
|
||
`retry_interval` is deactivated by default (`"0s"`). This should be set on Windows-based systems when rotating files with the same names because the | ||
component will no try to re-open the files and the targets won't be updated because of the cache. | ||
This is not needed on Unix-like systems because the component will always try to re-open the deleted files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drafting up a new suggestion.
We can state the specific OSes we support.
This sentence (and the overall description) isn't really clear to me. Why will loki.source_file
try to reopen deleted files? The description in the table says the arg will tell the component to try to reopen closed files. Which is it? I assume it's closed files since trying to reopen deleted files doesn't really make sense. Why won't the component try to reopen closed files on Windows? It's also not really clear to me. Windows caches the files? And the other OSes do not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a case where a user is rotating a "app.log" file everytime it reaches 5MB by deleting it and creating a new app.log file.
On Windows you must first close the file before deleting it. Closing the file will stop the tailer in the loki.source.file component. If the new file would have a different name, the local.file_match would send it to the loki.source.file component and it would work fine. But for the local.file_match component nothing changed because it only checks at a regular interval and it will just see that there is still an "app.log" file. Because of this, it won't notify the loki.source.file component to start a new tailer and the file won't be tailed (it only notifies the next component when the set of files changed).
On Unix-like systems, you can delete the file without closing it. Because the file is not closed, the tailer is not stopped, it continuously tries to read the file until the local.file_match sends an update. In the case of the user, it will restart the tailer once the new "app.log" file is created because of the names it thinks that it's the same file.
I made it updatable but the tests are flaky and I'm not sure why... Also I'm not super happy with the solution, adding an argument for an edge case on Windows does not feel great. I'm tempted to just put a Note in the doc for now and maybe we could plan a refactor of this component to use a pool of worker as you suggested and at the same time see whether we can tackle this edge case in a different way @mattdurham ? |
Not in love with it either, but it does fix an underlying problem. Would also be pro pulling back a bit and trying to figure out if there is a cleaner way. |
Could we solve this better by updating the file_match component to add a created_at timestamp to the discovered target map? This would allow consumers of the target like loki source file to identify new files. Looks like it's possible to get, though we'll need to add os specific implementations to the file_match component (https://stackoverflow.com/questions/59413768/how-to-find-file-creation-date-in-windows-using-go) |
That's an interesting alternative but this makes the component dependent on this label. It would not work if you hardcode the target in the config directly. I think that the problem should be solved in the loki.source.file component directly Because this problem is not urgent (the easy workaround is to change the name of the file you rotate), I would rather consider planning a refactor of the component to address scaling issues and set up a retry mechanism |
PR Description
On Windows, when a file that
loki.source.file
was tailing is deleted, the tailer will be stopped. On Unix-like systems, the tailer will stay alive and keep retrying to open the file.This is ok in most cases, but in the specific scenario where you have a loki.source.file receiving targets from a local.file_match component and that you rotate the file by deleting/re-creating it with the same name and within the same the sync_period, the local.file_match won't send an update because the targets are still the same. As a result, on Windows the loki.source.file will not tail the file anymore. (On Unix-like system, it would recover because it would manage to re-open the file).
This PR adds an optional argument
retry_interval
(default 0) to restart the readers on a tick when they stop. By default, there is no tick so that existing implementations are not affected.Which issue(s) this PR fixes
Notes to the Reviewer
PR Checklist