Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

codec unresponsive when working on a large file #21

Open
nikhilo opened this issue May 29, 2018 · 0 comments
Open

codec unresponsive when working on a large file #21

nikhilo opened this issue May 29, 2018 · 0 comments
Assignees

Comments

@nikhilo
Copy link

nikhilo commented May 29, 2018

I'm seeing a problem with logstash-codec-cloudtrail where the processing just hangs without any error or debug logs when the codec encounters a large file.

Tried enabling debug logs for the codec, but nothing is printed:

curl -XPUT 'localhost:9600/_node/logging?pretty' -H 'Content-Type: application/json' -d'{"logger.logstash.codecs.cloudtrail" : "DEBUG"}'
  • Logstash version 5.5.1
  • Codec Version: 3.0.4
  • Operating System: Ubuntu 14.04
  • Config File
    s3 {
      region => 'us-east-1'
      bucket => '<my-org>-logs'
      backup_to_bucket => '<my-org>-logs'
      backup_add_prefix => 'processed/'
      delete => true
      interval => 300
      tags => ['aws-input', 'cloudtrail']
      type => 'cloudtrail'
      codec => 'cloudtrail'
      prefix => 'cloudtrail/'
      sincedb_path => '/opt/logstash/server/sincedb/cloudtrail'
    }

Sample Data:

Here's the list of files we have in the s3 bucket

2018-05-21 05:32:14      21408 20180521T0000Z_oueDeCc9ryuFaNE2.json.gz
2018-05-21 07:07:23      10581 20180521T0130Z_2C9gPDzKtmwp1sO3.json.gz
2018-05-21 07:04:22    5264114 20180521T0135Z_7zhrUZGpPj8c9rnb.json.gz
2018-05-21 07:12:09      13128 20180521T0135Z_b9h4v5QqEkumMZNu.json.gz
2018-05-21 07:08:06      29622 20180521T0135Z_gY3u2wcdDT3DjPY9.json.gz
2018-05-21 07:08:05      42110 20180521T0135Z_uOFgvOohWqh7pCKm.json.gz
2018-05-21 07:07:13      42502 20180521T0140Z_2TX8v5UumEV24fgg.json.gz
2018-05-21 07:17:28      10593 20180521T0140Z_UQVPTdRJ7OGIpeQu.json.gz
2018-05-21 07:09:28    4841248 20180521T0140Z_ZV0HXfgBNseHi2cG.json.gz
2018-05-21 07:12:32      58228 20180521T0140Z_j8gNtuBoG91ftY6J.json.gz
2018-05-21 07:13:29      33323 20180521T0140Z_jBjTddHPURNw0wDp.json.gz
2018-05-21 07:17:43      45539 20180521T0145Z_28lYKm6deu5M9fPf.json.gz
2018-05-21 07:17:21      37363 20180521T0145Z_MuvtNRJAgTgjsIjq.json.gz
2018-05-21 07:12:22    5245924 20180521T0145Z_kCpHWvq3Hlua803U.json.gz
2018-05-21 07:22:40      12516 20180521T0145Z_kkJAyDaUNgv2LFLK.json.gz
2018-05-21 07:12:23     109264 20180521T0145Z_zrOp34x50ibxvQNT.json.gz
2018-05-21 07:16:04    5257312 20180521T0150Z_3KaopDSL1sGxg6vf.json.gz
2018-05-21 07:17:25     252268 20180521T0150Z_CIrZORIB3WFCVN9s.json.gz
2018-05-21 07:21:08    3119643 20180521T0150Z_ERpgl6PvHjkY90QB.json.gz

At first, the sincedb was stuck at 01:34, and this file was seen in /tmp/logstash
20180521T0135Z_7zhrUZGpPj8c9rnb.json.gz, which is about 5MB.

There was no processing/logs seen beyond that timestamp for over 6 hours.
So, I stopped logstash and set the sincedb to 01:37, to skip that file.

After doing that, logstash was stuck on this file 20180521T0140Z_ZV0HXfgBNseHi2cG.json.gz which is about 4MB.

This kept on going until I skipped this file from above list 20180521T0150Z_3KaopDSL1sGxg6vf.json.gz which is about 5MBs

Steps to Reproduce:

  • Have the codec parse a file larger than 2MB
  • Codec is hung

Please note:

  • Other s3 inputs (elb and cloudfront logs) are functional in the same logstash instance.
  • Filenames in above example have been simplified to emphasize timestamp and file sizes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants