One logstash input for Azure blob is slower than other #214

arunp-motorq · 2020-01-29T23:24:52Z

Issue with Logstash input for Azure blob

I have one instance of logstash for reading data from blob storage. Although logs are in the same container I have 2 major folder structure for logs from two different processes. Blob structure is something like this
Blob

Container
- Process1/Year/Month/Day/Hour/LogFile
- Process2/Year/Month/Day/Hour/LogFile

My logstash blob config looks like this

`azureblob
{
storage_account_name => 'folder1'
storage_access_key => ''
container => 'logs'
id => 'jobs1'
blob_list_page_size => 150
file_chunk_size_bytes => 8088608
registry_create_policy => 'resume'
path_filters => 'folder1/2020 /**/*.csv'
}

azureblob
{
storage_account_name => 'folder2'
storage_access_key => ''
container => 'logs'
id => 'jobs1'
blob_list_page_size => 150
file_chunk_size_bytes => 8088608
registry_create_policy => 'resume'
path_filters => 'folder2/2020 /**/*.csv'
}`

Heap is around 3G and cpu usage is at 70-80%.

I run only one instance of logstash. Issue is logs from folder2 are processed much faster than logs from folder1. Folder2 is days ahead of folder1. ( This is catch up scenario. Am reading logs from start of this month) How do I debug this ?

pinochioze · 2020-03-30T04:01:02Z

Hi Arun, I think your concern is due to the number of blob in each folder (you can get this number by using CLI or Ms Azure Storage Explorer), the procedure of this plugin is:

get the list of all the blobs in the container
Compare the list with the files in "path_filter" then get the list which matched
Get 1 blob in the list of matched blobs base on Generation algorthm and offset of the blob
So there are many blobs in the list of matched blobs have to wait to the next loop of the process

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

One logstash input for Azure blob is slower than other #214

One logstash input for Azure blob is slower than other #214

arunp-motorq commented Jan 29, 2020 •

edited

Loading

pinochioze commented Mar 30, 2020

One logstash input for Azure blob is slower than other #214

One logstash input for Azure blob is slower than other #214

Comments

arunp-motorq commented Jan 29, 2020 • edited Loading

pinochioze commented Mar 30, 2020

arunp-motorq commented Jan 29, 2020 •

edited

Loading