Improve performance #19

peetersn · 2014-02-25T14:43:46Z

Hi,

Quick question regarding this awesome plugin.
I'm using the following approach:

new WebXlsxExporter().with {
    setResponseHeaders(response)
    fillHeader(headers)
    fillRow(["aaa", "bbb", 13, new Date()], 1)
    save(response.outputStream)
}

... but the problem is that the amount of rows is big (20000). So the file generation takes a while (20 seconds or more), the generated file is 2MB+.
Since we're running on Heroku, the connection is reset by the Heroku Router mesh after 30 seconds. I was wondering what I could do to improve performance:

Would using a template help at all?
Is there anyway I can parallelize the implementation?
Can I send a chuncked or streaming response?
Can I zip the file?
Would reverting to Apache POI bring any performance gain?

The alternative is to generate the file in the background using some Groovy async support, but I'm not quite sure what to use on the front end to "poll the status".
Any recommendation what I can do with this is approach is most welcome.

The text was updated successfully, but these errors were encountered:

jakubnabrdalik · 2014-03-02T11:30:36Z

As for idea from question 5: For such a big excels, you could go with SXSSF (pure Apache POI has 3 modes of operation, I'm not using the most efficient in terms of speed or memory in this plugin): http://poi.apache.org/spreadsheet/index.html

"SXSSF is an API-compatible streaming extension of XSSF to be used when very large spreadsheets have to be produced, and heap space is limited. SXSSF achieves its low memory footprint by limiting access to the rows that are within a sliding window, while XSSF gives access to all rows in the document. Older rows that are no longer in the window become inaccessible, as they are written to the disk. "

You could also add support for SXSSF to this plugin and I'd be happy to merge. It doesn't look difficult.

As for the rest of questions:

Not much
Sure, you can fire it as a background job (quartz, Spring @async, or GPars, to keep it simple and not deal with threads directly), save the file to disk, and return it after it's created. Just ask from the html page for the file once in a while (AJAX) to bypass Heroku timeout. I don't know what js libraries you use, but it's simple.
Theoretically yes, practically not really. I'd go with other options.
Yes, it's actually already a zip, but it's not being send as zip AFAIK. Take a look at FileManipulationAbility.save(OutputStream outputStream). Last thing I do is close the zip package (XlsxExporter.closeZipPackageIfPossible). You can experiment with moving workbook.write(outputStream) after closeZipPackageIfPossible. Not sure what happens then, though - let me know what you find.

peetersn · 2014-03-02T11:44:48Z

Thanks for the pointers and the detailed answers. I'll definitely look into SXSSF. If I am reaching any significant improvement, I'll see if I can port some of the code to your plugin, that could be handy.

I have done some work with 2. I'll push it to a Github repo soon and will let you know.
I'll check what happens with 4.

jakubnabrdalik added the question label Mar 2, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance #19

Improve performance #19

peetersn commented Feb 25, 2014

jakubnabrdalik commented Mar 2, 2014

peetersn commented Mar 2, 2014

Improve performance #19

Improve performance #19

Comments

peetersn commented Feb 25, 2014

jakubnabrdalik commented Mar 2, 2014

peetersn commented Mar 2, 2014