Add support for reading / writing compressed files #4

mithro · 2019-07-02T21:35:55Z

See https://github.com/mithro/duck2-gsoc/issues/16

duck2 · 2019-07-22T15:36:38Z

I did an experiment with zstr streams:

void get_root_elements(const char *filename){
	pugi::xml_document doc;
	pugi::xml_parse_result result;

	std::string x(filename);
	if(x.rfind(".") != std::string::npos && x.substr(x.rfind(".")+1) == "gz"){
		std::ifstream F;
		F.open(x);
		zstr::istream Z(F);
		result = doc.load(Z);
	} else {
		result = doc.load_file(filename);
	}

	if(!result)
		throw std::runtime_error("Could not load XML file " + std::string(filename) + ".");
	for(pugi::xml_node node= doc.first_child(); node; node = node.next_sibling()){
		if(std::strcmp(node.name(), "rr_graph") == 0){
			count_rr_graph(node);
			alloc_arenas();
			load_rr_graph(node, &rr_graph);
		}
		else throw std::runtime_error("Invalid root-level element " + std::string(node.name()));
	}
}

Artix 7 rr_graph run with uncompressed file(922 MB)(without errno checking after strtol calls):
7.645 8.097 7.600 7.636 7.677

With gzip-compressed file:
11.34 11.15 11.10 11.29 11.13

mithro · 2019-07-22T15:55:49Z

Is that with or without a hot disk cache? Can you try flushing that?

duck2 · 2019-08-07T06:45:52Z

It's with a hot disk cache. Without the file in the cache, the reading time can jump to 11 seconds or so.

mithro · 2019-08-11T21:58:00Z

@duck2 - How does the time between with gzip and without gzip compare without file in the disk cache?

litghost · 2020-01-15T01:54:44Z

Once SAX parsing support is complete (#3), a compressed one pass SAX parser may be a good compromise between CPU/disk/memory usage. Unclear if a two pass SAX + compression would have good numbers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for reading / writing compressed files #4

Add support for reading / writing compressed files #4

mithro commented Jul 2, 2019

duck2 commented Jul 22, 2019

mithro commented Jul 22, 2019

duck2 commented Aug 7, 2019

mithro commented Aug 11, 2019

litghost commented Jan 15, 2020

Add support for reading / writing compressed files #4

Add support for reading / writing compressed files #4

Comments

mithro commented Jul 2, 2019

duck2 commented Jul 22, 2019

mithro commented Jul 22, 2019

duck2 commented Aug 7, 2019

mithro commented Aug 11, 2019

litghost commented Jan 15, 2020