Refactoring and resolution of performance issues #9

mproffitt · 2016-11-22T12:01:36Z

This pull request implements a new API and resolves ISSUE-7 - Severe Performance Degradation when working with large data-sets.

A full breakdown of the changes provided is available at https://github.com/mproffitt/py-upset/blob/feature/ISSUE-7-Severe-Performance-Degradation/docs/WhatChanged-Version2.md with a discussion on performance towards the bottom.

Synopsis of changes

New resources module
New methods module
pyupset.__init__ exposes only the plot() function, visualisation.UpsetPlot, resources.FilterConfig, resources.DataExtractor classes and resources.SortMethods Enum
New API structure
New FilterConfig, GraphStore, Colours GridSpecStore and ExtractedData classes extending an Immutable type (once set, cannot be changed)
ExtractedData class is comparable
DataExtractor class moved to resources
DataExtractor now works on a merge table rather than generated indexes
Improved API Documentation
Added Tests for core functionality
Improved lint checks

* New resources module * New methods module * New API structure * New FilterConfig, GraphStore, Colours GridSpecStore and ExtractedData classes extending an Immutable type (once set, cannot be changed) * ExtractedData class is comparable * DataExtractor class moved to resources * DataExtractor now works on a merge table rather than generated indexes * Improved API Documentation * Added Tests for core functionality * Improved lint checks Full write-up of changes can be found in docs/WhatChanged-Version2.md Differences in output. * The histogram plot show slightly different values to the original library. This could be for one of 2 reasons. 1. An issue with selecting the results into ExtractedData objects. 2. The original library plotted incorrect results potentially including NaN as a value. This would have provided larger datasets than the re-work which explicitly deletes NaN values.

* Added reset method to change index on small dataframes * Frames are now merged in on a copy with the column names on the original frame reset post merge

mproffitt force-pushed the feature/ISSUE-7-Severe-Performance-Degradation branch from c01fa0e to 97aefda Compare November 22, 2016 13:32

Issues with small dataframes and test data:

9a88e85

* Added reset method to change index on small dataframes * Frames are now merged in on a copy with the column names on the original frame reset post merge

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring and resolution of performance issues #9

Refactoring and resolution of performance issues #9

mproffitt commented Nov 22, 2016

Refactoring and resolution of performance issues #9

Are you sure you want to change the base?

Refactoring and resolution of performance issues #9

Conversation

mproffitt commented Nov 22, 2016

Synopsis of changes