Skip to content

Latest commit

 

History

History
463 lines (413 loc) · 39.1 KB

README.md

File metadata and controls

463 lines (413 loc) · 39.1 KB

MultiVis

The MultiVis package contains the necessary tools for visualisation of multivariate data.

Installation

Dependencies

multivis requires:

  • Python (==3.11.4)
  • NumPy (==1.25.2)
  • OpenPyXL (==2.6.1)
  • Pandas (==2.1.0)
  • Matplotlib (==3.8.0)
  • Seaborn (==0.12.2)
  • Networkx (==3.1.0)
  • statsmodels (==0.14.0)
  • scikits-bootstrap (==1.1.0)
  • SciPy (==1.11.2)
  • Scikit-learn (==1.3.1)
  • tqdm (==4.66.1)
  • xlrd (==2.0.1)

User installation

The recommend way to install multivis and dependencies is to using conda:

conda install -c brett.chapman multivis

or pip:

pip install multivis

Alternatively, to install directly from github:

pip install https://github.com/brettChapman/multivis/archive/master.zip

API

For further detail on the usage refer to the docstring.

multivis

  • Edge: Builds nodes and edges and is the base class for the Network class.

    • init_parameters
      • [peaktable] : Pandas dataframe containing peak data. Must contain 'Name' and 'Label'.
      • [datatable] : Pandas dataframe matrix containing scores.
      • [pvalues] : Pandas dataframe matrix containing score/similarity pvalues (if available, otherwise set to None).
    • methods
      • [set_params] : Set parameters

        • [filter_type] : The value type to filter the data on (default: 'pvalue')
        • [hard_threshold] : Value to filter the data on (default: 0.005)
        • [withinBlocks] : Include scores within blocks if building multi-block network (default: False)
        • [sign] : The sign of the score/similarity to filter on ('pos', 'neg' or 'both') (default: 'both')
      • [help] : Print this help text

      • [build] : Builds the nodes and edges.

      • [getNodes] : Returns a Pandas dataframe of all nodes.

      • [getEdges] : Returns a Pandas dataframe of all edges.

  • Network: Builds nodes and edges, with added NetworkX functionality. Inherits from Edge.

    • init_parameters
      • [peaktable] : Pandas dataframe containing peak data. Must contain 'Name' and 'Label'.
      • [datatable] : Pandas dataframe matrix containing scores.
      • [pvalues] : Pandas dataframe matrix containing score/similarity pvalues.
    • methods
      • [set_params] : Set parameters

        • [filter_type] : The value type to filter the data on (default: 'pvalue')
        • [hard_threshold] : Value to filter the data on (default: 0.005)
        • [link_type] : The value type to represent links in the network (default: 'score')
        • [withinBlocks] : Include scores within blocks if building multi-block network (default: False)
        • [sign] : The sign of the score/similarity to filter on ('pos', 'neg' or 'both') (default: 'both')
      • [help] : Print this help text

      • [build] : Builds nodes, edges and NetworkX graph.

      • [getNetworkx] : Returns a NetworkX graph.

      • [getLinkType] : Returns the link type parameter used in building the network.

  • edgeBundle: Produces an interactive hierarchical edge bundle in D3.js, from nodes and edges.

    • init_parameters
      • [nodes] : Pandas dataframe containing nodes generated from Edge.
      • [edges] : Pandas dataframe containing edges generated from Edge.
    • methods
      • [set_params] : Set parameters

        • [html_file] : Name to save the HTML file as (default: 'hEdgeBundle.html')
        • [innerRadiusOffset] : Sets the inner radius based on the offset value from the canvas width/diameter (default: 120)
        • [blockSeparation] : Value to set the distance between different segmented blocks (default: 1)
        • [linkFadeOpacity] : The link fade opacity when hovering over/clicking nodes (default: 0.05)
        • [mouseOver] : Setting to 'True' swaps from clicking to hovering over nodes to select them (default: True)
        • [fontSize] : The font size in pixels set for each node (default: 10)
        • [backgroundColor] : Set the background colour of the plot (default: 'white')
        • [foregroundColor] : Set the foreground colour of the plot (default: 'black')
        • [node_data] : Peak Table column names to include in the mouse over information (default: 'Name' and 'Label')
        • [nodeColorScale] : The scale to use for colouring the nodes ("linear", "reverse_linear", "log", "reverse_log", "square", "reverse_square", "area", "reverse_area", "volume", "reverse_volume", "ordinal", "reverse_ordinal") (default: 'linear')
        • [node_color_column] : The Peak Table column to use for node colours (default: None sets to black)
        • [node_cmap] : Set the CMAP colour palette to use for colouring the nodes (default: 'brg')
        • [edgeColorScale] : The scale to use for colouring the edges, if edge_color_value is 'pvalue' ("linear", "reverse_linear", "log", "reverse_log", "square", "reverse_square", "area", "reverse_area", "volume", "reverse_volume", "ordinal", "reverse_ordinal") (default: 'linear')
        • [edge_color_value] : Set the values to colour the edges by. Either 'sign', 'score' or 'pvalue' (default: 'score')
        • [edge_cmap] : Set the CMAP colour palette to use for colouring the edges (default: 'brg')
        • [addArcs] : Setting to 'True' adds arcs around the edge bundle for each block (default: False)
        • [arcRadiusOffset] : Sets the arc radius offset from the inner radius (default: 20)
        • [extendArcAngle] : Sets the angle value to add to each end of the arc (default: 2)
        • [arc_cmap] : Set the CMAP colour palette to use for colouring the arcs (default: 'Set1')
      • [help] : Print this help text

      • [build] : Generates the JavaScript embedded HTML code, writes to a HTML file and opens it in a browser.

      • [buildDashboard] : Generates the JavaScript embedded HTML code in a dashboard format, writes to a HTML file and opens it in a browser.

  • plotNetwork: Produces a static spring-embedded network from a NetworkX graph.

    • init_parameters
      • [g] : NetworkX graph.
    • methods
      • [set_params] : Set parameters

        • [imageFileName] : The image file name to save to (default: 'networkPlot.jpg')
        • [edgeLabels] : Setting to 'True' labels all edges with the score/similarity value (default: True)
        • [saveImage] : Setting to 'True' will save the image to file (default: True)
        • [layout] : Set the NetworkX layout type ('circular', 'kamada_kawai', 'random', 'spring', 'spectral') (default: 'spring')
        • [transparent] : Setting to 'True' will make the background transparent (default: False)
        • [dpi] : The number of Dots Per Inch (DPI) for the image (default: 200)
        • [figSize] : The figure size as a tuple (width,height) (default: (30,20))
        • [node_cmap] : The CMAP colour palette to use for nodes (default: 'brg')
        • [colorScale] : The node colour scale to apply ("linear", "reverse_linear", "log", "reverse_log", "square", "reverse_square", "area", "reverse_area", "volume", "reverse_volume", "ordinal", "reverse_ordinal") (default: 'linear')
        • [node_color_column] : The Peak Table column to use for node colours (default: None sets to black)
        • [sizeScale] : The node size scale to apply ("linear", "reverse_linear", "log", "reverse_log", "square", "reverse_square", "area", "reverse_area", "volume", "reverse_volume", "ordinal", "reverse_ordinal") (default: 'reverse_linear')
        • [size_range] : The node size scale range to apply. Tuple of length 2. Minimum size to maximum size (default: (150,2000))
        • [sizing_column] : The node sizing column to use (default: sizes all nodes to 1)
        • [alpha] : Node opacity value (default: 0.5)
        • [nodeLabels] : Setting to 'True' will label the nodes (default: True)
        • [fontSize] : The font size set for each node (default: 15)
        • [keepSingletons] : Setting to 'True' will keep any single nodes not connected by edges in the NetworkX graph (default: True)
        • [column] : Column from Peak Table to filter on (default: no filtering)
        • [threshold] : Value to filter on (default: no filtering)
        • [operator] : The comparison operator to use when filtering (default: '>')
        • [sign] : The sign of the score to filter on ('pos', 'neg' or 'both') (default: 'pos')
      • [help] : Print this help text

      • [build] : Generates and displays the NetworkX graph.

  • springNetwork: Interactive spring-embedded network which inherits data from the NetworkX graph.

    • init_parameters
      • [g] : NetworkX graph.
    • methods
      • [set_params] : Set parameters

        • [node_size_scale] : dictionary(Peak Table column name as index: dictionary('scale': ("linear", "reverse_linear", "log", "reverse_log", "square", "reverse_square", "area", "reverse_area", "volume", "reverse_volume", "ordinal", "reverse_ordinal") 'range': a number array of length 2 - minimum size to maximum size)) (default: sizes all nodes to 10 with no dropdown menu)
        • [node_color_scale] : dictionary(Peak Table column name as index: dictionary('scale': ("linear", "reverse_linear", "log", "reverse_log", "square", "reverse_square", "area", "reverse_area", "volume", "reverse_volume", "ordinal", "reverse_ordinal") (default: colours all nodes to 'black')
        • [html_file] : Name to save the HTML file as (default: 'springNetwork.html')
        • [backgroundColor] : Set the background colour of the plot (default: 'white')
        • [foregroundColor] : Set the foreground colour of the plot (default: 'black')
        • [chargeStrength] : The charge strength of the spring-embedded network (force between nodes) (default: -120)
        • [groupByBlock] : Setting to 'True' will group nodes by 'Block' if present in the data (default: False)
        • [groupFociStrength] : Set the strength of foci for each group (default: 0.2)
        • [intraGroupStrength] : Set the strength between each group (default: 0.01)
        • [groupLayoutTemplate] : Set the layout template to use for grouping (default: 'treemap')
        • [node_text_size] : The text size for each node (default: 15)
        • [fix_nodes] : Setting to 'True' will fix nodes in place when manually moved (default: False)
        • [displayLabel] : Setting to 'True' will set the node labels to the 'Label' column, otherwise it will set the labels to the 'Name' column from the Peak Table (default: False)
        • [node_data] : Peak Table column names to include in the mouse over information (default: 'Name' and 'Label')
        • [link_type] : The link type used in building the network (default: 'score')
        • [link_width] : The width of the links (default: 0.5)
        • [pos_score_color] : Colour value for positive scores. Can be HTML/CSS name, hex code, and (R,G,B) tuples (default: 'red')
        • [neg_score_color] : Colour value for negative scores. Can be HTML/CSS name, hex code, and (R,G,B) tuples (default: 'black')
      • [help] : Print this help text

      • [build] : Generates the JavaScript embedded HTML code and writes to a HTML file and opens it in a browser.

      • [buildDashboard] : Generates the JavaScript embedded HTML code in a dashboard format, writes to a HTML file and opens it in a browser.

  • clustermap: Produces a Hierarchical Clustered Heatmap.

    • init_parameters
      • [scores] : Pandas dataframe scores.
        • [row_linkage] : Precomputed linkage matrix for the rows from a linkage clustered distance/similarities matrix
        • [col_linkage] : Precomputed linkage matrix for the columns from a linkage clustered distance/similarities matrix
    • methods
      • [set_params] : Set parameters

        • [xLabels] : A Pandas Series for labelling the X axis
        • [yLabels] : A Pandas Series for labelling the Y axis
        • [imageFileName] : The image file name to save to (default: 'clusterMap.png')
        • [saveImage] : Setting to 'True' will save the image to file (default: True)
        • [dpi] : The number of Dots Per Inch (DPI) for the image (default: 200)
        • [figSize] : The figure size as a tuple (width,height) (default: (80,70))
        • [dendrogram_ratio_shift] : The ratio to shift the position of the dendrogram in relation to the heatmap (default: 0.0)
        • [dendrogram_line_width] : The line width of the dendrograms (default: 1.5)
        • [background_colour] : Set the background colour (default: 'white')
        • [transparent] : Setting to 'True' will ignore background_colour and make the background transparent (default: False)
        • [fontSize] : The font size for all text (default: 30)
        • [heatmap_annotation] : Annotate the heatmap with values (default: False)
        • [heatmap_cmap] : The CMAP colour palette to use for the heatmap (default: 'RdYlGn')
        • [cluster_cmap] : The CMAP colour palette to use for the branch separation of clusters in the dendrogram (default: 'Set1')
        • [rowColorCluster] : Setting to 'True' will display a colour bar for the clustered rows (default: False)
        • [colColorCluster] : Setting to 'True' will display a colour bar for the clustered columns (default: False)
        • [row_color_threshold] : The colouring threshold for the row dendrogram (default: 1)
        • [col_color_threshold] : The colouring threshold for the column dendrogram (default: 1)
      • [help] : Print this help text

      • [build] : Generates and displays the Hierarchical Clustered Heatmap (HCH).

  • plotFeatures: Produces different types of feature plots

    • init_parameters
      • [peaktable] : Pandas dataframe containing peak data. Must contain 'Name' and 'Label'.
      • [datatable] : Pandas dataframe containing matrix of values to plot (N samples x N features). Columns/features must be same as 'Name' from Peak Table.
    • methods
      • set_params : Set parameters

        • [plot_type] : The type of plot. Either "point", "violin", "box", "swarm", "violin-swarm" or "box-swarm" (default: 'point') - [column_numbers] : The number of columns to display in the plots (default: 4)
          - [log_data] : Perform a log ('natural', base 2 or base 10) on all data (default: (True, 2)) - [scale_data] : Scale the data ('standard' (centers to the mean and scales to unit variance), 'minmax' (scales between 0 and 1), 'maxabs' (scales to the absolute maximum value), 'robust' (centers to the median and scales to between 25th and 75th quantile range) (default: (True, 'minmax')) - [impute_data] : Impute any missing values using KNN impute with a set number of nearest neighbours (default: (True, 3)) - [style] : Set the seaborn style (default: 'seaborn-v0_8-white') - [transparent] : Setting to 'True' will make the background transparent (default: False)
          - [figSize] : The figure size as a tuple (width,height) (default: (15,10)) - [fontSize] : The font size for all text (default: 12) - [colour_palette] : The colour palette to use for the plot (default: None) - [y_axis_label] : The label to customise the y axis (default: None) - [x_axis_rotation] : Rotate the x axis labels this number of degrees (default: 0) - [group_column_name] : The group column name used in the datatable (e.g. 'Class') (default: None)
          - [point_estimator] : The statistical function to use for the point plot. Either "mean" or "median" (default: 'mean') - [point_ci] : The bootstrapped confidence interval for the point plot. Can also be standard deviation ("sd") (default: 95) - [violin_distribution_type] : The representation of the distribution of data points within the violin plot. Either "quartile", "box", "point", "stick" or None (default: 'box') - [violin_width_scale] : The method used to scale the width of the violin plot. Either "area", "count" or "width" (default: "width") - [box_iqr] : The proportion past the lower and upper quartiles to extend the plot whiskers for the box plot. Points outside this range will be identified as outliers (default: 1.5) - [saveImage] : Setting to 'True' will save the image to file (default: True) - [imageFileName] : The image file name to save to (default: [plot_type]_features.png')
          - [dpi] : The number of Dots Per Inch (DPI) for the image (default: 200)
      • [help] : Print this help text

      • [plot] : Generates feature plots.

  • polarDendrogram: Polar dendrogram

    • init_parameters
      • [dn] : Dendrogram dictionary labelled by Peak Table index
    • methods
      • set_params : Set parameters

        • [imageFileName] : The image file name to save to (default: 'polarDendrogram.png')
        • [saveImage] : Setting to 'True' will save the image to file (default: True)
        • [branch_scale] : The branch distance scale to apply ('linear', 'log', 'square') (default: 'linear')
        • [gap] : The gap size within the polar dendrogram (default: 0.1)
        • [grid] : Setting to 'True' will overlay a grid (default: False)
        • [style] : Set the seaborn style (default: 'seaborn-v0_8-white')
        • [transparent] : Setting to 'True' will make the background of all plots transparent (default: False)
        • [dpi] : The number of Dots Per Inch (DPI) for the image (default: 200)
        • [figSize] : The figure size as a tuple (width,height) (default: (10,10))
        • [fontSize] : The font size for all text (default: 15)
        • [PeakTable] : The Peak Table Pandas dataframe (default: empty dataframe)
        • [DataTable] : The Data Table Pandas dataframe (default: empty dataframe)
        • [group_column_name] : The group column name used in the datatable (e.g. 'Class') (default: None)
        • [textColorScale] : The scale to use for colouring the text ("linear", "reverse_linear", "log", "reverse_log", "square", "reverse_square", "area", "reverse_area", "volume", "reverse_volume", "ordinal", "reverse_ordinal") (default: 'linear')
        • [text_color_column] : The colour column to use from Peak Table (Can be colour or numerical values such as 'pvalue') (default: 'black')
        • [label_column] : The label column to use from Peak Table (default: use original Peak Table index from cartesian dendrogram)
        • [text_cmap] : The CMAP colour palette to use (default: 'brg')
      • [plotClusters] : Aggregates peaks from each cluster of the polar dendrogram and generates different feature plots across the group/class variables.

        • [plot_type] : The type of plot. Either "point", "violin", "box", "swarm", "violin-swarm" or "box-swarm" (default: 'point')
        • [column_numbers] : The number of columns to display in the plots (default: 4) - [log_data] : Perform a log ('natural', base 2 or base 10) on all data (default: (True, 2)) - [scale_data] : Scale the data ('standard' (centers to the mean and scales to unit variance), 'minmax' (scales between 0 and 1), 'maxabs' (scales to the absolute maximum value), 'robust' (centres to the median and scales to between 25th and 75th quantile range) (default: (True, 'minmax')) - [impute_data] : Impute any missing values using KNN impute with a set number of nearest neighbours (default: (True, 3)) - [figSize] : The figure size as a tuple (width,height) (default: (15,10)) - [fontSize] : The font size for all text (default: 12)
        • [colour_palette] : The colour palette to use for the plot (default: None)
        • [y_axis_label] : The label to customise the y axis (default: None)
        • [x_axis_rotation] : Rotate the x axis labels this number of degrees (default: 0) - [point_estimator] : The statistical function to use for the point plot. Either "mean" or "median" (default: 'mean')
        • [point_ci] : The bootstrapped confidence interval for the point plot. Can also be standard deviation ("sd") (default: 95) - [violin_distribution_type] : The representation of the distribution of data points within the violin plot. Either "quartile", "box", "point", "stick" or None (default: 'box') - [violin_width_scale] : The method used to scale the width of the violin plot. Either "area", "count" or "width" (default: "width") - [box_iqr] : The proportion past the lower and upper quartiles to extend the plot whiskers for the box plot. Points outside this range will be identified as outliers (default: 1.5)
        • [saveImage] : Setting to 'True' will save the image to file (default: True) - [imageFileName] : The image file name to save to (default: '[plot_type]_clusterPlots.png') - [dpi] : The number of Dots Per Inch (DPI) for the image (default: 200)
      • [help] : Print this help text

      • [build] : Generates and displays the Polar dendrogram.

  • pca: Creates a Principal Component Analysis (PCA) scores and loadings biplot.

    • parameters
      • [data] : array-like matrix, shape (n_samples, n_features)
      • [imageFileName] : The image file name to save to (default: 'PCA.png')
      • [saveImage] : Setting to 'True' will save the image to file (default: True)
      • [dpi] : The number of Dots Per Inch (DPI) for the image (default: 200)
      • [pcx] : The first component (default: 1)
      • [pcy] : The second component (default: 2)
      • [group_label] : Labels to assign to each group/class in the PCA plot (default: None)
      • [sample_label] : Labels to assign to each sample in the PCA plot (default: None)
      • [peak_label] : Labels to assign to each peak in the loadings biplot (default: None)
      • [markerSize] : The size of each marker (default: 100)
      • [fontSize] : The font size for all text (default: 12)
      • [figSize] : The figure size as a tuple (width,height) (default: (20,10))
      • [background_colour] : Set the background colour (default: 'white')
      • [grid] : Setting to 'True' will overlay a grid (default: True)
      • [transparent] : Setting to 'True' will ignore background_colour and make the background transparent (default: False)
      • [cmap] : The CMAP colour palette to use (default: 'Set1')
  • pcaLoadings: Creates a lollipop plot of PCA components with bootstrapped confidence intervals.

    • parameters
      • [data] : array-like, shape (n_samples, n_features)
      • [peak_label] : A list of peaks to plot
      • [imageFileName] : The image file name to save to (default: 'PCA_loadings.png')
      • [saveImage] : Setting to 'True' will save the image to file (default: True)
        • [dpi] : The number of Dots Per Inch (DPI) for the image (default: 200)
        • [pc_num] : The principal component to plot (default: 1)
        • [boot_num] : The number of bootstrap samples to use to calculate confidence internals (default: 500)
        • [alpha] : The alpha value for the bootstrapped confidence intervals (default: 0.05)
        • [fontSize] : The font size for all text (default: 30)
        • [markerSize] : The size of each marker (default: 100)
        • [figSize] : The figure size as a tuple (width,height) (default: (40,40))
        • [transparent] : Setting to 'True' will make the background transparent (default: False)
  • pcoa: Creates a Principal Coordinate Analysis (PCoA) plot.

    • parameters
      • [similarities] : array-like matrix, shape (n_samples, n_features)
      • [imageFileName] : The image file name to save to (default: 'PCOA.png')
      • [saveImage] : Setting to 'True' will save the image to file (default: True)
      • [dpi] : The number of Dots Per Inch (DPI) for the image (default: 200)
      • [n_components] : Number of components (default: 2)
      • [max_iter] : Maximum number of iterations of the SMACOF algorithm (default: 300)
      • [eps] : Relative tolerance with respect to stress at which to declare convergence (default: 1e-3)
      • [seed] : Seed number used by the random number generator for the RandomState instance (default: 3)
      • [group_label] : Labels to assign to each group/class (default: None)
      • [peak_label] : Labels to assign to each peak (default: None)
      • [markerSize] : The size of each marker (default: 100)
      • [fontSize] : The font size for all text (default: 12)
      • [figSize] : The figure size as a tuple (width,height) (default: (20,10))
      • [background_colour] : Set the background colour (default: 'white')
      • [grid] : Setting to 'True' will overlay a grid (default: True)
      • [transparent] : Setting to 'True' will ignore background_colour and make the background transparent (default: False)
      • [cmap] : The CMAP colour palette to use (default: 'Set1')

multivis.utils

  • loadData: Loads and validates the Data and Peak sheet from an excel file.

    • parameters
      • [filename] : The name of the excel file (.xlsx file) e.g. 'Data.xlsx'.
      • [DataSheet] : The name of the data sheet in the file e.g. 'Data'. The data sheet must contain an 'Idx', 'SampleID', and 'Class' column.
      • [PeakSheet] : The name of the peak sheet in the file e.g. 'Peak'. The peak sheet must contain an 'Idx', 'Name', and 'Label' column.
    • Returns
      • DataTable: Pandas dataFrame
      • PeakTable: Pandas dataFrame
  • groups2blocks: Slices the data by group/class name into blocks for later identification of multi-block associations and places the data into a dictionary indexed by group/class name.

    • parameters
      • [PeakTable] : Pandas dataframe containing the feature/peak data. Must contain 'Name' and 'Label'.
      • [DataTable] : Pandas dataframe matrix containing values. The data must contain a column separating out the different groups in the data (e.g. Class)
      • [group_column_name] : The group column name used in the datatable (e.g. Class)
    • Returns
      • [DataBlocks] : A dictionary containing DataTables indexed by group names
      • [PeakBlocks] : A dictionary containing PeakTables indexed by group names
  • mergeBlocks: Merges multiply different Data Tables and Peak Tables from dictionaries into a single Peak Table and Data Table (used for multi-block/multi-omics data preparation). The 'Name' column needs to be unique across all blocks. Automatically annotates the merged Peak Table with a 'Block' column and consolidates any statistical results generated from the multivis.utils.statistics package in relation to each block.

    • parameters
      • [peak_blocks] : A dictionary of Pandas Peak Table dataframes from different datasets indexed by dataset type.
      • [data_blocks] : A dictionary of Pandas Data Table dataframes from different datasets indexed by dataset type.
      • [mergeType] : The type of merging to perform. Either by 'SampleID' or 'Index'.
    • Returns
      • [DataTable] : Merged Pandas dataFrame
      • [PeakTable] : Merged Pandas dataFrame (with any statistical results generated by multivis.utils.statistics consolidated into each block)
  • transform: Scales and transforms data in forward or reverse order based on different transform options.

    • parameters
      • [data] : A 1D numpy array of values
      • [transform_type] : The transform type to apply to the data ("linear", "reverse_linear", "log", "reverse_log", "square", "reverse_square", "area", "reverse_area", "volume", "reverse_volume", "ordinal", "reverse_ordinal")
      • [min] : The minimum value for scaling
      • [max] : The maximum value for scaling
    • Returns
      • [transformed_data] : A scaled and transformed 1D numpy array
  • scaler: Scales a series of values in a 1D numpy array or pandas dataframe matrix based on different scaling functions

    • parameters

      • [data] : A pandas dataframe matrix or 1D numpy array of numerical values
      • [type] : The scaler type to apply based on sklearn preprocessing functions (default: "standard")
      • [stdScaler_with_mean] : Using "standard" scaler, center the data to the mean before scaling (default: True)
        • [stdScaler_with_std] : Using "standard" scaler, scale the data to unit variance (default: True)
        • [robust_with_centering] : Using "robust" scaler, center the data to the median before scaling (default: True)
        • [robust_with_scaling] : Using "robust" scaler, scale the data to within the quantile range (default: True)
        • [robust_unit_variance] : Using "robust" scaler, scale the data so that normally distributed features have a variance of 1 (default: False)
        • [minimum] : Using "minmax" scaler, set the minimum value for scaling (default: 0)
        • [maximum] : Using "minmax" scaler, set the maximum value for scaling (default: 1)
        • [lower_iqr] : Using "robust" scaler, set the lower quantile range (default: 25.0)
        • [upper_iqr] : Using "robust" scaler, set the upper quantile range (default: 75.0)
    • Returns

      • [scaled_data] : A scaled pandas dataframe matrix or 1D numpy array of numerical values
  • imputeData: Imputes data given a pandas dataframe of values

    • parameters
      • [data] : A pandas dataframe of values
      • [k] : The number of nearest neighbours
    • Returns
      • [data_filled] : Imputed data
  • statistics: Generate a table of parametric or non-parametric statistics and merges them with the Peak Table (node table).

    • init_parameters
      • [peaktable] : Pandas dataframe containing peak data. Must contain 'Name' and 'Label'. - [datatable] : Pandas dataframe matrix containing values for statistical analysis
    • methods
      • [set_params] : Set parameters

        • [parametric] : Perform parametric statistical analysis, assuming the data is normally distributed (default: True) - [log_data] : Perform a log ('natural', base 2 or base 10) on all data prior to statistical analysis (default: (False, 2)) - [scale_data] : Scale the data ('standard' (centers to the mean and scales to unit variance), 'minmax' (scales between 0 and 1), 'maxabs' (scales to the absolute maximum value), 'robust' (centers to the median and scales to between 25th and 75th quantile range) (default: (True, 'standard')) - [impute_data] : Impute any missing values using KNN impute with a set number of nearest neighbours (default: (False, 3))
        • [group_column_name] : The group column name used in the datatable (default: None) - [control_group_name] : The control group name in the datatable, if available (default: None) - [group_alpha_CI] : The alpha value for group confidence intervals (default: 0.05) - [fold_change_alpha_CI] : The alpha value for mean/median fold change confidence intervals (default: 0.05) - [pca_alpha_CI] : The alpha value for the PCA confidence intervals (default: 0.05) - [total_missing] : Calculate the total missing values per feature (Default: False) - [group_missing] : Calculate the missing values per feature per group (if group_column_name not None) (Default: False) - [pca_loadings] : Calculate PC1 and PC2 loadings for each feature (Default: True) - [normality_test] : Determine normal distribution across whole dataset using Shapiro-Wilk test (pvalues < 0.05 ~ non-normal distribution) (default: True) - [group_normality_test] : Determine normal distribution across each group (if group_column_name not None) using Shapiro-Wilk test (pvalues < 0.05 ~ non-normal distribution) (default: True) - [group_mean_CI] : Determine the mean with bootstrapped CI across each group (if parametric = True and group_column_name not None) (default: True) - [group_median_CI] : Determine the median with bootstrapped CI across each group (if parametric = False and group_column_name not None) (default: True) - [mean_fold_change] : Calculate the mean fold change with bootstrapped confidence intervals (if parametric = True, group_column_name not None and control_group_name not None) (default: False)
        • [median_fold_change] : Calculate the median fold change with bootstrapped confidence intervals (if parametric = False, group_column_name not None and control_group_name not None) (default: False) - [levene_twoGroup] : Test null hypothesis that control group and each of the other groups come from populations with equal variances (if group_column_name not None and control_group_name not None) (default: False) - [levene_allGroup] : Test null hypothesis that all groups come from populations with equal variances (if group_column_name not None) (default: False) - [oneway_Anova_test] : Test null hypothesis that all groups have the same population mean, with included Benjamini-Hochberg FDR (if parametric = True and group_column_name not None) (default: False) - [kruskal_wallis_test] : Test null hypothesis that population median of all groups are equal, with included Benjamini-Hochberg FDR (if parametric = False and group_column_name not None) (default: False) - [ttest_oneGroup] : Calculate the T-test for the mean across all the data (one group), with included Benjamini-Hochberg FDR (if parametric = True, group_column_name is None or there is only 1 group in the data) (default: False) - [ttest_twoGroup] : Calculate the T-test for the mean of two groups, with one group being the control group, with included Benjamini-Hochberg FDR (if parametric = True, group_column_name not None and control_group_name not None) (default: False) - [mann_whitney_u_test] : Compute the Mann-Whitney rank test on two groups, with one being the control group, with included Benjamini-Hochberg FDR (if parametric = False, group_column_name not None and control_group_name not None) (default: False)
      • [help] : Print this help text

      • [calculate] : Performs the statistical calculations and outputs the Peak Table (node table) with the results appended.

  • corrAnalysis: Correlation analysis on a matrix of values with Pearson, Spearman or Kendall's Tau.

    • parameters
      • [df_data] : A Pandas dataframe matrix of values
      • [correlationType] : The correlation type to apply. Either 'Pearson', 'Spearman' or 'KendallTau'
    • Returns
      • [df_corr] : Pandas dataframe matrix of all correlation coefficients
      • [df_pval] : Pandas dataframe matrix of all correlation pvalues
  • cluster: Clusters data using a linkage cluster method. If the data is correlated the correlations are first preprocessed, then clustered, otherwise a distance metric is applied to non-correlated data before clustering.

    • parameters
      • [matrix] : A Pandas dataframe matrix of scores
      • [transpose_non_correlated] : Setting to 'True' will transpose the matrix if it is not correlated data
      • [is_correlated] : Setting to 'True' will treat the matrix as if it contains correlation coefficients
      • [distance_metric] : Set the distance metric. Used if the matrix does not contain correlation coefficients.
      • [linkage_method] : Set the linkage method for the clustering.
    • Returns
      • [matrix] : The original matrix, transposed if transpose_non_correlated is 'True' and is_correlated is 'False'.
      • [row_linkage] : linkage matrix for the rows from a linkage clustered distance/similarities matrix
      • [col_linkage] : linkage matrix for the columns from a linkage clustered distance/similarities matrix

License

Multivis is licensed under the MIT license.

Authors

Correspondence

Dr. Brett Chapman, Post-doctoral Research Fellow at the Western Crop Genetics Alliance, Murdoch University. E-mail: [email protected], [email protected]

Citation

If you would like to cite MultiVis in a scientific publication, please cite this GitHub page until a citation to a publication becomes available.