Skip to content

Commit

Permalink
Merging (v2023.7.20) (#6)
Browse files Browse the repository at this point in the history
* Added pairwise_mcc functionality

Added "mcc" to the `acceptable_metrics` dictionary in the EnsembleAssociationNetwork class.
Added the `pairwise_mcc` function to the fit function of the `EnsembleAssociationNetwork` class.
Still need to confirm whether it returns True when it is passed to the hasattr function in line 1662 (in the main branch).

* Moved pairwise_mcc and modified __init__ file

Moved `pairwise_mcc` underneath the `pairwise_biweight_midcorrelation` function. Added `pairwise_mcc` to the list of functions in the __init__ file.

* v2023.7.20

2023.7.20 - Added `pairwise_mcc` with Mathew's Correlation Coefficient for binary correlations. Functionality also available in `EnsembleAssociationNetwork` ([@411an13](https://github.com/411an13))

---------

Co-authored-by: Allan Phillips <[email protected]>
  • Loading branch information
jolespin and 411an13 authored Jul 20, 2023
1 parent 39a6001 commit c583166
Show file tree
Hide file tree
Showing 7 changed files with 3,045 additions and 56 deletions.
19 changes: 10 additions & 9 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,16 @@


#### Completed:
* 2023.07.18 - Fixed issue with `SampleSpecificPerturbationNetwork` not being able to handle `X.index` with a `.name` that was not `NoneType`. Created a hack to allow `pd.MultiIndex` support (converts to strings and warns). Made `include_reference_for_samplespecific=True` the new default which creates a clone of the reference and uses that as the background network. Added `is_square` to `Symmetric` object.
* 2022.02.09 - Added support for iGraph and non-fully connected networks. Also added UMAP `fuzzy_simplical_set` graph
* 2021.06.24 - Added `get_weights_from_graph` function
* 2021.06.09 - Fixed `condensed_to_dense` ability to handle self interactions
* 2021.04.21 - Fixed `idx_nodes = pd.Index(sorted(set(groups[lambda x: x == group].index) & set(df_dense.index)))` in `connectivity` function to prepare for pandas deprecation.
* 2021.04.12 - Added `community_detection` wrapper for `python-louvain` and `leidenalg`. Changed `cluster_modularity` function to `cluster_homogeneity` to not be confused with `modularity` metric used for louvain algorithm.
* 2021.03.09 - Large changes took place in this version. Removed dependency of HiveNetworkX and moved many non-Hive plot functions/classes to EnsembleNetworkX. Now HiveNetworkX depends on EnsembleNetworkX which will be the more generalizable extension to NetworkX in the Soothsayer ecosystem while maintaining HiveNetworkX's core object on Hive plots. This version has also incorporated a feature engineering class called `CategoricalEngineeredFeature` that is a generalizable replacement to Soothsayer's PhylogenomicFunctionalComponent (which is being deprecated).
* 2020.07.24 - Added `DifferentialEnsembleAssociationNetwork`
* 2020.07.21 - `SampleSpecificPerturbationNetwork` fit method returns self
* 2023.7.20 - Added `pairwise_mcc` with Mathew's Correlation Coefficient for binary correlations. Functionality also available in `EnsembleAssociationNetwork` ([@411an13](https://github.com/411an13))
* 2023.7.18 - Fixed issue with `SampleSpecificPerturbationNetwork` not being able to handle `X.index` with a `.name` that was not `NoneType`. Created a hack to allow `pd.MultiIndex` support (converts to strings and warns). Made `include_reference_for_samplespecific=True` the new default which creates a clone of the reference and uses that as the background network. Added `is_square` to `Symmetric` object.
* 2022.2.9 - Added support for iGraph and non-fully connected networks. Also added UMAP `fuzzy_simplical_set` graph
* 2021.6.24 - Added `get_weights_from_graph` function
* 2021.6.9 - Fixed `condensed_to_dense` ability to handle self interactions
* 2021.4.21 - Fixed `idx_nodes = pd.Index(sorted(set(groups[lambda x: x == group].index) & set(df_dense.index)))` in `connectivity` function to prepare for pandas deprecation.
* 2021.4.12 - Added `community_detection` wrapper for `python-louvain` and `leidenalg`. Changed `cluster_modularity` function to `cluster_homogeneity` to not be confused with `modularity` metric used for louvain algorithm.
* 2021.3.9 - Large changes took place in this version. Removed dependency of HiveNetworkX and moved many non-Hive plot functions/classes to EnsembleNetworkX. Now HiveNetworkX depends on EnsembleNetworkX which will be the more generalizable extension to NetworkX in the Soothsayer ecosystem while maintaining HiveNetworkX's core object on Hive plots. This version has also incorporated a feature engineering class called `CategoricalEngineeredFeature` that is a generalizable replacement to Soothsayer's PhylogenomicFunctionalComponent (which is being deprecated).
* 2020.7.24 - Added `DifferentialEnsembleAssociationNetwork`
* 2020.7.21 - `SampleSpecificPerturbationNetwork` fit method returns self


#### Pending:
Expand Down
53 changes: 46 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,13 @@ High-level [Ensemble](https://en.wikipedia.org/wiki/Ensemble_averaging_(machine_
#### Dependencies:
Compatible for Python 3.

pandas >= 1
panda
numpy
scipy >= 1
networkx >= 2
matplotlib >= 3
soothsayer_utils >= 2021.03.08
compositional >= 2020.05.19
scipy
networkx
matplotlib
soothsayer_utils
compositional

#### Citations (Debut):

Expand Down Expand Up @@ -96,7 +96,46 @@ print(ens.stats_.head())

```

##### Simple case of creating sample-specific perturbation networks
##### Simple case of an ensemble network for binary data using Mathew's Correlation Coefficient (MCC)

```
# Create ensemble network using MCC for binary data
n,m = 1000, 100
X_binary = pd.DataFrame(
data=np.random.RandomState(0).choice([0,1], size=(n,m)),
index=map(lambda i: f"sample_{i}", range(n)),
columns=map(lambda j:f"feature_{j}", range(m)),
)
ens_binary = enx.EnsembleAssociationNetwork(name="Binary", edge_type="association")
ens_binary.fit(X=X_binary, metric="mcc", n_iter=100, stats_summary=[np.mean,np.var], copy_ensemble=True)
print(ens_binary)
# ====================================================
# EnsembleAssociationNetwork(Name:Binary, Metric: mcc)
# ====================================================
# * Number of nodes (None): 100
# * Number of edges (association): 4950
# * Observation type: None
# ------------------------------------------------
# | Parameters
# ------------------------------------------------
# * n_iter: 100
# * sampling_size: 618
# * random_state: 0
# * with_replacement: False
# * transformation: None
# * memory: 4.894 MB
# ------------------------------------------------
# | Data
# ------------------------------------------------
# * Features (n=1000, m=100, memory=821.352 KB)
# * Ensemble (memory=3.777 MB)
# * Statistics (['mean', 'var', 'normaltest|stat', 'normaltest|p_value'], memory=322.398 KB)
```

##### Simple case of creating sample-specific perturbation networks for compositional data using [Rho Proportionality](https://pubmed.ncbi.nlm.nih.gov/26762323/)

Iris data isn't compositional but this is for demonstration since they are positive values.


```python
Expand Down
77 changes: 77 additions & 0 deletions ensemble_networkx/.ipynb_checkpoints/__init__-checkpoint.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# ==============
# Ensemble NetworkX
# ==============
# Ensemble networks in Python
# ------------------------------------
# GitHub: https://github.com/jolespin/ensemble_networkx
# PyPI: https://pypi.org/project/ensemble_networkx/
# ------------------------------------
# =======
# Contact
# =======
# Producer: Josh L. Espinoza
# Contact: [email protected], [email protected]
# Google Scholar: https://scholar.google.com/citations?user=r9y1tTQAAAAJ&hl
# =======
# License BSD-3
# =======
# https://opensource.org/licenses/BSD-3-Clause
#
# Copyright 2020 Josh L. Espinoza
#
# Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
#
# 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
#
# 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

#
# =======
# Version
# =======
__version__= "2023.7.18"
__author__ = "Josh L. Espinoza"
__email__ = "[email protected], [email protected]"
__url__ = "https://github.com/jolespin/ensemble_networkx"
__license__ = "BSD-3"
__developmental__ = True

# =======
# Direct Exports
# =======
__functions__ = [
"pairwise_biweight_midcorrelation",
"umap_fuzzy_simplical_set_graph",
"pairwise_mcc"
] + [
"signed",
"get_symmetric_category",
"dense_to_condensed",
"condensed_to_dense",
"convert_network",
] + [
"connectivity",
"density",
"centralization",
"heterogeneity",
"topological_overlap_measure",
"community_detection",
"cluster_homogeneity",
]
__classes__ = [
'EnsembleAssociationNetwork',
'SampleSpecificPerturbationNetwork',
'DifferentialEnsembleAssociationNetwork',
'CategoricalEngineeredFeature',
'Symmetric',
]

__all__ = sorted(__functions__ + __classes__)

from .ensemble_networkx import *


Loading

0 comments on commit c583166

Please sign in to comment.