Impression Network Analysis

This project is a comprehensive analysis of a network of impressions derived from a dataset. It includes various algorithms for shortest path analysis, missing link prediction, and ranking nodes using random walk-based PageRank. The implementation is written in Python and utilizes the NetworkX library for graph-related operations.

Features

Random Walk-Based PageRank
- Implements a random walk algorithm to rank nodes based on their importance.
- Identifies the top leaders in the network based on normalized ratings.
Missing Link Prediction
- Uses a matrix-based method to predict missing edges in the network.
- Adds predicted missing links to the graph and recalculates PageRank scores for updated rankings.
Shortest Path Analysis
- Computes the shortest distances between all pairs of nodes using Dijkstra's algorithm.
- Provides metrics such as the maximum distance, unreachable node pairs, and node absence rates.

Files

requirements.txt: Specifies the required libraries for the project.
shortest_path_analysis.py: Contains the implementation for shortest path calculations and related statistics.
missing_link_prediction.py: Implements the missing link prediction algorithm and recalculates PageRank for the updated network.
random_walk_pagerank.py: Performs node ranking using a random walk algorithm.
dataset.csv: The dataset file representing the impression network.
Report.pdf: The report of the final project, containing detailed statistics and results of the project.

Prerequisites

Python 3.7 or higher
Install dependencies using the following command:
```
pip install -r requirements.txt
```

Usage

Random Walk-Based PageRank
- Execute random_walk_pagerank.py to compute the PageRank scores using a random walk method.
- Example:
```
python random_walk_pagerank.py
```
Missing Link Prediction
- Run missing_link_prediction.py to identify and add missing links, followed by recalculating rankings.
- Example:
```
python missing_link_prediction.py
```
Shortest Path Analysis
- Run shortest_path_analysis.py to compute shortest distances between nodes and related metrics.
- Example:
```
python shortest_path_analysis.py
```

Dataset

The dataset should be in CSV format. The columns are interpreted as follows:

Parent Node: The second column specifies the source node.
Child Nodes: Subsequent columns represent target nodes connected to the source node.

Output

Random Walk-Based PageRank
- Top 10 nodes with the highest ratings
Missing Link Prediction
- List of predicted links
- Updated PageRank rankings
Shortest Path Analysis
- Distance matrix
- Maximum distance
- Unreachable node statistics

Libraries Used

NetworkX: For graph creation and manipulation.
NumPy: For numerical computations.
CSV: For handling the dataset file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Impression Network Analysis

Features

Files

Prerequisites

Usage

Dataset

Output

Libraries Used

Author

Prepared by Shubham Aggarwal.

Files

README.md

Latest commit

History

README.md

File metadata and controls

Impression Network Analysis

Features

Files

Prerequisites

Usage

Dataset

Output

Libraries Used

Author

Prepared by Shubham Aggarwal.