Skip to content

Commit

Permalink
add impresso-team as author
Browse files Browse the repository at this point in the history
  • Loading branch information
danieleguido committed Oct 11, 2024
1 parent 54076d2 commit ea26806
Show file tree
Hide file tree
Showing 3 changed files with 119 additions and 7 deletions.
19 changes: 17 additions & 2 deletions src/content/notebooks/impresso-py-maps.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,41 +2,48 @@
title: Exploring impresso with maps
githubUrl: https://github.com/impresso/impresso-datalab-notebooks/blob/main/4-impresso-py/maps_explore.ipynb
authors:
- impresso-team
- RomanKalyakin
sha: 168c669246385a2ec6c3e088b0081364f129d11c
date: 2024-09-27T12:54:12Z
googleColabUrl: https://colab.research.google.com/github/impresso/impresso-datalab-notebooks/blob/main/4-impresso-py/maps_explore.ipynb
---

{/* cell:0 cell_type:markdown */}

## Install dependencies

We need the following packages:

* [impresso-py](https://impresso-project.ch/)
* [ipyleaflet](https://ipyleaflet.readthedocs.io/en/latest/index.html)
- [impresso-py](https://impresso-project.ch/)
- [ipyleaflet](https://ipyleaflet.readthedocs.io/en/latest/index.html)

{/* cell:1 cell_type:code */}

```python
%pip install git+https://github.com/impresso/impresso-py.git ipyleaflet
```

{/* cell:2 cell_type:markdown */}

## Connect to Impresso

{/* cell:3 cell_type:code */}

```python
from impresso import connect, OR, DateRange

impresso = connect(public_api_url="https://dev.impresso-project.ch/public-api")
```

{/* cell:4 cell_type:markdown */}

## Search and collect entities

Find top 100 location entities mentioned in articles that talk about nuclear power plants in the first three decades following the second world war.

{/* cell:5 cell_type:code */}

```python
locations = impresso.search.facet(
"location",
Expand All @@ -52,6 +59,7 @@ locations
Get entities details, including wikidata details

{/* cell:7 cell_type:code */}

```python
entities_ids = locations.df.index.tolist()
entities = impresso.entities.find(entity_id=OR(*entities_ids), load_wikidata=True, limit=len(entities_ids))
Expand All @@ -62,6 +70,7 @@ entities
Filter out entities that have no coordinates and add a country tag.

{/* cell:9 cell_type:code */}

```python
df = entities.df
entities_with_coordinates = df[df['wikidata.coordinates.latitude'].notna() & df['wikidata.coordinates.longitude'].notna()]
Expand All @@ -74,6 +83,7 @@ entities_with_coordinates
Add counts of mentions to the entities dataframe.

{/* cell:11 cell_type:code */}

```python
entities_with_coordinates['mentions_count'] = entities_with_coordinates.index.map(locations.df['count'])
```
Expand All @@ -82,6 +92,7 @@ entities_with_coordinates['mentions_count'] = entities_with_coordinates.index.ma
Plot entities on the map.

{/* cell:13 cell_type:markdown */}

### Utility methods

Functions used to calculate extra details needed to plot data on a map.
Expand All @@ -90,6 +101,7 @@ Functions used to calculate extra details needed to plot data on a map.
Find geo bounds of a group of items.

{/* cell:15 cell_type:code */}

```python
def find_bounds(coordinates):
"""
Expand Down Expand Up @@ -124,6 +136,7 @@ def find_bounds(coordinates):
Create an HTML used for rendering the hover pop-up.

{/* cell:17 cell_type:code */}

```python
from ipywidgets import HTML
from ipyleaflet import Popup
Expand All @@ -150,9 +163,11 @@ def build_hover_popup(title: str, subtitle: str, mentions: int) -> Popup:
```

{/* cell:18 cell_type:markdown */}

### Plot

{/* cell:19 cell_type:code */}

```python
from ipyleaflet import Map, Marker, AwesomeIcon, CircleMarker

Expand Down
21 changes: 21 additions & 0 deletions src/content/notebooks/impresso-py-network.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,41 +2,49 @@
title: Network graph with Impresso Py
githubUrl: https://github.com/impresso/impresso-datalab-notebooks/blob/main/4-impresso-py/network_graph.ipynb
authors:
- impresso-team
- RomanKalyakin
sha: 168c669246385a2ec6c3e088b0081364f129d11c
date: 2024-09-27T12:54:12Z
googleColabUrl: https://colab.research.google.com/github/impresso/impresso-datalab-notebooks/blob/main/4-impresso-py/network_graph.ipynb
---

{/* cell:0 cell_type:markdown */}

## Install dependencies

{/* cell:1 cell_type:code */}

```python
%pip install git+https://github.com/impresso/impresso-py.git ipysigma
```

{/* cell:2 cell_type:markdown */}

## Connect to Impresso

{/* cell:3 cell_type:code */}

```python
from impresso import connect, OR, AND

impresso = connect(public_api_url="https://dev.impresso-project.ch/public-api")
```

{/* cell:4 cell_type:markdown */}

## Part 1: Get entities and their co-occurrences

Find all persons mentioned in all articles that talk about the [Prague Spring](https://en.wikipedia.org/wiki/Prague_Spring).

{/* cell:5 cell_type:code */}

```python
query = OR("Prague Spring", "Prager Frühling", "Printemps de Prague")
```

{/* cell:6 cell_type:code */}

```python
persons = impresso.search.facet(
facet="person",
Expand All @@ -51,6 +59,7 @@ persons
Get all combinations of all entities with a mention count higher than `N`.

{/* cell:8 cell_type:code */}

```python
import itertools

Expand All @@ -66,6 +75,7 @@ print(f"Total combinations: {len(persons_ids_combinations)}")
```

{/* cell:9 cell_type:code */}

```python
if len(persons_ids_combinations) > 500:
msg = (
Expand All @@ -81,6 +91,7 @@ if len(persons_ids_combinations) > 500:
Get timestamps and counts of all articles where persons pairs appear.

{/* cell:11 cell_type:code */}

```python
from impresso.util.error import ImpressoError
from time import sleep
Expand Down Expand Up @@ -115,6 +126,7 @@ for idx, combo in enumerate(persons_ids_combinations):
Put them all into a dataframe

{/* cell:13 cell_type:code */}

```python
import pandas as pd

Expand All @@ -132,14 +144,17 @@ connections_df
```

{/* cell:14 cell_type:code */}

```python
connections_df.to_csv("connections.csv")
```

{/* cell:15 cell_type:markdown */}

## Part 2: visualise

{/* cell:16 cell_type:code */}

```python
import pandas as pd

Expand All @@ -148,6 +163,7 @@ connections_df
```

{/* cell:17 cell_type:code */}

```python
grouped_connections_df = connections_df.groupby(['node_a', 'node_b']) \
.agg({'timestamp': lambda x: ', '.join(list(x)), 'count': 'sum', 'url': lambda x: list(set(x))[0]}) \
Expand All @@ -156,6 +172,7 @@ grouped_connections_df
```

{/* cell:18 cell_type:code */}

```python
import networkx as nx

Expand All @@ -172,12 +189,14 @@ G.nodes
```

{/* cell:19 cell_type:code */}

```python
filename = input("Enter the filename: ")
filename = f"{filename.replace(' ', '_')}.gefx"
```

{/* cell:20 cell_type:code */}

```python
nx.write_gexf(G, filename)
```
Expand All @@ -186,6 +205,7 @@ nx.write_gexf(G, filename)
If running in Colab - activate custom widgets to allow Sigma to render the graph.

{/* cell:22 cell_type:code */}

```python
try:
from google.colab import output
Expand All @@ -198,6 +218,7 @@ except:
Render the graph.

{/* cell:24 cell_type:code */}

```python
import networkx as nx
from ipysigma import Sigma
Expand Down
Loading

0 comments on commit ea26806

Please sign in to comment.