resolved conflict

prio-data · Sep 25, 2024 · e683f75 · e683f75
2 parents bb4dd38 + 7504d17
commit e683f75
Show file tree

Hide file tree

Showing 4 changed files with 316 additions and 1 deletion.
diff --git a/.gitignore b/.gitignore
@@ -190,7 +190,8 @@ cython_debug/
 *.csv
 *.npy
 *.parquet
-*.pt 
+*.pt
+*.pkl 
 
 # folders
 **/wandb/

diff --git a/common_querysets/get_queryset_purple_alien.py b/common_querysets/get_queryset_purple_alien.py
@@ -0,0 +1,25 @@
+from viewser import Queryset, Column
+
+def get_queryset_purple_alien():
+
+    """
+    Contains the configuration for the input data in the form of a viewser queryset. That is the data from viewser that is used to train the model.
+    This configuration is "behavioral" so modifying it will affect the model's runtime behavior and integration into the deployment system.
+    There is no guarantee that the model will work if the input data configuration is changed here without changing the model settings and architecture accordingly.
+
+    Returns:
+    queryset_base (Queryset): A queryset containing the base data for the model training.
+    """
+
+    # VIEWSER 6
+    queryset_base = (Queryset("purple_alien", "priogrid_month")
+        .with_column(Column("ln_sb_best", from_loa = "priogrid_month", from_column = "ged_sb_best_count_nokgi").transform.ops.ln().transform.missing.replace_na())
+        .with_column(Column("ln_ns_best", from_loa = "priogrid_month", from_column = "ged_ns_best_count_nokgi").transform.ops.ln().transform.missing.replace_na())
+        .with_column(Column("ln_os_best", from_loa = "priogrid_month", from_column = "ged_os_best_count_nokgi").transform.ops.ln().transform.missing.replace_na())
+        .with_column(Column("month", from_loa = "month", from_column = "month"))
+        .with_column(Column("year_id", from_loa = "country_year", from_column = "year_id"))
+        .with_column(Column("c_id", from_loa = "country_year", from_column = "country_id"))
+        .with_column(Column("col", from_loa = "priogrid", from_column = "col"))
+        .with_column(Column("row", from_loa = "priogrid", from_column = "row")))
+
+    return queryset_base
diff --git a/documentation/ADRs/011_Common_Querysets_for_Model_Pipelines.md b/documentation/ADRs/011_Common_Querysets_for_Model_Pipelines.md
@@ -0,0 +1,56 @@
+# ADR 011 - Common Querysets for Model Pipelines
+
+
+| ADR Info            | Details                           |
+|---------------------|-----------------------------------|
+| Subject             | Common Querysets for Model Pipelines |
+| ADR Number          | 011                               |
+| Status              | Accepted                          |
+| Author              | Simon, Jim, Borbála               |
+| Date                | 16.09.2024                        |
+
+## Context
+Currently, querysets used by different models are stored under the path `views_pipeline/models/[model]/configs/config_input_data.py`. Each queryset is model-specific. We want to keep to option to have model specific querysets but we also want the option to share querysets across multiple models. Additionally we would like some locality and easy of access to all querysets for the sake of documantion and ease of overview.
+
+To address these points, we propose to modify the directory structure slightly to allow for easier access to querysets and promote reusability. By moving the querysets into a centralized `views_pipeline/common_queryset` directory, querysets can be shared across models without compromising modularity. This change also enhances clarity in the organization of querysets and their associations with models.
+
+## Decision
+- All querysets will now be stored as individual scripts under `views_pipeline/common_querysets`. 
+- Each queryset will have its own script, and both the script and the function defining the queryset will follow a consistent naming convention based on the first model using it.
+- For instance, a queryset for the model "purple_alien" will be named `get_queryset_purple_alien.py` with the function `get_queryset_purple_alien()`. If another queryset merges existing querysets (e.g., `purple_alien` and `orange_pasta`), the merged queryset will also follow the naming convention of the first model using it (e.g., `get_queryset_big_boss.py`).
+- The intenal naming of the queryset (the naming send to viewser) should also follow this convention. E.g. for `purple_alien` or `orange_pasta` respectively would be the name of the queryset send to viewser 
+- This naming convention ensures traceability and prevents confusion when querysets are shared across multiple models.
+- A gradual migration will be employed, moving querysets from model-specific directories to `views_pipeline/common_querysets` as necessary.
+- All dataloader's etc will need updating to refelct this change. 
+- The `set_path` function have been edited to include the new `common_querysets` directory.
+- Queryset names will remain static once assigned to ensure consistency.
+
+## Consequences
+**Positive Effects:**
+- **Modularity**: Querysets will be easier to locate, understand, and reuse across models, promoting consistency and reducing redundancy.
+- **Maintainability**: Shared querysets will ensure that changes to one queryset are reflected in all models using it, simplifying future updates.
+- **Clarity**: Maintaining a clear separation of querysets from models will ensure a modular organization of the pipeline.
+
+**Negative Effects:**
+- **Migration Overhead**: The migration of querysets to the new structure will be gradual, requiring coordination between model code, dataloaders, and documentation updates.
+- **Shared Dependencies**: Merged and shared querysets will inherit changes made to individual querysets, which could introduce unintentional side effects such as unexpected feature transformations or data discrepancies in models using the merged queryset. This is inherently a feature, but one that could led to bugs. As such, developer should be cautious and mindful about functionality.  
+
+## Rationale
+The decision to move querysets to a centralized directory aims to make them more accessible and reusable, particularly as models evolve and the pipeline grows. The goal is to improve organization and reduce redundancy without adding unnecessary complexity.
+
+By naming querysets after the first model using them, we ensure traceability and prevent potential conflicts. Naming will remain static to avoid confusion in shared usage across models.
+
+Merged querysets offer flexibility, but it is crucial to understand the implications of changing individual querysets when they are part of a merged set.
+
+## Considerations
+- The decision to merge querysets should be carefully evaluated. Merging ensures that changes to individual querysets automatically affect the merged queryset, whereas creating a new queryset from scratch will keep it independent.
+- Creating new querysets is most appropriate when models diverge substantially in their feature requirements or when modifications to existing querysets may disrupt dependent models.
+- Documentation should point back to the actual queryset being used, avoiding outdated references. Model READMEs and catalogs should clearly state which queryset is associated with each model.
+- Developers are encouraged to follow existing docstring examples to document their querysets.
+
+## Additional Notes
+- Unit tests for queryset are not required at this time - but should be later on.
+- viewser have specific tools for for combining querysets, and this functionality is already documented.
+
+## Feedback and Suggestions
+Please provide feedback or suggestions for improvement through the repository’s issue tracker or during regular team meetings.
diff --git a/reports/slides/new_york_september.md b/reports/slides/new_york_september.md
@@ -0,0 +1,233 @@
+---
+marp: true
+title: Spatiotemporal Learning in Action
+theme: default #gaia #uncover
+class: #invert
+math: mathjax
+---
+
+# VIEWS
+
+**Violence & Impacts Early-Warning System**
+
+&nbsp;
+Kluz Special Distinction
+&nbsp;
+
+![w:10cm](image_files/prio_VIEWS.png)
+![bg 100% right:50%](image_files/zstack.png)
+
+---
+
+### The Unreasonable Effectiveness of Being Prepared
+
+- **Checking the weather forecast**: So you don’t leave the house without an umbrella on a rainy day.
+
+- **Checking your calendar**: To make sure you don’t double-book yourself or miss that very important thing.
+
+- **Checking traffic before your commute**: So you avoid getting stuck in rush hour and get where you need to be on time.
+
+![bg 330% right:33%](image_files/rainy_umbrella.jpg)
+
+
+---
+
+### Why Early Conflict Warning Matters
+
+- **Early Warning Systems** (EWS) provide humanitarian actors with critical information and time for **Early Action**.
+
+- Enable early **resource allocation**, **personnel deployment**, and **evacuation of civilians**.
+
+- Give stakeholders the tools to **anticipate conflict**, **prepare**, and **reduce human suffering**.
+
+![bg 140% right:33%](image_files/aid.png)
+
+---
+
+
+# VIEWS: Violence & Impacts Early-Warning System
+
+---
+### VIEWS
+
+An advanced **machine learning based** system that delivers **early warnings** of violent conflict by **forecasting** the expected number of future **conflict fatalities** across the globe.
+
+
+![bg 100% right:60%](image_files/pipeline_diagram001.png)
+
+
+---
+
+
+**Global and Local Coverage:** It offers forecasts at both the country level and a detailed grid level (Africa and Middle East for now).
+
+**36-Month Forecasts:** New conflict forecasts are updated every month, projecting expected fatalites each month up to 36 months ahead.
+
+**Humanitarian Focus:** The aim is to empower humanitarian organizations and stakeholders to take early action, minimizing human suffering.
+
+![bg 100% right:39%](image_files/zstack.png)
+
+<!---
+---
+
+**Minimizes Uncertainty**: Early warnings reduce the unpredictability of conflicts, enabling proactive planning instead of reactive crisis management.
+
+**Accelerates Response**: Being prepared ensures quicker, more coordinated responses, minimizing the delays that cost lives and resources.
+
+**Enhances Agility**: Early alerts give stakeholders the flexibility to adjust their strategies as the situation evolves, ensuring that resources and personnel are deployed effectively.
+
+![bg 120% right:33%](image_files/map_and_compas.png)
+--->
+
+---
+
+# VIEWS as a Complement to Traditional Risk Analysis
+
+---
+
+**Systematic Approach:** Reduces cognitive bias by using data-driven analysis to keep critical conflicts on the radar.
+
+**Focusing on Protracted Conflicts:** Ensures long-running conflicts don’t fade from attention.
+
+**Spotlighting Low-Risk but Critical Conflicts:** Highlights conflicts that may be at low risk but carry the potential for devastating outcomes.
+
+**Recognizing Compound Risks:** Identifies hidden conflict patterns exacerbated by interconnected, multi-layered risks.
+
+![bg 102% right:30%](image_files/two.png)
+
+---
+
+
+# The Technology
+
+---
+
+**Conventional ML Models:** Algorithms like XGBoost and LightGBM combine insights from large datasets and capture complex patterns for conflict prediction.
+
+**Bespoke Models:** In-house models like HydraNet, a specialized deep learning system, forecast multiple conflict outcomes from complex temporospatial data.
+
+![bg 100% right:50%](image_files/hydranet_PNAS_simple.png)
+
+
+---
+
+**Diverse Data Sources:** The system integrates datasets on conflict events, socioeconomic factors, geography, and governance.
+
+**Ensemble Modeling:** Combines top-performing models into a unified framework, enhancing the robustness and accuracy of forecasts.
+
+**Open Source and Transparent:** Both the data and the code used in VIEWS are publicly available, ensuring transparency and enabling collaboration with the broader research and humanitarian community.
+
+![bg 100% right:33%](image_files/github.png)
+
+
+---
+
+![bg 100%](image_files/dashboard01.png)
+
+<!---
+---
+
+![bg 100%](image_files/dashboard02.png)
+--->
+
+---
+
+# Tangible Impact: VIEWS in the Real World
+
+---
+
+### Engaging with Policymakers and Practitioners
+
+**Actively collaborate** with policymakers and practitioners to unlock the potential of **AI-driven conflict forecasting** in real-world operations.
+
+Through commissioned research and proof-of-concepts, we leverage our experience and tools to improving their systems for **decision-making and crisis management**.
+
+![bg 110% right:33%](image_files/GFFO.png)
+
+---
+
+### Supporting Strategic Planning and Risk Modeling
+
+Our forecasts supports organizations like **UNHCR, UNESCWA, UNDP, FAO, the German FFO, and the UK FCDO** in strategic planning and risk modeling.
+
+These organizations rely on our forecasts to better anticipate conflict risks and **respond more effectively to emerging crises**.
+
+![bg 110% right:33%](image_files/alexa.png)
+
+---
+
+### Partnership with Complex Risk Analytics Fund (CRAF’d)
+
+As a key partner of **CRAF’d**, we contribute to a UN-led multilateral ecosystem that leverages interconnected data to save lives.
+
+CRAF’d prevents duplication of efforts by **fostering collaboration and maximizing the value of technological advancements**.
+
+![bg 110% right:33%](image_files/crafd_thin.png)
+
+---
+
+### Achievements Since 2018
+
+**70+** conflict prediction datasets and **100+** papers/reports advancing conflict forecasting.
+
+Hosted **2 global prediction challenges**, engaging research teams worldwide.
+
+Published a **multilateral flagship report with UNHCR**, demonstrating the transformative potential of leveraging early warning for early action in the Sahel.
+
+Written hundreds of thousands of (mostly well-documented) **open-source lines of code**.
+
+![bg 100% right:33%](image_files/papers.png)
+
+---
+
+# The Future of VIEWS: Scaling Our Impact
+
+---
+
+**Expanding Geographic Coverage:** Expand forecasts beyond Africa and the Middle East to cover more conflict-prone regions worldwide, increasing the system’s global applicability.
+
+**Leveraging Newswire Text:** Better integration of newswire data to detect early signals of conflict and provide more timely forecasts of dynamic developements.
+
+**Integrating GIS and Satellite Imagery:** Incorporate GIS data and satellite imagery to enhance geographic precision and track timely changes in conflict zones.
+
+![bg 240% right:40%](image_files/rocket.jpg)
+
+---
+
+**Actor-Based Forecasts:** Introduce actor-specific forecasts to capture how different groups interact and contribute to conflict escalation.
+
+**Dynamic Escalation and De-Escalation Patterns:** Enhance the system’s ability to track how conflicts escalate and de-escalate over time, providing more nuanced insights into conflict dynamics.
+
+**Forecasting Broader Impacts:** Expand forecasting to include related humanitarian crises, such as food insecurity, migration, and public health risks.
+
+<sub>Original image: Johan Spanner</sub>
+
+![bg 240% right:40%](image_files/bodies01.png)
+
+---
+
+**Explicit Modeling of Uncertainty:** Improve the explicit modeling of uncertainty for both input data and forecasts, ensuring more reliable, actionable, and transparent predictions.
+
+**New Decision-Support Algorithms:** Develop algorithms to help organizations allocate resources more effectively, based on evolving conflict risk assessments.
+
+**Developing Scenario-Based Planning Tools:** Offer tools that allow stakeholders to simulate different conflict scenarios and plan responses, improving preparedness.
+
+![bg 150% right:40%](image_files/chaos.png)
+
+---
+
+![bg 100% ](image_files/funders_logos_May2024.png)
+
+
+---
+
+
+![bg 240% left:42%](image_files/team.png)
+
+Thanks for listening!
+
+:bust_in_silhouette: Simon Polichinel von der Maase
+:world_map: PRIO, Oslo, Norway
+:mailbox: [email protected]
+:octopus: https://github.com/prio-data/views_pipeline
+:globe_with_meridians: https://viewsforecasting.org/
-Original file line number
+Diff line change
@@ Expand Up / @@ -190,7 +190,8 @@ cython_debug/ @@
     *.csv
     *.npy
     *.parquet
-    *.pt
+    *.pt
+    *.pkl
     # folders
     **/wandb/
@@ Expand Down @@