Skip to content

Commit

Permalink
Merge pull request #25 from DARPA-ASKEM/edit-model
Browse files Browse the repository at this point in the history
Update Edit model topic
  • Loading branch information
mecrouch authored Dec 20, 2024
2 parents c57d9d8 + b460d31 commit 6a11184
Show file tree
Hide file tree
Showing 7 changed files with 225 additions and 124 deletions.
191 changes: 104 additions & 87 deletions docs/datasets/transform-dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ Inside the Transform dataset operator is a code notebook. In the notebook, you c

![Transform dataset code notebook in which the AI assistant creates a new comparison](../img/data/transform-notebook.png)

Prompts and responses are written to cells where you can preview, edit, and run code or data. Each cell builds on the previous ones, letting you gradually make complex changes and save the history of your work. You can insert prompts or cells at any point in the chain of transformations.
Prompts and responses are written to cells where you can preview, edit, and run code. Each cell builds on the previous ones, letting you gradually make complex changes and save the history of your work. You can insert prompts or cells at any point in the chain of transformations.

???+ tip

Expand Down Expand Up @@ -332,13 +332,21 @@ At any time, you can edit the code generated by the AI assistant or enter your o

## Save transformed data

Saved datasets appear in your Resources panel and the output of the Transform dataset operator.
At times in your transformation or whenever specifically prompted, the AI assistant creates new transformed datasets as the output for the Transform dataset operator. This lets you return to previous versions of your dataset or choose the best one to save and use in your workflow.

??? list "Save transformed data as a new dataset in Terarium"
When you're done making changes, you can connect the chosen output to any operators in the same workflow that take datasets as an input.

To use a transformed dataset in other workflows, save it as a project resource.

??? list "Choose a different output for the Transform dataset operator"

- Use the **Select a dataframe** dropdown.

??? list "Save a transformed dataset to your project resources"

You can save your transformations as a new dataset at any time.

1. (Optional) If connected multiple datasets to your Transform dataset operator or if you created intermediary dataframes in the process, **Select a dataframe** to save.
1. (Optional) If you created multiple outputs during your transformations, **Select a dataframe** to save.
2. Click **Save for reuse**, enter a unique name in the text box, and then click **Save**.

??? list "Preview a transformation on the Transform dataset operator in the workflow graph"
Expand All @@ -347,7 +355,7 @@ Saved datasets appear in your Resources panel and the output of the Transform da

??? list "Download a transformed dataset"

1. Save the transformation as a new dataset.
1. Save the transformation output as a new dataset.
2. Close the Transform dataset code notebook.
3. In the Resources panel, click the name of the new dataset.
4. Click <span class="sr-only" id="menu-icon-label">Menu</span> :fontawesome-solid-ellipsis-vertical:{ title="Menu" aria-labelledby="menu-icon-label" } > :octicons-download-24:{ aria-hidden="true" } **Download**.
Expand All @@ -367,68 +375,76 @@ The following sections show examples of how to prompt the Transform dataset AI a
* `Plot the data`
* `Rename column 'cases' to 'I', column 'hospitalizations' to 'H', and 'deaths' to 'E'`

### Clean a dataset

You can use the AI assistant to clean your dataset by specifying column types, reformatting dates, and performing other common data preparation tasks.

??? example "Specify the type of data in a column"
??? list "Clean a dataset"

You can use the AI assistant to clean your dataset by specifying column types, reformatting dates, and performing other common data preparation tasks.

**Specify the type of data in a column**

Reformat a column of numeric IDs to, for example, add back leading zeroes that were stripped off:

> `Set the data type of the column "fips" to "string". Add leading zeros to the "fips" column to a length of 5 characters.`

??? example "Reformat dates"


```{ .text .wrap }
Set the data type of the column "fips" to "string". Add leading zeros to the "fips" column to a length of 5 characters.
```

**Reformat dates**

Datasets with inconsistent date formats can interfere with accurate interpretation and integration into model parameters:

```{ .text .wrap }
Set the data type of the column "t0" to datetime with format like YYYY-MM-DD hh:mm:ss UTC
```

> `Set the data type of the column "t0" to datetime with format like YYYY-MM-DD hh:mm:ss UTC`

### Combine datasets

Before you combine datasets, make sure they share at least one common column like name, ID, date, or location. You can ask the AI assistant to link them by matching records based on the common data so that information aligns correctly.

??? example "Combine multiple datasets"
??? list "Combine datasets"

Before you combine datasets, make sure they share at least one common column like name, ID, date, or location. You can ask the AI assistant to link them by matching records based on the common data so that information aligns correctly.

1. Connect the outputs of each dataset to the input of a Transform dataset operator and then click **Edit**.
2. Ask the assistant to `Join d1 and d2 where date, county, and state match. Save the result as a new dataset and show me the first 10 rows.`
2. Ask the assistant to:

???+ tip
```{ .text .wrap }
Join d1 and d2 where date, county, and state match. Save the result as a new dataset and show me the first 10 rows.
```

???+ tip

You can also specify what type of join (such as inner join, left join, right join, or full outer join) you want the assistant to perform.

3. To save the dataset as a new resource in your project, change the dataframe and click **Save for reuse**.

### Plot a dataset

You can visualize your data to explore patterns, compare quantities, identify relationships, analyze distributions, and capture insights tailored your analysis. Supported visualizations include:

<div class="col-container" markdown>
<div class="text-col" markdown>
- Line plots
- Bar charts
- Scatter plots
- Box plots
- Histograms
- Pie charts
- Heatmaps
- Violin plots
- Bubble charts
- Area charts
</div>
<div class="image-col" markdown>
![](../img/data/example-visualize-data.png)
</div>
</div>

???+ tip

To refine your visualizations, edit your prompt to add more details about what you want to see (for example, add `Insert a legend` to a prompt that initially only requests a plot).

??? example "Visualize a dataset"
??? list "Plot a dataset"

You can visualize your data to explore patterns, compare quantities, identify relationships, analyze distributions, and capture insights tailored your analysis. Supported visualizations include:

<div class="col-container" markdown>
<div class="text-col" markdown>

- Line plots
- Bar charts
- Scatter plots
- Box plots
- Histograms
- Pie charts
- Heatmaps
- Violin plots
- Bubble charts
- Area charts

</div>
<div class="image-col" markdown>
![](../img/data/example-visualize-data.png)
</div>
</div>

???+ tip

To refine your visualizations, edit your prompt to add more details about what you want to see (for example, add `Insert a legend` to a prompt that initially only requests a plot).

1. Ask the assistant to plot your data. For the best results, be as specific as possible about what you want to see:

> `plot the number of hospitalizations over the 150 days for the baseline, masking, and vaccination interventions.`
```{ .text .wrap }
plot the number of hospitalizations over the 150 days for the baseline, masking, and vaccination interventions.
```

2. To refine the visualization, perform one of the following actions:

Expand All @@ -440,53 +456,54 @@ You can visualize your data to explore patterns, compare quantities, identify re
- Select **Display on node thumbnail** to use the image as the thumbnail on the Transform dataset operator in the workflow.
- Right-click the image, select **Copy image**, and then paste it into your project overview.

### Create a map-based visualization

The AI assistant can connect to third-party code repositories and data visualization libraries to incorporate geolocation data and then create map plots.

![Blue and yellow choropleth map of reproduction numbers in parts of the U.S.](../img/data/chloropleth-map.png)

??? example "Create a choropleth map of reproduction numbers"
??? list "Create a map-based visualization"

The AI assistant can connect to third-party code repositories and data visualization libraries to incorporate geolocation data and then create map plots.

![Blue and yellow choropleth map of reproduction numbers in parts of the U.S.](../img/data/chloropleth-map.png)
These prompts ask the assistant to get U.S. county-level data from plotly and then use using matplotlib and geopandas handle and visualize geographic data structures.

```{ .text .wrap }
Write me code that downloads the US counties geojson from plotly GitHub using urlopen
```

1. `Write me code that downloads the US counties geojson from plotly GitHub using urlopen`
2. `Use matplotlib to make a figure. Create a choropleth map from the column "Rl" in the geopandas dataframe "new_df_all". Use the "cividis" colormap. Add a legend.`

### Compare datasets

You can use the AI assistant to compare multiple datasets or simulation results.
```{ .text .wrap }
Use matplotlib to make a figure. Create a choropleth map from the column "Rl" in the geopandas dataframe "new_df_all". Use the "cividis" colormap. Add a legend.
```

??? example "Compare optimized intervention policies"
??? list "Compare datasets"

See the [Working with Data](https://app.terarium.ai/projects/a9462f60-14bc-4ca3-869e-8a7e5a8600e2/workflow/33433e99-6c34-446b-83fa-41bad1440dd8){ target="_blank" } workflow in the [Terarium Sample Project](https://app.terarium.ai/projects/a9462f60-14bc-4ca3-869e-8a7e5a8600e2/overview){ target="_blank" }. It takes three datasets generated by optimizing intervention policies and then:
You can use the AI assistant to compare multiple datasets or simulation results.

See the [Working with Data](https://app.terarium.ai/projects/a9462f60-14bc-4ca3-869e-8a7e5a8600e2/workflow/33433e99-6c34-446b-83fa-41bad1440dd8){ target="_blank" } workflow in the [Terarium Sample Project](https://app.terarium.ai/projects/a9462f60-14bc-4ca3-869e-8a7e5a8600e2/overview){ target="_blank" }. It takes three datasets generated by optimizing intervention policies and then:
- Combines them into a new scenario comparison dataset.
- Calculates summary statistics for hospitalizations in each dataset.
- Identifies the timepoint at which the maximum number of hospitalizations occur in each dataset.
- In a separate data transformation, plots hospitalizations for each intervention over time.

### Convert incidence data to prevalence data

If you have an epidemiological dataset that contains incidence data (such as new cases per day), you can prompt the AI assistant to convert it to prevalence data (such as total cases at any given time). You will need to specify:

- How long it takes people to recover.
- The susceptible population.

??? example "Convert incidence data to prevalence data"
??? list "Convert incidence data to prevalence data"

If you have an epidemiological dataset that contains incidence data (such as new cases per day), you can prompt the AI assistant to convert it to prevalence data (such as total cases at any given time). You will need to specify:

- How long it takes people to recover.
- The susceptible population.

This prompt converts daily case counts into prevalence data. It uses user-supplied recovery and population data to calculate total cases:

```{ .text .wrap }
let's assume avg time to recover is 14 days and time to exit hosp is 10 days. Can you convert this data into **prevalence** data? please map it to SIRHD. Assume a population of 150 million.
```

For more information on the logic of how the AI assistant converts from incidence to prevalence data, see the [instructions](https://github.com/DARPA-ASKEM/askem-beaker/blob/4c17fc262e01badb4427a5f3e529940c17510677/src/askem_beaker/contexts/dataset/incidence_to_prevalence.md){ target="_blank" } the assistant follows in these cases.

> `let's assume avg time to recover is 14 days and time to exit hosp is 10 days. Can you convert this data into **prevalence** data? please map it to SIRHD. Assume a population of 150 million.`

For more information on the logic of how the AI assistant converts from incidence to prevalence data, see the [instructions](https://github.com/DARPA-ASKEM/askem-beaker/blob/4c17fc262e01badb4427a5f3e529940c17510677/src/askem_beaker/contexts/dataset/incidence_to_prevalence.md){ target="_blank" } the assistant follows in these cases.

### Calculate peak times

Calculating peak times can help you identify critical periods of disease spread, enabling targeted interventions.

??? example "Calculate peak times"
??? list "Calculate peak times"

Calculating peak times can help you identify critical periods of disease spread, enabling targeted interventions.

This prompt takes a collection of daily infection rates for various FIPs codes and identifies the peak time for each one:

> `Create a column named "peak_time". The first column is "fips". The second column is "peak_time", and its values are the values of the "timepoint" column for which the values of the FIPS columns are at a maximum.`

```{ .text .wrap }
Create a column named "peak_time". The first column is "fips". The second column is "peak_time", and its values are the values of the "timepoint" column for which the values of the FIPS columns are at a maximum.
```
Binary file added docs/img/models/model-edit-notebook.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
10 changes: 5 additions & 5 deletions docs/modeling/create-model-from-equations.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ The Create model from equations operator works with LaTeX equations. When adding

<div class="grid cards" markdown>

- __Derivatives__
- #### Derivatives

---

Expand All @@ -135,7 +135,7 @@ The Create model from equations operator works with LaTeX equations. When adding

- Place **first-order derivatives** to the left of the equal sign.

- __Mathematical notations__
- #### Mathematical notations

---

Expand All @@ -144,16 +144,16 @@ The Create model from equations operator works with LaTeX equations. When adding
- Capital sigma (`Σ`) and pi (`Π`) notations for summation and product.
- Non-ASCII characters.
- Homoglyphs (characters that look similar but have different meanings).
- To indicate multiplication, use `*` between scalar quantities.
- To indicate multiplication, use ` * ` between scalar quantities.

- __Superscripts and subscripts__
- #### Superscripts and subscripts

---

- To denote **indices**, use LaTeX subscripts `_` instead of superscripts and LaTeX superscripts `^`.
- Use **LaTeX subscripts** `_` instead of Unicode subscripts. Wrap all characters in the subscript in curly brackets `{...}`.

- __Variable and symbol usage__
- #### Variable and symbol usage

---

Expand Down
Loading

0 comments on commit 6a11184

Please sign in to comment.