Skip to content

Commit

Permalink
Add join catalogs notebook (#481)
Browse files Browse the repository at this point in the history
* Add joining catalogs notebook

* Remove kernel metadata

* Change differences to ratios
  • Loading branch information
camposandro authored Nov 5, 2024
1 parent 62cefb8 commit 0bf36bf
Show file tree
Hide file tree
Showing 2 changed files with 138 additions and 0 deletions.
1 change: 1 addition & 0 deletions docs/tutorials.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ An introduction to LSDB's core features and functionality

Getting data into LSDB <tutorials/getting_data>
Filtering large catalogs <tutorials/filtering_large_catalogs>
Joining catalogs <tutorials/join_catalogs>
Exporting results <tutorials/exporting_results>

Advanced Topics
Expand Down
137 changes: 137 additions & 0 deletions docs/tutorials/join_catalogs.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Joining catalogs\n",
"\n",
"In this tutorial we join a small cone region of Gaia with Gaia Early Data Release 3 (EDR3) and compute the ratio between the distances given by their `parallax` and `r_med_geo` columns, respectively."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import lsdb\n",
"from lsdb.core.search import ConeSearch"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First we load Gaia with its objects `source_id`, their positions and `parallax` columns."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"gaia = lsdb.read_hats(\n",
" \"https://data.lsdb.io/hats/gaia_dr3/gaia\",\n",
" margin_cache=\"https://data.lsdb.io/hats/gaia_dr3/gaia_10arcs\",\n",
" columns=[\"source_id\", \"ra\", \"dec\", \"parallax\"],\n",
" search_filter=ConeSearch(ra=0, dec=0, radius_arcsec=10 * 3600),\n",
")\n",
"gaia"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will do the same with Gaia EDR3 but the distance column we will use is called `r_med_geo`, the median of the geometric distance estimate."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"gaia_edr3 = lsdb.read_hats(\n",
" \"https://data.lsdb.io/hats/gaia_dr3/gaia_edr3_distances\",\n",
" margin_cache=\"https://data.lsdb.io/hats/gaia_dr3/gaia_edr3_distances_10arcs\",\n",
" columns=[\"source_id\", \"ra\", \"dec\", \"r_med_geo\"],\n",
" search_filter=ConeSearch(ra=0, dec=0, radius_arcsec=10 * 3600),\n",
")\n",
"gaia_edr3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We are now able to join both catalogs on the `source_id` column, as follows:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"joined = gaia.join(gaia_edr3, left_on=\"source_id\", right_on=\"source_id\")\n",
"joined"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's calculate a histogram with the ratio in catalog distances."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"results = (1e3 / joined[\"parallax_gaia\"]) / joined[\"r_med_geo_gaia_edr3_distances\"]\n",
"ratios = results.compute().to_numpy()\n",
"ratios"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"\n",
"plt.hist(ratios, bins=np.linspace(0.8, 1.2, 100))\n",
"plt.title(\"Histogram of Gaia distance / Gaia EDR3 distance\")\n",
"plt.show()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "demo",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

0 comments on commit 0bf36bf

Please sign in to comment.