diff --git a/docs/notebooks/public-specz-compilation.ipynb b/docs/notebooks/public-specz-compilation.ipynb
new file mode 100644
index 0000000..0b8aa46
--- /dev/null
+++ b/docs/notebooks/public-specz-compilation.ipynb
@@ -0,0 +1,687 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "9d42adbb-55a5-4edd-aaae-2a5c2279e279",
+ "metadata": {},
+ "source": [
+ " \n",
+ "\n",
+ "## Spectroscopic Redshifts Compilation\n",
+ "\n",
+ "Public collection of redshift measurements made available by spectroscopic surveys prior to DES DR2.\n",
+ "\n",
+ "\n",
+ "Contact: Julia Gschwend ([julia@linea.org.br](mailto:julia@linea.org.br))\n",
+ "
\n",
+ "
\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6c6d0276-13b5-4a2e-974d-0b3c664914f7",
+ "metadata": {},
+ "source": [
+ "#### Acknowledgments\n",
+ "If you use this dataset to generate scientific results, please add a reference to [Gschwend et al., 2018](https://ui.adsabs.harvard.edu/abs/2018A%26C....25...58G/abstract) and acknowledge LIneA in the acknowledgments section of your publication. For instance:\n",
+ "\n",
+ "'_This research used computational resources from the Associação Laboratório Interinstitucional de e-Astronomia (LIneA) with the financial support of INCT do e-Universo (Process no. 465376/2014-2)._'"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "dbbe57a5-fb05-4166-a2c3-dfd2b719f8ec",
+ "metadata": {},
+ "source": [
+ "#### Notes about the curation of spectroscopic _redshifts_ catalogs\n",
+ "\n",
+ "This notebook contains a brief characterization of a collection of spectroscopic _redshifts_ (spec-z) catalogs that have been publicly distributed and described in detail in scientific literature by their original projects. These catalogs were collected over the years of operation of the Dark Energy Survey (DES) and systematically grouped by the LIneA team (initially by Aurelio Carnero, then by Julia Gschwend) using the DES Science Portal tool (_pipeline_ Spectroscopic Sample) to form the basis of a training set for photometric _redshifts_ calculation algorithms based in machine learning. \n",
+ "\n",
+ "The latest version of this notebook "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7748de65-846c-43f4-8343-c80beda8fed6",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "### Spectroscopic surveys \n",
+ "\n",
+ "The table below contains the list of surveys and the respective bibliographic references for each of the _redshifts_ catalogs that make up the collection.\n",
+ "\n",
+ "|seq.|Survey name
(link to the website)| Number of redshifts in
the original sample | Reference
(link to the paper) |\n",
+ "|---|---|:-:|---|\n",
+ "|1| [2dF](http://www.2dfgrs.net/) |245,591 | [Colless et al. 2001](https://academic.oup.com/mnras/article/328/4/1039/1082731)|\n",
+ "|2|[2dFLenS](http://2dflens.swin.edu.au/) |70,079| [Blake et al. 2016](https://ui.adsabs.harvard.edu/abs/2016MNRAS.462.4240B/abstract)|\n",
+ "|3|[3DHST](http://3dhst.research.yale.edu/Data.php)|207,967|[Momcheva et al. 2016](https://ui.adsabs.harvard.edu/abs/2016ApJS..225...27M/abstract)|\n",
+ "|4|[6dF (DR3)](http://www.6dfgs.net/)|109,831|[Jones et al. 2009](https://ui.adsabs.harvard.edu/abs/2009MNRAS.399..683J/abstract)|\n",
+ "|5|[ACES](http://mur.ps.uci.edu/cooper/ACES/zcatalog.html)|13,963|[Cooper et al. 2012](https://ui.adsabs.harvard.edu/abs/2012MNRAS.425.2116C/abstract)|\n",
+ "|6|[ATLAS (DR2)](http://astro.dur.ac.uk/Cosmology/vstatlas/index.php?go=dr2)|1,074 |[Mao et al. 2012](https://ui.adsabs.harvard.edu/abs/2012MNRAS.426.3334M/abstract)|\n",
+ "|7|[C3R2 (DR2)](https://sites.google.com/view/c3r2-survey/) |4,525 |[Masters et al. 2019](https://ui.adsabs.harvard.edu/abs/2019ApJ...877...81M/abstract) |\n",
+ "|8|[CDB](http://vizier.u-strasbg.fr/viz-bin/VizieR?-source=J/MNRAS/406/782) |541 |[Sullivan et al. 2011](https://ui.adsabs.harvard.edu/abs/2011yCat..74060782S/abstract)|\n",
+ "|9|[CLASH-VLT](https://sites.google.com/site/vltclashpublic/data-release) |10,183|[Biviano et al. 2013](https://ui.adsabs.harvard.edu/abs/2013A%26A...558A...1B/abstract)
[Annunziatella et al. 2016](https://ui.adsabs.harvard.edu/abs/2016A%26A...585A.160A/abstract)
[Balestra et al. 2016](https://ui.adsabs.harvard.edu/abs/2016yCat..22240033B/abstract)
[Grillo et al. 2016](https://ui.adsabs.harvard.edu/abs/2016ApJ...822...78G/abstract)
[Caminha et al. 2017](https://ui.adsabs.harvard.edu/abs/2017yCat..36000090C/abstract)
[Karman et al. 2017](https://ui.adsabs.harvard.edu/abs/2017A%26A...599A..28K/abstract)
[Monna et al. 2017](https://ui.adsabs.harvard.edu/abs/2017MNRAS.466.4094M/abstract)|\n",
+ "|10|[DEEP2 (DR4)](http://deep.ps.uci.edu/DR4/home.html)|50,319|[Newman et al. 2013](https://ui.adsabs.harvard.edu/abs/2013ApJS..208....5N/abstract)|\n",
+ "|11|[DEIMOS 10K](http://cosmos.astro.caltech.edu/)|10,770|[Hasinger et al. 2018](https://ui.adsabs.harvard.edu/abs/2018ApJ...858...77H/abstract)| \n",
+ "|12|[FMOS-COSMOS](http://member.ipmu.jp/fmos-cosmos/FC%5C_catalogs.html) |1,153|[Silverman et al. 2015](https://ui.adsabs.harvard.edu/abs/2015yCat..22200012S/abstract)|\n",
+ "|13|[GAMA (DR3) ](http://www.gama-survey.org/dr3/schema/table.php?id=24) |166,332 |[Baldry et al. 2018](https://ui.adsabs.harvard.edu/abs/2018MNRAS.474.3875B/abstract) |\n",
+ "|14|[GLASS (DR2)](https://archive.stsci.edu/prepds/glass/#dataformat) |3,289 |[Abramson et al. 2020](https://ui.adsabs.harvard.edu/abs/2020MNRAS.tmp..279A/abstract)|\n",
+ "|15|[MOSFIRE ](http://mosdef.astro.berkeley.edu) |267 |[McLean et al. 2012](https://ui.adsabs.harvard.edu/abs/2012SPIE.8446E..0JM/abstract) |\n",
+ "|16|[MUSE ](https://musewide.aip.de/query/) |1,602 |[Urrutia et al. 2019](https://ui.adsabs.harvard.edu/abs/2019A%26A...624A.141U/abstract) |\n",
+ "|17|[SAGA ](http://sagasurvey.org/)|68,644 |[Geha et al. 2017 ](https://ui.adsabs.harvard.edu/abs/2017ApJ...847....4G/abstract)|\n",
+ "|18|[SDSS (DR16)](http://www.sdss.org/dr16/)|4,613,773| [Ahumada et al. 2020 ](https://ui.adsabs.harvard.edu/abs/2020ApJS..249....3A/abstract)| \n",
+ "|19|[SpARCS ](http://simbad.u-strasbg.fr/simbad/sim-ref?querymethod=bib&simbo=on&submit=submit+bibcode&bibcode=2012ApJ...746..188M)|410 |[Muzzin et al. 2012 ](https://ui.adsabs.harvard.edu/abs/2012ApJ...746..188M/abstract)|\n",
+ "|20|[SPT-GMOS ](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/OR13NN/)|2,243 |[Bayliss et al. 2016 ](https://ui.adsabs.harvard.edu/abs/2016yCat..22270003B/abstract)|\n",
+ "|21|[UDS ](http://www.nottingham.ac.uk/astronomy/UDS/UDSz/)|1,511 |[Galametz wt al. 2013 ](https://ui.adsabs.harvard.edu/abs/2013ApJS..206...10G/abstract)|\n",
+ "|22|[VANDELS ](https://eso.org/rm/api/v1/public/releaseDescriptions/120)|1,362 |[Pentericci et al. 2018 ](https://ui.adsabs.harvard.edu/abs/2018A%26A...616A.174P/abstract)|\n",
+ "|23|[VIPERS ](http://vipers.inaf.it/rel-pdr1.html)|91,507 |[Garilli et al. 2014 ](https://ui.adsabs.harvard.edu/abs/2014A%26A...562A..23G/abstract)|\n",
+ "|24|[VUDS ](http://cesam.lam.fr/vuds/DR1/) |698 |[Tasca et al. 2017 ](https://ui.adsabs.harvard.edu/abs/2017A%26A...600A.110T/abstract)|\n",
+ "|25|[VVDS ](https://cesam.lam.fr/cesamdata/project_desc/vvds_index.html) |40,927 |[Le Fèvre et al. 2004 ](https://ui.adsabs.harvard.edu/abs/2004A%26A...428.1043L/abstract)
[Garilli et al. 2008](https://ui.adsabs.harvard.edu/abs/2008A%26A...486..683G/abstract)| \n",
+ "|26|[WiggleZ ](http://wigglez.swin.edu.au/site/) |81,362 |[Parkinson et al. 2012 ](https://ui.adsabs.harvard.edu/abs/2012PhRvD..86j3518P/abstract)|\n",
+ "|27|[zCOSMOS ](https://cesam.lam.fr/zCosmos/)|20,689 |[Lilly et al. 2009 ](https://ui.adsabs.harvard.edu/abs/2009ApJS..184..218L/abstract)
[Knobel et al. 2012](https://ui.adsabs.harvard.edu/abs/2012ApJ...753..121K/abstract)
[Lilly 2016 (DR description)](https://www.eso.org/sci/observing/phase3/data_releases/zcosmos_dr3_b2.pdf)|\n",
+ "|28|[ZFIRE ](http://zfire.swinburne.edu.au/data.html)|216|[Nanayakkara et al. 2016](https://ui.adsabs.harvard.edu/abs/2012ApJ...753..121K/abstract)|"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "466d351b-be2d-44e4-a722-336589d6276d",
+ "metadata": {},
+ "source": [
+ "
\n",
+ "\n",
+ "### Unified flags system\n",
+ "\n",
+ "In addition to combining all catalogs into a single table, the _pipeline_ Spectroscopic Sample also homogenizes the various original catalog quality flags into a single system (`flag_des`) based on the parameters used in the OzDES survey ([Yuan et al., 2015](https://ui.adsabs.harvard.edu/abs/2015MNRAS.452.3047Y/abstract)). In summary, the flags mean:\n",
+ "\n",
+ "|flag_des| Meaning |\n",
+ "|--- |---|\n",
+ "|1 | redshift unknown |\n",
+ "|2 | unreliable guess |\n",
+ "|3 | 95% confidence |\n",
+ "|4 | 99% confidence |"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "74a577b8-abc2-41fd-9ed8-736f7f187d79",
+ "metadata": {},
+ "source": [
+ "The correspondence between the original _flags_ and the unified system is as follows: \n",
+ "\n",
+ "|Survey | flag_des = 1 | flag_des = 2 | flag_des = 3 | flag_des = 4 |\n",
+ "|:--|:--|:--|:--|:--|\n",
+ "|2DF | 1 | 2 | 3 | 4,5 |\n",
+ "|2dFLenS | 1 | 2 | 3 | 4,5\n",
+ "|3DHST | -1,1 | 2 | 0 | -|\n",
+ "|6DF | 1 | 2 | 3 | 4,6|\n",
+ "|ACES | -2,0,1 | 2 | 3 | 4,-1|\n",
+ "|ATLAS | - | 4 | - | -|\n",
+ "|C3R2 | - | - | 3,3.5 | 4|\n",
+ "|CDB | - | 4 | - | -|\n",
+ "|CLASH-VLT | - | 2 | 4,5,6,9 | 3|\n",
+ "|DEEP2 | -2,0,1 | 2 | 3 | 4,-1|\n",
+ "|DEIMOS_10K | 0.0 | 1.0 | 1.5 | 2.0|\n",
+ "|FMOS_COSMOS| 0,1 | 2 | 3 | 4|\n",
+ "|GAMA | <0.0,0.68> | <0.68,0.95> | <0.95,0.99> | <0.99,1.0>|\n",
+ "|GLASS | 0,0.5,1,1.5 | 2,2.5 | 3 = 3,3.5 | 4|\n",
+ "|MOSFIRE | 1 | - | 3 | -|\n",
+ "|MUSE | - | 1 | 2 | 3|\n",
+ "|SAGA | - | - | 3 | 4|\n",
+ "|SDSS_DR16 | - | - | - | 0|\n",
+ "|SPARCS | 4 | 3 | 2 | 1|\n",
+ "|SPT_GMOS | 0,1 | 2 | 3 | 4|\n",
+ "|UDS | - | - | 3,B,B* | 4,A|\n",
+ "|VANDELS | 1 | 2,9 | 3 | 4|\n",
+ "|VIPERS | 0,1,11,211 | 2,9,12,19,212,213 | 3 = 3,13 | 4,14,24|\n",
+ "|VUDS | - | - | 3,B,B* | 4,A|\n",
+ "|VVDS | 0,1 | 2,9 | 3 | 4|\n",
+ "|WIGGLEZ | 1 | 2 | 3 | 4,5|\n",
+ "|ZCOSMOS | 0,1,11,20,21,211 | 2,12,22 | 9,19,29,18 | 3,4,13,14,23,24|\n",
+ "|ZFIRE | 0,1 | 2 | 3 | 4|\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "bd62e9e6-3808-4f79-a69c-e02568606dfe",
+ "metadata": {},
+ "source": [
+ "### In case of multiple measurements\n",
+ "\n",
+ "Many of the surveys listed above observed common regions of the sky. By grouping all measurements into a single catalogue, there are often multiple spec-z measurements available for the same galaxy. To identify these cases, the _pipeline_ Spectroscopic Sample makes a spatial match between the equatorial coordinates of 'all against all' with a search radius of 1.0 _arcsec_ from each object. Then, it applies a selection to keep only one measurement for each extragalactic object present in the sample, following the criteria below for choice and tiebreaker:\n",
+ "\n",
+ "1. measure with the highest quality _flag_ (`flag_des`)\n",
+ "2. measurement with lowest error in redshift (`err_z`)\n",
+ "3. measurement taken by the most recent survey"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4e7ba2d5-4d08-460e-a1fd-c94cab0b1c50",
+ "metadata": {},
+ "source": [
+ "\n",
+ "--- \n",
+ "\n",
+ "## Sample characterization\n",
+ "\n",
+ "Check below a brief characterization of the data contained in the compiled collection of spectroscopic catalogs."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f7eeca28-7bfc-4449-9afd-b88b648d0418",
+ "metadata": {},
+ "source": [
+ "Requirements for this notebook:\n",
+ "\n",
+ "* **Auxiliary file**: [des-round19-poly.txt](https://github.com/kadrlica/skymap/blob/master/skymap/data/des-round19-poly.txt) (contours of the area covered by the survey, i.e., DES _footprint_, 2019 version).\n",
+ "* **View libraries**: seaborn, bokeh, holoviews\n",
+ "\n",
+ "_Download_ the file `des-round19-poly.txt` from the repository [kadrlica/skymap](https://github.com/kadrlica/skymap) on GitHub:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "b8c1f12b-4212-42cd-8271-fcbe881c9d7a",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "! wget https://raw.githubusercontent.com/kadrlica/skymap/master/skymap/data/des-round19-poly.txt "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5608ae7e-ca24-48dd-9bd3-3aa4b7d14089",
+ "metadata": {},
+ "source": [
+ "Imports and configs"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "16cf1c89-b929-4070-8a3e-e6966fd9589d",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "# General\n",
+ "import numpy as np\n",
+ "import pandas as pd\n",
+ "import matplotlib.pyplot as plt\n",
+ "import seaborn as sns\n",
+ "import tables_io\n",
+ "import psutil\n",
+ "import sys\n",
+ "\n",
+ "# Astropy\n",
+ "from astropy import units as u\n",
+ "from astropy.coordinates import SkyCoord\n",
+ "from astropy.units.quantity import Quantity\n",
+ "\n",
+ "# Bokeh\n",
+ "import bokeh\n",
+ "from bokeh.io import output_notebook, show, output_file, reset_output\n",
+ "from bokeh.models import ColumnDataSource, Range1d, HoverTool\n",
+ "from bokeh.models import CDSView, GroupFilter\n",
+ "from bokeh.plotting import figure, show, gridplot, output_notebook\n",
+ "from bokeh.models import Range1d, LinearColorMapper, ColorBar\n",
+ "from bokeh.transform import factor_cmap\n",
+ "\n",
+ "# HoloViews\n",
+ "import holoviews as hv\n",
+ "from holoviews import streams, opts\n",
+ "from holoviews.operation.datashader import datashade, dynspread\n",
+ "from holoviews.plotting.util import process_cmap\n",
+ "\n",
+ "\n",
+ "# Config\n",
+ "import warnings\n",
+ "warnings.filterwarnings('ignore')\n",
+ "%reload_ext autoreload \n",
+ "%autoreload 2 \n",
+ "%matplotlib inline \n",
+ "sns.set(color_codes=True, font_scale=1.5) \n",
+ "sns.set_style('whitegrid')\n",
+ "plt.rcParams.update({'figure.max_open_warning': 0})\n",
+ "hv.extension('bokeh')\n",
+ "output_notebook()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a38f3277-6e78-4a01-86a1-4e5545851ca9",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "print('Python version: ' + sys.version)\n",
+ "print('Numpy version: ' + np.__version__)\n",
+ "print('Bokeh version: ' + bokeh.__version__)\n",
+ "print('HoloViews version: ' + hv.__version__)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "69883b2d-b313-4fe7-8dfd-ab511dc3082a",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "def fmt(x):\n",
+ " return '{:.1f}%'.format(x)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cc778e08-57f9-4cff-966b-f2717c66ad6e",
+ "metadata": {},
+ "source": [
+ "Read DES footprint file `des-round19-poly.txt`:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "2c9c00a2-2b6c-4dd9-b021-df831bb20bd7",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "foot_ra, foot_dec = np.loadtxt('des-round19-poly.txt', unpack=True)\n",
+ "foot_coords = SkyCoord(ra=-foot_ra*u.degree, dec=foot_dec*u.degree, frame='icrs')\n",
+ "foot_df = pd.DataFrame({'foot_ra': np.array(foot_coords.ra.wrap_at(180*u.degree)), \n",
+ " 'foot_dec': np.array(foot_coords.dec)})"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "0de5e4d4-eed4-4d3c-8b06-f2175a5d4209",
+ "metadata": {},
+ "source": [
+ "Read spec-z catalog file"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "107a8ac3-0e34-48a5-9ac5-744ab6275e88",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "specz_catalog = tables_io.read('public_specz_compilation.pq')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "9b1472ed-265a-4ed2-8f4d-06fd71b57d66",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "type(specz_catalog)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "c6ba7104-d84a-4a55-a490-3cd84f4d477e",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "assert len(specz_catalog) == 3661690"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "7b89c02c-c0eb-4bff-b7f4-169a8d669240",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "specz_catalog.info(memory_usage=\"deep\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "9c3c59e6-f135-47be-b6f4-7ff4704e1fc5",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "specz_catalog.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "696930b8-3a83-4223-bfa5-dec23c4a22b9",
+ "metadata": {},
+ "source": [
+ "Meaning of columns:\n",
+ "\n",
+ "| Column name | Meaning |\n",
+ "|--:|:--|\n",
+ "| **ra** | Right Ascension (degrees) |\n",
+ "| **dec** | Declination (degrees) |\n",
+ "| **z** | redshift |\n",
+ "| **err_z** | Redshift error. When unavailable, replaced by 99.0 |\n",
+ "| **flag_des**| Standardized quality marker (details [above](#flags))|\n",
+ "| **survey** | Name of the project or survey of origin. |\n",
+ "| **flag_survey** | Original quality flag given by the origin survey. |\n",
+ "| **id_spec** | Original unique identifier given by the survey. When unavailable, replaced by 9999. |\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8319a10c-07e2-4b7d-bde5-5bc72a29948c",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "Basic statistics"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "870c6e04-7014-4749-8535-a3a4ed4036b3",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "specz_catalog.describe()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1b7af5a5-219f-4177-94ed-68f51013905c",
+ "metadata": {},
+ "source": [
+ "Note from the minimum and maximum values of the **flag_des** column that a quality cutoff was applied where only objects with **flag_des** $\\geqslant$ 3 were included in the sample.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "97294cbd-21a6-445d-83e1-5a374693799e",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "frac = 0.02\n",
+ "spec_sample_for_plots = specz_catalog.sample(frac=frac, axis='index')\n",
+ "assert len(spec_sample_for_plots) == round(frac * len(specz_catalog))\n",
+ "print(len(spec_sample_for_plots))\n",
+ "#spec_sample_for_plots = specz_catalog # comment this line to use a fraction of the data"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6f7edf41-c446-4467-9dbc-6f197d1cd9e0",
+ "metadata": {},
+ "source": [
+ "--- \n",
+ "\n",
+ "#### Spatial distribution\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "090f2e91-cc90-475c-9fea-25b4a47d0ec5",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "coords = SkyCoord(ra=-np.array(spec_sample_for_plots.ra)*u.degree, \n",
+ " dec=np.array(spec_sample_for_plots.dec)*u.degree, frame='icrs')\n",
+ "spec_sample_for_plots.ra = np.array(coords.ra.wrap_at(180*u.degree))\n",
+ "spec_sample_for_plots.dec = np.array(coords.dec)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "3ee9976b-e9cc-4009-92f6-c8735cee5921",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "%%time\n",
+ "fig = plt.figure(figsize=[14,6])\n",
+ "ax = fig.add_subplot(111, projection='mollweide') \n",
+ "ra_rad = coords.ra.wrap_at(180 * u.deg).radian\n",
+ "dec_rad = coords.dec.radian\n",
+ "plt.plot(ra_rad, dec_rad, '.', alpha=0.3)\n",
+ "plt.plot(-np.radians(foot_ra), np.radians(foot_dec), '-', color='darkorange')\n",
+ "org=0.0\n",
+ "tick_labels = np.array([150, 120, 90, 60, 30, 0, 330, 300, 270, 240, 210])\n",
+ "tick_labels = np.remainder(tick_labels+360+org,360)\n",
+ "ax.set_xticklabels(tick_labels) # we add the scale on the x axis\n",
+ "ax.set_xlabel('R.A.')\n",
+ "ax.xaxis.label.set_fontsize(14)\n",
+ "ax.set_ylabel('Dec.')\n",
+ "ax.yaxis.label.set_fontsize(14)\n",
+ "ax.grid(True)\n",
+ "plt.tight_layout()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c586d263-f7ab-445b-a439-79cee7a9d306",
+ "metadata": {},
+ "source": [
+ "Redshift distribution"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "c1134f91-42e9-4fd8-99ec-8631c2159ee6",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "redshift = hv.Dimension('z', label='spec-z')#, range=(0.0, 2.0))\n",
+ "(count, z_bin) = np.histogram(spec_sample_for_plots.z, bins='fd')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "7f41242a-1145-4f6e-a70f-81edb9d394fb",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "z_distribution = hv.Histogram((count, z_bin), kdims=redshift).opts(\n",
+ " title='Distribuição de redshifts', xlabel='spec-z', height=400, width=800, xlim=(0.,2.)) \n",
+ "z_distribution"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "dc2adf0b-f1f0-4b5c-8aff-50733013ce40",
+ "metadata": {},
+ "source": [
+ "#### Quality Flags"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "05e49101-b40c-413e-bdb5-ef246923f518",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "specz_catalog.flag_des.value_counts() "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "9a7e4a64-2275-43d8-9e2f-cbd3c3778456",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "counts = pd.DataFrame(data={'flag_des':[len(specz_catalog.query('flag_des ==3')), \n",
+ " len(specz_catalog.query('flag_des ==4'))]}, index= [3, 4])\n",
+ "counts.plot.pie(y='flag_des', labels=None, autopct=fmt, colors=['darkorange', 'steelblue']) \n",
+ "counts"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1d0c1a08-eb5b-4d0a-b9e4-3095de31773a",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "Redshift distributions depending on the quality flag"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "055b8ed5-d391-442a-8d7a-d23a7a85ec4f",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "(count4, z_bin4) = np.histogram(spec_sample_for_plots.query('flag_des == 4').z, bins='fd')\n",
+ "z_distribution4 = hv.Histogram((count4, z_bin4), kdims=redshift).opts(\n",
+ " title='flag_des = 4', xlabel='spec-z', height=400, width=400, xlim=(0.,2.5))\n",
+ "(count3, z_bin3) = np.histogram(spec_sample_for_plots.query('flag_des == 3').z, bins='fd')\n",
+ "z_distribution3 = hv.Histogram((count3, z_bin3), kdims=redshift).opts(\n",
+ " title='flag_des = 3', color='darkorange', xlabel='spec-z', height=400, width=400, xlim=(0.,2.5))\n",
+ "z_dist_by_flag = z_distribution4.options(height=350, width=450) + z_distribution3.options(height=350, width=450) \n",
+ "z_dist_by_flag"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "392c909c-8523-4a47-bb3e-bd3db457780a",
+ "metadata": {},
+ "source": [
+ "#### Characterization of subsamples (by survey) "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6cb4d550-3345-4b70-9734-f8379bc1eb5a",
+ "metadata": {},
+ "source": [
+ "Redshift counts per survey actually included in the sample (after quality cuts and selection in case of multiple measurements)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "202f57f2-9514-40b4-b54d-1e1448de845f",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "counts_table = specz_catalog.survey.value_counts().sort_values(ascending=False).reset_index(name='count')\n",
+ "counts_table"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "24231816-a17b-4c93-9170-d8cad3d8f678",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "coords_all = SkyCoord(ra=-np.array(specz_catalog.ra)*u.degree, \n",
+ " dec=np.array(specz_catalog.dec)*u.degree, frame='icrs')\n",
+ "specz_catalog.ra = np.array(coords_all.ra.wrap_at(180*u.degree))\n",
+ "specz_catalog.dec = np.array(coords_all.dec)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "d55d1ccc-0372-48c1-86c1-4122ec564de3",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "for index, row in counts_table.iterrows():\n",
+ " survey = row['survey']\n",
+ " query = f'survey == \"{survey}\" '\n",
+ " data = specz_catalog.query(query)\n",
+ " plt.figure(figsize=[15,5])\n",
+ " survey = row['survey']\n",
+ " query = f'survey == \"{survey}\" '\n",
+ " plt.subplot(121)\n",
+ " plt.plot(data.ra, data.dec, '.')\n",
+ " plt.plot(foot_df.foot_ra, foot_df.foot_dec, '-', color='darkorange')\n",
+ " plt.xlabel('R.A. (deg)')\n",
+ " plt.ylabel('Dec. (deg)')\n",
+ " plt.xlim(-180, 180)\n",
+ " plt.subplot(122) \n",
+ " sns.histplot(data.z, bins=50, stat='count', label=f'{row[\"survey\"]}: {row[\"count\"]} objects')\n",
+ " plt.xlabel('spec-$z$')#, fontsize=13)\n",
+ " plt.xlim(0,)\n",
+ " plt.legend()\n",
+ " plt.tight_layout()"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "specz",
+ "language": "python",
+ "name": "specz"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.12"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/docs/notebooks/public-training-set-des-dr2.ipynb b/docs/notebooks/public-training-set-des-dr2.ipynb
new file mode 100644
index 0000000..a670af2
--- /dev/null
+++ b/docs/notebooks/public-training-set-des-dr2.ipynb
@@ -0,0 +1,715 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "9d42adbb-55a5-4edd-aaae-2a5c2279e279",
+ "metadata": {},
+ "source": [
+ " \n",
+ "\n",
+ "## Photo-z Training Set\n",
+ "\n",
+ "Combination of the public collection of redshifts made available by spectroscopic surveys and fotometric data from DES DR2.\n",
+ "\n",
+ "\n",
+ "Contact: Julia Gschwend ([julia@linea.org.br](mailto:julia@linea.org.br))\n",
+ "
\n",
+ "
\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6c6d0276-13b5-4a2e-974d-0b3c664914f7",
+ "metadata": {},
+ "source": [
+ "#### Acknowledgments\n",
+ "If you use this dataset to generate scientific results, please add a reference to [Gschwend et al., 2018](https://ui.adsabs.harvard.edu/abs/2018A%26C....25...58G/abstract) and acknowledge LIneA in the acknowledgments section of your publication. For instance:\n",
+ "\n",
+ "'_This research used computational resources from the Associação Laboratório Interinstitucional de e-Astronomia (LIneA) with the financial support of INCT do e-Universo (Process no. 465376/2014-2)._'"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "dbbe57a5-fb05-4166-a2c3-dfd2b719f8ec",
+ "metadata": {},
+ "source": [
+ "#### Notes \n",
+ "\n",
+ "The characterization of the spectroscopic redshifts catalog is available in a separate notebook. If you have questions, feel free to contact me ([julia@linea.org.br](mailto:julia@linea.org.br)). \n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ce385d91-cffd-419d-b699-65e218e86d7b",
+ "metadata": {
+ "jp-MarkdownHeadingCollapsed": true,
+ "tags": []
+ },
+ "source": [
+ "The training set was created based on the spatial correspondence between the objects present in the reshift catalog described above and the object table (_coadd_objects_) of DES DR2, with a search radius of 1.0 _arcsec_, with the aim of including the columns of the set photometric measurements that are useful for calculating photo-z (apparent magnitudes and their respective errors). "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4e7ba2d5-4d08-460e-a1fd-c94cab0b1c50",
+ "metadata": {},
+ "source": [
+ "\n",
+ "--- \n",
+ "\n",
+ "## Sample characterization\n",
+ "\n",
+ "Check below a brief characterization of the data contained in the compiled collection of spectroscopic catalogs."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f7eeca28-7bfc-4449-9afd-b88b648d0418",
+ "metadata": {},
+ "source": [
+ "Requirements for this notebook:\n",
+ "\n",
+ "* **Auxiliary file**: [des-round19-poly.txt](https://github.com/kadrlica/skymap/blob/master/skymap/data/des-round19-poly.txt) (contours of the area covered by the survey, i.e., DES _footprint_, 2019 version).\n",
+ "* **View libraries**: seaborn, bokeh, holoviews\n",
+ "\n",
+ "_Download_ the file `des-round19-poly.txt` from the repository [kadrlica/skymap](https://github.com/kadrlica/skymap) on GitHub:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "b8c1f12b-4212-42cd-8271-fcbe881c9d7a",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "! wget https://raw.githubusercontent.com/kadrlica/skymap/master/skymap/data/des-round19-poly.txt "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5608ae7e-ca24-48dd-9bd3-3aa4b7d14089",
+ "metadata": {},
+ "source": [
+ "Imports and configs"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "16cf1c89-b929-4070-8a3e-e6966fd9589d",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "# General\n",
+ "import numpy as np\n",
+ "import pandas as pd\n",
+ "import matplotlib.pyplot as plt\n",
+ "import seaborn as sns\n",
+ "import tables_io\n",
+ "import psutil\n",
+ "import sys\n",
+ "\n",
+ "# Astropy\n",
+ "from astropy import units as u\n",
+ "from astropy.coordinates import SkyCoord\n",
+ "from astropy.units.quantity import Quantity\n",
+ "\n",
+ "# Bokeh\n",
+ "import bokeh\n",
+ "from bokeh.io import output_notebook, show, output_file, reset_output\n",
+ "from bokeh.models import ColumnDataSource, Range1d, HoverTool\n",
+ "from bokeh.models import CDSView, GroupFilter\n",
+ "from bokeh.plotting import figure, show, gridplot, output_notebook\n",
+ "from bokeh.models import Range1d, LinearColorMapper, ColorBar\n",
+ "from bokeh.transform import factor_cmap\n",
+ "\n",
+ "# HoloViews\n",
+ "import holoviews as hv\n",
+ "from holoviews import streams, opts\n",
+ "from holoviews.operation.datashader import datashade, dynspread\n",
+ "from holoviews.plotting.util import process_cmap\n",
+ "\n",
+ "\n",
+ "# Config\n",
+ "import warnings\n",
+ "warnings.filterwarnings('ignore')\n",
+ "%reload_ext autoreload \n",
+ "%autoreload 2 \n",
+ "%matplotlib inline \n",
+ "sns.set(color_codes=True, font_scale=1.5) \n",
+ "sns.set_style('whitegrid')\n",
+ "plt.rcParams.update({'figure.max_open_warning': 0})\n",
+ "hv.extension('bokeh')\n",
+ "output_notebook()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a38f3277-6e78-4a01-86a1-4e5545851ca9",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "print('Python version: ' + sys.version)\n",
+ "print('Numpy version: ' + np.__version__)\n",
+ "print('Bokeh version: ' + bokeh.__version__)\n",
+ "print('HoloViews version: ' + hv.__version__)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "69883b2d-b313-4fe7-8dfd-ab511dc3082a",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "def fmt(x):\n",
+ " return '{:.1f}%'.format(x)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cc778e08-57f9-4cff-966b-f2717c66ad6e",
+ "metadata": {},
+ "source": [
+ "Read DES footprint file `des-round19-poly.txt`:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "2c9c00a2-2b6c-4dd9-b021-df831bb20bd7",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "foot_ra, foot_dec = np.loadtxt('des-round19-poly.txt', unpack=True)\n",
+ "foot_coords = SkyCoord(ra=-foot_ra*u.degree, dec=foot_dec*u.degree, frame='icrs')\n",
+ "foot_df = pd.DataFrame({'foot_ra': np.array(foot_coords.ra.wrap_at(180*u.degree)), \n",
+ " 'foot_dec': np.array(foot_coords.dec)})"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "0de5e4d4-eed4-4d3c-8b06-f2175a5d4209",
+ "metadata": {},
+ "source": [
+ "Read training set file "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "107a8ac3-0e34-48a5-9ac5-744ab6275e88",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "training_set = tables_io.read('public_pz_training_set.pq')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "9b1472ed-265a-4ed2-8f4d-06fd71b57d66",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "type(training_set)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "ce2a47e1-7000-4d73-98ce-41a8807c6bb2",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "assert len(training_set) == 592493"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "2a9de0db-56e2-494b-9d29-e77be17335a8",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "training_set.info(memory_usage=\"deep\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "e2b58a35-3f36-4d6a-8b37-8cdbd57fb208",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "training_set.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "13763228-ad08-4f3c-8dda-3f1ab77ee20d",
+ "metadata": {},
+ "source": [
+ "Meaning of columns:\n",
+ "\n",
+ "| Column name | Meaning |\n",
+ "|--:|:--|\n",
+ "| **coadd_object_id**| Unique object identifier in the DES DR2 photometric catalog (_coadd_objects_ table). |\n",
+ "| **ra** | Right Ascension (degrees) |\n",
+ "| **dec** | Declination (degrees) |\n",
+ "| **z** | Redshift |\n",
+ "| **err_z** | Redshift error. When unavailable, replaced by 99.0 |\n",
+ "| **flag_des**| Standardized quality marker (details [above](#flags))|\n",
+ "| **survey** | Name of the project or survey of origin. |\n",
+ "| **flag_survey** | Original quality flag given by the origin survey. |\n",
+ "| **mag\\_auto\\_[g,r,i,z,y]\\_dered** | Apparent magnitude in bands [g, r, i, z, y], corrected for reddening |\n",
+ "| **magerr\\_auto\\_[g,r,i,z,y]** | Apparent magnitude error in bands [g, r, i, z, y] |"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7cefbe53-79bc-40f8-ac9e-e26463974a3c",
+ "metadata": {},
+ "source": [
+ "Compute colors $(g-r)$ e $(r-i)$ "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "321187e3-b377-46fb-9a04-e112dedffaf4",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "training_set['gmr'] = training_set['mag_auto_g_dered'] - training_set['mag_auto_r_dered']\n",
+ "training_set['rmi'] = training_set['mag_auto_r_dered'] - training_set['mag_auto_i_dered']"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d96fd0c3-fc12-4577-be0b-f7f9a11019ad",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "Basic statistics "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "e95af7c3-3724-4460-aa24-cf069ba567b3",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "training_set.describe()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "63bde5d3-e98c-4d10-8614-a63ce2930947",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "frac = 0.06\n",
+ "train_sample_for_plots = training_set.sample(frac=frac, axis='index')\n",
+ "assert len(train_sample_for_plots) == round(frac * len(training_set))\n",
+ "print(len(train_sample_for_plots))\n",
+ "train_sample_for_plots = training_set # comment this line to use a fraction of the sample "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "33b6d153-7431-4065-b997-79e7d9ffedc5",
+ "metadata": {},
+ "source": [
+ "--- \n",
+ "\n",
+ "#### Spatial Distribution \n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "0ffef202-3cb6-4817-bc0f-0e07cfb32d42",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "coords = SkyCoord(ra=-np.array(train_sample_for_plots.ra)*u.degree, \n",
+ " dec=np.array(train_sample_for_plots.dec)*u.degree, frame='icrs')\n",
+ "train_sample_for_plots.ra = np.array(coords.ra.wrap_at(180*u.degree))\n",
+ "train_sample_for_plots.dec = np.array(coords.dec)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "9c06d71c-6476-47fd-a3db-a879efe2547f",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%time\n",
+ "fig = plt.figure(figsize=[14,6])\n",
+ "ax = fig.add_subplot(111, projection='mollweide') \n",
+ "ra_rad = coords.ra.wrap_at(180 * u.deg).radian\n",
+ "dec_rad = coords.dec.radian\n",
+ "plt.plot(ra_rad, dec_rad, '.', alpha=0.1)\n",
+ "plt.plot(-np.radians(foot_ra), np.radians(foot_dec), '-', color='darkorange')\n",
+ "org=0.0\n",
+ "tick_labels = np.array([150, 120, 90, 60, 30, 0, 330, 300, 270, 240, 210])\n",
+ "tick_labels = np.remainder(tick_labels+360+org,360)\n",
+ "ax.set_xticklabels(tick_labels) # we add the scale on the x axis\n",
+ "ax.set_xlabel('R.A.')\n",
+ "ax.xaxis.label.set_fontsize(14)\n",
+ "ax.set_ylabel('Dec.')\n",
+ "ax.yaxis.label.set_fontsize(14)\n",
+ "ax.grid(True)\n",
+ "plt.tight_layout()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "0003d9dd-8758-49e5-8aa1-c2ef4284366d",
+ "metadata": {},
+ "source": [
+ "Redshift distribution"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "07201b66-43f4-46bc-bfb0-15cbd24ce423",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "redshift = hv.Dimension('z', label='spec-z', range=(0.0, 2.0))\n",
+ "(count, z_bin) = np.histogram(train_sample_for_plots.z, bins='fd')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "7242c54f-9501-40ad-aef8-478687289b7b",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "z_distribution = hv.Histogram((count, z_bin), kdims=redshift).opts(\n",
+ " title='Distribuição de redshifts', xlabel='spec-z', height=400, width=800) \n",
+ "z_distribution"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3405231c-0ced-4cae-b4a6-aacc4905a947",
+ "metadata": {},
+ "source": [
+ "#### Quality Flags"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "6e4330de-11cc-4a9f-9e57-76b7968234aa",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "training_set.flag_des.value_counts() "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "53f41d74-ac08-4dd4-a50d-175285fe6818",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "def fmt(x):\n",
+ " return '{:.1f}%'.format(x)\n",
+ "counts = pd.DataFrame(data={'flag_des':[len(training_set.query('flag_des ==3')), \n",
+ " len(training_set.query('flag_des ==4'))]}, index= [3, 4])\n",
+ "counts.plot.pie(y='flag_des', labels=None, autopct=fmt, colors=['darkorange', 'steelblue']) \n",
+ "counts"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7ed9d86c-0c8b-411a-aba7-0e459f74985c",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "Redshift distributions depending on the quality flag"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "fb59083e-3ae3-416c-a21c-97c638fb0ec8",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "(count4, z_bin4) = np.histogram(train_sample_for_plots.query('flag_des == 4').z, bins='fd')\n",
+ "z_distribution4 = hv.Histogram((count4, z_bin4), kdims=redshift).opts(\n",
+ " title='flag_des = 4', xlabel='spec-z', height=400, width=400, xlim=(0., 2.))\n",
+ "(count3, z_bin3) = np.histogram(train_sample_for_plots.query('flag_des == 3').z, bins='fd')\n",
+ "z_distribution3 = hv.Histogram((count3, z_bin3), kdims=redshift).opts(\n",
+ " title='flag_des = 3', color='darkorange', xlabel='spec-z', height=400, width=400, xlim=(0., 2.))\n",
+ "z_dist_by_flag = z_distribution4.options(height=350, width=450) + z_distribution3.options(height=350, width=450) \n",
+ "z_dist_by_flag"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "76c63ab9-ef6f-4e54-9d7d-3b769360ba09",
+ "metadata": {},
+ "source": [
+ "#### Characteristics of the photometric sample"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "96d9810a-8d38-4cdd-a5f6-a082fd3ed69f",
+ "metadata": {},
+ "source": [
+ "##### Magnitude distributions and their respective errors"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "284c51ed-a53a-4578-ba8d-76ef4bee56d8",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "bands = ['g', 'r', 'i', 'z', 'y']"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "7db8e426-6aef-4d2b-bca0-13a26678dfe6",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "fig = plt.figure(figsize=[12,4])\n",
+ "plt.subplot(1,2,1)\n",
+ "for band in bands:\n",
+ " plt.hist(train_sample_for_plots.query(f'mag_auto_{band}_dered != 99.')[f'mag_auto_{band}_dered'], \n",
+ " bins=30, histtype='step', lw=2, log=True)\n",
+ "plt.xlabel('magnitude')\n",
+ "plt.ylabel('counts')\n",
+ "plt.xlim(12,28)\n",
+ "plt.ylim(10,)\n",
+ "plt.subplot(1,2,2)\n",
+ "for band in bands:\n",
+ " plt.hist(train_sample_for_plots.query(f'mag_auto_{band}_dered != 99. & magerr_auto_{band} < 1.')[f'magerr_auto_{band}'], \n",
+ " bins=30, label=band, histtype='step', lw=2, log=True)\n",
+ "plt.xlabel('magnitude error')\n",
+ "plt.ylabel('counts')\n",
+ "plt.xlim(0,1)\n",
+ "plt.ylim(10,)\n",
+ "plt.legend(loc='upper right')\n",
+ "plt.tight_layout()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "822b9633-bb65-4dba-84d7-d55d9475561e",
+ "metadata": {},
+ "source": [
+ "##### Magnitude errors"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "4bad2a30-0036-4de9-9acc-34352bfb0fe7",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "plt.figure(figsize=[18,4])\n",
+ "for i, band in enumerate(bands): \n",
+ " plt.subplot(int(f'15{str(i+1)}'))\n",
+ " query = f'mag_auto_{band}_dered != 99. & magerr_auto_{band} < 2.'\n",
+ " plt.plot(train_sample_for_plots.query(query)[f'mag_auto_{band}_dered'],\n",
+ " train_sample_for_plots.query(query)[f'magerr_auto_{band}'], \n",
+ " '.', alpha=0.3, color='steelblue')\n",
+ " plt.xlabel(f'mag {band}')\n",
+ " if i == 0: \n",
+ " plt.ylabel('error')\n",
+ " plt.xlim(16, 28) \n",
+ " plt.ylim(0, 2)\n",
+ " plt.tight_layout()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ad56274d-2781-4694-bf41-b8a12ff57781",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "##### Magnitude X redshift"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "e58f6ccb-30ce-485d-81fa-ec431a510c00",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "clean = 'magerr_auto_i < 0.1 & mag_auto_g_dered != 99. & mag_auto_r_dered != 99. & mag_auto_i_dered != 99.'\n",
+ "train_sample_for_plots.query(clean, inplace=True)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "323c0c49-eabe-4b21-b426-533ad3e461da",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "mag_vs_z = hv.Scatter(train_sample_for_plots[['z', 'mag_auto_i_dered']]).opts(\n",
+ " toolbar='above', tools=['hover'], height=400, width=800, alpha=0.5, \n",
+ " size=2, xlim=(0,2), ylim=(14,24), xlabel='spec-z', ylabel='mag i')\n",
+ "mag_vs_z"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "09773eb5-cfd7-4803-9388-61d9fb8378eb",
+ "metadata": {},
+ "source": [
+ "##### CMD and color-color plots"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "04a8567d-9c67-4d1b-a0f8-914c3f3b8981",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "plot_style_bkh = dict(alpha=0.2,# color='steelblue',\n",
+ " marker='triangle', size=3,\n",
+ " xticks=5, yticks=5,\n",
+ " height=400, width=400,\n",
+ " toolbar='above')\n",
+ "plot_style = plot_style_bkh"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "4f4e1c33-c530-41c3-9c18-7ded6d2ee582",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "points = train_sample_for_plots"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "e300a251-3019-4db1-84aa-24af9c10d122",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "imag = hv.Dimension('mag_auto_i_dered', label='mag i', range=(12, 24))\n",
+ "gmr = hv.Dimension('gmr', label='(g-r)', range=(-0.8, 3.0))\n",
+ "col_mag = hv.Scatter(points, kdims=imag, vdims=gmr).opts(**plot_style)\n",
+ "col_mag = col_mag.hist(dimension=[imag, gmr], num_bins=100, adjoin=True)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "d65492be-b04a-4e5f-8434-bdc295800fa3",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "rmi = hv.Dimension('rmi', label='(r-i)', range=(-0.8, 2.5))\n",
+ "gmr = hv.Dimension('gmr', label='(g-r)', range=(-0.8, 3.5))\n",
+ "col_col = hv.Scatter(points, kdims=rmi, vdims=gmr).opts(**plot_style)\n",
+ "col_col = col_col.hist(dimension=[rmi, gmr], num_bins=100, adjoin=True)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "cf8ff47c-5381-455d-93d8-782e5370e75b",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "col_mag + col_col"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "specz",
+ "language": "python",
+ "name": "specz"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.12"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}