Skip to content

Commit

Permalink
added data tables
Browse files Browse the repository at this point in the history
closes #32
  • Loading branch information
vrooje committed Apr 15, 2016
1 parent 4beed66 commit 74c33a2
Show file tree
Hide file tree
Showing 5 changed files with 276 additions and 7 deletions.
Binary file modified gzcandels-datapaper.pdf
Binary file not shown.
17 changes: 10 additions & 7 deletions gzcandels-datapaper.tex
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
\documentclass[useAMS,usenatbib]{mn2e}
%\documentclass[twocolumn]{emulateapj}
\usepackage{graphicx,natbib,color,multirow,amsmath}
\usepackage{graphicx,natbib,color,multirow,amsmath,deluxetable}
\voffset=-0.8in

\definecolor{titlecol}{rgb}{0,0,1}
Expand Down Expand Up @@ -66,7 +66,7 @@
\def\gtrsim{\mathrel{\hbox{\rlap{\hbox{\lower3pt\hbox{$\sim$}}}\hbox{\raise2pt\hbox{$>$}}}}}


\newcommand\nodata{ ~$\cdots$~ }
%\newcommand\nodata{ ~$\cdots$~ }

\makeatletter
\def\@xfootnote[#1]{%
Expand Down Expand Up @@ -492,12 +492,12 @@ \subsubsection{Consensus-based classifier weighting}

Averaging over a classifier's individual consistency values for all classifications effectively downweights those contributions from classifiers whose classifications regularly diverge from the consensus whilst preserving the diversity of classifications from classifiers who are \emph{on average} consistent with each other. It also allows for the classifications of skilled classifiers to remain highly weighted even on difficult subjects where the individual consensus is skewed (e.g., if an image is very noisy or if a nearby artifact is distracting to less experienced classifiers).

{\notebsm The classifier weight is then calculated as
The classifier weight is then calculated as
\begin{equation}
w = \min \left(1.0,(\mkappamean / 0.6)^{8.5} \right) ,
\label{eqn-weight}
\end{equation}
a formulation that preserves a uniform weighting for any classifier with $\mkappamean \geq 0.6$ and downweights those with a lower consistency rating.}
a formulation that preserves a uniform weighting for any classifier with $\mkappamean \geq 0.6$ and downweights those with a lower consistency rating.

The weighted consensus classifications are then calculated for each subject by summing the weighted votes for each task and response between task T01 and T16, and reporting the vote fractions $f$ for each. (Although the classifications for task T00 are included in the computation of the consensus-based weights, the vote fractions for task T00 are not re-computed using the consensus-based weights.) As the classifier weights are calculated via comparison with the consensus, which leads to a new consensus, this method can be iterated until the classifier weights converge to a stable value.

Expand Down Expand Up @@ -564,7 +564,7 @@ \subsection{Use of Classifications in Practice}\label{sec:usage}

\subsection{Data release and ``clean'' samples}\label{sec:release}

This paper includes the release of the raw and weighted classifications for each of the $49,555$ subjects in the Galaxy Zoo CANDELS sample. In addition to each raw and weighted vote fraction for each task, we include the raw and weighted number of answers to each task, as well as the total raw and weighted classifier count for each subject. This combines for a total of 136 quantities for each subject, not including the subject ID or any other metadata. {\notebsm This is too much information to present a meaningful sample table in print here.} The structure of the data for each task number $NN$ with $i = {0 {\rm\ to\ } n-1}$ responses is as follows:
This paper includes the release of the raw and weighted classifications for each of the $49,555$ subjects in the Galaxy Zoo CANDELS sample. In addition to each raw and weighted vote fraction for each task, we include the raw and weighted number of answers to each task, as well as the total raw and weighted classifier count for each subject. This combines for a total of 136 quantities for each subject, not including the subject ID or any other metadata. The structure of the data for each task number $NN$ with $i = {0 {\rm\ to\ } n-1}$ responses is as follows:
%oh god it's really late and I just said "thusly"

% THUSLY REMOVED - KWW
Expand Down Expand Up @@ -601,12 +601,14 @@ \subsection{Data release and ``clean'' samples}\label{sec:release}

\end{itemize}

The sum of raw {\tt \_frac} fractions adds to 1.0, as does the sum of {\tt \_weighted\_frac} fractions. Multiplying the {\tt \_frac} values (raw fractions) by the {\tt \_count} (raw classifier counts) will retrieve the number of people who gave a specific answer; likewise with weighted answer counts from {\tt \_weighted\_frac} and {\tt \_weight}. As the consensus-based classifier weighting described in Section \ref{sec:weighting} assigns a weight of $w \leq 1$ to each classifier, the weighted vote count for tasks T01--T16 must be less than or equal to the raw vote count for those tasks. While the raw vote counts and fractions are provided for completeness, we recommend that users of this data set use the weighted fractions and counts.
The sum of raw {\tt \_frac} fractions adds to 1.0, as does the sum of {\tt \_weighted\_frac} fractions. Multiplying the {\tt \_frac} values (raw fractions) by the {\tt \_count} (raw classifier counts) will retrieve the number of people who gave a specific answer; likewise with weighted answer counts from {\tt \_weighted\_frac} and {\tt \_weight}. As the consensus-based classifier weighting described in Section \ref{sec:weighting} assigns a weight of $w \leq 1$ to each classifier, the weighted vote count for tasks T01--T16 must be less than or equal to the raw vote count for those tasks. While the raw vote counts and fractions are provided for completeness, we recommend that users of this data set use the weighted fractions and counts. The raw and weighted classifications are presented in Table \ref{table:data-main}.

In addition to the vote fractions for each subject, we provide a set of flags for each subject that indicates its member or non-member status in a ``clean'' sample of galaxies of a specific type. We select separate clean samples of smooth, featured, clumpy, edge-on, and spiral galaxies. These samples contain exemplars of each galaxy type with minimal contamination of the sample --- as a result, samples selected with the flags will be highly incomplete, but also highly pure. They are selected according to vote fraction and vote count thresholds given in Table \ref{table:clean}.

We provide these flags for the convenience of the end user, but we additionally encourage those wishing to use Galaxy Zoo classifications to investigate whether a different set of thresholds would be optimal for their own science case.

%\input{table-data-main.tex}
\input{table_data_main_cols_all.tex}

\begin{table}
\begin{tabular}{@{}lllr}
Expand Down Expand Up @@ -664,8 +666,9 @@ \subsection{Depth Corrections}\label{sec:depth}

In addition to the release of classification data described in Section \ref{sec:release} above, we additionally present these ``corrected'', weighted classifications for each of the 8,130 subjects with deep exposures but for which we do not also have separate wide-field depth classifications, as well as the measured wide-field depth classifications for the measured-correction subset, for a total of 10,648 morphological classifications of deep-field subjects corrected to the wide-field average depth.

For these subjects, the wide-field depth classifications are given in a separate table in the data release and labelled in the data catalogs as described in Section \ref{sec:release}, except with an additional {\small \tt \_deepcorr} added to each relevant weighted-classification column. For example, the wide-field vote fraction for classifiers indicating an answer of `Features or Disk' to Task T00 is labelled {\small \tt t00\_smooth\_or\_featured\_a0\_smooth\_weighted\_frac\_deepcorr}, which is depth-corrected from the deep-exposure classification indicated in the {\small \tt t00\_smooth\_or\_featured\_a0\_smooth\_weighted\_frac} column. For those investigating science questions where it is advantageous to consider classifications from images of comparable depth across an entire sample, we recommend using the {\small \tt \_deepcorr} classifications for subjects in the ``deep'' fields.
For these subjects, the wide-field depth classifications are given in Table \ref{table:data-depthcorr} and labelled in the data catalogs as described in Section \ref{sec:release}, except with an additional {\small \tt \_deepcorr} added to each relevant weighted-classification column. For example, the wide-field vote fraction for classifiers indicating an answer of `Features or Disk' to Task T00 is labelled {\small \tt t00\_smooth\_or\_featured\_a0\_smooth\_weighted\_frac\_deepcorr}, which is depth-corrected from the deep-exposure classification indicated in the {\small \tt t00\_smooth\_or\_featured\_a0\_smooth\_weighted\_frac} column. For those investigating science questions where it is advantageous to consider classifications from images of comparable depth across an entire sample, we recommend using the {\small \tt \_deepcorr} classifications for subjects in the ``deep'' fields.

\input{table_data_deepcorr_cols_all.tex}



Expand Down
24 changes: 24 additions & 0 deletions refs.bib
Original file line number Diff line number Diff line change
Expand Up @@ -9386,6 +9386,30 @@ @ARTICLE{ravindranath04
}






___________________________________________________________________________
Euclid main reference, at least for now until it's doing science
@ARTICLE{refregier10,
author = {{Refregier}, A. and {Amara}, A. and {Kitching}, T.~D. and {Rassat}, A. and
{Scaramella}, R. and {Weller}, J. and {Euclid Imaging Consortium}, f.~t.
},
title = "{Euclid Imaging Consortium Science Book}",
journal = {ArXiv e-prints},
archivePrefix = "arXiv",
eprint = {1001.0061},
primaryClass = "astro-ph.IM",
keywords = {Astrophysics - Instrumentation and Methods for Astrophysics, Astrophysics - Cosmology and Extragalactic Astrophysics},
year = 2010,
month = jan,
adsurl = {http://adsabs.harvard.edu/abs/2010arXiv1001.0061R},
adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}

___________________________________________________________________________
SDSS QSO template and bolometric corrections
Expand Down
69 changes: 69 additions & 0 deletions table_data_deepcorr_cols_all.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
\begin{table*}
\resizebox{\textwidth}{!}{%
\begin{tabular}{@{}lccccccc}
\hline
\multicolumn{1}{l}{Data Column Name} &
\multicolumn{6}{c}{Subjects} &
\multicolumn{1}{c}{}
\\
\hline
\hline
ID & GDS\_12132 & GDS\_15834 & GDS\_17388 & GDS\_17613 & GDS\_20321 & GDS\_8970 & \\
RA & 53.099782 & 53.075878 & 53.126442 & 53.192374 & 53.159091 & 53.142340 & \\
Dec & -27.799512 & -27.768666 & -27.756544 & -27.752376 & -27.728037 & -27.827641 & \\
t00\_smooth\_or\_featured\_a0\_smooth\_weighted\_frac\_deepcorr & 0.67 & 0.55 & 0.37 & 0.70 & 0.36 & 0.64 & \\
t00\_smooth\_or\_featured\_a1\_features\_weighted\_frac\_deepcorr & 0.05 & 0.16 & 0.47 & 0.07 & 0.18 & 0.18 & \\
t00\_smooth\_or\_featured\_a2\_artifact\_weighted\_frac\_deepcorr & 0.28 & 0.28 & 0.16 & 0.23 & 0.45 & 0.18 & \\
t01\_how\_rounded\_a0\_completely\_weighted\_frac\_deepcorr & 0.84 & 0.32 & 0.87 & 0.59 & 0.26 & 0.43 & \\
t01\_how\_rounded\_a1\_inbetween\_weighted\_frac\_deepcorr & 0.15 & 0.67 & 0.13 & 0.36 & 0.72 & 0.51 & \\
t01\_how\_rounded\_a2\_cigarshaped\_weighted\_frac\_deepcorr & 0.01 & 0.01 & 0.00 & 0.05 & 0.02 & 0.06 & \\
t02\_clumpy\_appearance\_a0\_yes\_weighted\_frac\_deepcorr & 1.00 & 0.48 & 0.41 & 0.74 & 0.58 & 0.48 & \\
t02\_clumpy\_appearance\_a1\_no\_weighted\_frac\_deepcorr & 0.00 & 0.52 & 0.59 & 0.26 & 0.42 & 0.52 & \\
t03\_how\_many\_clumps\_a0\_1\_weighted\_frac\_deepcorr & 1.00 & 0.06 & 0.55 & 0.28 & 0.03 & 0.25 & \\
t03\_how\_many\_clumps\_a1\_2\_weighted\_frac\_deepcorr & 0.00 & 0.50 & 0.00 & 0.09 & 0.33 & 0.03 & \\
t03\_how\_many\_clumps\_a2\_3\_weighted\_frac\_deepcorr & 0.00 & 0.14 & 0.03 & 0.25 & 0.26 & 0.03 & \\
t03\_how\_many\_clumps\_a3\_4\_weighted\_frac\_deepcorr & 0.00 & 0.02 & 0.07 & 0.02 & 0.02 & 0.22 & \\
t03\_how\_many\_clumps\_a4\_5\_plus\_weighted\_frac\_deepcorr & 0.00 & 0.07 & 0.35 & 0.20 & 0.04 & 0.04 & \\
t03\_how\_many\_clumps\_a5\_cant\_tell\_weighted\_frac\_deepcorr & 0.00 & 0.21 & 0.00 & 0.16 & 0.32 & 0.43 & \\
t04\_clump\_configuration\_a0\_straight\_line\_weighted\_frac\_deepcorr & 0.00 & 0.02 & 0.00 & 0.02 & 0.00 & 0.00 & \\
t04\_clump\_configuration\_a1\_chain\_weighted\_frac\_deepcorr & 0.00 & 0.11 & 0.00 & 0.34 & 0.00 & 0.07 & \\
t04\_clump\_configuration\_a2\_cluster\_or\_irregular\_weighted\_frac\_deepcorr & 0.00 & 0.87 & 0.60 & 0.63 & 1.00 & 0.62 & \\
t04\_clump\_configuration\_a3\_spiral\_weighted\_frac\_deepcorr & 0.00 & 0.00 & 0.40 & 0.01 & 0.00 & 0.31 & \\
t05\_is\_one\_clump\_brightest\_a0\_yes\_weighted\_frac\_deepcorr & 0.00 & 0.25 & 0.40 & 0.42 & 0.49 & 0.38 & \\
t05\_is\_one\_clump\_brightest\_a1\_no\_weighted\_frac\_deepcorr & 0.00 & 0.75 & 0.60 & 0.58 & 0.51 & 0.62 & \\
t06\_brightest\_clump\_central\_a0\_yes\_weighted\_frac\_deepcorr & 0.00 & 0.28 & 1.00 & 1.00 & 0.51 & 0.18 & \\
t06\_brightest\_clump\_central\_a1\_no\_weighted\_frac\_deepcorr & 0.00 & 0.72 & 0.00 & 0.00 & 0.49 & 0.82 & \\
t07\_galaxy\_symmetrical\_a0\_yes\_weighted\_frac\_deepcorr & 0.00 & 0.04 & 0.66 & 0.05 & 0.00 & 0.34 & \\
t07\_galaxy\_symmetrical\_a1\_no\_weighted\_frac\_deepcorr & 1.00 & 0.96 & 0.34 & 0.95 & 1.00 & 0.66 & \\
t08\_clumps\_embedded\_larger\_object\_a0\_yes\_weighted\_frac\_deepcorr & 0.61 & 0.36 & 0.66 & 0.48 & 0.35 & 0.17 & \\
t08\_clumps\_embedded\_larger\_object\_a1\_no\_weighted\_frac\_deepcorr & 0.39 & 0.64 & 0.34 & 0.52 & 0.65 & 0.83 & \\
t09\_disk\_edge\_on\_a0\_yes\_weighted\_frac\_deepcorr & 0.00 & 0.06 & 0.05 & 0.00 & 0.72 & 0.40 & \\
t09\_disk\_edge\_on\_a1\_no\_weighted\_frac\_deepcorr & 0.00 & 0.94 & 0.95 & 0.00 & 0.28 & 0.60 & \\
t10\_edge\_on\_bulge\_a0\_yes\_weighted\_frac\_deepcorr & 0.00 & 0.00 & 1.00 & 0.00 & 1.00 & 0.50 & \\
t10\_edge\_on\_bulge\_a1\_no\_weighted\_frac\_deepcorr & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.50 & \\
t11\_bar\_feature\_a0\_yes\_weighted\_frac\_deepcorr & 0.00 & 0.01 & 0.05 & 0.00 & 0.00 & 0.00 & \\
t11\_bar\_feature\_a1\_no\_weighted\_frac\_deepcorr & 0.00 & 0.99 & 0.95 & 0.00 & 1.00 & 1.00 & \\
t12\_spiral\_pattern\_a0\_yes\_weighted\_frac\_deepcorr & 0.00 & 0.00 & 0.29 & 0.00 & 1.00 & 0.00 & \\
t12\_spiral\_pattern\_a1\_no\_weighted\_frac\_deepcorr & 0.00 & 1.00 & 0.71 & 0.00 & 0.00 & 1.00 & \\
t13\_spiral\_arm\_winding\_a0\_tight\_weighted\_frac\_deepcorr & 0.00 & 0.00 & 0.83 & 0.00 & 1.00 & 0.00 & \\
t13\_spiral\_arm\_winding\_a1\_medium\_weighted\_frac\_deepcorr & 0.00 & 0.00 & 0.17 & 0.00 & 0.00 & 0.00 & \\
t13\_spiral\_arm\_winding\_a2\_loose\_weighted\_frac\_deepcorr & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 & \\
t14\_spiral\_arm\_count\_a0\_1\_weighted\_frac\_deepcorr & 0.00 & 0.00 & 0.17 & 0.00 & 0.00 & 0.00 & \\
t14\_spiral\_arm\_count\_a1\_2\_weighted\_frac\_deepcorr & 0.00 & 0.00 & 0.68 & 0.00 & 0.00 & 0.00 & \\
t14\_spiral\_arm\_count\_a2\_3\_weighted\_frac\_deepcorr & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 & \\
t14\_spiral\_arm\_count\_a3\_4\_weighted\_frac\_deepcorr & 0.00 & 0.00 & 0.17 & 0.00 & 0.00 & 0.00 & \\
t14\_spiral\_arm\_count\_a4\_5\_plus\_weighted\_frac\_deepcorr & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 & \\
t14\_spiral\_arm\_count\_a5\_cant\_tell\_weighted\_frac\_deepcorr & 0.00 & 0.00 & -0.01 & 0.00 & 1.00 & 0.00 & \\
t15\_bulge\_prominence\_a0\_no\_bulge\_weighted\_frac\_deepcorr & 0.00 & 0.75 & 0.00 & 0.00 & 0.00 & 1.00 & \\
t15\_bulge\_prominence\_a1\_obvious\_weighted\_frac\_deepcorr & 0.00 & 0.20 & 0.62 & 0.00 & 1.00 & 0.00 & \\
t15\_bulge\_prominence\_a2\_dominant\_weighted\_frac\_deepcorr & 0.00 & 0.05 & 0.38 & 0.00 & 0.00 & 0.00 & \\
t16\_merging\_tidal\_debris\_a0\_merging\_weighted\_frac\_deepcorr & 0.01 & 0.10 & 0.35 & 0.09 & 0.15 & 0.54 & \\
t16\_merging\_tidal\_debris\_a1\_tidal\_debris\_weighted\_frac\_deepcorr & 0.02 & 0.02 & 0.01 & 0.07 & 0.10 & 0.04 & \\
t16\_merging\_tidal\_debris\_a2\_both\_weighted\_frac\_deepcorr & 0.01 & 0.01 & 0.04 & 0.02 & 0.05 & 0.03 & \\
t16\_merging\_tidal\_debris\_a3\_neither\_weighted\_frac\_deepcorr & 0.96 & 0.87 & 0.60 & 0.82 & 0.70 & 0.39 & \\
$\ldots$ \\
\hline
\end{tabular}}
\caption{Depth-corrected classifications for the ``measured-correction'' sample defined in Section \ref{sec:depth}. The complete version of this table is available in electronic form and at http://data.galaxyzoo.org. The printed table shows a transposed subset of the full table to illustrate its format and content.}
\label{table:data-depthcorr}
\end{table*}
Loading

0 comments on commit 74c33a2

Please sign in to comment.