Merge pull request #89 from UBC-MDS/fix-report

Fixed report
UBC-MDS · Dec 8, 2023 · eb4fbd3 · eb4fbd3
2 parents d463258 + 89e5330
commit eb4fbd3
Show file tree

Hide file tree

Showing 4 changed files with 51 additions and 23 deletions.
diff --git a/README.md b/README.md
@@ -87,13 +87,13 @@ See the [renv.lock file](https://github.com/UBC-MDS/speed_dating_analysis/blob/m
 
 The Speed Dating Analysis project is licensed under the Creative Common License [CC BY-NC-SA 4.0 Deed](https://creativecommons.org/licenses/by-nc-sa/4.0/). Please acknowledge and link to this webpage if you plan on using or adapting any part of this project. The software portion of this project is licensed under the MIT license. For a full description of the licenses used, please refer to [the license document in our project](https://github.com/wenyunie/speed_dating_analysis/blob/main/LICENSE).
 
-and assocated materials are licensed under the MIT license. Please acknowledge and link to this webpage if you plan on using or re-mixing any part of this project
+Please acknowledge and link to this webpage if you plan on using, re-mixing, or adapting any part of this project.
 
 ## Developer Notes
 
-**Note1:** If you would like to run only a portion or subset of the analyses, please open `analysis_script_and_output_bash_file.sh` and selectively run the commands in the root project folder in your terminal (if you are running the file locally) or the Rstudio Server terminal (if you are using a container).
+**Note 1:** If you would like to run only a portion or subset of the analyses, please open `Makefile` and selectively run the script commands in the root project folder in your terminal (if you are running the file locally) or the Rstudio Server terminal (if you are using a container).
 
-**Note2:** If the you plan to use the containerized solution after using the renv file, please remember to deactivate the .Rproj fist. Otherwise the activated .Rproj environment will be detected inside the container because of the .Rprofile file. This would overlay the self-contained container environment.
+**Note 2:** If the you plan to use the containerized solution after using the renv file, please either deactivate renv first by entering `renv::deactivate()` in the console OR remove the `.Rprofile` file. Otherwise the activated .Rproj environment will be detected inside the container and overwrite the self-contained container environment.
 
 ## References
 

diff --git a/analysis/02-methods.Rmd b/analysis/02-methods.Rmd
@@ -14,7 +14,13 @@ An observation or row in the data represents one set of ratings per ratee. For i
 
 ## Analysis
 
-We conducted a one-sample t-test to answer our research question (whether self-perception of attractiveness is accurate). Since observations in the original raw dataset violated assumptions of independence (i.e., each rater made multiple ratings and each ratee received multiple ratings), we decided to first process the data by averaging how each rater rated the attractiveness of each ratee. We then calculated a difference score by subtracting how others perceived an individual’s attractiveness from an individual’s rating of their own perceived attractiveness by others (i.e., own perception – other perception). As such, a positive score indicates that an individual was overly confident in their own perceived attractiveness.
+We conducted a one-sample t-test in order to examine our core research question of interest (i.e., whether self-perception of attractiveness is accurate). That is, we would take the difference in self- vs other-rated attractiveness and compare it against zero; if our difference score is statistically different from zero, then we would have sufficient evidence to reject the null hypothesis (i.e., that there is no difference between self- vs other-rated attractiveness). 
+
+### Data Pre-Processing
+
+Since observations in the original raw dataset violated assumptions of independence (i.e., each rater made multiple ratings and each ratee received multiple ratings), we decided to first process the data by averaging how each rater rated the attractiveness of each ratee. 
+
+We then calculated a difference score by subtracting how others perceived an individual’s attractiveness from an individual’s rating of their own perceived attractiveness by others (i.e., own perception – other perception). As such, a positive score indicates that an individual was overly confident in their own perceived attractiveness.
 
 Analysis and graphs were produced using the R programming language [@r_core_team_r_2019] as well as the following R packages: tidyverse [@wickham_welcome_2019], knitr [@xie_knitr_2023]. For the full code used to produce this report and analyses, please see: https://github.com/wenyunie/speed_dating_analysis.
 

diff --git a/analysis/03-analysis-plot.Rmd b/analysis/03-analysis-plot.Rmd
@@ -3,8 +3,6 @@ title: "Analysis and Result"
 output: bookdown::html_document2
 ---
 
-## Analysis
-
 ```{r message=FALSE, warning=FALSE, include=FALSE}
 library(here)
 library(tidyverse)
@@ -31,29 +29,42 @@ raw <- read.csv(paste0(here(),'/data/Speed_Dating_Data.csv'))
 data <- clean_speed_dating_dat(raw)
 ```
 
-The one-sample t-test on the difference scores revealed a significant difference between self-perceived attractiveness and the average perception by others (see Table \@ref(tab:paired)).
+### Exploratory Data Analysis
+
 
 ```{r violin, message=FALSE, warning=FALSE,echo=FALSE, fig.cap="Rating of attractiveness by other raters versus self. Error bars denote 95% bootstrap confidence interval."}
 violin_plot
 ```
 
 <br>
 
-The negative t-value (-8.5025) indicates that, on average, individuals tend to overestimate their own attractiveness compared to how others perceive them. The extremely low p-value (7.725e-16) supports the rejection of the null hypothesis, further emphasizing the substantial difference in self-perceived attractiveness.
+As shown in figure 1 below, our dependent variable of interest (attractiveness rating) is roughly bell-shaped aross both levels of our independent variable (self- vs other-rated). There is also no obvious skew or outliers in our data.
+
+The non-overalpping 95% bootstrap confidence intervals suggests that there may be a difference in self- vs other-rated attractiveness. We further examine this relationship below.
+
+
+### Statistical Analysis
 
-The 95 percent confidence interval for the true mean difference is (-0.9678015, -0.6040559) (see Table \@ref(tab:difference)). This interval does not include zero, reinforcing the conclusion that individuals exhibit a systematic bias by viewing themselves as more attractive than how others perceive them. This is bias is visually apparent in the distributions shown in Figure \@ref(fig:violin).
+The one-sample t-test on the difference scores revealed a significant difference between self-perceived attractiveness and the average perception by others (see Table \@ref(tab:paired)). 
+
+The negative t-value (`r round(result_paired_test$statistic[[1]], 3)`) indicates that, on average, individuals tend to overestimate their own attractiveness compared to how others perceive them. The extremely low p-value (`r signif(result_paired_test$p.value, 3)`) supports the rejection of the null hypothesis, further emphasizing the substantial difference in self-perceived attractiveness.
+
+The 95 percent confidence interval for the true mean difference is (`r round(result_paired_test$conf.int[1], 3)`, `r round(result_paired_test$conf.int[2], 3)`) (see Table \@ref(tab:difference)). This interval does not include zero, reinforcing the conclusion that individuals exhibit a systematic bias by viewing themselves as more attractive than how others perceive them. This is bias is visually apparent in the distributions shown in Figure \@ref(fig:violin).
 
 ```{r paired, message=FALSE, warning=FALSE,echo=FALSE}
 kable(tidy(result_paired_test), caption = "Paired Test Results")
 ```
 
+Below, we also provide the paired-sample t-test results. Note that paired t-tests are beyond the scope of our current learning, but mathematically, they should be equivalent to a one-sample t-test that examines whether the difference in ratings differ from zero. Indeed, our one-sample t-test and paired-sample t-test results converge.
+
+
 ```{r difference, message=FALSE, warning=FALSE,echo=FALSE}
 kable(tidy(result_diff_test), caption = "Difference Test Results")
 ```
 
 <br>
 
-An additional test was performed to check if the difference is a systematic and constant overrating of one's own attributes, or if the self-rating is also not accurate in the sense that it is not correlated to others' perception at all (see Table \@ref(tab:pearson)) and the correlation was visualized in Figure \@ref(fig:contour).
+An additional test was performed to examine if the difference is a systematic and constant overrating of one's own attributes, or if the self-rating is also not accurate in the sense that it is not correlated to others' perception at all (see Table \@ref(tab:pearson)) and the correlation was visualized in Figure \@ref(fig:contour).
 
 ```{r contour, message=FALSE, warning=FALSE, echo=FALSE, fig.cap="Contour plot of the relationship between self vs others' rating of attractiveness. The red dashed line (r = 0.2745) indicates a weak significant correlation between these two variables."}
 
@@ -62,7 +73,7 @@ contour_plot
 
 <br>
 
-The observed correlation is 0.2745, with a 95 percent confidence interval between 0.1863 and 1.000. The p-value of 3.892e-07 is smaller than the commonly used significance level of 0.05. Therefore, we have sufficient evidence to reject the null hypothesis in favor of the alternative hypothesis. The null hypothesis in this case is that the true correlation between self rating and others' perception is equal to 0. The 95 percent confidence interval does not include 0, further supporting the hypothesis of a significant correlation.
+The observed correlation is `r round(result_pearson_test$estimate[[1]], 3)`, with a 95 percent confidence interval between `r round(result_pearson_test$conf.int[1], 3)` and `r round(result_pearson_test$conf.int[2], 3)`. The p-value of `r signif(result_pearson_test$p.value, 3)` is smaller than the commonly used significance level of 0.05. Therefore, we have sufficient evidence to reject the null hypothesis in favor of the alternative hypothesis. The null hypothesis in this case is that the true correlation between self rating and others' perception is equal to 0. The 95 percent confidence interval does not include 0, further supporting the hypothesis of a significant correlation.
 
 ```{r pearson, message=FALSE, warning=FALSE,echo=FALSE}
 kable(tidy(result_pearson_test), caption = "Pearson Correlation Test Results")

diff --git a/analysis/analysis_report.html b/analysis/analysis_report.html