UofTCoders · SaraMati · Mar 11, 2019 · Mar 6, 2019 · Mar 6, 2019 · Mar 6, 2019
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,4 @@
+.Rproj.user
+.Rhistory
+.RData
+.Ruserdata
diff --git a/grad-course.Rproj b/grad-course.Rproj
@@ -0,0 +1,16 @@
+Version: 1.0
+
+RestoreWorkspace: Default
+SaveWorkspace: Default
+AlwaysSaveHistory: Default
+
+EnableCodeIndexing: Yes
+UseSpacesForTab: Yes
+NumSpacesForTab: 4
+Encoding: UTF-8
+
+RnwWeave: knitr
+LaTeX: pdfLaTeX
+
+AutoAppendNewline: Yes
+StripTrailingWhitespace: Yes
diff --git a/schedule.csv b/schedule.csv
@@ -0,0 +1,29 @@
+Week Number,Class number,Topic,Description
+1,1,"Introduction to course, to reproducibility, and to open scientific practices",
+1,2,Data analysis project setup and management,
+2,3,"Data management, storage, and structure (tidy data)","What tidy data is, how to save it (csv), don't edit raw data"
+2,4,Version control and collaboration with Git/GitHub,
+3,5,Introduction to Python,
+3,6,Best practices in programming in Python,
+4,7,Data wrangling in Python,Pandas
+4,8,Data visualization in Python and best practices,Seaborn
+5,9,Basic programming in Python,"Functions, DRY, conditionals"
+5,10,Exploratory data analysis,
+6,11,Basic statistics in Python,
+6,12,"Multivariate statistic techniques, high dimensional data","PCA, mixed models"
+7,13,Statistical learning in Python,sci-kit learn
+7,14,Creating reproducible documents with Jupyter Notebooks,
+8,15,Creating a pipeline from data wrangling to publication,"Integrate ideas behind project management, script dependencies, and reproducible documents"
+8,16,Publishing in the era of reproducibility and open science,"Git tags, Zenodo, preprint archives, open access"
+9,17,,
+9,18,,
+10,19,,
+10,20,,
+11,21,,
+11,22,,
+12,23,Project work,
+12,24,Project work,
+13,25,Project work,
+13,26,Project work,
+14,27,Project work,
+14,28,Project work,
diff --git a/sustainability.md b/sustainability.md
@@ -0,0 +1,31 @@
+---
+title: "Sustainability for course: Data Science in Biomedical Research"
+---
+
+There are several major aspects of this project and course that ensure its
+sustainability into the future:
+
+- This course will be openly licensed (CC-BY 4.0) and publicly accessible. All
+material can be freely copied, modified, and reused by anyone, including future
+instructors of this course.
+- Due to the format of the course (e.g. "participatory live-coding"), the learning
+material is fairly well structured, organized, and developed. This ensures that
+anyone taking over the course will have a fairly easy time instructing it.
+- Several of the interested instructors are collaborating on projects aimed at
+developing openly licensed teaching material that overlaps with this courses
+aims (Data Science for scientists). Therefore this course will continue to be
+actively developed and maintained in the foreseeable future, regardless of
+graduation or employment. See the projects organization's [GitHub] and [GitLab]
+repositories.
+
+It has been our experience with the [EEB RQM course] that once we developed the
+material, it alone was more than enough for new instructors to use and teach
+without much preparation. In the second year of the [EEB RQM course], two of
+the four instructors were new and had little difficulty teaching the material.
+
+[EEB R course]: https://uoftcoders.github.io/rcourse/
+[GitHub]: https://github.com/rostools
+[GitLab]: https://gitlab.com/rostools
+
+
+
diff --git a/syllabus.md b/syllabus.md
@@ -0,0 +1,67 @@
+---
+title: "Syllabus for 'Data science in Biomedical research: Reproducible quantitative analytics and pipelines'"
+---
+
+## Description
+
+Biomedical data includes a diverse range of data types from patient data from
+national registeries, to biomarker data, to genetic and sequence data, to
+electrophysiological recordings from human or animal models, to simulations,
+often covering multiple spatio-temporal scales. These data are usually high
+dimensional, noisy, and fragmented. Handling such data is challenging and
+understanding it is an important first step in a data-driven approach to build
+predictive or inferential models. This course introduces techniques to analyze
+biomedical data using the Python programing language. The focus of the course
+is to integrate reproducible and open scientific research principles such as
+the use of code to generate the results, having well-documented modular coding,
+correctness of procedure and chronology of execution, being transparent and open
+throughout the entire research process, and correct and cautious interpretation
+and presentation of the results.
+
+## Learning Outcomes
+
+- To develop a proficiency in coding and doing data analysis in Python.
+- To recognize the importance of and to apply "tidy data" principles.
+- To write reproducible, well-documented, and modular Python code.
+- To learn how to munge, wrangle, and management data reproducibly in Python.
+- To apply common statistical techniques to biological data, particularly high
+dimensional data.
+- To recognize the importance of and to ensure reproducibility in documents such
+as manuscripts and theses.
+- To generate publication quality outputs such as figures and documents that
+effectively communicate technical content.
+- To work in a productive and collaborative environment in a team-based project.
+
+## Target population
+
+Graduate students that are from different backgrounds and have entered either
+neuromodulation, neuroscience, or biomedical engineering programs.
+
+## Prerequisites
+
+- No programming experience in necessary.
+
+## Assessments
+
+Multiple, small assignments will be given out to test and reinforce learning of
+the material. A final team-based project will assess and reinforce students' 
+grasp of the material that was taught throughout the course.
+
+The final project will be to obtain an open dataset, formulate a research
+hypothesis, analyze the data, and write up a document, all while adhering to
+reproducible and open scientific guidelines. The project will include creating a
+project plan that includes scope, rationale, deliverables, and milestones. For a
+project to be successfully completed, we expect that:
+
+1. Team members have discussed and decided as a team on the dataset and on the
+roles of the members.
+1. Set up and (regularly) use a project repository on GitHub for the code and
+document.
+1. Python code is properly documented, structured, and written.
+1. Data has been properly wrangled and converted into a tidy format.
+1. The report is written in a Jupyter Notebook and that the output and results
+are completely reproducible.
+
+The progress of the project will be checked and assistance will be provided
+where needed.
+