Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal additions, syllabus, sustainability, and schedule #11

Merged
merged 4 commits into from
Mar 11, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
.Rproj.user
.Rhistory
.RData
.Ruserdata
16 changes: 16 additions & 0 deletions grad-course.Rproj
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
Version: 1.0

RestoreWorkspace: Default
SaveWorkspace: Default
AlwaysSaveHistory: Default

EnableCodeIndexing: Yes
UseSpacesForTab: Yes
NumSpacesForTab: 4
Encoding: UTF-8

RnwWeave: knitr
LaTeX: pdfLaTeX

AutoAppendNewline: Yes
StripTrailingWhitespace: Yes
29 changes: 29 additions & 0 deletions schedule.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
Week Number,Class number,Topic,Description
1,1,"Introduction to course, to reproducibility, and to open scientific practices",
1,2,Data analysis project setup and management,
2,3,"Data management, storage, and structure (tidy data)","What tidy data is, how to save it (csv), don't edit raw data"
2,4,Version control and collaboration with Git/GitHub,
3,5,Introduction to Python,
3,6,Best practices in programming in Python,
4,7,Data wrangling in Python,Pandas
4,8,Data visualization in Python and best practices,Seaborn
5,9,Basic programming in Python,"Functions, DRY, conditionals"
5,10,Exploratory data analysis,
6,11,Basic statistics in Python,
6,12,"Multivariate statistic techniques, high dimensional data","PCA, mixed models"
7,13,Statistical learning in Python,sci-kit learn
7,14,Creating reproducible documents with Jupyter Notebooks,
8,15,Creating a pipeline from data wrangling to publication,"Integrate ideas behind project management, script dependencies, and reproducible documents"
8,16,Publishing in the era of reproducibility and open science,"Git tags, Zenodo, preprint archives, open access"
9,17,,
9,18,,
10,19,,
10,20,,
11,21,,
11,22,,
12,23,Project work,
12,24,Project work,
13,25,Project work,
13,26,Project work,
14,27,Project work,
14,28,Project work,
31 changes: 31 additions & 0 deletions sustainability.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
---
title: "Sustainability for course: Data Science in Biomedical Research"
---

There are several major aspects of this project and course that ensure its
sustainability into the future:

- This course will be openly licensed (CC-BY 4.0) and publicly accessible. All
material can be freely copied, modified, and reused by anyone, including future
instructors of this course.
- Due to the format of the course (e.g. "participatory live-coding"), the learning
material is fairly well structured, organized, and developed. This ensures that
anyone taking over the course will have a fairly easy time instructing it.
- Several of the interested instructors are collaborating on projects aimed at
developing openly licensed teaching material that overlaps with this courses
aims (Data Science for scientists). Therefore this course will continue to be
actively developed and maintained in the foreseeable future, regardless of
graduation or employment. See the projects organization's [GitHub] and [GitLab]
repositories.

It has been our experience with the [EEB RQM course] that once we developed the
material, it alone was more than enough for new instructors to use and teach
without much preparation. In the second year of the [EEB RQM course], two of
the four instructors were new and had little difficulty teaching the material.

[EEB R course]: https://uoftcoders.github.io/rcourse/
[GitHub]: https://github.com/rostools
[GitLab]: https://gitlab.com/rostools



67 changes: 67 additions & 0 deletions syllabus.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
---
title: "Syllabus for 'Data science in Biomedical research: Reproducible quantitative analytics and pipelines'"
---

## Description

Biomedical data includes a diverse range of data types from patient data from
national registeries, to biomarker data, to genetic and sequence data, to
electrophysiological recordings from human or animal models, to simulations,
often covering multiple spatio-temporal scales. These data are usually high
dimensional, noisy, and fragmented. Handling such data is challenging and
understanding it is an important first step in a data-driven approach to build
predictive or inferential models. This course introduces techniques to analyze
biomedical data using the Python programing language. The focus of the course
is to integrate reproducible and open scientific research principles such as
the use of code to generate the results, having well-documented modular coding,
correctness of procedure and chronology of execution, being transparent and open
throughout the entire research process, and correct and cautious interpretation
and presentation of the results.

## Learning Outcomes

- To develop a proficiency in coding and doing data analysis in Python.
- To recognize the importance of and to apply "tidy data" principles.
- To write reproducible, well-documented, and modular Python code.
- To learn how to munge, wrangle, and management data reproducibly in Python.
- To apply common statistical techniques to biological data, particularly high
dimensional data.
- To recognize the importance of and to ensure reproducibility in documents such
as manuscripts and theses.
- To generate publication quality outputs such as figures and documents that
effectively communicate technical content.
- To work in a productive and collaborative environment in a team-based project.

## Target population

Graduate students that are from different backgrounds and have entered either
neuromodulation, neuroscience, or biomedical engineering programs.

## Prerequisites

- No programming experience in necessary.

## Assessments

Multiple, small assignments will be given out to test and reinforce learning of
the material. A final team-based project will assess and reinforce students'
grasp of the material that was taught throughout the course.

The final project will be to obtain an open dataset, formulate a research
hypothesis, analyze the data, and write up a document, all while adhering to
reproducible and open scientific guidelines. The project will include creating a
project plan that includes scope, rationale, deliverables, and milestones. For a
project to be successfully completed, we expect that:

1. Team members have discussed and decided as a team on the dataset and on the
roles of the members.
1. Set up and (regularly) use a project repository on GitHub for the code and
document.
1. Python code is properly documented, structured, and written.
1. Data has been properly wrangled and converted into a tidy format.
1. The report is written in a Jupyter Notebook and that the output and results
are completely reproducible.

The progress of the project will be checked and assistance will be provided
where needed.