Skip to content

Latest commit

 

History

History
88 lines (73 loc) · 3.32 KB

README.md

File metadata and controls

88 lines (73 loc) · 3.32 KB

General Practices for Reproducibility

Myeong Lee ([email protected])

This seminar focuses on popular scripting languages and tools for data analysis - Python, R, and web development platforms, and the management of them using Github. This covers only part of reproducibilty topics.

Basic premises

General Things to Concern for Reproducibility

  • Folder structure matters.
  • Markdown documentation
    • Userful tools: MacDown, MarkdownPad
    • Rmd: R Markdown
    • Jupyter: Supporting Markdown for Python and R
    • Github: Markdown is the default format for README files.
    • Using Markdown pages as index to ohter resources (e.g., a link in a MD file -> a Google Drive folder)
  • A project introduction webpage using Github
  • Docstring
  • Testing
  • Web-based presentations of the project
  • Development environment

General Folder Structure

+src
  -R
  -python
  -jupyter	
+doc
  -...Markdown documents
+data
  -input
  -results (empty)
+html
-vm (Vagrant, Docker, or other environment configuration files)

.gitignore (including confidential files and script results)
README.md (providing entry point to other resources and general descriptions)
LICENSE.md

R

  • .Rmd rather than .R
    • Good documentation of each code block.
    • Can export the overall work as a HTML file.
    • When running scripts on clouds, .R might work better.
  • Make functions if possible
  • R Docstring
  • In-line comments
  • Specify the R version correctly

Python

  • Jupyter for development along with Markdown comments (Anaconda)
  • Virtual environment for converting different versions of Python (e.g., Anaconda Tutorial)
  • Once a set of functions are completed and ready for distribution, convert them to .py with docstrings, and save them in a separate location so they can be used by just importing the package (e.g., src/python/)
  • Python Docstring
  • Automatic testing
  • In-line comments
  • Specify the Python version correctly (2.x and 3.x are a LOT different).

Web Applications

  • Specify PHP, Apache, Database, and Javascript versions correctly
  • Provide step-by-step instructions to set up the development environment.
  • A better way of making system configurations consistent: Virtual Machines + Auto Configurations
  • It's the best if you can provide software architecture docs
    • UML
    • Functions documentation.
  • In-line comments

Contributions are Welcomed

This seminar covers only part of the reproducibility topics, so any further practicies/concerns through pull requests are welcomed.