Skip to content

Latest commit

 

History

History
163 lines (109 loc) · 13.4 KB

environment-setup.md

File metadata and controls

163 lines (109 loc) · 13.4 KB

💻 Setting up your development environment

There are many guides to setting up your computer for development. Each has its own merits. At DataMade, we perform a mix of maintaining legacy projects and developing new projects, often in parallel. Our toolkit is therefore optimized for managing many, isolated versions of packages and getting up and running on new projects quickly.

Contents

When possible, we prefer installing binaries from a package manager to building packages from source. For Mac users, most of these tools are available as Homebrew packages. Homebrew is by far the easiest way to manage packages on MacOS.

New Computer

If you are replacing your computer, you should install a fresh copy of macOS, and then use the Migration Assistant to transfer all your data and settings to your new computer in one go. Make sure you take note of your DataMade Gmail account password to sign into Chrome if that's your browser of choice, and your LastPass master password to access your accounts.

Once you do that and you have checked that your new computer is good to go, you can safely wipe your old computer clean.

Version control

We use GitHub and Git to keep our work under version control. Note that we prefer the git CLI to the GitHub desktop GUI. The Git CLI is built-in on MacOS and most Linux distributions, usually with an acceptably recent version; to check whether you already have Git installed, run which git.

Packages

Text editor

DataMade developers use a variety of different text editors, and we leave it up to you to decide which one you prefer to use. However, there are two configurations that we encourage you to make no matter which editor you use:

  1. Automatically trim trailing whitespace
  2. Set default tab size for Python files to 4, and set default HTML/JavaScript tab size to 2

Historically, Sublime Text has been the most widely-used editor at DataMade, so the following instructions will demonstrate how to set these settings for Sublime Text.

In the nav, under "Sublime Text," you will find a "Preferences" fly open, where you will see "Settings - User." Here, you can override the default settings (i.e., do not make changes to the "Settings - Default" file). In the User file, add two things:

    "trim_trailing_white_space_on_save": false,
    "tab_size": 4,

Note: You are welcome to explore other text editor options, e.g., Atom, Vim, etc.

Packages

Python

At DataMade, you'll run most Python processes in containers. However, it's still useful to have a fresh install of Python on your machine to keep your system Python isolated (it's important!) and to use a later version of Python (if you're on a Mac, your system Python is probably version 2.7).

When you aren't using containers, DataMade recommends you conduct Python work in virtual environments (virtualenvs). Virtualenvs help enforce dependency separation between your projects and make it a lot easier for other users to replicate your work on their computers. The Python ecosystem contains a lot of options for managing your environments, from the built-in virtualenv package to bundled package and environment management with conda. We like virtualenvwrapper, which provides a few convenience functions you can use from your terminal to create, activate, deactivate, and remove virtual environments.

Finally, to install packages in your environment, you'll need pip, the Python package installer.

The simplest way to manage Python with minimal headaches is to do a clean install of Python via homebrew or apt, then get pip and virtualenvwrapper running on your fresh version of Python.

Optionally, create a global virtual environment for general utility packages:

mkvirtualenv gus  # that is, generally useful stuff

workon gus whenever you want to use these "global" packages. More often, you'll want to create project-specific virtual environments, and workon those environments during development.

Packages

  • Python 3
  • pip
  • virtualenv
  • virtualenvwrapper
    • virtualenvwrapper contains useful shortcut commands for all stages of virtualenv use. Install it, making sure to follow the instructions for editing your shell startup file.
    • If you run into ownership errors and you installed Python via Homebrew, add VIRTUALENVWRAPPER_PYTHON=/usr/local/bin/python3 to your shell startup file prior to the lines you added for virtualenvwrapper, and try again.

Docker

Containers are a popular, modern approach to packaging and running software. We use the Docker engine to create, run, and destroy containers for our applications during local development. This makes it infinitely easier to manage dependencies across 5+ years of web applications.

Packages

Security

Cryptographic security is essential for developers. You don't have to know the guts of how these tools work, but you should have them installed and get comfortable using them.

To enable hard drive encryption on a Macbook, go to System Preferences > Security & Privacy > FileVault. Turn on FileVault and save a recovery key in a safe place, such as LastPass. The process will take some time, probably about a day depending on what you're doing and what model computer you have. It can run in the background as long as your computer is awake and connected to power.

Packages

  • SSH
    • Secure Shell, a protocol for communicating securely over unsecured networks. We use it to push and pull from Git remotes and to access our servers. It comes pre-installed as a command line tool on all Mac and Linux distributions; generate an SSH key if you haven't already and follow the instructions for adding it to your GitHub account.
  • GPG
    • GNU Privacy Guard, a command line tool for encrypting and decrypting files. Mac users can brew install gnupg2. Then, configure your key by hand.
  • Blackbox
    • StackExchange's open-source CLI for keeping secrets secure under public version control. Follow our excellent guide (internal link) to use it.

Data

We try to maintain a consistent and standardized toolkit for all of our data work. We know that there are many good options for working with data, and we are always open to hearing arguments for new additions to this toolkit. But these tools have stood the test of time, and you'll see them crop up over and over in DataMade's work.

🚨 Note: If you're working on a new DataMade application, most of these dependencies (e.g., Postgres) should be containerized, i.e., you don't need to install them directly on your computer, and you can skip this section. If you're working on a legacy application that does not include containerization artifacts, read on for our installation tips.

Packages

  • Bash and basic Unix tools
    • Comes installed with OS X, macOS, Windows 10, and all Linux distributions.
  • PostgreSQL
    • A powerful open-source database engine (also known as Postgres). There are a million ways to download and manage Postgres. If you're writing a new application, this dependency should be containerized, i.e., you don't need to install it directly on your machine. but you'll be best off installing it with your package manager and following their Getting Started guide to configure it.
      • Many of our database configurations assume your installation of Postgres has a postgres database owned by a postgres user. After you've installed Postgres via your favorite package manager (probably brew install postgresql), run the following commands from your terminal:

        createuser -s postgres # create postgres superuser
        createdb -O postgres postgres # create postgres database owned by postgres user

        If you get a "database already exists" error:

        psql # log in to postgres
        alter database postgres owner to postgres; # make postgres the owner of the postgres database
  • csvkit
    • Command line tools for working with CSVs, the most common (and arguably the best) file format for spreadsheets. It's built on Python, so you can install it by running pip install csvkit in your gus virutalenv.

Geospatial data

🚨 Note: If you're working on a new DataMade application, most of these dependencies (e.g., PostGIS) should be containerized, i.e., you don't need to install them directly on your computer, and you can skip this section. If you're working on a legacy application that does not include containerization artifacts, read on for our installation tips.

Packages

  • PostGIS
    • A geospatial plugin for Postgres. We do lots of geographic work, so it's worth installing this as soon as you have Postgres up and running. Ignore the installers and install with your favorite package manager; make sure to install the version that corresponds to your version of Postgres, and remember that PostGIS must be activated in any database that needs to use it by running the SQL command CREATE EXTENSION postgis.
  • GDAL
    • A set of command line tools for modifying and converting geospatial data. If you're on a Mac, make your life easier and install it with homebrew. (If you installed PostGIS via homebrew, you got GDAL, too.)
  • Optional: QGIS

Static sites

Most of our sites are dynamic and built on Django, but sometimes we deploy small static sites (like datamade.us) using Jekyll, a site generator built on Ruby, or more recently, GatsbyJS.

🚨 Note: If you're working on a new DataMade application, all of these dependencies should be containerized, i.e., you don't need to install them directly on your computer, and you can skip this section. If you're working on a legacy application that does not include containerization artifacts, read on for our installation tips.

Packages

  • Node.js
    • An environment to run JavaScript outside the browser, bundled with its very own package manager, npm. For Mac users, installation is as simple as brew install node.
  • GatsbyJS
  • Ruby
  • Bundler (gem install bundler)
    • Bundler is Ruby's package manager, and it works a lot like pip.
  • Jekyll
    • A static site generator built on Ruby. Always check the Gemfile of the project you're working on to see which version of Jekyll you need to run. If you have multiple versions of Jekyll installed, you may have to prepend Jekyll commands with bundle exec (e.g. jekyll serve becomes bundle exec jekyll serve).