There are many guides to setting up your computer for development. Each has its own merits. At DataMade, we perform a mix of maintaining legacy projects and developing new projects, often in parallel. Our toolkit is therefore optimized for managing many, isolated versions of packages and getting up and running on new projects quickly.
- Transferring data and settings to a new computer
- Version control
- Text editor
- Python
- Docker
- Security
- Data
- Geospatial data
- Static sites
When possible, we prefer installing binaries from a package manager to building packages from source. For Mac users, most of these tools are available as Homebrew packages. Homebrew is by far the easiest way to manage packages on MacOS.
If you are replacing your computer, you should install a fresh copy of macOS, and then use the Migration Assistant to transfer all your data and settings to your new computer in one go. Make sure you take note of your DataMade Gmail account password to sign into Chrome if that's your browser of choice, and your LastPass master password to access your accounts.
Once you do that and you have checked that your new computer is good to go, you can safely wipe your old computer clean.
We use GitHub and Git to keep our work under version control. Note that we prefer the git
CLI to the GitHub desktop GUI. The Git CLI is built-in on MacOS and most Linux distributions, usually with an acceptably recent version; to check whether you already have Git installed, run which git
.
DataMade developers use a variety of different text editors, and we leave it up to you to decide which one you prefer to use. However, there are two configurations that we encourage you to make no matter which editor you use:
- Automatically trim trailing whitespace
- Set default tab size for Python files to 4, and set default HTML/JavaScript tab size to 2
Historically, Sublime Text has been the most widely-used editor at DataMade, so the following instructions will demonstrate how to set these settings for Sublime Text.
In the nav, under "Sublime Text," you will find a "Preferences" fly open, where you will see "Settings - User." Here, you can override the default settings (i.e., do not make changes to the "Settings - Default" file). In the User file, add two things:
"trim_trailing_white_space_on_save": false,
"tab_size": 4,
‼ Note: You are welcome to explore other text editor options, e.g., Atom, Vim, etc.
At DataMade, you'll run most Python processes in containers. However, it's still useful to have a fresh install of Python on your machine to keep your system Python isolated (it's important!) and to use a later version of Python (if you're on a Mac, your system Python is probably version 2.7).
When you aren't using containers, DataMade recommends you conduct Python work in virtual environments (virtualenvs). Virtualenvs help enforce dependency separation between your projects and make it a lot easier for other users to replicate your work on their computers. The Python ecosystem contains a lot of options for managing your environments, from the built-in virtualenv
package to bundled package and environment management with conda
. We like virtualenvwrapper
, which provides a few convenience functions you can use from your terminal to create, activate, deactivate, and remove virtual environments.
Finally, to install packages in your environment, you'll need pip
, the Python package installer.
The simplest way to manage Python with minimal headaches is to do a clean install of Python via homebrew or apt, then get pip
and virtualenvwrapper
running on your fresh version of Python.
Optionally, create a global virtual environment for general utility packages:
mkvirtualenv gus # that is, generally useful stuff
workon gus
whenever you want to use these "global" packages. More often, you'll want to create project-specific virtual environments, and workon
those environments during development.
- Python 3
- pip
- Make sure you're using the latest version of pip. (If you used homebrew to install Python, you got pip for free!)
- virtualenv
- virtualenvwrapper
- virtualenvwrapper contains useful shortcut commands for all stages of virtualenv use. Install it, making sure to follow the instructions for editing your shell startup file.
- If you run into ownership errors and you installed Python via Homebrew, add
VIRTUALENVWRAPPER_PYTHON=/usr/local/bin/python3
to your shell startup file prior to the lines you added for virtualenvwrapper, and try again.
Containers are a popular, modern approach to packaging and running software. We use the Docker engine to create, run, and destroy containers for our applications during local development. This makes it infinitely easier to manage dependencies across 5+ years of web applications.
- Docker Community Edition
- A popular and feature-rich container engine. Mac users, look no further for installation instructions. Linux users, open the Mac instructions and find your distribution in the sidebar.
Cryptographic security is essential for developers. You don't have to know the guts of how these tools work, but you should have them installed and get comfortable using them.
To enable hard drive encryption on a Macbook, go to System Preferences > Security & Privacy > FileVault. Turn on FileVault and save a recovery key in a safe place, such as LastPass. The process will take some time, probably about a day depending on what you're doing and what model computer you have. It can run in the background as long as your computer is awake and connected to power.
- SSH
- Secure Shell, a protocol for communicating securely over unsecured networks. We use it to push and pull from Git remotes and to access our servers. It comes pre-installed as a command line tool on all Mac and Linux distributions; generate an SSH key if you haven't already and follow the instructions for adding it to your GitHub account.
- GPG
- GNU Privacy Guard, a command line tool for encrypting and decrypting files. Mac users can
brew install gnupg2
. Then, configure your key by hand.
- GNU Privacy Guard, a command line tool for encrypting and decrypting files. Mac users can
- Blackbox
- StackExchange's open-source CLI for keeping secrets secure under public version control. Follow our excellent guide (internal link) to use it.
We try to maintain a consistent and standardized toolkit for all of our data work. We know that there are many good options for working with data, and we are always open to hearing arguments for new additions to this toolkit. But these tools have stood the test of time, and you'll see them crop up over and over in DataMade's work.
🚨 Note: If you're working on a new DataMade application, most of these dependencies (e.g., Postgres) should be containerized, i.e., you don't need to install them directly on your computer, and you can skip this section. If you're working on a legacy application that does not include containerization artifacts, read on for our installation tips.
- Bash and basic Unix tools
- Comes installed with OS X, macOS, Windows 10, and all Linux distributions.
- PostgreSQL
- A powerful open-source database engine (also known as Postgres). There are a million ways to download and manage Postgres. If you're writing a new application, this dependency should be containerized, i.e., you don't need to install it directly on your machine. but you'll be best off installing it with your package manager and following their Getting Started guide to configure it.
-
Many of our database configurations assume your installation of Postgres has a
postgres
database owned by apostgres
user. After you've installed Postgres via your favorite package manager (probablybrew install postgresql
), run the following commands from your terminal:createuser -s postgres # create postgres superuser createdb -O postgres postgres # create postgres database owned by postgres user
If you get a "database already exists" error:
psql # log in to postgres alter database postgres owner to postgres; # make postgres the owner of the postgres database
-
- A powerful open-source database engine (also known as Postgres). There are a million ways to download and manage Postgres. If you're writing a new application, this dependency should be containerized, i.e., you don't need to install it directly on your machine. but you'll be best off installing it with your package manager and following their Getting Started guide to configure it.
- csvkit
- Command line tools for working with CSVs, the most common (and arguably the best) file format for spreadsheets. It's built on Python, so you can install it by running
pip install csvkit
in yourgus
virutalenv.
- Command line tools for working with CSVs, the most common (and arguably the best) file format for spreadsheets. It's built on Python, so you can install it by running
🚨 Note: If you're working on a new DataMade application, most of these dependencies (e.g., PostGIS) should be containerized, i.e., you don't need to install them directly on your computer, and you can skip this section. If you're working on a legacy application that does not include containerization artifacts, read on for our installation tips.
- PostGIS
- A geospatial plugin for Postgres. We do lots of geographic work, so it's worth installing this as soon as you have Postgres up and running. Ignore the installers and install with your favorite package manager; make sure to install the version that corresponds to your version of Postgres, and remember that PostGIS must be activated in any database that needs to use it by running the SQL command
CREATE EXTENSION postgis
.
- A geospatial plugin for Postgres. We do lots of geographic work, so it's worth installing this as soon as you have Postgres up and running. Ignore the installers and install with your favorite package manager; make sure to install the version that corresponds to your version of Postgres, and remember that PostGIS must be activated in any database that needs to use it by running the SQL command
- GDAL
- A set of command line tools for modifying and converting geospatial data. If you're on a Mac, make your life easier and install it with homebrew. (If you installed PostGIS via homebrew, you got GDAL, too.)
- Optional: QGIS
- The best open-source GUI app for playing with geospatial data. Currently there are no perfect ways of installing QGIS. We've had some success with the homebrew package on macOS and William Kyngsburye's installers.
Most of our sites are dynamic and built on Django, but sometimes we deploy small static sites (like datamade.us) using Jekyll, a site generator built on Ruby, or more recently, GatsbyJS.
🚨 Note: If you're working on a new DataMade application, all of these dependencies should be containerized, i.e., you don't need to install them directly on your computer, and you can skip this section. If you're working on a legacy application that does not include containerization artifacts, read on for our installation tips.
- Node.js
- An environment to run JavaScript outside the browser, bundled with its very own package manager,
npm
. For Mac users, installation is as simple asbrew install node
.
- An environment to run JavaScript outside the browser, bundled with its very own package manager,
- GatsbyJS
- A React-powered static site generator that works with a variety of data sources. The GatsbyJS team provides excellent documentation for installation and getting your feet wet.
- Ruby
- We recommend using a third-party Ruby package manager like RVM (follow the unofficial cheat sheet linked from their website, being sure to run
rvm install x.x.x
when you come to it with the latest version of Ruby) or rbenv to manage different versions of Ruby.(http://octopress.org/docs/setup/rbenv/) to manage different versions of Ruby.
- We recommend using a third-party Ruby package manager like RVM (follow the unofficial cheat sheet linked from their website, being sure to run
- Bundler (
gem install bundler
)- Bundler is Ruby's package manager, and it works a lot like pip.
- Jekyll
- A static site generator built on Ruby. Always check the Gemfile of the project you're working on to see which version of Jekyll you need to run. If you have multiple versions of Jekyll installed, you may have to prepend Jekyll commands with
bundle exec
(e.g.jekyll serve
becomesbundle exec jekyll serve
).
- A static site generator built on Ruby. Always check the Gemfile of the project you're working on to see which version of Jekyll you need to run. If you have multiple versions of Jekyll installed, you may have to prepend Jekyll commands with