Skip to content

Latest commit

 

History

History
227 lines (123 loc) · 14.4 KB

100-learning-node-runtime.adoc

File metadata and controls

227 lines (123 loc) · 14.4 KB

Introduction

I believe the majority of developers learn Node the wrong way. Most tutorials, books, and courses about Node focus on the Node ecosystem – not the Node runtime itself. They focus on teaching what can be done with all the packages available for you when you work with Node, like Express and Socket.IO, rather than teaching the capabilities of the Node runtime itself.

There are good reasons for this. Node is raw and flexible. It doesn’t provide complete solutions, but rather provides a rich runtime that enables you to implement solutions of your own. Libraries like Express.js and Socket.IO are more of complete solutions, so it makes more sense to teach those libraries, so you can enable learners to use these complete solutions.

The conventional wisdom seems to be that only those whose job is to write libraries like Express.js and Socket.IO need to understand everything about the Node runtime, but I think this is wrong. A solid understanding of the Node runtime itself is the best thing you can do before using those complete solutions. You should at least have the knowledge and confidence to judge a package by its code so you can make an educated decision about using it.

The Node Knowledge Challenge

Let me give you a taste of the kind of questions you will be able to answer after reading this book. Look at this as your Node knowledge challenge. If you can answer most of these questions, this book is probably not for you.

  • What is the relationship between Node and V8? Can Node work without V8?

  • How come when you declare a global variable in any Node file it’s not really global to all modules?

  • When exporting the API of a Node module, why can we sometimes use exports and other times we have to use module.exports?

  • What is the Call Stack? Is it part of V8?

  • What is the Event Loop? Is it part of V8?

  • What is the difference between setImmediate and process.nextTick?

  • What are the major differences between spawn, exec, and fork?

  • How does the cluster module work? How is it different than using a load balancer?

  • What will Node do when both the call stack and the event loop queue are empty?

  • What are V8 object and function templates?

  • What is libuv and how does Node use it?

  • How can we do one final operation before a Node process exits? Can that operation be done asynchronously?

  • Besides V8 and libuv, what other external dependencies does Node have?

  • What’s the problem with the process uncaughtException event? How is it different than the exit event?

  • What are the 5 major steps that the require function does?

  • How can you check for the existence of a local module?

  • What are circular modular dependencies in Node and how can they be avoided?

  • What are the 3 file extensions that will be automatically tried by the require function?

  • When creating an http server and writing a response for a request, why is the end() function required?

  • When is it ok to use the file system *Sync methods?

  • How can you print only one level of a deeply nested object?

  • How come top-level variables are not global?

  • The objects exports, require, and module are all globally available in every module but they are different in every module. How?

  • If you execute a JavaScript file that has the single line: console.log(arguments); with Node, what exactly will Node print?

  • How can a module be both "requirable" by other modules and executable directly using the node command?

  • What’s an example of a built-in stream in Node that is both readable and writable?

  • What happens when the line cluster.fork() gets executed in a Node script?

  • What’s the difference between using event emitters and using simple callback functions to allow for asynchronous handling of code?

  • What’s the difference between the Paused and the Flowing modes of readable streams?

  • How can you read data from a connected socket?

  • The require function always caches the module it requires. What can you do if you need to execute the code in a required module many times?

  • When working with streams, when do you use the pipe function and when do you use events? Can those two methods be combined?

Fundamentals

Okay, I would categorize some of the questions above as fundamentals. Let me start by answering these:

What is the Call Stack and is it part of V8?

The Call Stack is definitely part of V8. It is the data structure that V8 uses to keep track of function invocations. Every time we invoke a function, V8 places a reference to that function on the call stack and it keeps doing so for each nested invocation of other functions. This also includes functions that call themselves recursively.

When the nested invocations of functions reaches an end, V8 will pop one function at a time and use its returned value in its place.

Why is this important to understand for Node? Because you only get ONE Call Stack per Node process. If you keep that Call Stack busy, your whole Node process is busy. Keep that in mind.

picturenode12

What is the Event Loop? Is it part of V8?

The event loop is provided by the libuv library. It is not part of V8.

The Event Loop is the entity that handles external events and converts them into callback invocations. It is a loop that picks events from the event queues and pushes their callbacks into the Call Stack. It is also a multi-phase loop.

The Event Loop is part of a bigger picture that you need to understand first in order to understand the Event Loop. You need to understand the role of V8, know about the Node APIs, and know how things get queued to be executed by V8.

💡
The Node API hosts functions like setTimeout or fs.readFile. These are not part of JavaScript itself. They are just functions provided by Node.

The Event Loop sits in the middle between V8’s Call Stack and the different phases and callback queues and it acts like an organizer. When the V8 Call Stack is empty, the event loop can decide what to execute next.

What will Node do when the Call Stack and the event loop queues are all empty?

It will simply exit.

When you run a Node program, Node will automatically start the event loop and when that event loop is idle and has nothing else to do, the process will exit.

To keep a Node process running, you need to place something somewhere in event queues. For example, when you start a timer or an HTTP server you are basically telling the event loop to keep running and checking on these events.

Besides V8 and libuv, what other external dependencies does Node have?

The following are all separate libraries that a Node process can use:

  • http-parser

  • c-ares

  • OpenSSL

  • zlib

All of them are external to Node. They have their own source code. They also have their own license. Node just uses them.

You want to remember that because you want to know where your program is running. If you are doing something with data compression, you might encounter trouble deep in the zlib library stack. You might be fighting a zlib bug. Do not blame everything on Node.

How come top-level variables are not global?

If you have module1 that defines a top-level variable g:

module1.js
var g = 42;

And you have module2 that requires module1 and try to access the variable g, you would get g is not defined.

Why?? If you do the same in a browser, you can access top-level defined variables in all scripts included after their definition.

Every Node file gets its own IIFE (Immediately Invoked Function Expression) behind the scenes. All variables declared in a Node file are scoped to that IIFE.

When is it okay to use the file system *Sync methods (like readFileSync)

Every fs method in Node has a synchronous version. Why would you use a sync method instead of the async one?

Sometimes using the sync method is fine. For example, it can be used in any initializing step while the server is still loading. It is often the case that everything you do after the initializing step depends on the data you obtain there. Instead of introducing a callback-level, using synchronous methods is acceptable as long as what you use the synchronous methods for is a one-time thing.

However, if you are using synchronous methods inside a handler like an HTTP server on-request callback, that would simply be 100% wrong. Do not do that.

How to Learn the Node Runtime

Learning Node can be challenging. Here are some of the guidelines that I hope will help along that journey:

Learn the good parts of JavaScript and learn its modern syntax (ES2015 and beyond)

Node is a set of libraries on top of a VM engine that can compile JavaScript, so it goes without saying that the important skills for JavaScript itself are a subset of the important skills for Node. You should start with JavaScript itself.

Do you understand functions, scopes, binding, the this keyword, the new keyword, closures, classes, module patterns, prototypes, callbacks, and promises? Are you aware of the various methods that can be used on Numbers, Strings, Arrays, Sets, Objects, and Maps? Getting yourself comfortable with the items on this list will make learning the Node API much easier. For example, trying to learn the fs module methods before you have a good understanding of callbacks may lead to unnecessary confusion.

Understand the non-blocking nature of Node

Callbacks and promises (and generators/async patterns) are especially important for Node. You need to understand how asynchronous operations are first class in Node.

You can compare the non-blocking nature of lines of code in a Node program to the way you order a Starbucks coffee (in the store, not the drive-thru):

  • Place your order | Give Node some instructions to execute (a function)

  • Customize your order, no whipped cream for example | Give the function some arguments: ({whippedCream: false})

  • Give the Starbucks worker your name with the order | Give Node a callback with your function: ({whippedCream: false}, callback)

  • Step aside and the Starbucks worker will take orders from people who were after you in line | Node will take instructions from lines after yours.

  • When your order is ready, the Starbucks worker will call your name and give you your order | When your function is computed and Node has a ready result for you, it’ll call your callback with that result: callback(result)

Learn the JavaScript concurrency model and how it is based on an event loop

The simplified picture has a Call Stack and some Event Queues and the Event Loop sits in the middle organizing the communication between them. The Node asynchronous APIs place callbacks in Event Queues and the Event Loop dequeues them to the Call Stack.

Understand how a Node process never sleeps and will exit when there is nothing left to do

A Node process can be idle but it never sleeps. It keeps track of all the callbacks that are pending and if there is nothing left to execute it will simply exit. To keep a Node process running you can, for example, use a setInterval function because that would create a permanent pending callback in the Event Loop.

Learn the global variables that you can use like process, module, and Buffer

They’re all defined on a global variable (which is usually compared to the window variable in browsers). In a Node’s REPL, type global. (with a dot) and hit tab twice to see all the items available (or simply hit the tab key twice on an empty line). Some of these items are JavaScript structures (like Array and Object). Some of them are Node library functions (like setTimeout, or console to print to stdout/stderr), and some of them are Node global objects that you can use for certain tasks (for example, process.env can be used to read the host environment variables).

picturenode1

You need to understand most of what you see in that list.

Learn what you can do with the built-in libraries that ship with Node and how they have a focus on “networking”

Some of those will feel familiar, like Timers for example, because they also exist in the browser and Node is simulating that environment. However, there is much more to learn, like fs, path, readline, http, net, stream, cluster, etc. The auto-complete list above has them all.

For example, you can read/write files with fs, you can run a streaming-ready web server using “http”, and you can run a TCP server and program sockets with “net”. Node today is so much more powerful than it was just a year ago, and it’s getting better by the commit. Before you look for a package to do some task, make sure that you can’t do that task with the built-in Node packages first.

The events module is especially important because most of Node architecture is event-driven.

Understand why Node is named Node

You build simple single-process building blocks (nodes) that can be organized with good networking protocols to have them communicate with each other and scale up to build large distributed programs. Scaling a Node application is not an afterthought – it’s built right into the name.

Read and try to understand some code written for Node

Pick a framework, like Express, and try to understand some of its code. Ask specific questions about the things you don’t understand.

Finally, write a web application in Node without using any frameworks. Try to handle as many cases as you can, respond with an HTML file, parse query strings, accept form input, and create an endpoint that responds with JSON.

Also try writing a chat server, publishing an npm package, and contributing to an open-source Node-based project.

This Book

This book is not for the beginner. I assume that the reader is comfortable with JavaScript and has a basic knowledge of Node. To be specific about this, if you don’t know how to execute a script with Node, require an NPM package, or use Node as a simple web server, you are probably not ready for this book. I create courses for Pluralsight and they have some good starting resources for Node. Check their Node.js path at pluralsight.com/paths/node-js.

All the examples I will be using in this book are Linux-based. On Windows, you need to switch the commands I use with their Windows alternatives.

Throughout the book, I use the term Node instead of Node.js for brevity. The official name of the runtime is Node.js but referring to it as just Node is a very common thing.