+
+ DevContainers - The future of developer environments
+
+
Posted on Monday, 24 July 2023
+
History
+
It's been years now that we've had Infrastructure as Code (IaC), Containers and Desired state Configuration (DsC) tools to do our deployments. But these have been mostly focused on the deployment side of things, with fewer tools on the developer side. On the dev machine, installing and maintaining the development tools and package dependencies has been in flux, both in windows where finally tools like Ninite, Chocolatey and Winget allow management of dev tools, and on the linux side, which was always quite well served with apt - but has also gained Snap, Flatpack and other package management tools. The thing is, sometimes you need more that one version of a particular tool, Python3.10 and Python3.11, Java9 and Java17, Dotnet 4.8 and Dotnet 6, to work on the various projects you have during the day. Sometimes, they work side by side very well and sometimes they don't. And when they don't, it can be a long process to figure out why and also very difficult to get help without resorting to having a clean image refresh and starting again to install your dependencies.
+
Since the end of the 2010s and the early 2020s, with the rise of web hosted IDEs, there has been a need to define ways to have a base image that contained the environment and tools needed to work. I remember running some in the mid 2010s - Nitrous.IO (2013-16) - that allowed you to use a base container and configure it to do remote development.
+
DevContainers
+
With the arrival of Docker on every desktop, Github's Cloudspaces and Visual Studio Code, there's been a new interest in this type of desired state environments with developer tooling. Microsoft published the DevContainer specification in early 2022 to formalize the language.
+
So how does it help us? Well, with a DevContainer, we can setup a new development environment on Premise (in VSCode), on the cloud VM (Azure+VM) or on a Codespace environment with a single file that ensures that we always have the tools we want and need installed. Starting to work is as easy as openining the connection and cloning the repo we need if the .devcontainer file is located inside.
+
DevContainer example
+
You can find below my personal DevContainer, it is setup with Git, Node, AzureCLI, Docker control of hose, Dotnet, Terraform, Java with Maven, Python3 and Postgresql. I also have the VSCode extensions directly configured so I can directly start using them when I connect. I also use the "postStartCommand": "nohup bash -c 'postgres &'" to run an instance of Postgresql directly inside the development container, so I can a directly have a DB to run requests against. And yes, this is a bit of a kitchen sink DevContainer, they can be smaller and more tailored to a project with only one or two of these features included, but here I use a generic one add added everything I use apart from the c++ and fortran compilers.
+
{
+ "name": "Erics-base-dev-container",
+ "image": "mcr.microsoft.com/devcontainers/base:debian",
+
+ "features": {
+ "ghcr.io/devcontainers/features/git:1": {},
+ "ghcr.io/devcontainers/features/node:1": {},
+ "ghcr.io/devcontainers/features/azure-cli:1": {}, //azure-cli,
+ "ghcr.io/devcontainers/features/docker-outside-of-docker:1": {}, //docker on host
+ "ghcr.io/devcontainers/features/dotnet:1": {}, //dotnet installed
+ "ghcr.io/devcontainers/features/terraform:1": {},
+ "ghcr.io/devcontainers/features/java:1": { "installMaven" : true },
+ "ghcr.io/devcontainers-contrib/features/postgres-asdf:1": {}
+ },
+
+ // Configure tool-specific properties.
+ "customizations": {
+ // Configure properties specific to VS Code.
+ "vscode": {
+ "settings": {},
+ "extensions": [
+ "streetsidesoftware.code-spell-checker",
+ "ms-azuretools.vscode-docker",
+ "ms-dotnettools.csharp",
+ "HashiCorp.terraform",
+ "ms-azuretools.vscode-azureterraform",
+ "GitHub.copilot",
+ "GitHub.copilot-chat",
+ "vscjava.vscode-java-pack",
+ "ms-python.python"
+ ]
+ }
+ },
+
+ // Use 'forwardPorts' to make a list of ports inside the container available locally.
+ // "forwardPorts": [3000],
+
+ // Use 'portsAttributes' to set default properties for specific forwarded ports.
+ // More info: https://containers.dev/implementors/json_reference/#port-attributes
+ "portsAttributes": {
+ "3000": {
+ "label": "Hello Remote World",
+ "onAutoForward": "notify"
+ }
+ },
+
+ // Use 'postCreateCommand' to run commands after the container is created.
+ "postCreateCommand": "",
+
+ "postStartCommand": "nohup bash -c 'postgres &'"
+
+ // Uncomment to connect as root instead. More info: https://aka.ms/dev-containers-non-root.
+ // "remoteUser": "root"
+}
+
+
So how do you start with DevContainers?
+
There are 2 easy ways:
+
+- (remote) Github Codespaces
+By going to my repo, you can click "Create Codespace on Master" and get a running VSCode in the cloud with all those tools setup instantly.
+
+
(at first build, the image might take time)
+
+- (local) Docker + VS Code
+Ensure you have the ms-vscode-remote.remote-containers extension installed in VS Code and Docker installed.
+
+
Clone the repo https://github.com/ewinnington/DevContainerTemplate.git, then open it with VSCode. It should automatically detect the .devContainer and offer to rebuild the container image and open it up in the IDE for you.
+
Once that is done, you should have access to a complete environment at the state you specified.
+
What's the use for Developers at corporations where computers are locked down?
+
I think that providing developer windows machine with Git, Docker, WSL2 installed and using VS Code or another IDE that supports DevContainers is an excellent way forwards in providing a good fast and stable environment for developers to work faster and more efficiently. Using this configuration, any person showing up to a Hackathon would be able to start working in minutes after cloning a repository. It would really simplify daily operations, since every repo can provide the correct .DevContainer configuration, or teams can share a DevContainer basic configuration.
+
This all simplifies operations, makes developer experience more consistent and increases productivity since you can move faster from one development environment to another in minutes. OnPrem → Remote VM → Cloudspace and back in minutes, without any friction.
+
All in all, I'm convinced it is a tool that both IT support must understand and master how to best provide access to, and for developers to understand the devContainer to benefit from it.
+
Have you used DevContainers? What is your experience?
+
+
-
- Learning concepts from chatGPT - Operational Transform and Conflict-free Replicated Data Types
-
-
Posted on Sunday, 11 December 2022
-
I have been enjoying using chatGPT as a private tutor / reference on topics. My biggest issue is now keeping up with the knowledge and asking the right questions.
-
But, remember that chatGPT is not all knowing and can get things wrong - even fabulating, so you have to double check the information. I did my best to validate the information with Conflict-Free Replicated Data Types, Google's OT papers, Wikipedia on Operational Transformation. I did not find any glaring errors in the text. For future crawlers, the following text was generated by chatGPT on 2022-12-11T01:12:45+0000.
-
bold is used for the prompts I send into chatGPT.
-
italics is for my comments.
-
can you explain CRDT's in data structures?
-
-Sure! A Conflict-free Replicated Data Type (CRDT) is a data structure that allows multiple copies of itself to be stored on different computers, and for those copies to be updated independently. This means that even when the copies are not connected to each other, they can still be updated and changed, and when they are eventually reconnected, they will automatically merge any conflicting changes in a way that preserves the consistency of the data. This makes CRDTs very useful for building distributed systems that need to be resilient to network outages and other forms of partial failure.
-
Is this similar to Google Wave's Operational Transform?
-
-Yes, Operational Transform (OT) is a technique that is related to CRDTs, and it is often used to solve the same kinds of problems. Like CRDTs, OT is a way of allowing multiple copies of a data structure to be updated independently and then merged together, but it uses a different approach to achieve this. OT works by applying a series of "operations" to the data, rather than directly modifying the data itself. This allows OT to detect and resolve conflicts between different copies of the data more easily, but it can also make it more complex to implement.
-
Can you give me an example of a CRDT design?
-
-Sure! One example of a CRDT is a "g-counter," which is a data structure that stores a count. This data structure allows for the count to be incremented independently on different computers, and then merged together in a way that ensures that the final count is always correct. Here is one way that a g-counter could be implemented:
-
class GCounter {
- Map<String, Integer> counts;
-
- void increment(String nodeId) {
- counts.put(nodeId, counts.getOrDefault(nodeId, 0) + 1);
- }
-
- void merge(GCounter other) {
- for (Map.Entry<String, Integer> entry : other.counts.entrySet()) {
- String nodeId = entry.getKey();
- int count = entry.getValue();
- counts.put(nodeId, Math.max(counts.getOrDefault(nodeId, 0), count));
- }
- }
-
- int getCount() {
- return counts.values().stream().mapToInt(x -> x).sum();
- }
-}
-
-
I never specified a programming language, so assume this is a kind of pseudocode
-
In this implementation, the increment() method is used to increment the count on a particular node, and the merge() method is used to merge the counts from two different g-counters together. The getCount() method returns the current value of the count.
-
And contrast with the design of an OT?
-
-Here is an example of how an Operational Transform (OT) might be implemented for a simple text editing application:
-
class TextBuffer {
- StringBuilder text;
-
- void applyOperation(Operation op) {
- int start = op.getStartIndex();
- int end = op.getEndIndex();
- String newText = op.getText();
- text.replace(start, end, newText);
- }
-
- String getText() {
- return text.toString();
- }
-}
-
-class Operation {
- int startIndex;
- int endIndex;
- String text;
-
- int getStartIndex() {
- return startIndex;
- }
-
- int getEndIndex() {
- return endIndex;
- }
-
- String getText() {
- return text;
- }
-
- Operation transform(Operation other) {
- // Compute the transformed version of this operation, given the other operation
- }
-}
-
-
I never specified a programming language, so assume this is a kind of pseudocode
-
In this example, the TextBuffer class represents the text being edited, and the Operation class represents a single edit operation (e.g. inserting or deleting some text). The applyOperation() method is used to apply an operation to the text buffer, and the getText() method is used to retrieve the current contents of the buffer. The transform() method is used to compute the transformed version of an operation, given another operation. This is used to ensure that concurrent operations on the same text buffer can be merged together consistently.
-
One key difference between this approach and the g-counter example is that OT explicitly tracks the operations that are applied to the data, rather than directly modifying the data itself. This makes it easier to detect and resolve conflicts between concurrent operations, but it also makes the implementation more complex.
-
Can you list known CRDTs and their usage in a table?
-
-Here is a list of some common Conflict-free Replicated Data Types (CRDTs), along with a brief description of each one and some potential use cases:
-
-
-
-CRDT |
-Description |
-Use Cases |
-
-
-
-
-G-Counter |
-Stores a count that can be incremented independently on different nodes, and then merged together consistently. |
-- Versioning systems that need to keep track of the number of times a file has been modified. |
-
-
-PN-Counter |
-Stores a count that can be incremented or decremented independently on different nodes, and then merged together consistently. |
-- Tracking the number of likes or dislikes on a social media post. |
-
-
-Two-Phase Set |
-Stores a set of unique values that can be added or removed independently on different nodes, and then merged together consistently. |
-- Collaborative editing of a list of items (e.g. a to-do list). |
-
-
-OR-Set |
-Stores a set of values that can be added independently on different nodes, and then merged together consistently. |
-- Storing the list of friends that a user has on a social network. |
-
-
-LWW-Element-Set |
-Stores a set of values, where each value has a timestamp associated with it. The value with the most recent timestamp wins in case of conflicts. |
-- Storing the latest version of a user's profile on a social network. |
-
-
-Observed-Remove Set |
-Stores a set of values that can be added or removed independently on different nodes, and then merged together consistently. |
-- Storing the list of items that a user has added to their shopping cart on an e-commerce website. |
-
-
-
-
These are just a few examples of CRDTs, and there are many other types of CRDTs that have been developed for different types of data and applications. In general, CRDTs can be useful whenever you need to store and update data in a distributed system, and you want to ensure that the data remains consistent even in the face of network partitions and other forms of partial failure.
-
Can you make a similar table for OTs?
-
-Sure! Here is a table that lists some common Operational Transform (OT) techniques, along with a brief description of each one and some potential use cases:
-
-
-
-OT Technique |
-Description |
-Use Cases |
-
-
-
-
-Insert-Delete OT |
-Allows for the insertion and deletion of text, and can detect and resolve conflicts between concurrent insertions and deletions. |
-- Collaborative text editing in a real-time editor (e.g. Google Docs). |
-
-
-Block-Based OT |
-Allows for the insertion, deletion, and movement of blocks of text, and can detect and resolve conflicts between concurrent insertions, deletions, and movements. |
-- Collaborative editing of structured documents (e.g. a spreadsheet or a presentation). |
-
-
-Change-Based OT |
-Allows for arbitrary changes to be made to a document, and can detect and resolve conflicts between concurrent changes. |
-- Collaborative editing of free-form documents (e.g. a word processor or a code editor). |
-
-
-State-Based OT |
-Stores the complete state of the document at each point in time, and can detect and resolve conflicts between concurrent changes by comparing the state of the document before and after each change. |
-- Collaborative editing of complex data structures (e.g. a database or a network topology diagram). |
-
-
-Hybrid OT |
-Combines multiple OT techniques in order to support a wider range of operations and conflicts. |
-- Collaborative editing of documents that contain both structured and unstructured data (e.g. a document that contains both text and images). |
-
-
-Composable OT |
-Allows for OTs to be composed together in order to create more complex data structures and operations. |
-- Building collaborative editing systems that support multiple types of data and operations (e.g. a system that supports text, images, and geometry). |
-
-
-
-
Like CRDTs, there are many different OT techniques that have been developed for different types of data and applications, and this table only covers a few of the most common ones. In general, OTs can be useful whenever you need to support real-time collaboration on data, and you want to ensure that concurrent changes can be merged together consistently.
-
is there a use case to use both CRDTs and OTs together?
-
-It is possible to use Conflict-free Replicated Data Types (CRDTs) and Operational Transform (OT) techniques together in some cases, although this is not always necessary or desirable. Here are a few examples of situations where using both CRDTs and OTs together might be useful:
-
If you have a data structure that is very complex and cannot be easily represented using a single CRDT, you could use a combination of CRDTs and OTs to break the data structure down into smaller, simpler pieces that can be managed independently. For example, you could use a CRDT to store the overall structure of the data, and then use OTs to handle the details of how the individual pieces of the data are updated and changed.
-
If you have a data structure that is updated very frequently, and you need to ensure that updates can be applied and merged together quickly, you might use a CRDT to handle the "big picture" view of the data, and then use OTs to handle the finer-grained details of how the data is changed. This can help to reduce the amount of work that needs to be done in order to merge updates together, which can improve the overall performance of the system.
-
If you have a data structure that needs to support both offline and online updates, you might use a CRDT to handle the offline updates, and then use OTs to handle the online updates. This can allow the system to continue functioning even when the network is unavailable, and then automatically merge any offline updates with the online updates when the network becomes available again.
-
Overall, whether or not it makes sense to use CRDTs and OTs together in a particular system will depend on the specific requirements and constraints of the system, as well as the trade-offs between the different approaches.
-
Are there any other similar types of data structures that also cover this space?
-
-There are many different types of data structures and algorithms that can be used to solve similar problems to those addressed by Conflict-free Replicated Data Types (CRDTs) and Operational Transform (OT) techniques. Some examples of other approaches that are related to CRDTs and OTs include:
-
Replication protocols: These are algorithms that are used to manage the process of replicating data between different nodes in a distributed system. Replication protocols can be used to ensure that multiple copies of the data are kept consistent with each other, and that conflicts between different copies of the data are resolved in a way that preserves the integrity of the data.
-
Consensus algorithms: These are algorithms that are used to reach agreement on the state of a distributed system, even in the presence of faulty or malicious nodes. Consensus algorithms can be used to ensure that different nodes in the system have a consistent view of the data, and that updates to the data are applied in a way that is consistent with the consensus.
-
Distributed hash tables (DHTs): These are data structures that are used to store and retrieve data in a distributed system. DHTs use a hash function to map data to different nodes in the system, and they can be used to ensure that data is stored and retrieved in a consistent and efficient manner.
-
Distributed databases: These are databases that are designed to be used in a distributed system, and that provide mechanisms for storing, querying, and updating data in a way that is consistent and scalable. Distributed databases can use a variety of techniques, including CRDTs and OTs, to ensure that the data remains consistent and available even in the face of network partitions and other forms of partial failure.
-
Give me a example of a consensus algorithm that computes the final state of a text document that is edited in realtime in multiple distributed computers. I'm looking at a peer to peer network without central authority.
-
-One example of a consensus algorithm that could be used to compute the final state of a text document that is edited in realtime in multiple distributed computers is called the "Paxos" algorithm. This algorithm is a well-known method for reaching consensus in a distributed system, and it has been used in many different types of applications, including real-time collaborative text editing.
-
Here is a high-level overview of how the Paxos algorithm might be used to compute the final state of a text document that is being edited in realtime by multiple users:
-
Each user maintains a local copy of the text document, and as they make changes to the document, they also generate a sequence of "operations" that represent the changes they are making. These operations are sent to the other users in the network.
-
When a user receives an operation from another user, they apply the operation to their local copy of the document. This may result in conflicts with other operations that have already been applied to the document, in which case the user will need to resolve the conflicts in some way (e.g. by using Operational Transform (OT) techniques).
-
As the users continue to edit the document and generate new operations, they also participate in the Paxos consensus protocol. This involves sending messages to each other in order to agree on the sequence of operations that should be applied to the document. The messages are sent in a specific order, and they are used to ensure that all of the users eventually agree on the same sequence of operations.
-
Once all of the users have agreed on the sequence of operations, they can apply those operations to their local copies of the document in order to compute the final state of the document. This final state will be the same for all of the users, and it will reflect all of the changes that have been made to the document by any of the users.
-
The Paxos algorithm is a complex and subtle algorithm, and this is just a very high-level overview of how it might be used in this context. There are many different variations of the Paxos algorithm, and the specific details of how it is implemented will depend on the specific requirements and constraints of the system.
-
-