Mary Branscombe, Author at The New Stack

As a small, fast, secure, cross-platform bytecode, WebAssembly should be ideal for cloud native patterns. Docker founder Solomon Hykes famously tweeted that WASM and WASI (the WebAssembly System Interface for safe access to system APIs), would have removed the need to create the container technology.

But since WASM gives you a binary rather than entire environment, unless you move the entire environment into the WebAssembly module, it’s not a replacement for containers. It’s a way of running arbitrary code where you want it, with the least possible overhead and the fewest (known) security issues.

That’s ideal for embedding third-party code where Kubernetes workloads need it. For example, the Envoy proxy (often used for network routing and load balancing in Kubernetes) already uses WebAssembly as a way to write filters, Open Policy Agent policy rules can be written in WASM and Kubewarden uses WebAssembly modules for admissions policies.

In other words, stop saying “WASM is the end of containers;” start saying “WASM plus containers.”

Kubernetes Helps WebAssembly

If you think of cloud native processes as something you create from an API when you need them at scale, scale up as you need more capacity and remove them when you don’t, WASM is an excellent way to run those processes. But you still need an environment to run them in. Cosmonic CEO Liam Randall compares WebAssembly to a virtual, and potentially universal, CPU because it acts as a universal compilation target but notes “there’s no virtual operating system that sits in there.”

And unless they’re embedded in another workload like Envoy filters or used as serverless functions in an environment like Cloudflare Workers, you need an orchestrator to coordinate them. “Scheduling and orchestration is still an open area for sever-side WASM,” noted Renee Shah, a principal at the venture capital firm Amplify Partners.

There aren’t yet WASM-specific scheduling and orchestration systems, and it may not make sense to develop those from scratch rather than integrating with existing options. Kubernetes (K8s) certainly isn’t the only choice for orchestrating cloud processes, but it’s so well known and widely adopted that it’s logical to think about using it with WebAssembly.

With Krustlet, which is now a Sandbox project of the Cloud Native Computing Foundation, Kubernetes can natively orchestrate WebAssembly runtimes. Krustlets are Kubernetes Kubelets written in Rust. When the Kubernetes API schedules pods onto a Krustlet, Krustlet runs them in a WASM runtime; WebAssembly and container workloads running in the same cluster can communicate with each other and interact with the rest of the cluster using features like secrets, volume mounts and init containers.

If WASM+WASI existed in 2008, we wouldn’t have needed to create Docker. That’s how important it is. Webassembly on the server is the future of computing. A standardized system interface was the missing link. Let’s hope WASI is up to the task! https://t.co/wnXQg4kwa4

— Solomon Hykes (@solomonstre) March 27, 2019

Matt Butcher, head of the Microsoft team building container and WebAssembly technologies like Helm, Draft and Krustlet, told The New Stack that Kubernetes is a “hidden gem” for WebAssembly because it brings not just scheduling but other key services.

The startup time for a WebAssembly runtime is in the 10 milliseconds range, compared to several seconds for a container (or several minutes for a virtual machine), which makes it a good match for serverless — but also for scaling.

Kubernetes “also brings with it the storage layer, which turns out to be a very big deal; you want to be able to move your workload around, you want to make sure that it can attach to the right storage wherever it lands, and that would have been a monster project to write on our own,” Butcher said. “Kubernetes is mature in that area. Secrets management, the networking layer; it’s like getting a bunch of freebies. Here’s a problem I don’t have to worry about, because Kubernetes will do it. That was one of the things that make Krustlet such a great experience starting up, because we could go from very basic to pretty interesting configurations because we only had to hook into Kubernetes instead of writing our own implementations.”

WebAssembly can take advantage of other container constructs too; with WAGI, the WebAssembly Gateway Interface as a workaround for building HTTP handlers that compile to WASM, you can reference and pull modules from OCI registries. (Kubewarden and Krustlet also take advantage of OCI artifacts to distribute WASM modules through OCI registries).

“The nice part about WebAssembly and containers is that they both play on the same trend of breaking up monoliths,” Shah pointed out. “If a company already uses microservices, it’s relatively straightforward to add new ones powered by WebAssembly.” And with Krustlet, they can run side by side in Kubernetes.

Better, Faster, Smaller: WASM Helps Kubernetes

There are plenty of areas in which WASM will be useful that Kubernetes won’t be relevant, although another orchestrator may be needed for some of them. But there are also Kubernetes scenarios that WASM will improve significantly, so WebAssembly may become a common Kubernetes workload — again, without replacing containers.

WebAssembly isn’t about compiling an entire app into a module instead of a container; you’re creating a binary, and that fits in very well with the cloud native principles of having code that does one thing and does it well. WASM modules running in Kubernetes can take on some tasks where containers have been the only option but aren’t well suited for, but in most cases these WASM “pods” are going to be doing new work that wasn’t possible before, and maybe in new places.

What WASM in Kubernetes will be particularly good for is scenarios that need high density, fast availability and scaling, or to run with limited resources, Butcher told The New Stack. It’s also a lot of work to correctly secure Kubernetes and the default runtime configuration of containers. Containers aren’t a security boundary; WebAssembly does a lot more to prevent code escapes. The security of the WASM sandbox and the “denied by default” way WASM modules require explicit permissions for any capabilities have obvious advantages here.

The startup time for a WebAssembly runtime is in the 10 milliseconds range, compared to several seconds for a container (or several minutes for a virtual machine), which makes it a good match for serverless — but also for scaling.

“When you’re doing the kind of thing where you want to take a workload and say ‘OK, bring me up to five instances, well that’s not enough, bring me up 15 instances,’ you get instant scale and very high throughput,” Butcher said. That’s also useful for scale to zero (where, despite the name, you typically never scale below one instance because of the latency for a cold start), he notes. “It costs money to have a server that you have sitting there waiting for somebody to use it. Can we make it appear to be always available, but only because it’s so fast to start up that the end user is blissfully unaware that the process wasn’t running until they requested it?”

There’s a lot of emphasis on making container sizes smaller, but they’re often too large for edge environments with constrained memory and storage or minimal processing power. Running a 2MB WebAssembly module instead of a 25MB Docker container means workloads can be managed by Kubernetes in more environments (perhaps using minimized versions of K8s, like K3s or Kind).

That can significantly improve workload density, which might reduce cloud bills, Butcher noted: “There’s an old joke that cloud services are really just paying somebody else’s electrical bill, and density improvements mean lower cost.”

But the far more heterogeneous compute environments of edge are perhaps the biggest opportunity for the combination of WASM and Kubernetes, which could mean that “the cloud” moves beyond hyperscale data centers.

“We’ve been thinking about the cluster as a thing that lives in the data center, and we’ve deployed them as things that live in a data center,” Butcher said. That might be AKS on Azure or Kubernetes on bare metal servers, but it’s still a data center. “What Krustlet solves for us that we weren’t able to solve with Kubernetes and containers was that we needed to be able to extend Kubernetes clusters outside of just the data center and to devices that were getting a little more exotic,” he said.

Supporting Arm silicon was a big part of that, but so are the microcontrollers used in IoT devices, the many hardware AI acceleration options and the heterogeneous CPUs that even Intel is now producing. “We’ve got exotic architectures, we’ve got new emerging architectures, and we want a cross-platform, cross-architecture story and we want a cross-operating system story,” Butcher said.

Randall, of Cosmonic, echoed this idea. “The diversity of CPUs has made more obvious what limitations that containers bring to the ecosystem,” he said. “In a CI/CD world. if you have to target dozens or hundreds of different CPUs, you need dozens or hundreds of specific CPU images.” Plus, he noted, “a lot of small devices will never run Linux.”

“What WebAssembly brings is we let go of our assumption that we’re building for a single CPU, because WebAssembly is simply a compilation target that we can bring our existing code and applications to.”

Edge — but Experimental

Because modules can run on many different devices without needing to be recompiled, Butcher suggested WebAssembly offers the “write once, run anywhere” promise familiar from Java but “in a form that’s really amenable to cloud native development.”

That could be part of a very diverse Kubernetes environment thanks to projects like Akri, which attaches IoT and other devices (from an IP camera to a field-programmable gate array) to a Kubernetes cluster as resources. “Wouldn’t it be awesome if we could make the cluster expand outward from just the data center into different devices elsewhere?” Butcher asked. “Wouldn’t it be cool if I had a cluster that lived on my phone and my laptop and whatever devices were closest to me, and as I got into proximity, they’d join and leave the cluster?”

Recently, the Akri and Krustlet teams have been experimenting with compiling Akri to WASM so it can be deployed into a Kubernetes cluster using Krustlet, which makes it even smaller and faster for the edge.

However, that depends on WASI to deliver the virtual operating system primitives that WASM lacks, and WASI is still emerging and can take a lot of work to make useful in key areas like networking. “The WASI specification still doesn’t have guest side network access figured out; it’s always a matter of piping everything through the host runtime. It doesn’t have some of basic library stuff figured out,” Butcher noted.

Fintan Ryan, a senior analyst at the analyst firm Gartner, told The New Stack, “WASI needs to move beyond an experimental stage for any significant adoption.” And while they are being developed, those robust APIs are still six or 12 months away.

When they do arrive, more tools will need to be built and more integration work done between the WebAssembly and Kubernetes ecosystem, Butcher said. “What will it look like when our WebAssembly modules can open up a thousand server sockets and listen on those? How will Kubernetes wire it up so it’s got a thousand routes going into the WebAssembly module?”

In the meantime, he suggested, “WAGI and Krustlet and things like that give us a good way to get people in the door and looking at the technology today, hopefully getting excited about it and having their own ideas, and starting up their own GitHub repositories and startups and working groups.”

One of the ways he hopes to get people excited may provoke some strong reactions: a WAGI provider for Krustlet that, he says, “will give you a quick, easy developer experience to build applications in — I kid you not, 1996 style CGI — because the technology is rock solid and everybody in the universe has support for this!

“It’s a step we can take today to get this technology in front of people so they can go build small applications of microservices on it.”

The Possibilities of WASM plus Kubernetes

WASM could make the developer experience with Kubernetes as easy as Docker and containers, Butcher hopes: “I want a nice quick workflow where I can write an app and, somewhere in the background, it gets deployed out to somewhere like a Kubernetes cluster running Krustlet.”

Building containers may be easy, but deploying, testing and sharing the test environment with colleagues is more fraught, he said. “That is still a little bit slow and it has some rough edges, and we’re hoping that WebAssembly might be able to smooth over some of those as well.”

Cosmonic’s wasmCloud, now a CNCF Sandbox project, also aims to improve developer and experience, and also relies on Krustlet for some scenarios, although with its own runtime rather than the Bytecode Alliance’s Wasmtime runtime.

“The combination of Kubernetes and WebAssembly enables if you want to take wasmCloud and put it into a set of Kubernetes servers on multiple clouds and have them talk to each other,” Randall said. One customer, a large European bank, is experimenting with using it to fail over from one cloud Kubernetes service to another, for availability when one of its cloud providers has an outage.

There are also some fundamental patterns in microservices that future features in WebAssembly might be able to improve on.

“Your typical microservice involves every developer standing up their own HTTP server and writing their JSON to object serialization,” Butcher pointed out. That adds up to around a thousand lines of boilerplate code in every microservice that has to be written, tested, secured and updated. The WebAssembly nanoprocess model puts the HTTP layer and serialization/deserialization in the WASM runtime.

“When two modules happen to be sitting next to each other on the same runtime, they just communicate directly,” he said. “But when they get spread across the network, then the host runtime takes care of the traffic between the two and the developer doesn’t have to maintain it. And the security mode is great because you don’t have to update every piece of code every time OpenSSL has a bug.”

That’s still some time off. “We’re probably about a year out from being able to really even start solving that problem,” Butcher noted. “If we solve it, we’ll be able to build an entirely new class of applications.”

That’s still an “if” with a lot of moving pieces. “Nanoprocess is seen as even more experimental than WASI,” Ryan cautioned.

“The promise is huge, the amount of work to get there is also huge.”

Not all of the promise of WebAssembly and Kubernetes is that distant though. Another heterogeneous, resource-constrained area where Kubernetes is increasingly important, especially with tools like Kubeflow, and where WebAssembly could deliver significant gains: machine learning.

WASI doesn’t just deliver low-level OS interfaces like storage and networking; it can also offer high-level interfaces for something like a neural network. Trained machine learning models have to run on a wide range of devices with platform-specific runtimes and use different specialist hardware accelerators. Because it’s platform-independent, WASM would simplify deployment and the WASI-nn interface will let WebAssembly talk to multiple machine learning runtimes like ONNX, OpenVINO, PyTorch and TensorFlow.

“Machine learning is another area where WebAssembly might be safe enough, but low enough on the stack that it’ll be a little bit easier to integrate with the hardware than containers have been,” Butcher suggested.

“We’re seeing WebAssembly as a glue layer for some of the niggly parts that were hard to do with containers, that we might be able to do more easily with WebAssembly, and then join up the container ecosystem with some of the harder bits of, say, hardware management or deploying machine learning algorithms.

“I think that’s where you start to really see the story of how containers and WebAssembly are complementary to each other.”

Open WebAssembly comes to Cloud Native Computing on YouTube.

The post How WebAssembly Could Streamline Cloud Native Computing appeared first on The New Stack.

Authentication and authorization sound similar. They’re often mentioned together, they’re often both implemented with tokens, and the terms are sometimes used almost interchangeably — or as a portmanteau that suggests they’re the same thing.

Frequently, you’ll just hear the term “auth” and have to work out which of the two is meant: “modern auth” refers to authentication, but OAuth 2 is an authorization standard. And OpenID Connect, which extends Oauth, adds authentication as a layer on top of authorization.

They’re always related, but authentication and authorization are two different concepts that need to be separate steps in an access policy and may well be managed by different teams using different tools.

Authentication is verifying that a user is who they say they are: authorization is giving them permission to access a resource or perform a specific function.

The principles will seem familiar, suggested Alex Weinert, director of identity security at Microsoft. “If you’re walking into my store as a customer, you show your ID to say you’re 21 and you can buy alcohol. If you’re the manager, you can get into the back stockroom. If you come to the Azure portal and your claim says you’re an average user, you’re not going to be able to do very much, but if you come to the Azure portal and your claim says you’re a global administrator, you can do quite a lot.”

Typically authorization happens right after authentication, said Mike Hanley, GitHub’s chief security officer: “You are authorizing a user who you presumably figured out that they are who they say they are, via an authentication process.”

What you are allowed to do is based on who you are, but different systems and (usually) different teams handle the two steps of authentication and authorization. Authentication is the provenance of the identity provider, verifying those credentials using passwords, biometrics, one-time PINs, hardware keys or authentication applications, so the system knows it’s talking to the same person the identity was issued to — and then associating data, like their name and role, to them.

Authorization is something the security or application administrator handles and it’s based on what permissions are available for the system being accessed (either directly or mapped to the roles and attributes in the identity system used for authentication).

The user getting authenticated might be a machine rather than a human, using secrets and other credentials (usually X509 certificates that need to be provisioned securely) rather than a username, password or token, with the authenticated machine or workload being authorized to talk to another machine, or read and write data in the test but not the production environment.

Just as you don’t want users writing passwords down on sticky notes they keep on their keyboard, policies, secret scanning and credential rotation are important to avoid mistakes like checking credentials into repos, where they can be found and abused.

Stepping Up Authentication

As a user, you can usually see authentication happening (although it might be persistent, like staying logged into a website even if you close the browser tab) and you can often do things like changing your password or choosing which second factor you want to use. Users can’t change their authorization options and won’t see authorization happening.

But you might see another authentication request if you try to do something that’s considered important enough that your identity has to be verified again before you are authorized to do it. Some banks will let you log in to your account and make payments you’ve done previously with your username and password, but ask you to use 2FA to set up a new payee.

Conversely, authentication systems that use conditional access policies can recognize that you’re using the same device, IP address, location and network connection to access the same file share you access from that device, location and network every day to improve your productivity and not make you go through an authentication challenge.

“We have enough data about users who we’ve authenticated, the devices that they’re coming from, the workloads or applications that they’re trying to access, the network that they’re joining from that we can have systems reason about, ‘We’re pretty sure this is still you, so we’re going to allow this to happen without an additional step-up,’” Hanley said.

“You may not care if somebody accesses the payroll system from home to get their pay stub on their Windows 95 PC, but you’re definitely going to care if somebody is accessing the corporate finance system from anything other than likely a corporate-managed, fully patched and up-to-date machine that you own and operate.”

— Mike Hanley, chief security officer, GitHub

These are decisions about risk tolerance, he said: “We might say this application is so high risk or so high value that this workflow merits the friction, and I’ll dial up the friction only on this workflow to make sure that nothing bad ever happens.”

Authentication and authorization remain separate steps but with the increasing use of conditional access and Zero Trust approaches to identity, they are often entwined.

“One of the things that can play into an authorization decision is how strong your confidence is in the authentication,” Hanley said. “If you’re only username and password-authenticated, I might allow you to get to the lowest risk tier of applications, but I might step you up with a multifactor challenge or require you to [use] multifactor for any sort of medium-tier sensitivity. And then if you’re doing anything in the highest tier of sensitivity, I’m going to be constantly introducing friction to be sure that it is still you.”

That might include refusing authorization even when the authentication succeeds but reveals things about the connection that you’re uncomfortable with, Hanley noted. “I’ve authenticated Mike and Mike used his two-factor authentication, so I’m really sure that it’s Mike, but it’s coming from a network that I don’t recognize and a PC that I don’t trust because it’s not running the latest security updates, so the authorization decision changes.

“You may not care if somebody accesses the payroll system from home to get their pay stub on their Windows 95 PC, but you’re definitely going to care if somebody is accessing the corporate finance system from anything other than likely a corporate-managed, fully patched and up-to-date machine that you own and operate.”

Setting Higher Guardrails

Admins have the option of setting much richer policies to access high-value systems, whether that’s company finance or an important code repo. GitHub now insists you use tokens rather than passwords when authenticating Git operations, and those tokens can be set for very specific access. That’s similar to the Just Enough Admin authorization option for systems managed by PowerShell, where you can restrict what commands are allowed and even set a specific time period.

As Hanley explained, “Rather than sending somebody off with a bazooka when they don’t need it, you can give them a butter knife, and say come back to us when you actually need the bazooka and we’ll give it to you for a specific reason and a specific amount of time.”

Think of these approaches as guardrails, he suggested. “This instills a sense of operational safety and security because you’re less likely to make a human error that results in a bad situation from exercise of that excessive privilege. But it also protects you from the case that, even if you’ve strongly authenticated some user, it is still possible that their account could have been hijacked or somebody stole their session or somebody’s wrenched the laptop out of their hand.”

Continuous access evaluation catches cases like a user who is authenticated and authorized to use a particular application because they work in the finance division but then moves to another group or even leaves the company and still has access as long as their session token is valid. You want to revoke access immediately and again, Weinert pointed out, that needs coordination between authentication and authorization.

You may also need to consider (and monitor) application behavior, although app governance is also separate from authentication and authorization. Does an application have too many permissions that allow access to too much information that it doesn’t need and can be abused by attackers?

A ‘Blunt Force Tool’

Application roles and permissions are why authorization is a hard problem that sometimes gets punted over to the authentication system. Security teams don’t know the details of every application in every department or have the nuances of what different roles need to be entitled to do.

As Weinert asked, “Can you as an admin really manage the intersection of all possible roles of all possible apps in your environment?”

Particularly if you’re using Single Sign-On to federate access to older systems that don’t have modern authentication built-in, those systems you’re being authorized to use may have user and admin roles, or even very specific roles like “wire transfer administrator” or “shelf stocker” versus “checkout specialist.”

But they often don’t have a granular approach to permissions either, Weinert noted. In many systems, he said, authentication is essentially the authorization for certain roles. The rule is that if you’re allowed to be introduced to something, you will be allowed to use it: “If you even got here, I’m going to assume that you have some business being here.”

Weinert suggests thinking of the different things you’re authorized to do — anything from accessing a database to sending an email or configuring a virtual machine — as verbs. “Everybody has different verbs and different roles and different ways of exposing those, so authentication becomes the blunt force tool that admins use to regulate over the top,” he said.

“In the world of authentication, if I don’t generate an access token for you, I’m performing a form of authorization.”

—Alex Weinert, director of identity security, Microsoft

The developer may not have planned it that way, but if an application deals with financial data, a security admin might decide to only allow authorization if authentication shows the user’s device meets the most stringent access policy. “In the world of authentication, if I don’t generate an access token for you, I’m performing a form of authorization,” Weinert said.

Reducing the complexity of this will require coming up with some common tasks that apps authorize users for, like reading and writing sensitive data, which Weinart suggests could work alongside a provisioning standard like SCIM, the System for Cross-Domain Identity Management that handles common attributes like names.

“You’ll see more of a shift from this coarse-grained access control that happens in the authentication step to a finer-grained authorization that can happen as we normalize some of that experience,” Weinert said.

OAuth (and OpenID Connect) and complicated and better authorization tools are only slowly emerging.

MSAL, Microsoft’s multiplatform authentication library, avoids the need to work directly with OAuth for authorization, as well as extending extends OpenID Connect with options like conditional access.

Google’s Zanzibar API (open sourced as Keto and the open source Oso library (which handles rules written in the Polar policy language) take a different approach, aiming to help decouple authorization from authentication and make it easier to build exactly the authorization you need.

The Advantages of Authentication Services

The authentication side is much better served, which is helping the move away from passwords. Historically, checking the user is who they say they are has been a single step: if I have the username and password, I must be the user.

But those days are drawing to a close.

“With phishing and credential stuffing and all the various attacks that feed into taking over accounts, that’s generally insufficient,” Hanley warned. “Passwords have all these deficiencies; they tend to be things that people write down, they tend to be things that people reuse. We give people complex password rules, which makes it difficult for them to adopt them as a primary factor, and that means they end up spilling or being abused or found in other places when they shouldn’t be.”

The FIDO2 and WebAuthn standards make it easier to move to multi-factor authentication that uses biometrics, authentication apps or hardware tokens because they’re built into common operating systems and browsers. Biometrics like Windows Hello cameras and Touch ID on Apple devices make this far more convenient for users when apps and websites use those for strong authentication.

There are some situations, like air-gapped networks, where you will have to create your own identity and authentication system. But when you can, picking one of the cloud identity provider services (like Auth0, Azure AD B2B and B2C, Google Workspace, Okta, Twilio’s Authy and others), gives you a wider choice of authentication methods, as well as their expertise in the systems that underlie even something as common as SMS 2FA.

“We know supply chain security is under attack today, and there are a variety of interesting and exotic attacks out there. We need to protect the developers and the communities that are doing that work.”

—Mike Hanley, chief security officer, GitHub

“You have to manage this complex relationship with telcos,” said Catie Kolander, senior product marketing manager with Twilio’s account security business unit. “You may have to manage a pool of numbers to be able to send the one-time passcodes. Do you have access to a backup route in the event that delivery fails?

“You’re going to start seeing an increase in support costs when they say ‘I didn’t get my authentication code,’ and they send that message seven times.”

For internal users, organizations can mandate authentication methods; if you’re dealing with customers you may have to be more flexible, bearing in mind who will be authenticated, on what device and what they’re comfortable with, Kolander pointed out. ”If you only offer something like SMS, are you excluding a large user base that doesn’t have a mobile phone? Do you need to have something like a voice call or email?”

That’s not the most secure option for authentication, but it may be necessary for some audiences. For others, you may need over-the-top methods like WhatsApp, voice calls, Time-Based One-Time Passcodes (commonly known as TOTP) through authenticator apps, or push authentication inside your mobile app — which takes more development work but is the only option that lets users explicitly deny access.

Identity services also simplify getting extra information about the connection beyond just the IP and device information to use in authentication decisions. As Kolander told us, “If it’s a phone number that’s coming from a country that you shouldn’t be seeing be a connection from, you have the ability to block that or provide extra layers of authentication.”

These levels of checks and challenges do mean more work for admins and users, but it’s important, especially for developers. “We are more than intentionally not just nudging but really pushing toward what we think good looks like,” Hanley said.

“We know supply chain security is under attack today, and there are a variety of interesting and exotic attacks out there. We need to protect the developers and the communities that are doing that work.

“We have to be a secure platform, but part of it is also pushing that account security out to the edge so that we can enable them to have better control over these user accounts, which are authorized to and part of some of the biggest and most important, open source projects that power all of the devices we use and software we interact with each and every day.”

The post How Do Authentication and Authorization Differ? appeared first on The New Stack.

JavaScript continues to evolve, and the next annual update to ECMAScript, which formally standardizes the JavaScript language, will be approved in July 2022. That will include all the new feature proposals that have reached stage four by March 2022 — proposals that have been signed off by the ECMAScript editors, passed the test suite and shipped in at least two implementations.

A handful of proposals are already at that stage. And you can get an idea of what other features may make it into ECMAScript 2022 by looking at which have reached stage three, where the spec has been signed off and the tests have been passed. But these proposals need to be tried out in an implementation to see how well they actually work in practice.

For instance, the proposal to be able to work backward through an array starting with the last element the way you can in Python is currently implemented as a polyfill, in an attempt to discover if it clashes with any commonly used JavaScript frameworks.

We asked two of the co-chairs of the TC39 committee that standardizes ECMAScript, Rob Palmer (head of Bloomberg’s JavaScript infrastructure and tooling team) and Brian Terlson (principal architect on the Azure SDK for TypeScript and JavaScript, and former editor of the spec) to pick out some of the most significant proposals — including major improvements to working with dates.

Top-Level Await

JavaScript has had asynchronous functions since ECMAScript 2017, but developers have to explicitly declare a function to be asynchronous.

“If you’re running a module initializer — the top level of your code that runs when your module begins — that hasn’t always been regarded as asynchronous,” Palmer said.

Using the await function to wait on an asynchronous process as that top-level might result in a syntax error if other code tries to read results that haven’t yet been returned, and attempts to avoid that by using async main make code more complicated and harder to test or perform static analysis on.

With Top-Level Await (which has reached stage four and is already implemented in the three main browser JavaScript engines), the module system coordinates all the async promises for you.

Top-Level Await is important for integrating WebAssembly modules with JavaScript. This will also help developers who share samples and code snippets, said Terlson: “I don’t need to put the async function name around them to make them copy/paste-able.”

It’s taken a while for Top-Level Await to become part of ECMAScript because of concerns that it would make it more likely that developers would create deadlocks in their code.

“When we allow asynchronous operations to be put in there, people can write arbitrarily long-running operations — like querying a database, making a network request,” Palmer warned.

“If you are not careful with this feature, you could delay your entire app loading. You can’t even run the first main function because you’re now waiting on some arbitrary operation to complete.”

But calling async code is very common in browsers, he added, because if you have to wait for a piece of code to finish running before anything else can run, blocking code can cause performance issues.

With Top-Level Await, ECMAScript is diverging from Node.js, which uses CommonJS modules, which will remain synchronous for backward compatibility. Developers won’t be able to load ECMAScript modules synchronously in a CommonJS file; they should use the dynamic import operator, which returns not the module but a promise for the module, loading it asynchronously.

Finally, Private Fields

By default, the properties in JavaScript classes are public and can be called from anywhere in your code. But sometimes that exposes some fields that you don’t want to have accessed (and possibly changed) from elsewhere in the code: you might want to have a stable interface in a library but be able to change the code that implements the class over time, which means making sure developers don’t take a dependency on any specific implementation detail.

So far, the convention has been to mark private fields by using an underscore at the beginning of the property name, but the JavaScript language hasn’t done anything to block access from outside the class. With private class fields, using # to start the field name means it can only be accessed directly inside the body of the class in which it’s declared.

Discussions about implementing private fields have gone on for a number of years (and they’re already available in Node.js). The suite of proposals that make up the new class fields is ready to be part of ECMAScript 2022, with a mix of proposals for public, private and static fields, methods and accessors, including a way to check if a private field exists.

All the browsers are now implementing key pieces. Being able to declare instances of static fields and make them private for data encapsulation has already been implemented in Chrome and Safari.

That unlocked the key proposals going to stage four earlier this year, according to Palmer.

Temporal, for Date-Related Javascript

Temporal, which Terlson refers to as “the replacement for our broken Date object,” reached stage three earlier this year. It’s set to replace libraries like Moment.js, which have done a good job filling a gap in JavaScript that’s so common that it makes more sense to have the functionality as part of the language.

Temporal is a global object that will be a top-level namespace for the new date and time API, covering the full range of date, time, time zones, calendars and even public holidays.

Whether you need the current time, a Unix timestamp, the day of the week a date falls on in a particular year, how many days till Christmas, whether a business that lists opening hours is open or closing in the next few minutes, an easy way to sort dates and times or something more complex involving multiple time zones, the Temporal API covers it.

“Temporal is going to completely change the way we write date-related JavaScript code, [by] being able to actually do date math without falling into myriad different pitfalls,” Terlson said. “It has good ways to represent a calendar date or a time that might exist on a clock: these are things that we just have no way to do in JavaScript right now, it always has to be attached to a time zone and a particular UTC instance, which causes no end of problems.”

Going from a single Date object to multiple options in Temporal may feel like more work, but Terlson suggested it will be easier to use in practice because it’s more rigorous.

“The Date object that we have now only feels easy because you don’t realize you’re shooting yourself in the foot half a dozen times,” he said. “The new date system forces you to think about that complexity upfront, and so you’re just frontloading your bug-finding efforts.

Temporal is set to replace libraries like Moment.js, which have done a good job filling a gap in JavaScript that’s so common that it makes more sense to have the functionality as part of the language.

“Dates are one of those things where it’s hard to test; you can’t just change the system date easily in the middle of unit tests,” he added, “The right way to do it is to mock it — but a lot of people just don’t touch their date code so they don’t find [the problems.”

In addition, Terlson said, “you almost need to be a data expert to know what the interesting test cases are.” Not every developer would think to test for leap year behavior or what if their code will run correctly when a leap second is declared, he suggested. “Or a mobile app when the user changes time zone — is it going to fall over?”

Palmer calls Temporal “in some ways the least controversial proposal” he’s seen. Everyone wants to see it progress, he noted — but because it’s such a large API, fully implementing and testing it and taking feedback may take a little time.

However, production-quality polyfills are already in development, and there’s a sandbox in the Temporal documentation powered by an earlier polyfill, so developers can try it out in their browser developer tools.

He also sees strong developer demand and expects Temporal to start getting used quickly, so developers can remove Moment.js from their applications, “making everyone’s code lighter and faster to download.” But the sheer size of the API means testing will take time.

“The reason these libraries are so big and complex is not accidental complexity,” Palmer said. “It is because the rules in our world for dates and times are so complex that you want to offload as much as much as you can to the experts that have thought about this and embedded all that knowledge in the API.”

Moving Temporal to stage four will also have to wait until the Internet Engineering Task Force (IETF) work to standardize the string formats used for calendar and time zone annotations is complete; that’s expected for 2022.

Chaining Error Cause

Also at stage three is a proposal Terlson views as a companion to the aggregate error that was standardized with Promise.any in ECMAScript 2021: Error Cause, which is already implemented in Chrome, Firefox, Safari and Node.js.

“Aggregate error is used for cases when you have different sources of errors: where you’re doing different processes and you get a bunch of errors that are only related by the fact that they are kicked off by the same activity,” Terlson said.

The Error Cause proposal is a way to link errors that cause other errors, like making a network request as part of a bigger operation that will use that network request. Often developers will just throw an error if the network request fails, which doesn’t provide any context for which operation failed because of the network request failure.

With Error Cause, developers can throw an error including an error code with a cause property that records how many requests filed, each of which has its own error with more information, and those can chain more errors in turn.

“It lets you catch an error and throw an error with more context, while preserving the original error,” Terlson said.

There are also some interesting proposals that have only reached stage two —where there’s a full specification and it’s likely the proposal will continue to be developed and become part of ECMAScript, but there’s not enough agreement to sign off a complete specification that’s ready to be tested in implementations.

Record and Tuple for Better Matching

User interface developers will be interested in the stage two Record and Tuple proposal that provides first-class support for the immutable data structures often used in interface development. Today that’s done with libraries like Immutable.js.and Immer, but having this in the JavaScript language will make the new Record and Tuple data structures easier to debug.

“Record is similar to a frozen object and Tuple is similar to a frozen array,” Palmer said. Because they’re primitives rather than objects (which have identity) and they’re compared by value, you can compare two separate objects that contain the same things – like a person’s name – and they’ll be recognized as being the same, even though they’re in different objects, each of which has its own identity.

If you’re using that name as the key to a map, you want to be able to do a lookup in some other part of your code, where you generate a fresh record but want to find out if that person’s name is in the map. With Record and Tuple, you’ll get the match instead of it failing because the fresh record is a brand new identity object.

“Previously, people had to do tricks to try and make maps work well when you want to use a composite key with multiple things, like serializing the contents into a string with special markers between the different fields, so that the strings would be compared by value,” Palmer said.

Reading the contents of an object out into a string is an opportunity to introduce bugs; now you can use proper language structures and use the regular triple === equality comparison.

This will be particularly useful for comparisons in JSON-style deep object trees, Palmer suggested. “You’re getting a more semantic style of equality.”

Using === can also improve performance and reduce the compute load of rendering the user interface.

“A common functional pattern, like you see in React, is where you’re examining your data properties, and seeing how they changed since the last render,” Palmer said. “You can do optimizations: if my input data has not changed, I would expect the rendering map remains identical, so you don’t need to recompute it.”

He views the syntax as less cumbersome than the existing libraries: “This provides a really, terse, really clear way of dealing with immutable data structures. Also, if you’re a library writer, you now have a common currency to exchange with the rest of the world. It standardizes the way of passing around immutable data.”

Pipeline Operator Still in the Pipeline

Some other interesting proposals are less likely to make the ECMAScript 2022 timeline.

Most scripting environments have a pipeline operator that lets you take the output of one function and “pipe” it into another function (pipes are also widely used in functional programming). Without a pipe operator, you have to build a hierarchy of nested functions which can be hard to read, or create — and manage — temporary intermediate variables to store and pass results from function to function.

So far, JavaScript hasn’t had a native pipe option (although there are many libraries that add it), and adding this was one of the top four requests in the State of JS 2020 survey.

There have been many suggestions for how to do pipes in JavaScript over the years and there are currently several competing proposals following the approach taken in F# or the Hack language.

The Hack-style proposal has reached stage 2 and there are already plugins for Babel and a feature flag in Firefox that support the new operator, but there are also ongoing discussions about concerns on the impact on browser engine memory and performance, so the pipe operator is still definitely experimental and seems unlikely to be in ECMAScript 2022.

The post JavaScript Forecast: What’s Ahead for ECMAScript 2022? appeared first on The New Stack.

For many enterprises, web applications written in JavaScript are a standard part of their development process, for both customer-facing and internal line of business applications. When they have large teams of developers working at scale, TypeScript — a statically typed superset of JavaScript — is increasingly proving to be a more suitable choice than plain JavaScript.

That may be one of the reasons TypeScript jobs pay better than JavaScript roles, by nearly $5,000 a year on average.

TypeScript isn’t trying to be a programming language in its own right; it’s really a combination of tooling and optional, removable types. Technically, TypeScript is a combination of a static type checker with a compiler (that can also act as a transpiler). But the fundamental idea is to improve developer productivity in the most widely used language of all — JavaScript.

When TypeScript started at Microsoft back in 2011, JavaScript lacked key features for structuring large codebases: modules, classes, interfaces and especially a static type system. TypeScript was developed to give developers the productivity they were used to in languages like C# that has those modern language constructs (and to enable features in developer tooling that relies on them).

JavaScript is both powerful and flexible. You can hack out a prototype quickly and build server backends as well as browser or desktop apps. But JavaScript doesn’t give developers as much help with managing and understanding large codebases as other languages.

“JavaScript is one of the most ubiquitous languages. It’s everywhere, but it has one of the weakest developer ergonomics or productivity experiences,” noted Dylan Schiemann, co-founder of Dojo, one of the earliest JavaScript frameworks and an early TypeScript enthusiast. “It’s simple because it’s simple, but it’s actually hard to do the right thing at times.”

Developers like the freedom to write code and to remove that code and replace it with something else but “without interfaces you can’t really do that,” he added. “And without types you don’t have interfaces.

“TypeScript is fixing up something that is fundamentally dirty. JavaScript is kind of dirty and TypeScript makes it less dirty.”

What You Get from TypeScript

Slightly confusingly, TypeScript refers to several different things:

The type-checker for the TypeScript and JavaScript languages.
A “superset” language that adds type syntax to JavaScript.
The official compiler that type checks and transforms your code.
The editing experience that’s co-developed with the compiler and is bundled in editors like Visual Studio and Visual Studio Code.

All those are needed to add static typing to JavaScript and deliver tooling that makes that easy and convenient to use. Calling TypeScript a strongly typed superscript of JavaScript might make people think it changes how JavaScript works (or changes your code), noted Daniel Rosenwasser, senior program manager for TypeScript.

He prefers calling it statically typed, “since we’re just performing some checks before you run your code. Our vision is for TypeScript to complement and improve JavaScript in whatever way it can.”

The goal of TypeScript has evolved over the years, along with the JavaScript language itself; before ECMAScript 6 kicked off a now-annual cadence of improvements to the language, TypeScript’s role as a transpiler was an important way to get access to language improvements, like arrow functions and enums.

Now the language only implements new features that have reached stage three in the TC39 standardization process, those ready to be built into JavaScript implementations to get feedback. “The place where TypeScript innovates is the type system, not the run time,” said Titian Cernicova Dragomir, a TypeScript expert and Bloomberg software engineer.

TC39 co-chair Rob Palmer explained TypeScript as syntax: “It’s type annotations, just extra characters you can add to your JavaScript, to give more meaning to it and to allow the compiler to detect more errors and offer you more and more benefits, because it understands more about what you really meant for that code.”

There’s no conflict or competition with JavaScript. Think of TypeScript more like a linting tool or a code-enforcement tool than necessarily a programming language that deviates from JavaScript, Schiemann suggested: “It’s really a set of tools that make JavaScript better that get removed.”

That’s why Rosenwasser prefers to call TypeScript “JavaScript with erasable types.” Developers mark the types in their code; the compiler checks them, then generates clean JavaScript code that looks as if an expert human developer wrote it.

You can also adopt TypeScript progressively, Dragomir noted. “If you look at all the compiler settings TypeScript has, you can look at them as a dial and start that on the lowest setting and progressively increase the security and strictness as you move along.”

That means you can pick the mix of expressiveness and correctness you want in a language that remains as flexible as JavaScript — because it is JavaScript.

Modern JavaScript code is valid TypeScript code and TypeScript runs anywhere JavaScript runs, on any device with a JavaScript runtime that supports ECMAScript 3 or higher — whether that’s a complex website, a backend service or a mobile device.

Why Enterprises Need TypeScript

Especially when you have multiple developers building large-scale web apps with many thousands of lines of code, the dynamic typing in JavaScript, which means code will compile even if it has errors that will stop it running, makes it far too easy to end up with messy code that’s hard to debug and maintain. Enterprises can’t afford that kind of technical debt.

For complex code, enforcing types makes it harder to introduce bugs without noticing, because they tell both the compiler and developers how the code is supposed to behave.

Getting type errors as soon as you compile code, as you do with Typescript, so you can see the problems while you’re still writing your code rather than hours later, makes it far easier to debug a project than when those errors show up at run time.

Static type checking doesn’t just help spot type errors and operations that won’t work because of the type; it makes it easier for an integrated development environment (IDE) to find mistakes like typos in method names.

Types allow for context-aware suggestions like autocomplete, for better code navigation (by finding all the references in code or taking you to the definition) and for better automatic refactoring. “Things that used to take a whole day take a second with refactoring, and that’s just invaluable,” noted Anders Hejlsberg, a technical fellow at Microsoft and core developer of Typescript.

Types also make it easier for humans reading the code to understand the arguments and values of a method without digging into comments and documentation (for your code and for any libraries and frameworks).

Getting type errors as soon as you compile code, as you do with Typescript, so you can see the problems while you’re still writing your code rather than hours later, makes it far easier to debug a project than when those errors show up at run time.

You can document code and annotate types with JSDoc, but you still have to make sure the code and documentation match up: other developers have to trust that the code was documented correctly in the first place and that any changes are reflected in the documentation.

For large teams writing a lot of code, these benefits can be a huge productivity boost. “Teams like Outlook Web and others in Office were considering compiling from languages like C# to JavaScript just so they could get modern language constructs, type checking and good tooling,” Rosenwasser told the New Stack.

“We knew that that was not the right way to get those benefits because you couldn’t ignore the underlying platform — if you want to build the best JavaScript apps, you have to start with JavaScript.”

With TypeScript, enterprises can get the fast development they picked JavaScript for, with the type-checking and tooling that delivers developer productivity.

Enterprises Already Rely on TypeScript

Enterprises almost certainly use software that uses TypeScript. Not only is TypeScript itself written in TypeScript, so are Visual Studio Code, the web version of the Excel engine, Slack Desktop, Microsoft’s new Fluid framework and the Figma design tool — technologies that are powering the next generation of collaborative apps that enterprises will rely on, noted Rosenwasser.

“It’s clear TypeScript is going to power a lot of rich apps in that domain,” he said.

Dojo and Angular 2 were notable early adopters, Schiemann told The New Stack: “We knew the problems and we knew the pain points, and we knew this had the potential to help us.”

Now almost every major frontend framework is written in TypeScript, said Rosenwasser: “At some point, the question became, who isn’t using TypeScript?”

Airbnb has more than 2 billion lines of JavaScript and 100-plus internal npm modules. It decided to make TypeScript its official language for frontend web development after looking at six months of postmortems and discovering that TypeScript could have prevented 38% of the bugs that resulted in incidents in production.

“TypeScript is fixing up something that is fundamentally dirty. JavaScript is kind of dirty and TypeScript makes it less dirty.”

—Dylan Schiemann, co-founder of Dojo

Slack had a similar experience when moving to TypeScript for writing Slack Desktop, immediately finding a surprising number of small bugs. It also found the productivity improvements so significant that the company started using it for all new code within a few days of starting to convert the existing codebase.

“TypeScript gives us a guarantee that the structural dependencies in the code are sound,” Felix Rieseberg, an engineering manager at Slack when it moved to TypeScript, pointed out in a presentation about adopting the language.

There’s certainly some overhead in adopting types, but the developer productivity for large teams with large codebases is very appealing. Being able to lean on code completion and codebase navigation, both features of TypeScript, helps a team scale out, Rosenwasser said.

“Questions like, ‘How is this function supposed to be used?’ and ‘What properties can I access here?’ might be hard questions to answer without TypeScript,” he said.

“It’s very rare in our experience to see someone move away from TypeScript because that baseline experience is hard to give up,” he added, especially the code completion feature and its ability to help users avoid typos.

Dragomir echoed these sentiments. “I feel incapable of writing pure JavaScript. It’s so restricted,” he said. “Once you’ve gotten used to it, if you try to go back to JavaScript, you feel the pain.

“It’s that you have gotten used to the idea of it doing everything for you and you suddenly feel like you’re driving without a seatbelt, and that feels painful.”

But adopting the language involves a learning curve, Rosenwasser noted, and enterprise developer teams will need to take that into account as they move to TypeScript.

“Maybe TypeScript in its entirety won’t be the best tool for every project — especially very small programs and short scripts,” he said “So we try to provide modes like checkJs, where you can write in plain JavaScript with JSDoc type annotations and get checking from TypeScript.”

Airbnb decided to make TypeScript its official language for frontend web development after looking at six months of postmortems and discovering that TypeScript could have prevented 38% of the bugs that resulted in incidents in production.

TypeScript usage is a little less common in server-side code, although it’s used with Node.js, Deno, and many other serverless runtimes; Bloomberg uses TypeScript in its unified frontend and server-side systems.

“While there’s a huge chunk of Node.js developers that use TypeScript, there’s a little bit less willingness in the Node world to add a compile step,” said Rosenwasser.

Webpack and Rollup have normalized that step as part of the build chain for frontend developers, he said. “TypeScript has an edge on the front end, and a big part of that is because build steps like bundling have become ubiquitous.”

TypeScript Tooling

TypeScript developers can use any JavaScript tools, libraries and frameworks; if the type definitions for those aren’t included in the package, they’re almost certainly available from DefinitelyTyped (and delivered through npm).

That network effect adds to the popularity of TypeScript, according to Palmer (and simplifies enterprise adoption).

“You can get type information for almost any library, any package in the world,” he said. “And of course, when you try to do something for every package in the ecosystem that requires work, people are willing to do that work once. But you’re not going to get the whole world to do that work for every single type system.”

Early on, the best place to use TypeScript was in Visual Studio, which meant using Windows and paying for a high-end Microsoft developer tool — or missing out on the tooling that’s a key part of the TypeScript experience. Enterprises may well have that expertise in existing development team members who are used to other languages, but not all new web development hires will.

For developers with experience in Python, C or Java, the tools TypeScript brings are familiar and have been missed. For younger developers, they are more likely to be novel.

The arrival of Visual Studio Code changed that, and other popular editors like Atom and Sublime support it (directly or through plugins), using the language services that allow TypeScript (and your editor) to understand which properties and methods are available on certain objects.

The motto of Rust is “fearless concurrency.” Rosenwasser likes to suggest that TypeScript’s superpower is “fearless refactoring.”

“We’re motivated by common bugs we see in the real world,” he said. And with the scale of TypeScript usage, the team gets to see a wide range of issues in code.

TypeScript aims to recognize common JavaScript patterns so that you can write your JavaScript without remembering that you’re writing TypeScript, Rosenwasser said.

”In a sense, we’re making the language more expressive without trying to push the burden of expressing yourself to the type system,” he said. “The more we do this sort of thing, the more you can avoid type-assertions”— or casts — “that often violate the type system’s assumptions.”

Favorite Features of TypeScript

Recent releases of TypeScript have added features to make it easier to work with async functions (an Awaited type for working with Promises in JavaScript) and for using different types when you set a property and when you retrieve it (because you might accept both integers and strings but transform those strings to integers before storing them).

One favorite feature Rosenwasser called out is the way TypeScript handles union types.

“TypeScript’s not the only type system with untagged structural union types, but it’s one you typically won’t see outside of type systems for dynamic languages like JavaScript,” he said. “There’s something very powerful about being able to mix and match the possible values a function can take.”

Some features, like eliminating tail calls between recursive types, are extremely powerful and aimed at developers writing libraries. “Indirectly, we’ve built a small functional programming language within our type system, and one of the challenges is how to allow people to use those constructs efficiently and effectively,” said Rosenwasser.

Others, like snippet completion, will prove useful for many developers who work at scale. This feature lets you add default text and tab through the different options, choosing bits and pieces of code you might want to tweak. Instead of typing lots of boilerplate, developers can focus on the core logic of their code.

In fact, as TypeScript encodes more semantic understanding of how developers write JavaScript code, there are more places where it knows what types are required. As a result, developers don’t have to provide those type annotations themselves, getting the benefits of control flow analysis without extra work.

If you have a variable that can be a type or null, and it’s not null, then the type is automatically changed to the type it should be.

For developers with experience in Python, C or Java, the tools TypeScript brings are familiar and have been missed. For younger developers, they are more likely to be novel — and Schiemann credits some of TypeScript’s recent popularity to this.

The language services in TypeScript mean editors like Visual Studio Code gain extra understanding of plain JavaScript code, without types to give it hints. That powers autocompletion and code navigation as well as refactoring, Palmer pointed out.

“A lot of people will think they’re just using Visual Studio Code,” he said. “And they won’t realize all those powers are coming from TypeScript running under the hood.”

TypeScript’s Growing Impact

TypeScript is also beginning to influence new languages like AssemblyScript for WASM and Bicep. Rosenwasser credits the familiarity of JavaScript for some of that, in much the same way that many languages adopt the conventions of C: “If your syntax looks close to something in the C family of languages with curly braces and the like, you get a lot just by being familiar.”

But it’s also worth noting that static types make TypeScript relevant in some domains JavaScript may not be appropriate for.

In many ways, TypeScript is the project that taught Microsoft how to do open source well. (Other open source project maintainers inside Microsoft credit the TypeScript team for shepherding them through their projects’ early stages.)

Thinking back to how they won over frameworks like Dojo and developers with large projects, Jonathan Turner, a former program manager for the Typescript team, recalled: “opening our doors and treating everyone that comes to us as engineers with good juicy problems about how they work with JavaScript.”

From the perspective of a TypeScript user, Schiemann called the team that built the tooling “the most humble, warm, inviting group of geniuses.”

“They just keep looking at things that are hard to do, and how they can make it more ergonomic and easier to work with,” he said of the people behind TypeScript. “They say, how can we make this more useful for you? And they listen, and they do those things, and then people are happy.”

The post How TypeScript Helps Enterprise Developers appeared first on The New Stack.

If you want to run code to provide observability, security or network functionality, running it in the kernel of your operating system gives you a lot of power because that kernel can see and control everything on the system.

That’s powerful, but potentially intrusive or dangerous if you get it wrong, whether that’s introducing a vulnerability or just slowing the system down.

If you’re looking for a way to take advantage of that kind of privileged context without the potential danger, eBPF is emerging as an alternative — and now it’s coming to Windows.

Not Just Networking

Originally eBPF stood for “extended Berkeley Packet Filter”, updating the open source networking tool that puts a packet filter in the Linux kernel for higher performance packet tracing (now often called cBPF for classic BPF).

But it’s now a generic mechanism for running many kinds of code safely in a privileged context by using a sandbox, with application monitoring, profiling and security workloads as well as networking, so it’s not really an acronym anymore.

That privileged context doesn’t even have to be an OS kernel, although it still tends to be, with eBPF being a more stable and secure alternative to kernel modules (on Linux) and device drivers (on Windows), where buggy or vulnerability code can compromise the entire system.

The usual implementation is a kernel-based virtual machine for low-level packet processing; on Linux that lets you change the behavior of the kernel without recompiling it, to load your own event-driven code that will be executed in response to “hooks” — triggers like network events, system calls, function entries and kernel tracepoints.

That code can run a variety of functions by using helper calls, and it can change what the kernel would usually do: an eBPF program might mean a network packet is dropped, a system call is refused or an event is recorded for tracing.

It’s still something of a specialized technique, but developers get to work in a relatively high-level language like C++, Go or Rust but get the impact of working directly with the kernel.

“One of the reasons Microsoft has been investing in eBPF is because we see the importance of making the data plane itself programmable and controllable by software,”

— Dave Maltz.

eBPF is used for high-performance networking and load balancing, delivering application-specific routing or Quality of Service, protection against denial of service attacks and enforcing container runtime security. Kubernetes networking security using Calico or Cilium is extremely popular because it provides visibility into HTTP traffic traditional security monitoring can’t see.

There are some interesting opportunities to use eBPF in the new open networking models that are being developed around the open source, Linux-based SONiC network operating system Microsoft (a founder member of the eBPF Foundation) created to simplify building its Azure infrastructure, which is now supported by many network hardware vendors for smaller cloud providers and enterprises, especially in verticals like financial services, as well as hyperscalers.

“One of the reasons Microsoft has been investing in eBPF is because we see the importance of making the data plane itself programmable and controllable by software,” SONiC founder and Azure Networking engineering lead Dave Maltz told the New Stack.

Disaggregating the network stack means network architects could choose which protocols they want to run and use just the software modules required for that; not running software that provides features you don’t need means less overhead and fewer potential bugs.

It also allows the network OS to use public APIs to provide functionality; the code that provides that API can be improved but code that depends on it won’t have to be updated when that happens, which allows for much more innovation in the networking stack.

DASH (Distributed APIs for SONiC Hosts) is a network project to do that for the software-defined networking data plane, using SmartNICs and other hardware to improve network performance for cloud services by doing more work in the network itself — like encryption or key management — in ways that merge compute and networking.

“We need to expose higher-level APIs for the control of that SDN data path and eBPF is a great way to implement that,” Maltz said.

On Linux, eBPF is the evolution of a capability that’s been around the 90s but now has enough features to be broadly useful for an increasingly wide range of applications.

“Over the years, it’s grown from we can run some arbitrary code when a packet arrives, to we can run some arbitrary code when some other thing [happens]. We’re seeing more and more places where you can hook into the kernel,” Isovalent’s Chief Open Source Officer Liz Rice told us.

eBPF is ideal for debugging, application tracing and performance troubleshooting, getting observability data without the usual intrusiveness and overhead — and for creating workarounds and compatibility fixes for limitations in software you can’t change.

Most of what New Relic’s open source Pixie Kubernetes observability platform does relies on eBPF and Splunk recently donated its Flowmill eBPF collector application and kernel telemetry to the Cloud Native Computing Foundation‘s OpenTelemetry project.

Those telemetry collection components will help developers stitch together telemetry data from different layers and could help to make eBPF simpler to work with, suggested Dave Thaler, partner software engineer at Microsoft.

“eBPF is a very efficient and safe means of obtaining information about processes, with good flexibility around what and how information should be collected,” Thaler said.

Netflix has been using eBPF to get much deeper metrics for analyzing performance on Amazon Web Services‘ than AWS itself originally thought was possible, but when it first started that work it was still a tool for experts. “It seems to be crossing into the mainstream now,” RedMonk co-founder James Governor told The New Stack.

It’s so useful and mainstream that developers and Microsoft customers started asking when they could get something similar in Windows.

Why Windows

“eBPF has emerged as a revolutionary technology enabling greater programmability, extensibility and agility,” Thaler told us. That’s as useful on Windows or any other OS as it is on Linux, and using common tooling and frameworks across Windows and other platforms provides engineering efficiency, he noted.

“For [developers] who already use eBPF on Linux, it is attractive to use eBPF on Windows to enable the same type of solutions to work on both platforms. Even for those who only use Windows, we believe that the programmability, extensibility, and agility benefits of eBPF will open up development to a wider audience.”

And he expects the same speedups eBPF can bring with SmartNICs with approaches like DASH should be possible for Windows in the future.

Windows already has ways of extending low-level functionality by calling public APIs, such as NDIS, Windows Filtering Platform, DTrace, the Driver Interception Framework (DIF) and so on. But extending the Windows kernel requires writing a driver and submitting it to Microsoft for signing which, Thaler points out, is a slow process.

“Since the eBPF verifier uses a formal methods based analysis to check code safety-based, eBPF enables agility of rapid extensibility without having to wait for a long [approval] process, which can be especially useful in time-critical scenarios such as debugging or DDoS mitigation.”

“eBPF for Windows will enable developers to safely consume frameworks like these using multiple existing well-known programming languages without writing a kernel driver, and leverage the eBPF ecosystem of cross-platform tools and experience.”

Rice agrees: “eBPF gives us this really powerful platform for building things like observability tools and there’s no reason why that shouldn’t be just as applicable on Windows as it is on Linux.”

Importantly, eBPF is being built to run on Windows rather than to be part of Windows, which means it will run on existing versions of Windows rather than needing an update to the OS. The open source eBPF for Windows project supports Windows 10, Windows Server 2016 and later, with Microsoft contributing code to existing open source eBPF projects so that they work with Windows as well as Linux (and potentially other operating systems in the future).

And when the project is mature, it will move to what Microsoft describes as “a community-governed foundation in the eBPF ecosystem.”

Building on Open Source

Windows uses drivers whereas Linux uses kernel modules and APIs rather than system calls, so eBPF needs to be implemented slightly differently. The modular architecture Microsoft chose also means eBPF can be used in more scenarios.

Before it runs, eBPF bytecode is checked by the open source PREVAIL static verifier that runs in user mode rather than in the kernel: if it passes all the safety checks, the code is either compiled to native code by the open source uBPF JIT compiler that also runs in user mode or passed straight to the uBPF interpreter. The interpreter and the compiled native code both run in kernel mode.

Running the verifier and just-in-time (JIT) compiler in user mode is a big difference from eBPF on Linux, but it makes sense, Thaler told us, and not just because Microsoft wanted to build on an existing community project that had already made the decision to build for user mode.

There’s an increasing trend to moving code out of the kernel, because if there’s a bug in the driver that causes it to crash it won’t take down the whole OS.

But it also means that eBPF can be used to extend user mode daemons, not just the kernel. You could even run eBPF on one device and have it provide functionality and performance improvements on another machine. Thaler explained:

“We believe that an important property of code for eBPF building blocks like the verifier, interpreter, and JIT compiler, is that the same code should be reusable not just across multiple platforms, but be usable in different contexts such as built into an OS kernel, or run in user space, or run inside a Trusted Execution Environment, or even run on a separate machine from the OS being extended.”

That means that the verifier has to be very reliable, and it needs to be protected so that it’s as secure as the kernel.

There’s already a test suite for the verifier and it will be fuzz tested before it’s ready for production. The safety properties that the verifier providers for code are also stronger than the tests that Microsoft typically runs on drivers before signing them.

Both the verifier and compiler run in a privileged system service on Windows, and user-space APIs can only be called by admin accounts, Thaler notes, “We are investigating future models where the verifier and compiler can run inside a separate VM, or even on another machine.”

“Much of the security hardening work for eBPF for Windows still remains, which is why it is not yet signed for used in production, only test environments,” he warned. Although you can install and try it out today, you’ll have to put your Windows PC into developer mode, with test-signed binaries enabled; that’s common for developers but not suitable for production systems for security reasons.

You’ll also have to stick to interpreted mode unless you turn off the HyperVisor-enforced Code Integrity (HVCI) hardware virtualization feature Windows uses to protect kernel mode processes like the logon service from attack. The JIT compiler doesn’t sign code with a signature that HVCI trusts, so that code won’t currently run.

“JIT compiled mode is more efficient than interpreted mode, if all other things are equal,” Thaler points out, so Microsoft is looking at how to get JIT compilation working with HVCI.

Cross-Platform Code

Developers can create eBPF bytecode for Windows with existing eBPF tools like clang, include it in a Windows application or just type it into the Windows netsh command-line tool (much as Linux users would use Bpftool); they all call a shared library that exposes Libbpf APIs (and passes the code on to the verifier).

Microsoft plans to enable anything that you can call as a public API and write a driver in Windows that’s relevant to eBPF to be exposed to eBPF. “The eBPF runtime does provide additional constructs from eBPF, such as using shared memory via support for a wide variety of ‘map’ constructs.”

That code might or might not be the same eBPF code you’d run on Linux because it depends on whether you’re doing something that’s supported on Windows.

“Some eBPF APIs are inherently Linux-kernel-specific since they interact with or extend functionality in a way that depends on knowledge of internal implementation details. Other eBPF APIs extend functionality such as TCP/IP that is common across platforms,” Thaler explained.

Call a network socket or bind to a common protocol like IPv4 or IPv6 and you can recompile Linux eBPF code to run on Windows.

“Our goal is to have a large set of cross-platform APIs, while enabling anyone to easily add additional APIs for their platform of choice, such as to extend functionality in their own drivers or user-space applications,” Thaler said.

That will mean if eBPF bytecode calls those APIs, the same code will work on Linux and Windows.

As for the hooks and the helper functions eBPF code can call, eBPF for Windows already supports two hooks and over 10 cross-platform helpers documented in bpf_helper_defs.h and ebpf_nethooks.h.

“For the core eBPF execution context, Microsoft has had to add implementations of various map types and helper functions since there was not an existing open source project that could be immediately leveraged,” Thaler said. “Our hope is that over time such implementation will move to a cross-platform project such as the generic-ebpf project, that eBPF for Windows can then use.”

Microsoft is focusing on the most commonly used ones first to enable popular applications, but any hooks and helpers that call public APIs can be contributed from the community, he said.

Because eBPF has so far only been available on Linux, eBPF tools like Libbpf, Bpftool or Kinvolk’s Inspektor Gadget suite for inspecting Kubernetes clusters (now owned by Microsoft) assume you’ll be using Linux and rely on Linux-specific functionality and implementations even of things that are available on other platforms.

“A number of helpers and hooks on Linux are inherently Linux-specific or expose structures that use details of the Linux implementation. Some of them wouldn’t apply to any other platform, and some would apply but other platforms use a different native format,” he explained.

For those that apply but simply use a different native format, some cross-platform functionality can still exist by copying the data into the format exposed by the existing eBPF helper, with a slight performance cost.” Microsoft is working with the community to separate the native and cross-platform functionality in Libbpf and Bpftool so they can be used on Windows as well.

Higher-level tools like the L3AF lifecycle management and orchestration project for eBPF networking apps (including load-balancing, rate limiting, traffic mirroring, flow exporter, packet manipulation and performance tuning) that Walmart contributed to the LF Networking group are also currently only for Linux; again Microsoft is working with the L3AF community to bring support for Windows.

The number one request, ahead of even the observability and debugging features eBPF will provide, is the denial of service protection it will offer on Windows.

Another big opportunity for eBPF on Windows is for organizations to support their own legacy in-house applications when they need to tweak things like port redirection or socket handling. If you need to use port 8080 when the application expects port 80, or you need to work with a socket and the middleware only exposes an http abstraction, eBPF makes it easer to create workarounds — without needing to write a custom driver and get it signed.

That’s something a lot of organizations with Windows servers will find useful.

The post Microsoft Brings eBPF to Windows appeared first on The New Stack.

Modules are how you organize code into self-contained chunks that you can reuse in different codebases and import as necessary. JavaScript didn’t have a standard module system until ECMAScript 2015 (with support for ES modules arriving in major browsers by 2018). Up until then, developers using JavaScript for larger and more complex development turned to community module systems like AMD (Asynchronous Module Definition) plus RequireJS, Webpack, or CommonJS (the module system in Node.js).

The ES module system was developed to be an approach that both AMD and CommonJS users would be comfortable with. Node.js added experimental support in v8.5.0 in 2017, with stable support arriving in mid-2021 in Node v15.3.0. Importantly, that gave Node developers access to the full range of published modules.

“ES modules was wildly successful as an authoring format,” TC39 co-chair Rob Palmer told The New Stack. “It’s the dominant form [in] which you see source code being authored, if you’re on GitHub. It will all look like ES modules.”

Most of the World Runs on CommonJS

Despite the popularity of ES modules for authoring packages, when those modules get executed, they may actually get compiled (either by Node or TypeScript) and run as CommonJS.

“Most of the world is still running on CommonJS, even if it looks like it’s being authored in ES modules,” Palmer explained.

By default, Node.js treats the JavaScript code you import as CommonJS modules, for backward compatibility. That means developers who think they’re writing ES modules actually aren’t, Palmer warned, meaning they don’t get access to ES modules features like the new top-level await coming in ECMAScript 2022. “That’s an ergonomic feature that most developers really love,” he added.

ES modules enable some other important Node.js features, like package exports — where you can encapsulate a package so that developers only get access to specific entry points, rather than being able to call anything in the package. This means that someone could take a dependency on the way a feature is implemented, which might change in a later version.

“Previously it was a free-for-all in Node packages,” Palmer said. “You could just load any arbitrary file inside the package you like: just write the directory and the file name and you can load it. Whereas now, with package exports, you can say no, only these particular entry points are the ones that the public are allowed and everything inside is private to that package. This is a key piece that’s needed for the ecosystem and it’s one of the really wonderful treats we’ll get once TypeScript has support for Node’s flavor of ES modules.”

A Challenging Change

Supporting developers through the transition to ES modules has been “an ongoing pain point,” explained Daniel Rosenwasser, senior program manager for TypeScript at Microsoft. It is the complexity that developers will have to deal with, whether or not they use TypeScript.

“JavaScript developers have built up a lot of expectations of how things should work based on their experiences with CommonJS, bundlers, and browser ES module support,” he said, “and things don’t work that way in Node.”

He described CommonJS as “extremely convenient to use, but ultimately different from what browsers needed”.

With its focus on developer productivity, adding ECMAScript module support for Node.js in TypeScript seems like an obvious step. It was originally planned for TypeScript 4.5 as module: node12 and moduleResolution: nodenext modes, that would either match Node.js behavior or use TypeScript features to deliver the same functionality.

“To provide a bridge for developers, we’re experimenting with their support for ES modules and seeing where we can reduce some of the issues developers may run into,” Rosenwasser said.

But the new mode wasn’t ready for TypeScript 4.5, partly because of complexities in the ecosystem of tools like Deno, ts-node, Webpack and Vite; and partly because it’s already too easy for package authors to misconfigure packages in ways that make it hard for developers to work with them. So the TypeScript team doesn’t want to make that worse. There’s also some debate about whether the default for using the new mode should be node12 or nodenext, since Node 12 doesn’t have top-level await.

“Now it’s time for TypeScript to support this, because the rest of the ecosystem has adopted it and people want access to the features that Node has provided here,” Palmer told us. “But it’s really challenging to do that. This is nothing to do with TypeScript; the challenges are inherent in the compatibility between the old format and the new.

The Difficulties of Making the Two Systems Interoperate

The two module systems don’t just have different syntax, they also work rather differently.

“The root of that conflict is that CommonJS is synchronous,” Palmer continued. “When it runs, it just runs straight line code: there’s no gaps, there’s no time delays. ES modules supports asynchronous work, where you can split your work up over time.”

That’s important, because when you write code that will run in a browser, you have to be able to make sure that the rest of the page isn’t held up waiting for a long-running script.

Because they’re synchronous, CommonJS modules are parsed dynamically during execution — so you can call modules from inside an IF statement, because the dependency graph will be built while the program is running. Because ES modules are asynchronous, the interpreter builds a dependency graph before they run, which means that it can optimize the code (so code that won’t be called isn’t included). Note: tree shaking to remove unnecessary code is possible with CommonJS, but bundlers don’t usually do it by default.

There’s an additional complication in the way module files are named. The .mjs and .cjs extensions make it explicit whether you’re using ECMAScript or CommonJS modules (the equivalent for TypeScript would be .mts and .cts). However, they’re optional because some developers feel strongly that JavaScript code should always use .js as the extension, even for modules.

Where TypeScript Can Help

With the new Node.js ES module support, TypeScript will understand these extensions. But if developers don’t use them, TypeScript will need to figure out if a file of code is a module or not, as well as what import syntax to use. Picking the wrong one could end up fetching a different file from the package, making it tricky for the TypeScript team to choose the right import default. If it’s complex for the people designing TypeScript to understand what the right behaviour will be, explaining to developers who aren’t experts in module systems what they should be doing will be even harder.

Those differences add to the difficulty of making the two systems interoperate. So far, the new module support is only available as an experimental flag in the nightly builds of TypeScript, so developers have to explicitly opt in to it.

“People would love TypeScript to just solve this,” Palmer noted. “In some ways, it’s an unfair expectation that TypeScript will provide a magic solution, because it’s not possible for them to play that role entirely.”

But despite all of this complexity, he praises the way TypeScript is tackling support for Node.js-style ECMAScript modules. “What I really love is that they started to implement this and then, because they’re famous for developer experience, they found [that] when people try to use this […] the experience is not what they hope it to be.”

“It’s maturing and feedback is going in and they’re responding,” Palmer concluded. “But this is a real example where they didn’t just throw it out the door; they’re working very hard to make up for problems they did not create, problems that some other part of the ecosystem created.”

Conclusion

What TypeScript is hoping to offer is a large feature that will deliver a lot of the improvements developers have been asking for. The JavaScript community is so broad that there continues to be debate about what TypeScript should do here, so work in the broader ecosystem of JavaScript tools to support the new module options is ongoing. But many of the issues that led to postponing the feature from TypeScript 4.5 have been addressed, and the current plan is to include it in TypeScript 4.7 (which is due in May).

Currently, it’s just too easy for developers to get things wrong when configuring ES modules in Node.js. With the extra discussion that resulted from the delay, what ships should be a feature that makes working with ES modules easier to understand and debug.

The post Can TypeScript Help More Developers Adopt ECMAScript Modules? appeared first on The New Stack.

Mobile connectivity has powered an app revolution on smartphones and the Internet of Things for consumers and industry — but the mobile networks that deliver that connectivity are complex, proprietary legacy systems that are slow to develop and update.

For mobile operators and telecoms networks, opening up the radio access network — through emerging 5G mobile networks — is a chance to take advantage of the cost savings, virtualization, automation and interoperability that storage, networking and compute stacks have already achieved by switching from monolithic proprietary hardware to modular, software-driven approaches using off-the-shelf commercial servers.

Fundamentally, Open RAN is a giant step forward. It uses an open, software-driven architecture to put a generic compute layer into the special-purpose telecoms stack, which means operators will be able to automate and upgrade networks more easily, roll out new technology faster and offer the kind of services that make 5G more than just a faster mobile phone network.

Getting the Names Straight

The idea of Open RAN is that instead of cramming everything into the limited space and power available at a cell tower or base station, you split the mobile base station into some well-defined parts that communicate over open, defined interfaces.

That means you can strip the mobile base station down to just the antenna that receives and transmits the radio signal (plus a power source) with the compute to process those radio signals done on off-the-shelf hardware, maybe even in a nearby data center or in the cloud.

That makes the hardware easier to maintain and manage, and to scale up when you have more traffic. Operators can just allocate more processing power instead of trucking in temporary base stations to erect at festivals and sporting events when a location that’s usually almost empty suddenly has thousands of users and devices watching videos and uploading images,

With more space and power, it’s cheaper to offer edge computing services that you want to run as close as possible to the source of data for low latency and fast response. Open RAN may also allow operators to deploy new generations of mobile networks on existing hardware rather than having to roll out new systems each time.

Plus, 5G and 6G depend on having enough base stations to provide coverage and deliver hardware bandwidth and low latency: if Open RAN can create smaller, lower power base stations that can be deployed in more locations, it might speed up network rollouts.

Technology Definitions

But talking about Open RAN quickly gets confusing because as well as the general concept of the technology, it can also refer to some specific specification and groups. Unless you’re actually building and running a telco network you won’t usually need to know the difference and they’re sometimes used interchangeably, but getting it clear can make it easier to understand what a particular product, service or network is likely to deliver.

RAN stands for Radio Access Network; it’s the base stations and other infrastructure that delivers the “Air Interface” that connects phones, IoT devices and anything else that gets a cellular signal to the core network, with access to services delivered by the operator — and internet connectivity to use online apps and services.

O-RAN refers to the O-RAN Alliance (and that sometimes gets shortened to ORAN): an industry forum of vendors, mobile operators and researchers aiming to push the RAN industry towards “more intelligent, open, virtualized and fully interoperable mobile networks”.

Open RAN, which can confusingly also be shorted to ORAN, refers to the open specifications for RAN architectures published by the O-RAN Alliance (and it’s often used as shorthand for the general idea of these more open networks).

The point of these open specifications is to create interoperable building blocks. “If we don’t build every building block unique for every different service provider, there’s a gain here,” Jack Murray, co-chair of the O-RAN Software Community technical oversight committee, told us. “As we can agree on reusing more code, more APIs, more toolkits, more agreement where the networks touch each other we get exchange points, which become very valuable and enable growth and opportunity.”

This also takes advantage of the way open source unlocks community innovation. “What’s nice about open source projects is the community keeps adapting as the needs grow and bring more functionality based on what people need,” he noted.

That’s why the O-RAN SC is a joint project from the O-RAN Alliance and the Linux Foundation, focusing on the open source projects like ONAP, Anuket, Magma and others that deliver a lot of the Open RAN concepts — although there are also some projects with FRAND-licensed IP in the community and Open RAN implementations can include proprietary code for specialized functions.

Courtesy of The Linux Foundation

OpenRAN (without a space in the middle) means the OpenRAN Telcom Infra Project, a group of operators and vendors working on the implementation of interoperable products from multiple hardware and software vendors with the goal of making telecom infrastructure more affordable in order to reach more of the world’s population — using the open RAN standards defined by O-RAN in the ORAN framework.

VRAN or Virtualized RAN, refers to taking the networking and data processing baseband processing functions that used to be implemented in proprietary hardware (usually using ASICs for performance and efficiency), rewriting them as software and virtualizing them on standard server hardware. As with cloud, separating the data and control plane means you can scale them independently, and move functions that aren’t real time or very sensitive to latency into centralized servers where you can increase utilization.

Cloud RAN (C-RAN) is a VRAN that’s designed to be cloud native so it can take advantage of microservices, containers, continuous delivery and other familiar DevOps techniques. That would let network operators take advantage of common cloud patterns like offering canary deployment to a certain percentage of users to test the new version of a service.

A VRAN is one of the ways you can implement the standardized, open interfaces between the different components of the RAN that ORAN describes — but VRAN doesn’t have to be open or multivendor, and Open RAN doesn’t have to be virtualized.

“VRAN is about making the RAN much more software-defined and programmable and shifting functionality from hardware to software,” analyst Richard Webb of research firm CCS Insight explained to the New Stack.

VRANs and C-RANs focus on disaggregating the network; Open RAN is about interoperability, with a set of standards focused on open networking that define profiles and interfaces for accessing functionality inside the RAN. Think of the interfaces as APIs between the different parts of the RAN and the outside world.

That means apps and services can use information that used to be locked inside the proprietary network stack, either to make the network work better or to take better advantage of the network.

The Benefits of Open RAN

Network operators like the idea that moving away from proprietary stacks from a single vendor to an open ecosystem where they can mix and match components creates a competitive environment.

But just focusing on the potential for reducing prices misses the point, Webb warned: “Open RAN is really about the diversity of functionality in the RAN. It should mean you’re engendering a more flexible, highly available network.”

Breaking the RAN up into disaggregated, software-defined pieces means you can put more features from different suppliers in the network, including smaller players with expertise in specific areas.

As well as the open interfaces, the big piece Open RAN brings compared to vRAN is a RAN Intelligent Controller (RIC) that can make the system smarter and far more automated by using the data that was previously locked inside the RAN for analytics. Not only can operators choose a RIC from a different vendor from the rest of the base station: the RIC itself may let them pick different functionality from something like an app store.

Courtesy of ORAN Alliance

The first thing the network data will be used for is enhancing the performance of the RAN itself in near-real-time, improving spectral efficiency and reliability.

“RANs are complicated environments,” Murray pointed out. “If you’re close to the cell, far [away] from the cell, overlapping cells… With multiple spectrum, now that we can hand off different users into different bands, the management is becoming much more complex.”

There are multiple spectrum bands to take into account for different flavors of 3G, 4G, 5G and (in future) 6G, plus other networks to interoperate with like Wi-Fi 6E and 7 (plus LoraWAN, Zigbee and Z-Wave in some scenarios). There might be interference, congestion or an increase in different parts of the network. With 5G, radio resources will be assigned and reassigned from millisecond to millisecond.

Diagram courtesy of Juniper Networks.

With 5G, operators can deliver multiple virtualized, isolated logical networks on the same physical network infrastructure: one for gamers who need low latency and high bandwidth alongside a network for regular phone users and another for IoT devices where low power is more important. They might also want to prioritize some users, like hospitals, schools and first responders who need guaranteed quality of service.

This network slicing makes it easier to support the specialized needs of different groups of users, who care about different network characteristics (and are often looking for very different price plans).

“If you can tune the network, it’s better for the user and for the provider, because we’re matching the right resources to the right group of needs, rather than treating a remote sensor that has a very low duty cycle like a phone,” Murray said.

Operators can use the real-time information on RAN resource utilization to create and maintain those network slices “powered by AI-based algorithms used for traffic routing, location prediction, channel quality prediction and user selection,” Cristina Rodriguez, vice president of Intel’s Network & Edge group, told us.

You need a lot of data about what’s happening in the network to manage so dynamic a network and do more granular traffic management, but you also have to handle that data efficiently, Murray warns, “because otherwise, your network becomes all about the data and not about the service that you’re providing.”

Some RIC workloads don’t need to be so close to real-time and Open RAN can give network tooling access to data without having to move large amounts of data around. “With the disaggregated architecture where we now have a distributed near-real-time direct RIC versus a non-real-time RIC, you can put analytics close to where the data is generated and not have to move the data as far.”

Open RAN for Developers

Implementing the dynamic network relies on automation, another cloud native concept that’s new to the operator world where the networks aren’t just proprietary but heavily customized and updates are slow, complex and scheduled many months or even years in advance.

Open RAN promises remote automated software installation, zero-touch provisioning, test and validation, rolling updates and upgrades using CI/CD pipelines, and ultimately, self-healing and self-optimizing networks using telemetry, analytics and other AIOps features, with many solutions relying on the Kubernetes ecosystem for configuration and orchestration.

A lot of that automation and intelligence will come from machine learning, Rodriguez suggested. “A cloud native, software-based RAN architecture [like Open RAN] is ideal for integrating AI and machine learning, which allows operators to automate operational tasks in their networks to make them more power, resource and cost-efficient.”

Operators can then use that intelligence to offer extra services and SLAs to enterprise and high-end consumers in ways they couldn’t with 4G, she suggested. “Organizations such as media providers, factories, retailers, healthcare systems, and smart cities can benefit from guaranteed throughput, latency, reliability and increased quality of experience.”

“An automated factory assembling car parts requires ultra-reliable, low-latency communications to ensure that it meets production targets without costly downtime. With AI-powered networks, operators can bolster delivery of performance-related SLAs and sell that as a service to the manufacturer.”

But developers can also use the RIC data themselves, Webb noted. “When you’ve got that RIC, which is predicated on disaggregated software functionality, then you’ve got a stable platform for third-party apps to get access to network functionality, and then do their thing. App developers want access to that connectivity; they need that network to access edge computing, to access AI and other functions and the network is the route to that.”

In the past, network operators tried to compete with cloud providers by offering their own services or encouraging app developers to create apps specifically for their network but have usually failed because they don’t have the scale or the skills.

This time, Webb said, operators want to create a rich sandpit where app developers can leverage network assets (and use cloud technologies to work with them) to create network-intelligent apps that can use information from the mobile network.

“They want their network to be at scale, high capacity, ultra-low latency, but also a foundational platform for access to edge computing and processing power, to AI and machine learning and other capabilities that 5G and other network technologies will connect them to. And then they step back and said, OK developers, we’ve given you this playground, go play.”

Near-real-time response, guaranteed reliability and privacy will be important for industry applications like robotics and autonomous vehicles as well as manufacturing automation and quality assurance, and for safety applications like smart highways as well as AR and VR.

Smart cities will need the massive machine-to-machine communications of low-power sensors and IoT devices, as will pipeline monitoring. Not only will Open RAN make it easier to deliver basic connectivity to a stadium or racetrack when it’s packed with fans, but it will also let the network offer enhanced mobile broadband with the higher data rates that VR and video streaming need.

Apps and services may be very specific to different industries, Webb suggested. “The healthcare app development community will be able to leverage that connectivity and access to functionality to create apps that healthcare cares about, and the same for smart manufacturing or smart city sectors. It’s about that specialization and those vertical markets having enough access to both the technology and the scale of networking to make them rich environments for the specialist communities to develop what those specialist markets require.”

Imagine a parking lot that used the compute possible with Open RAN to update self-driving cars with the specific details they need to park efficiently on that specific lot without needing to communicate with every other car trying to park there at the same time. That’s more convenience for drivers, and more business for the parking lot operator, since you can fit in about twice as many cars if they’re self-parking.

Or you could digitize and entire road using information from cameras and sensors to create a digital twin that lets you treat it as an API that apps can call or infrastructure can use to adapt in real-time to the current situation. Custom object recognition models can use camera feeds to find safety hazards like debris on the road or vehicles that have broken down: which could trigger the lane to get closed and update warning signs.

Ferrovial is building a smart road system using Azure Public MEC services with a Kubernetes cluster deployed on Azure Arc to do that with both public and private 5G networks; the first roads will open in Virginia and Texas this year.

With Open RAN that compute can have the lowest possible latency, so the lane closure happens in time to avoid an accident and the system can provide other real-time services like dynamic traffic management or in-road charging for electric vehicles.

Barriers to Open RAN

Open RAN has a lot of benefits, but it’s also a major change in how operator networks are designed, built and maintained. Telco networks are complex, demanding, real-time environments where low latency and reliability are crucial.

The move to 5G means new architectures that bring an opportunity for a more disaggregated RAN but the quality, efficiency, reliability and resiliency have to match what you can get from proprietary, integrated RAN stacks, Murray warns. “Spectrum is a critical resource that costs a lot of money and a lot of the stack is about optimizing spectral efficiency. You have a very large user base that’s very demanding.”

“People rely on these networks: cars rely on them, safety personnel rely on them. It’s very hard to turn things off and migrate everything going forward.”

Some networks have already deployed Open RAN commercially at greenfield sites where they don’t have to deal with interoperability with legacy systems or in rural areas with less network traffic (so RAN performance isn’t as critical).

Many more have made commitments to using it in a significant proportion of their network, including those looking for alternatives to Huawei, for example in the UK where the vendor is banned from the 5G telecoms network and the government wants a third of mobile network traffic to be carries over Open RAN by 2030. Dish is running Open RAN and 5G Core software on Amazon Web Services (plus AWS Outposts in its own network) to build a new 5G network far more quickly than usual.

But Open RAN covers a large number of specifications and not all operators are implementing all of the interfaces or including the RIC. That might be because of the cost and complexity of integrating Open RAN into existing networks, concerns about a more complex architecture where they may not have the right skills and can’t turn to a single vendor for support, or the lack of edge data centers with fiber connections to cell towers for moving compute out of the base station. Reference designs, blueprints and integration testing will help, as will the speed at which hyperscale cloud vendors and hosters are rolling out edge compute. But there are also questions of maturity — the RIC is what really differentiates Open RAN but is still very basic in many cases — and hardware performance.

Existing commodity hardware may not always be suitable for Open RAN and as many RAN network functions have only recently been virtualized they aren’t always optimized and can take up most of the cores in a commodity server that operators are hoping to offer as a compute platform. Arm, Intel, Qualcomm, Marvell, Xilinx and other silicon vendors are building new chipsets and accelerators to deliver both the performance and low power consumption that traditional, physical RAN gets by using specialized chipsets like ASICs.

The hardware situation is better than it was 12 months ago, Webb suggested. “Intel and the like have had to respond to the sheer scale and workload and processing requirements for [servers] that are going to be in telco environments, versus say an enterprise environment because the deluge of data that’s going over those networks is an order of magnitude bigger.”

“They’ve had to roll their sleeves up and say ‘we need to think bigger in terms of what we’re offering to operators for their network processing requirements if they’re going to virtualize their networks.’”

Network virtualization is a hard problem. AT&T — which is perhaps furthest along with some three-quarters of its core network functions virtualized — decided in 2021 to move its 5G core software to Azure, in exchange for Microsoft buying and running its Network Cloud platform that the AT&T 5G core network runs on.

Open RAN will require even more infrastructure, automation, development and cloud native expertise that is rare in telecommunications whose know-how lies in other areas. It’s worth network operators making the effort to gain the skills for this new approach because they may need the applications and services Open RAN can enable by providing a distributed edge compute option within the network to truly deliver the promise of 5G and 6G.

But Open RAN is also important for the private LTE and 5G networks larger organizations are building for their warehouses, manufacturing plants and other locations where they need more flexibility than a wired network but with lower latency than Wi-Fi can offer.

“If a factory wants to go smarter and untether its production robots and connect them via 5g so they can be moved around different parts of the production floor, Wi-Fi can’t handle that kind of capacity and mobility,” Webb explained. That requires not just connectivity but AI and other processing that depends on the kind of low latency edge compute Open RAN can deliver in the network.

The pre-integrated solutions and appliances that familiar vendors like HP and Intel are offering for Open RAN with 5G specialists like Rakuten and familiar enterprise network partners like Juniper, often in conjunction with cloud providers like Azure and AWS, will work well here to simplify network integration, while still giving developers access to the low-latency edge compute platform Open RAN promises. Mobile operators are starting to create Open RAN-based small cell appliances that can connect to their network or a private 5G network.

There are around a thousand private LTE networks globally (and perhaps another 300 on 5G) but that could increase tenfold in the next few years. In fact, he suggested, private networks and small cell deployments might prove to be “the backdoor through which Open RAN goes mainstream for operators because they see it being so successful in those specialized enterprises and industrial environments”.

The post Not Just for Telcos: Open RAN Opens 5G Networks to Devs, Enterprises appeared first on The New Stack.

Earlier this year Mozilla decided to stop development on its mixed reality browser. Rather than shuttering the project completely, it passed the source code to open source consultancy Igalia, which is using it to build the Wolvic browser. If you’ve been following browser and JavaScript development closely, then you may know about Igalia. Chances are though, you’ve never heard of them. Yet you’re almost certainly using something that Igalia helped build.

That includes big-ticket items like CSS Grid and dozens of other improvements, optimizations and even foundational web features.

Key Igalia Contributions

Igalia was involved in the arrow functions and destructuring that were standardized in ECMAScript 2015; major features now used universally. It worked on generators and the async functions in ECMAScript 2017, that offer cleaner, less verbose execution for what developers previously had to do with manual promise chains. It also worked on async await (which Igalia implemented in V8 and JavaScript Core for WebKit) and top-level await.

For BigInt, Igalia was involved in the spec and testing and implemented the feature in both SpiderMonkey and JavaScript Core. Igalia contributors are working on Class Fields, a long-awaited approach that will make plain JavaScript classes powerful enough to express the constructs developers currently need for internal proprietary class systems for, the “universally adored” Temporal replacement for the JavaScript Date field, and more speculative features like type annotations and erasable types. It’s also on track to finally produce a MathML Core specification that browsers will adopt, resolving a process that predates the W3C.

“Igalia is the premiere web standards consultancy and their mission is to improve the commons.”

Rob Palmer, Bloomberg

In 2019, Igalia was the second largest contributor to both Chromium (after Google) and WebKit (after Apple), as well as a major contributor to Mozilla’s Servo. “Igalia has contributed to many efforts in the web platform, including moving standards forward, implementing missing features, and fixing bugs that positively impact both web developers and browser users,” the Microsoft Edge team told us when we asked how a browser maker views their work.

It’s not just browsers. The consultancy is also involved with projects like Node.js and Wayland, and Igalia’s work also shows up on the Steam Deck because of its contributions to graphics libraries like Mesa and Vulkan.

But who is Igalia and how can it make such significant contributions to the web (and related platforms)?

Expertise and Connections

“Igalia is the premiere web standards consultancy and their mission is to improve the commons,” said Rob Palmer, head of Bloomberg’s JavaScript Infrastructure and Tooling team and co-chair of the TC39 ECMAScript standardization committee.

It’s not a typical consultancy and much of its success comes from how different it is: “We are a worker-owned cooperative,” explains Brian Kardell, a developer advocate at Igalia known for his work on the Extensible Web Manifesto and HitchJS. “We have a flat structure. There are no bosses, there are no shareholders. It’s our lives, our company and we want to work on something that is valuable.” For Igalia, that means focusing on open source and free software almost exclusively, and on filling gaps: “we try very hard to improve what we think are failures in the status quo and create a system that is healthier for everyone”.

Although the company is based in Spain and the pay may not match Silicon Valley, being able to work fully remote on technology they view as significant allows Igalia to hire an almost unique combination of experts.

“We have a flat structure. There are no bosses, there are no shareholders. It’s our lives, our company and we want to work on something that is valuable.”

Brian Kardell, Igalia

“Because the mission is so attractive, you get top tier candidates, people who have worked directly on the engines for the browsers and other projects but choose to work for Igalia because they believe in that fundamental mission to improve the web and improve the commons for all,” Palmer suggests.

Calling Igalia influential and well respected in the browser development community is almost an understatement. In recent years, a number of senior developers have moved to Igalia from the browser engineering teams at Apple, Firefox, Google and other projects, giving the company expertise in codebases like WebKit, Gecko, Servo, SpiderMonkey, V8, Chromium and Blink; along with excellent connections to those projects, often with commit rights and membership of organizations like Blink API owners (which makes decisions about which developer-facing features become available in Chromium).

That means Igalia has the technical ability to work on significant features (which isn’t necessarily rare) and can also help get the code to deliver them into multiple browsers at almost the same time (which is rare).

“Igalia brings expertise in standardization,” Palmer explains. “Consensus building, having the relationships and the expertise to engage and to make forward progress, which is a very tough thing to do in this world because we’re trying to get many disparate parties to all agree. But also, they’re not just doing the standardization, they’re also doing things like implementation and test: the full end to end story of what is required.”

All the major web browser engines are open source and, in theory, anyone can contribute to the underlying projects. But not everyone can invest the necessary time; plus, those projects have a core group of maintainers who decide what code goes into them. “For Chromium, the Chrome API owners have to agree that it’s something that largely fits the architecture and principles of the web,” Kardell points out. “Not every contribution would be accepted.” But Igalia’s contributions almost always are.

“We have expertise. We belong to all the standards bodies, we have good relationships with people in all the standards bodies, we belong to a lot of working groups with members who are actively involved and we do implementation work. We are core contributors, API owners, reviewers for all kinds of things in all those browsers,” he explains.

Open Source without Burnout

Part of what attracts browser engineers with this level of expertise is Igalia’s funding approach, which avoids common problems of burnout and unsustainable business models, Kardell says.

“Open source is great in many ways. You can take software and try it out, inspect it, you can mold it and fork it and help evolution happen. You can create a startup very quickly. There are all kinds of things I love about open source, but what I don’t love is that it can become a source of burnout and non-compensation.”

“There are all kinds of things I love about open source, but what I don’t love is that it can become a source of burnout and non-compensation.”

Brian Kardell, Igalia

Igalia does work directly for paying clients, encouraging them to use open source and contribute the technology it builds to the commons. It also works with sponsors like Bloomberg, Salesforce and the AMP Project (which is part of the OpenJS Foundation). And most recently it experimented with fundraising from smaller organizations and individual web developers, to have the web community rather than a single paying client drive the implementation of a missing feature.

Even organizations that don’t sponsor any work through Igalia welcome its contributions. “We believe that the evolution of the web is best served through open debate from a wide variety of perspectives, and we appreciate the perspective that Igalia and others bring to standards discussions and the Chromium community,” Microsoft told us.

Unblocking Progress for the Commons

A single organization might sponsor a feature but that ends up with something that’s useful for a lot of web developers, even — or especially — when the different priorities of the browser makers mean there hadn’t been significant progress before.

“We helped unblock container queries, which was the number one ask in CSS forever,” Kardell told us. “We unblocked has(), which is now in two browsers.” The has() selector had been in the CSS spec since the late 1990s and was also a top request from developers, but it was a complex proposal and so browser makers were concerned it would affect performance. Kardell tried to make progress on it in the CSS working group: “every year or two I would say ‘let’s do something about this, it’s a big deal’ and we just could not make it happen.”

When Eyo, the company behind AdBlock, sponsored Igalia to work on it so they could use CSS for their rules, they were able to get past what he terms a ‘nuclear standoff’. “With a little investment and research showing that it could work, and it could be performant, once we did that Apple said ‘we can do that’ and they did it and in fact they landed it already.”

Some browser engineers say that if it wasn’t for Igalia, CSS Grid might not have become widely available.

It’s a similar story with CSS Grid, which lets developers achieve much more advanced and custom layouts than Flexbox: Palmer calls it “a huge feature that’s loved by developers”. But some browser engineers say if it wasn’t for Igalia, it might not have become widely available. Microsoft started work on what became the original CSS Grid Layout specification, shipping the prefixed version in IE10 in 2012: Google started to add support for CSS Grid to WebKit in 2011 but then forked WebKit to create Blink in 2012, while Mozilla didn’t adopt it because it was focused on its own XUL grid layout.

Bloomberg uses web technologies both for serverside operations and rendering on the Bloomberg terminal, which Palmer describes as “a data-intensive real-time rendering system that really pushes the limits of Chromium”; in 2013, it sponsored Igalia for a multi-year project to work on a new approach to CSS Grid, which it implemented in both Blink and WebKit.

“It’s in our interests, to truly become successful, for us to build amazing fast and rich applications for our users,” Palmer told us. “But when we can do more [with web technologies], the world can do more as a result. We run into bottlenecks that we find are worth optimizing that maybe not everyone runs into, and when we fund those optimizations, everyone benefits, because everyone’s browser goes a little bit faster.”

“If there is any uncertainty about whether there is demand, about whether everyone will step forwards together, we can help provide that push. We can say ‘these two browsers are moving ahead [with a feature] because it’s the top of their priority list and this one is not, so we should fund the one that is behind, we should fill that gap’. And by achieving that completeness, everyone moves forward.”

He refers to the work Bloomberg and Igalia do as “pipe cleaning a process,” because it isn’t just getting a new feature into browsers or the JavaScript runtime: Igalia also works on the toolchain required to use it and develops test suites to help drive interoperability between different browser engines. Sometimes it can also lead to more significant features in future.

BigInt in ECMAScript was a sponsored improvement that Bloomberg wanted for working with numbers bigger than can be expressed with IEEE double precision variables; BigInt means they can ergonomically pass those around. But the precedent of adding a new numeric type to JavaScript may make it easier to add the decimal numbers everyone uses in daily life. Bloomberg wants that because financial market data is often supplied as 64-bit decimal numbers, but it would also help any developer who finds simple arithmetic — like adding up 0.1 and 0.2 (which doesn’t equal 0.3 in any language that uses IEEE numbers) — counterintuitive in JavaScript. “This would solve one of the most frequently reported problems with the language,” Palmer suggested.

Resources are Finite, Prioritization Hard

It’s clear how important Igalia’s contributions are to the web platform, but there’s sometimes confusion over why they come from Igalia — although the occasional misunderstanding or controversy is often for political rather than technical reasons. It may seem odd that, for example, both Google and the web community effectively pay Igalia to work on features in WebKit that Apple hasn’t prioritized. While Apple has been hiring well-respected developers to expand the Safari team and is now adding key features after a period of underinvestment, it’s also salutary to note how many more web platform features (both experimental and stable) are unavailable only in Safari.

Historically, browser makers like Apple, Firefox, Google and Microsoft have acted as what Kardell terms “stewards of the web,” with pressure from the broader web community pushing them to implement W3C standards. But while the commons of the web has become fundamental to systems far beyond the browser, in everything from TVs to cars, adopting those standards is still completely voluntary.

Different browsers have their own different priorities — and even the largest budget has limits.

“It’s not great that we’ve set up a system in which everything is dependent on the completely voluntary budget and participation of what is effectively three organizations. It’s great that we’ve gotten it this far: it’s open and we have multiple contributors.” But different browsers have their own different priorities — and even the largest budget has limits.

With the web platform being at least as large and complex as an operating system, building a browser takes a wide range of expertise. Inevitably, even though browser makers want to be competitive by pushing the web platform forward (or at least not being the last browser to implement a feature), their priorities and commitments dictate what gets implemented and what doesn’t.

The strength of the W3C is the breadth of who is involved beyond the browser makers — there are over 500 members, although many are involved with a single working group rather than contributing broadly — but that also leads to what Kardell characterizes as “potentially long, difficult, incredibly complex discussions, that can take an extraordinary amount of time from your limited resources”.

“A lot of things just don’t move forward because implementers are in the critical path, it’s completely voluntary, and it’s independently prioritized by them. Getting all those stars to align is really, really, really hard.”

That’s the problem Igalia is so good at unblocking.

Cross-Browser Compatibility

Most web developers care less about the priorities of individual browsers and more about not relying on features that aren’t supported across all browsers. Normally, Palmer notes, “new features turn up in all the browsers and that’s what makes things wildly adoptable and it’s easy to think that this is a natural flow — a fountain of features where the platform just gets better and all by itself”.

Actually, it takes a lot of hard work and funding and time: not just writing the code, but getting it reviewed, tested for compliance, put through QA and accepted into multiple codebases.

“It’s almost a superpower that Igalia has,” says Palmer: “to work across browsers and help everyone move forward in consensus-based lockstep.”

That’s something individual browser makers, with their individual priorities and expertise in their own specific codebase, find difficult to do.

“If you come to us and you have a reasonable case, if we think there is some ‘there’ there that we can help you with, then you can pay us and we can help you,” Kardell explains. “We can be the implementor that you need to have to move the conversation.”

“It’s almost a superpower that Igalia has, to work across browsers and help everyone move forward in consensus-based lockstep.”

Rob Palmer, Bloomberg

Even if a feature is a high priority for all the browser makers, it can also be more difficult to implement a feature in one browser than it is in another: “what it will cost to do it for Chrome isn’t what it will cost to do it for Safari and isn’t what it will cost to do it for Firefox,” he notes. Standards require multiple implementations, which means a significant commitment from multiple browser makers, which is where some proposals get stuck.

The shortage of people with the deep expertise to build browsers results in the kind of nuclear standoff that held up has(), he explains. “Where there’s something that’s going to be hard and potentially expensive and we don’t know how valuable yet because we haven’t had the discussion, we just know we can’t afford to do it because doing it means not doing something else. So it gets to where nobody’s willing to be the first one to pull the trigger and you have these things that linger for lots and lots and lots of years. They can’t get past go. But once someone gets passed go, suddenly people are like, ‘okay, I guess we’re going to have to figure this out’ and Igalia plays that role sometimes.”

In some cases, a feature is important for one particular use case — like embedded systems — and mainstream browser makers don’t see it as a priority even though they would benefit from it.

While Apple controls the way WebKit powers Safari, WebKit-based browsers on PlayStation, Epiphany and embedded devices like smart TVs and refrigerators, digital signage and in-vehicle displays use WPE WebKit, which Igalia maintains. Appliance makers like Thermomix (which uses the embedded browser for the screen of its smart food processor) and set-top box manufacturers come to Igalia for help with it; and their investment has driven major improvements in Canvas and SVG hardware acceleration.

Despite having developed for the web since the mid-90s, even Kardell didn’t expect JavaScript’s Off-Screen Canvas to be relevant to him. “The number of times that I have ever professionally programmed against Canvas is zero — but I use Canvas every single day without realizing it and I have used libraries that use Canvas to do things.” Maps, blob databases and Google Docs all use Canvas and the way Canvas blocked the main thread, so everything else in the browser was interrupted while you pan or zoom, might be bearable on a high-end device, but was a significant problem for performance on resource-constrained embedded devices. Fixing that improves the experience for everyone.

That’s a clear example of why prioritizing features in browser development is so hard, he suggests. “When you ship Off-Screen Canvas, a whole bunch of the world will say: why don’t you do this instead? This is clearly more important — but the problem is it’s all the most important.”

Who Should Fund the Web

Rather than letting anyone “buy a standard”, sponsorship is a way to get responsible development of features that browser developers are asking for that involves collaboration and co-design with different browser makers and thorough testing with developers, without expecting developers to work for free.

Kardell understands the concern because he felt it himself before learning more about Igalia, but he’s clear that it doesn’t work like that. “If we agree to work with you, it’s because we think there’s a chance of us helping you do something valuable. What you can buy is us championing [your feature] and the priority of someone who has implementer experience and implementer credibility, who has the right skills and ability to help move that forward.”

“They don’t just do anything that is asked of them: they consider the impact, whether it is good for the community, whether it’s the right thing for the platform,” Palmer agrees.

“Because all the work is open anyway, you can’t just subvert it by saying ‘I want my pet feature in the web platform’. It always involves going through that consensus-building committee process.”

In fact, this is an advantage of having an open ecosystem rather than centralized decision-making, he suggests. “You can spin this either way. On one hand, you can say, why is the trillion-dollar company not moving things forward themselves? But the other way of looking at it is, wow, these browsers are open source and we’re able to contribute the features that we want.”

“This is the opportunity given by open source, let’s celebrate that. Let’s encourage more people to get involved and contribute, let’s encourage more people to fund that style of development, because it means that then the priorities can be more and more set by the community and a large, wide base of developer interests.”

“Companies like Igalia can help bring attention to new customer problems that aren’t already being discussed by browser vendors.”

Microsoft representative

Having Igalia work on a particular web feature doesn’t guarantee that it will happen but it’s a signal to browser makers that the feature is worth taking seriously. “Companies like Igalia can help bring attention to new customer problems that aren’t already being discussed by browser vendors,” Microsoft told us.

In a way, Igalia can act as a filter for all the requests that browser makers get, Kardell suggests. “The trouble with being at the core of everything in the whole world is that everybody can see very clearly the problem that they have, and they send it into the bucket — but the bucket is the size of an ocean.

He also hopes the Open Prioritization experiment can help with highlighting what organizations like Igalia should work on. The idea came from the question: why do we need single, very, very rich companies to fund something? “It would be great if we had diversity of funding that would help the web last, that would help it reach its potential.”

That could be smaller companies or working groups or even individuals. “It could be all of us or a few of us that sponsor the work and unblock it and make the thing happen, and then we control the priority.”

“Why couldn’t a million developers democratically decide ‘this is worth a dollar’ and if you collected a million dollars in funding, then you could do a million dollars’ worth of work and that’s amazing.”

The post Igalia: the Open Source Powerhouse You’ve Never Heard of appeared first on The New Stack.

When you think about cloud services for developer productivity, the emphasis is usually on the local development environment, where you can reduce hours of work to a few seconds of setup. Even using scripts to get up and running, it used to take new employees at GitHub half a day to get a local instance of GitHub.com ready to work on, and then another 45 minutes every time they wanted to move to a branch with new dependencies, or help a colleague with a coding problem in a different branch. CodeSpaces gets that down to a few seconds for working on repos in Visual Studio Code (Google Cloud Workstations offers similar with various JetBrains IDEs) and Microsoft Dev Box gives you an entire preconfigured developer workstation in the cloud.

But when it comes to deploying the complex environments that applications run in, which might require multiple cloud services and subscriptions and need to be set up multiple times for test, staging and production as part of a build pipeline, there’s a tension between making it fast and easy for developers and keeping resources secure and managed.

Azure Deployment Environments

The new public preview of Azure Deployment Environments, announced today at the Microsoft Ignite conference, allows developers to spin up on-demand app deployment environments using a catalog of Infrastructure as Code templates with policies and subscriptions pre-configured by the infrastructure management team. The deployments use managed identities and developers don’t need to have access to the subscriptions and resource groups that applications will use (so they’re not tempted to save the credentials or hard-code them into scripts).

Azure Deployment Environments apply the right mix of policy to app environments and give developers the permissions they need at different stages of development and deployment; Image via Microsoft.

The templates also improve the security of the environments the applications run in, Anthony Cangialosi, who works on developer tools and services at Microsoft, told The New Stack.

“One of the concerns and challenges we hear organizations talk about is when we give developers access to a subscription, it’s like the wild west. We don’t know that they’re configuring these environments in a secure way. We don’t know if they’re configuring them consistently in the way the application was designed to work. So, when changes get submitted into the main branch, we often find issues as it gets rolled out.”

Azure Deployment Environments allows infrastructure and operations teams to create standard definitions for an application runtime environment. Developers can deploy them from the developer portal, from the CLI or as part of the CI/CD pipeline.

What Will It Be Used for

Unlike Azure DevTest Labs, where the advanced configuration and cost control features focus on VMs, templates built using Azure Resource Manager can configure and deploy any service in Azure. DevTest Labs has only basic options for spinning up PaaS resources using ARM templates and it creates all resources in the same subscription, so it’s only really useful for the development and test scenarios the name suggests. Deployment Environments are useful all the way through to production deployments.

Bicep and Terraform support is on the roadmap for Azure Deployment Environments and maybe Pulumi, Ansible and other infrastructure-as-code frameworks further down the road.

Different policies can apply to the templates for sandbox, dev, test, staging, pre-prod and production environments — like automatically turning a test environment off after it’s been used, or giving a developer more permissions in the development or test environment than in production.

“Being able to define different templates for production, pre-production, and testing and staging allows teams to define here is what an application needs to run correctly at the right sizing to manage cost for these different stages and to ensure that they’re configured securely and correctly, giving the developer the flexibility to rapidly create these without having to take on the burden of learning how to do that,” Cangialosi explained.

“It gives teams the confidence that they’re working with consistent environments. Now a developer can publish their changes into a deployment environment before they submit a PR, allowing a team to collaborate on the actual changes running in an instance that is representative of how it’ll run in production, with the actual services, not just the isolated component running on your local workstation.” Even with containers and mocking you can’t replicate the entire delivery environment on a developer laptop.

Security and Productivity

This isn’t about locking down developer environments or limiting what they can do. “There’s often this tension between the IT organization, as it is trying to manage security and compliance and governance, and developers looking for flexibility and control. These services help balance and give developers a greater degree of flexibility within the constraints.”

Being able to define and standardize app environments gives developers and infrastructure teams a way to work together and experiment with the impact of different configurations, he suggested.

“They can collaborate on changes; you can deploy them in a separate environment for developers to try out to see whether or not they’ll support the product, what impact they might have on performance, perspective, scalability, [and] any functional issues that might happen before they’re deployed broadly in pre-production or production environments.” Templates are stored in repos with versioning and pull request processes, just like any other code.

Azure Deployment Environments show up in the same Developer Center as Dev Box instances, which gives infrastructure teams one place to manage permissions and shared authorization. “As you have a development team that’s working on a common project, you can define the developers that have permissions to set up the developer workstation through Dev Boxes, as well as deployment environments that are predefined for them, that they can create in the same service.”

For developers, the appeal of Dev Box is getting to work faster. Like Azure Deployment Environments, it makes it easier to switch between roles like development and test or data engineering (or different clients, for consultants and contract staff), he pointed out. “You don’t have to have a separate laptop. You don’t have to worry, as you’re taking on different tasks, about potentially poisoning your existing environment with configurations that might make it difficult to rebuild your environment as you bring in new tools and SDKs that might reset or regress where you are.”

Right-Sizing

Developers also get the benefit of hardware upgrades without the disruption that usually means, and the economics are appealing to enterprises for the same reasons they adopted IaaS. “When you’re sending a physical workstation, you have to size for the maximum of what developers need, even though they may not be using that all the time.” Like any other cloud service, Dev Box can scale up and down as required. “As you move these workstations into the cloud, you have so much more elasticity to work with. You can right-size for a particular task or a particular role or project and as those roles and needs change, you can also change the hardware and the capacity that a developer has available to them.”

That matters more than ever for hybrid and remote work. “The last couple of years have taught us a lot about the need for more flexible, managed development and production environments,” RedMonk analyst James Governor told the New Stack. “Onboarding and offboarding new employees and contractors, for example, and finally bringing development environments fully into the cloud era. Guardrails are the watchword, as we balance developer productivity and developer experience with the needs of the business in terms of security and compliance. Dev Box allows a Windows shop to turn developer workstations into fully managed cloud services. Azure Deployment Environments should help enterprises standardize management of workflows from dev to production.”

The private preview has been especially popular in the financial services industry, where many organizations require developers to reimage their machines every two to four weeks for compliance reasons. But even without those somewhat extreme policies, Cangialosi pointed out, “You are a new developer in a new space at different points in your career, even when you don’t leave your current job.”

The post Turn Deployment Environments into Commodities with Azure appeared first on The New Stack.

The web is supposed to be a platform where multiple implementations deliver the same standards: there’s a new emphasis on having browsers actually deliver that interoperability.

Interoperability has always been the promise and the frustration of the web, but getting different browsers to do the same thing remains onerous for many web developers. Given that W3C, TC39 and other web standards processes all require implementations in multiple browsers and engines before a proposal can become a standard, you might expect all browsers to behave in the same way for features that are based on web standards. In practice, given those browsers are built by different vendors and they have to run on different devices where not everything will work the same, behaviors can be frustratingly different.

Web developers regularly call compatibility and interoperability issues between browsers their top frustration in surveys like the MDN Developer Needs Assessment (explore the data from those studies in the Browser Compat Data repo).

The MDN survey results show how frustrated developers are with browser compatibility issues.

Interop 2022 is part of an ongoing project to make that less painful for all web developers: “how do we reduce the number of web developer tears per second in the world,” asks Philip Jägenstedt, a software engineer on the web platform team inside Google — and he’s only partly joking.

Web of Pain

Take the developer at Google described by Rick Byers, director of Google’s web platform team, as “one of our biggest proponents of the web platform”. That person’s feedback when they moved to work on Gmail for iOS instead of the web version: “it sure feels nice to not feel like I’m fighting the platform all the time anymore”.

That kind of reaction came as a surprise to people building browsers, because as Alex Russell (a former colleague of Byers and now working on the web platform at Microsoft) pointed out at the Polymer Summit back in 2017, “browser engineers tend not to viscerally feel the problems that we feel as web developers.”

Developers care less about standards in the abstract, Igalia developer advocate Brian Kardell notes: “They want to know am I going to experience pain if I try to use this?”

“Whatever you think that you want as a developer, the first thing that you want is for all the sites to actually work the same.”

Instead, they would experience interoperability problems in features as widely used as Flexbox. “Flexbox has been around for a really, really long time and people use it all the time,” said Kardell, who estimates that 70% of sites on the HTTP Archive use Flexbox and that there were a thousand bugs to fix.

The “last mile” work of running tests and fixing bugs to get perfect interoperability isn’t exactly exciting and it takes a lot of effort, he noted. “It’s really hard to operate with perfect interoperability with varying architectures across varying operating systems.”

Taking Tests Seriously

Byers blames much of the frustration developers have experienced building on the web on how browser makers had, naturally enough, focused on their own browsers rather than the web itself. “Our job on the Chrome team is not to build Chrome. Our job is to help the larger community build the web.”

Changing that priority made it clearer how to improve the web developer experience. “We need to really approach the web platform with the engineering discipline it deserves if we want developers to take it seriously as a serious platform.”

“It should be no surprise that developers are frustrated. The web behaves inconsistently. We have applied none of our engineering expertise. Step one in software engineering is you should have a coherent test suite if you want reliability: the web didn’t have a coherent test suite.”

Tests suites for browsers did exist, Jägenstedt is quick to point out. “The problem was we didn’t care very much about it.”

Too often, the testing happened after features had been shipped and often it was outsourced.

Once the Chrome team realized that they did indeed care about web platform tests and announced a session on the predictability of the web at BlinkOn, representatives of other browsers who had been advocating for tests approached them enthusiastically, Byers remembers.

“We didn’t have to push a big rock up a hill, we just had to say we’re going to take this seriously and the entire community came together in a really positive and constructive way to say well, of course, this is the right way to engineer things. We’re all engineers at heart, clearly, we’ve been just dropping the ball on engineering. Let’s get together and work together as a community on really engineering the web with a common test suite that’s a first-class citizen.”

The community created conference sessions, mailing lists, dashboards and bug trackers focused on interoperability: things Jägenstedt describes as “trying different things to see what’s going to work to give us all incentives to improve the web platform for web developers”. One of those was the Compat 2021 project (later renamed to Interop 2021) that Google, Igalia and Microsoft collaborated on, covering five specific areas highlighted in the MDN surveys and Mozilla’s very comprehensive Browser Compatibility Report as particular problems for developers: Aspect Ratio, Flexbox, Grid, Sticky Positioning and Transforms.

The idea was to fix problems where there’s a standard that’s clearly specified and widely accepted enough to have multiple implementations, but those implementations have bugs — or are just incomplete.

The MDN browser compatibility report shows some very specific frustrations.

“We picked some areas that seemed to be problematic based on our understanding of the web developer pain points,” he explains. For each of those, they looked at the specifications and relevant tests to come up with a metric to track how well the different browser implementations do.

Participation and Priorities

Interop 2022 brings together more participants — adding Apple and Mozilla, as well as Bocoup (a consulting cooperative with a particular focus on accessibility and inclusion) — and again, the focus is on features that improve web developers’ lives, where the technology is mature and the impact clear, rather than trying to tackle everything in Web Platform Tests (which covers standards and proposals that are at many different stages of development, or even experimentation).

As Brent Fulgham, who heads up the WebKit web compatibility team at Apple, put it in a tweet, “Interop 2022 represents the technologies that we (web engine maintainers, standards bodies, and web developers) targeted as the most important metrics for good web compatibility and interoperability.”

A spokesperson from Microsoft told us that Interop 2022 is about “ensuring that web developers can rely on those standards and deliver innovative experiences to their customers”.

The Interop project is about participation and mutual consensus on shared interests, Google’s Jägenstedt emphasized. “We recognize that we can’t force each other to do things we don’t want to. This is not an effort to make Mozilla implement a thing that we like, but they don’t; that was off the table.”

“We didn’t try to negotiate a shared list of priorities. We just came together and looked at what could we agree to; and if someone didn’t want to say, why that wasn’t a part of the process.”

“What are the things that we can all mutually agree to that are important that we’ll focus on together? These are things that we all want to invest in and we can do it so much better if we do it at the same time and in collaboration. It’s a win-win situation.”

Rather than simply good intentions about improving interoperability, Interop is measurable: again, it has metrics generated by automated tests, making it clear how much work needs to be done and how much progress has been made. The scores on the Interop dashboard are calculated by weighting the 2022 and 2021 priorities, which reflect technologies that developers clearly want to use. “We wanted to make sure that this is measuring the real-world interoperability of the feature,” he notes.

But the metrics also mean the work is effectively being done in public. Picking the focus areas meant deciding “is this strong enough that we want to, if not quite commit to doing the engineering work, but be willing to have this metric out in public that’s going to make us look bad if we don’t do the work.”

Working on interoperability in a coordinated way means not only do all the browsers improve but they improve together.

10 New Focus Areas for 2022

As well as proposals collected from developers through the Web Platform Tests repo and bug reports from end users, that turn out to be about implementation differences between browsers, Jägenstedt says the State of CSS study played a big part in picking the ten new focus areas for 2022 “because that’s where we had the most and the best data”.

The full list is Cascade Layers, Color Spaces and Functions, Containment (which is a first step to addressing the Container Queries support many developers have been asking for), Dialog Element, Form Fixes (like the work Open UI is doing on making forms work the same way across browsers), Scrolling, Subgrid, Typography and Encodings, Viewport Units and Web Compat (not a specific feature or technology, but a catchall for known problems with various features that have already shipped in the various browsers, but that have bugs or mismatches with the standards that stop sites working the way they’re designed to).

“It’s a mix of both fixing the old things that are slightly broken and also some more forward-looking things where we nevertheless had some implementation experience. We’re not writing new standards here, we’re not doing totally greenfield things,” said Jägenstedt. “All the things that we’ve prioritized are things that had some implementation experience at the beginning of the year that we think can benefit from accelerating and making them available everywhere.”

“There were some things that each of us considered important that maybe didn’t make the cut where we did need to do more work before we can get there. What we were left with was a set of things that, if I may say so, I think is going to make developers rather happy when it is available everywhere.”

For example, CSS Subgrid is already in Firefox, making it easier to line up objects in complex grid layouts; Safari has implemented support over the course of 2022, and it’s coming “pretty soon” to Chrome and shipping Subgrid in Edge (which is based on Chromium) is what Microsoft is primarily focusing on as part of its Interop 2022 work.

Viewport units are another great example of how much of a difference adding a focus area to Interop can make. Developers complain about viewport units repeatedly in the Browser Compatibility Report and the CSS Survey, because they don’t work well on mobile, where you have to allow for the address bar and the way the size of the browser’s viewport changes as you scroll through the page. Dynamic viewport units take care of that, but browser support wasn’t there. “That actually started out at zero for everyone, but we can see that in a pretty short timespan it’s gone from zero to 100%. That usually doesn’t happen in the space of a single year,” Jägenstedt points out.

“This is more effective.We’re not doing more work. We’re probably doing less work in total by focusing on it at the same time.It’s simply a win-win to do this coordination.”

Scrolling is another area where mobile brings a number of different interoperability issues for developers. “What happens when you touch your phone and move your finger across browsers is not super consistent, the way the web developers see it: what happens to the address bar, what events are fired, what are the values reported to JavaScript for the width and height of different things?”

Interop 2022 is focusing on overscroll behavior — what happens when you scroll all the way down the page and start dragging — and scroll snap, which is used for swiping through product carousels. Because that’s not as interoperable across mobile and desktop browsers, developers have to code it in JavaScript, which doesn’t give as smooth an experience.

Byers points at touch, tap and scroll behavior in general as an example of building browsers as products rather than doing the engineering to get a high-quality web platform.

“When we all started taking our desktop browsers and our desktop browser engines and porting them over [to] mobile, we thought of it as a browser engineering problem and we didn’t think of it as a platform engineering problem as much as we should have. We all shipped browsers that worked on mobile, without really thinking: how do we need to update the standards?”

To make the web work well on mobile devices, browsers needed to put scrolling on a separate thread. “The only way scrolling was going to be smooth on low-power mobile devices was if we did it asynchronously for JavaScript. That fundamentally changed the model of the web platform and we didn’t update a single spec for it, I don’t think,” said Byers. Browsers added features like pinch to zoom, hiding address bars and the different ways tap and touch behave on screen, but the standards didn’t necessarily change to match.

“We accumulated all this debt in terms of the standards and the interoperability and the test suites.”

Initiatives like Interop are helping mobile catch up though, he believes. “The cost you pay to have an interoperable open platform is that it’s slow to catch up, but we’re finally getting to the point that the web platform on mobile is a serious computing platform. It’s not just that we have good browsers, but in the underlying platform increasingly we’ve got the standards [and] we’ve got the consistent behavior you can rely on. There’s still work to do, but I think we’re getting to the point that the web on mobile is now the predictable reliable platform that you can bet your business on the way it became on desktop a decade ago.”

Looking Forward Together

The work on color spaces and functions is important because while displays are getting better and better, the specifications for color management and taking advantage of the wider color gamut haven’t quite kept up. The CSS Color Module is trying to address that. “We have displays that show color more vibrantly and gradients that just look better than before,” said Jägenstedt. The Interop work will help browser makers figure out “how do we actually go and implement this in our browsers, without using more memory.” Without this feature, pages can fall back to less vibrant colors; with it, “designers will make use of this to make more vibrant designs”.

Work is also continuing on the five focus areas from 2021: Safari has continued to catch up on aspect ratio, flexbox, grid and transforms. But more importantly, all four browsers are much closer to parity on all the Interop priorities than they were six months ago (which was already a big improvement over the level of interoperability before Compat 2021).

Having clear metrics means you can see how good the progress on interoperability is already.

Three of the focus areas for Interop 2022 are slightly different though. Rather than focusing on areas where browsers are ready to move ahead and can start work immediately, these are what Jägenstedt terms “investigation efforts” into areas where developers or web users are seeing problems but there isn’t yet a shared test suite to use as a metric. “We’ve said let’s come together and do the work to figure out where the specs are lacking and what tests we need to write, so that come next year we could include these as focus areas in 2023 if the evidence is there.”

Viewport measurement is a good example. “We know that there are differences between browsers in what happens with these JavaScript APIs and events that have fired and CSS units,” Jägenstedt said — but the automated tests aren’t finished, so engineers from the different browsers have collaborated on a list of issues that need resolving. “I’m bringing those to the CSS working group so we can get these specs refined and then write the tests the next time around.”

“Think about it as a trampoline. When we know something’s important, but we don’t know how to do it yet, this is like a stepping stone to include it the next time around.”

These more speculative investigations do contribute to the Interop metrics, but all browsers get the same score for them, underlining the point that making progress on new standards and tests is a team effort.

“It’s not just everyone deciding to push,” Byers suggests. “It’s doing the work to find the areas of common interest so that we can all agree on what’s worth pushing on.”

“This is one of the things I love about working on the web platform, is that when you can find allies that care about the same things you can move mountains by saying ‘hey, we all want this done, let’s agree we want this done, let’s get it done’ and we all benefit.”

“The total value of the web is more than just the sum of the parts or what any of us could do individually. Ultimately, I think the superpower of an open platform like the web platform is it can be so much more powerful, so much more resilient than any platform owned by a single company.”

The post Browser Vendors Aim to Heal Developer Pain with Interop 2022 appeared first on The New Stack.

The aggregate storage model for Bindle, an open source package manager, is such a good fit for WebAssembly that it could be the basis of the WASM component registry.

Cloud native architectures rely on microservices that break complex applications up into smaller composable pieces but there’s still the problem of storing and installing those individual pieces, which often requires developers to deal with multiple runtimes, cloud services and related artifacts.

Even web applications have components in HTML, CSS, JavaScript — often including multiple libraries and dependencies — and whatever images and other media are needed: the application is an aggregate of multiple pieces. That’s even more true of WebAssembly, where application binaries need to be portable to many different systems. Depending on deployment choices or the resources available on the system where the application will run, different components might be required.

How do you easily represent complex interdependencies like an application that can be configured to use MySQL, PostgreSQL or SQLite as a database that will need a helper library if you pick MySQL and a shim library if you’re deploying on a system that doesn’t have a GPU, without the package being so large that it’s impractical to deploy when bandwidth is tight?

That’s the kind of common scenario that the Bindle package repository system (soon to reach its 0.9 release) was designed to address, grouping related objects for distribution using aggregate storage where clients can retrieve just the parts of the package they need.

“We knew that as WebAssembly matured, applications in WebAssembly were going to be built of composite binaries, where you have a whole bunch of different binaries stored together that could link with each other in different configurations and we built Bindle to do that,” Bindle maintainer and Fermyon Technologies CEO Matt Butcher told the New Stack.

“The system was set up to conceptualize applications the way we want to in the cloud native ecosystem. An application shouldn’t be considered a binary: that was the way we were thinking about applications years and years ago. Now we have to think of applications in terms of a conglomerate of different microservices and maybe even some files and objects and things like that. They all have to fit together in a very particular way in order to accomplish their job but they are actually several different binaries.”

The Silverware Drawer

Butcher and Bindle maintainer Taylor Thomas uses the metaphor of the silverware drawer in the kitchen, where you might keep chopsticks and straws and tea strainers as well as knives forks and spoons, with rules like “you need two chopsticks and probably a spoon” or “this spoon is only used for soup and this spoon is only used for tea”.

“The silverware drawer is the idea of saying regardless of the shape and size you should be able to store all this stuff in that same thing and explain to the system this is how I need to retrieve it and these are the things I get back.”

Bindle (named after the handkerchief-wrapped bundle tied to the end of a long stick that you can imagine holding everything you need) handles groups of related objects, with each “bindle” package having an invoice: a manifest explaining what the bindle does and listing all the parcels that make up the package.

Most package managers are geared towards distributing individual packages either as a compressed file or an entire repository. When a Bindle client retrieves the invoice and reads the list of parcels — which might be a combination of WASM modules, text files, JavaScript, CSS, images, videos, shims or dependencies — it can use the conditional groups and feature descriptions in the invoice to pick which components, dependencies and add-ones it needs and only download those.

Invoices have to have version numbers (Bindle uses semantic versioning so if you search for v1.2 and v1.2.4 is the latest version, you’ll be offered that but if you know you need v1.2.3 instead you can search for that explicitly).

Conditional groups can list requirements (the application needs a Borne-style shell so at least one of Bash, Korn, Zsh or Busybox needs to be installed), optional add-ones or chains of dependencies (the application needs at least one web server and if NGNX is installed, a particular NGNX module must also be installed).

“That’s how we group things,” Thomas explains, “there are things we use together that that are entirely different from each other.”

You’d use a group to list the SQLite, PostgreSQL and MySQL plus helper library database options and a feature for the shim library required when there’s no GPU. Instead of downloading six components, people get the two or three they need, saving network bandwidth and storage. And if one of the parcels is already installed because another package uses the same version of Postgres, the Bindle client will use that rather than downloading it again.

“You save a lot of bandwidth with Bindle,” he points out. “Instead of having to download whole tarballs where half of it you don’t need, you can now just download exactly what you need, given your circumstances.”

These invoices can be as simple or complex as the application requires.

“You don’t have to fit your application into an existing mold,” Butcher explained: “you can store the application the way you want it stored and then reassemble it the way you need to reassemble it later.”

Store and Deploy

He gives the example of a game he coded for a colleague with a very picky cat. The Finicky Whiskers app has seven microservices packaged into a single bindle: one for the scoreboard, another than handles the button you click to offer the cat different types of food. “When we refresh the Finicky Whiskers site, it pulls the bindle out and reconstitutes the application from that.”

“That’s the flexibility of being able to store things in Bindle. You can package up seven different microservices all in one big application and deploy it wherever [you need]. You can store a single binary, you can store an application that has dozens of different binaries: you can probably store a file system in there if you want. It’s proven to be a very nimble format for us.”

“It’s surprising to me that no one has even thought about doing it this way,” Thomas told us. “Everyone wants to think of it as a tarball or some sort of very flat structure and Bindle is taking the approach that you might want to assemble things from different parts. Internally in Cosmonic we’ve already seen multiple instances where we need to be able to version something that has a couple of different disparate components and put it together.”

The idea behind Bindle might actually sound familiar if you’ve come across Porter, a packaging tool created by colleagues of the Bindle creators that recently reached 1.0, or the Cloud Native Application Bundle (CNAB) specification it implements. “CNAB is for installing applications, for taking an application and installing it everywhere and Bindle is for storing those things,” Thomas explained. “Those two projects came from the same team because we were thinking about that from two different angles: one is the actual deployment of the application, and one is how you store it.”

Bindle was originally planned as a way to store and share WebAssembly applications and binaries although it’s proving useful more broadly. Butcher characterized CNAB as a similar approach for the container ecosystem, calling Bindle “what we learned working with registries and CNAB — that there should be a better way to store microservice-based applications and then deliver that microservice-based application”.

CNAB and Bindle could work together, Thomas suggested. “You could have CNAB pull the artifacts that it needs from the Bindle server.”

WebAssembly isn’t the only place Bindle could be useful; “there’s many other things that we have found that don’t have anything to do with WebAssembly where it comes in handy.” The principle of having implementation details that vary between systems and allowing the runtime to make the decision about what to pull down works for a lot of scenarios.

Bindle could handle file system snapshots for building infrastructure where you need specific numbered releases, or where you want to deploy at the edge where you can’t control what hardware will be available (and don’t want to have to create multiple container images for different hardware). “I’m assuming that if Bindle continues to grow, we’ll start to see like things that are Bindle aware for specific runtimes and how to leverage those,” Thomas said.

“We’ve needed this for a long time.”

Secure Package Registry

The first special interest working group at the Bytecode Alliance (the industry group building shared implementations of WebAssembly standards) is SIG-Registries, which is working on the specification for a package registry for WASM components. “A central hub for us to share the different WebAssembly applications and binaries that we’re building and then allow people to pull from that and assemble their own applications and upload those,” Butcher explained, comparing it to npm or Docker Hub for WASM.

Bindle has been proposed as the storage layer for that, with a package API layered on top and may become the reference implementation.

With the WebAssembly component model, applications are built out of small units like Lego blocks. “A component alone depends on many other components: if you get one component, that’s not enough to run anything,” Bailey Hayes from the SIG-Registries working group explained. Bindle makes it easy to pull all the components and nested dependencies you need. “With SIG-Registries, I think we’re going to build on a lot of what Bindle has come up with and the way that all those pieces work.”

The appeal of WebAssembly isn’t just that code can run in many different places but the combination of performance and security. WebAssembly code should be designed to have only the capabilities and permissions it really needs and it’s that as much as the sandbox that offers a better security model, because components can do things like opening sockets that essentially punch a hole in the sandbox. Importing components means you’re relying on them behaving well and being well written, so the component registry needs to implement a chain of trust, Hayes said.

“It’s a complete rethinking of how we’ve been doing registries in the past, how we want to do them in the future and just like many of the other WebAssembly standards that have come before us, it’s all about having security as a first principle.”

Bindle is a good fit for that because it was designed to have security on by default. Bindles are immutable: they’re cryptographically hashed and signed and neither the name nor the contents can be changed.

“There should be no question that when I pull something back out of the registry, it is the same exact thing that somebody put into the registry before,” Taylor explained. “There is no sense in which it can be tampered with, or that a bad actor could come in and redirect one thing to another thing.”

The upcoming 0.9 release will require all bindles to be signed; “that’s always been stated since the beginning that it would be the plan,” Thomas noted, but there will be an RC release to make sure it doesn’t break anything for organizations like Fermion, Cosmonic (where Thomas and Hayes work) and Suborbital that are already using Bindle.

Because Bindle was designed for signed bindles, while they will need to do a little more work, this won’t significantly change the developer experience. “The Bindle key spec [means] the base default is for me to get at least the most simple level of security, which is saying I trust this host, I must know every single thing I’ve pulled down from that host is actually coming down from that host.”

On top of that organizations will be able to build more restrictive policies in WebAssembly and Bindle but still give developers flexibility, Cosmonic CEO Liam Randall pointed out. “You can say ‘only allow the import of modules that have a scan or that meet certain requirements.’ You can have organization-specific policies that would show me choices that meet my policies. Within a regular development pipeline, perhaps you have major and minor numbers of a component, and only the majors get a security assessment, and maybe in prod you would just want those but in dev, you would want other options.”

“I can look at these things at a glance and say what level of risk I want to take on,” Hayes noted.

Getting to Bindle 1.0

The 1.0 release of Bindle may not follow the 0.9 release until there’s enough usage to make it clear the spec hasn’t missed any edge cases that would need breaking changes, Thomas explained.

“We haven’t designed a registry, which is basically a storage system from the ground up. We have a background from people who’ve done little bits and pieces of it and worked with OCI and different parts of the ecosystem but because this is an entirely new way of doing things, we knew we probably got some stuff wrong and so we didn’t want to have to break the spec too soon.”

Having worked on projects like Helm, the Bindle maintainers have strict requirements around semantic versioning and bandwidth compatibility and the only breaking changes to 1.0 would be security related. “That’s a rule we follow because all of us have been bit by trying to upgrade from one supposedly minor version of Kubernetes to the next minor version of Kubernetes and having three APIs break on you and we don’t want to do that to people.”

That doesn’t mean Bindle isn’t ready to use in production now: platforms like Fermion and Cosmonic are built on Bindle (and Cosmonic uses it for both WebAssembly and part of the platform that are not built with WebAssembly). “It’s solid enough to use, it’s just not 1.0 because we wanted to reserve the right to make a breaking API change or a breaking spec change.”

That’s particularly important for a specification that Thomas says aims to be “simple, straightforward and open” and will be implemented in multiple ways for the different ways platforms might need to set up and run Bindle. There’s already an F# implementation and a proposal for implementing Bindle as a set of wasmCloud actors.

The flexibility of the silverware drawer metaphor is part of what makes Bindle so powerful because it allows for experimentation when exploring projects like SIG-Registries (which may ultimately have a more rigid package API than Bindle requires), Taylor suggested. “People can try different things and see when we make it rigid, which set of assumptions are going to hold.”

“It’s malleable enough to move with the standard but at the same time the internals are pretty solid.”

That comes from the frustration of having to constantly rebuild otherwise working systems Butcher said. “As developers, we tend to build solutions for a specific problem but technology evolves so rapidly, that very often, it’s only a few years before we’d say, oops, we made an assumption and the assumption no longer fits the design.”

“I’ve been a software developer for 25 years now and I think no lesson is learned so clearly by experience in this industry as the lesson that technology as a whole moves forward, but individual technologies have a very short shelf life. Consequently, we constantly have to revisit assumptions: yesterday’s assumptions that seemed safe yesterday, may not be safe today and will certainly be out of date tomorrow. As an industry moving toward more flexible solutions is going to be an important thing: if we want to somehow break out of the trap of having to continually re-architect particularly the low-level stable things then we need to get better at designing with flexibility in mind.”

The post How Bindle Makes It Easy to Store and Distribute Cloud Native Apps appeared first on The New Stack.

Unless you’re extremely good at security, hyperscale cloud providers are probably better at security than your organization: they have more security expertise, they patch faster, they run background checks on admins, and they have strong operational security.

But there are still risks, whether from your own admins or attacks on cloud data centers, vulnerabilities in the guest or host OS — or just the fact that while data is encrypted at rest and in motion, it’s not usually encrypted when you’re using it.

That might be enough for extremely sensitive data and confidential workloads in regulated industries to make the cloud unsuitable. Confidential computing keeps data encrypted even in use, in memory, and during computation, so you stay in control of your data from the time it’s created to when you delete it, and it’s never exposed to malicious insiders, admins or hackers, even if there are security vulnerabilities in the OS or hypervisor (assuming there are no bugs in the confidential computing stack itself).

That’s because the computation happens in a hardware-based trusted execution environment (TEE) where you have verifiable assurance for data and code integrity and data confidentiality. As well as memory and the data in it is encrypted, the code you run in the cloud is protected, and you can verify that it hasn’t been tampered with (and the activity history is immutable and auditable too).

Confidential Cloud

With confidential cloud, ”data is in the control of the customer during its entire lifecycle, whether that’s at rest, in transit, or in use,” Vikas Bhatia, head of product for Azure Confidential Computing explained at the recent Microsoft Ignite conference. “The cloud provider is outside the trusted computing base. The code that you’re running in the cloud is protected and verified by you, the customer with remote attestation capabilities.”

“What we see today is our customers are looking to trust as little as possible,” he noted. “They want full control over the data lifecycle.”

Analysts Everest Group predict that confidential computing is going to grow quickly and could become a standard for end-to-end security, particularly for the public sector and enterprises in banking, financial services, insurance, healthcare, life sciences, defense and other regulated industries or where critical infrastructure is involved.

Bhatia listed early Azure confidential users including regulated industries like telcos, healthcare teams in disease diagnostics working with data from multiple healthcare providers in a confidential environment they can tear down completely when the research is complete, retail and advertising companies who want to do multiparty machine learning and financial services organizations building anti-money laundering systems.

“This is enabling net new scenarios in confidential computing that were not possible before,” Amar Gowda from the Azure Confidential Computing team explained in a session. “This is allowing two different institutions that could not collaborate on data because it has PII into this environment. Now because of attestation and memory protection and integrity protection, you can rest assured that the data does not leave the boundaries [or get] in the wrong hands.”

Confidential Computing on Azure

Confidential computing starts with the hardware root of trust; Azure has confidential virtual machines using Intel SGX, AMD SEV-SNP (in preview this month) and NVidia A100 tensor core GPUs and Ampere-protected memory that have a secure channel between trusted execution environments on both the CPU and the GPU (in limited preview for ML training and large data AI workloads where confidentiality and integrity is key).

Intel SGX stands for Software Guard Extensions, where you partition your app into an untrusted and trusted region, and the sensitive code goes into the trusted environment. Both VM memory and RAM are encrypted, and there’s Enclave Protected Cache (EPC) memory specific to the application. With SGX the amount of encrypted memory used to be quite small, but the current DCsv3 generation of VMs has much more encrypted memory (up to 256GB) for large data workloads.

AMD SEV-SNP stands for Secure Encrypted Virtualization and Secure Nested Paging, which offers hardware protection against malicious hypervisors as well as encrypted memory: the virtual machine memory is entirely encrypted and integrity protected with keys generated by the AMD CPU that can be kept in Azure Key Vault or Azure Managed HSM (which itself relies on confidential computing), plus you can choose to pre-encrypt the operating system disk. That doesn’t need any code changes, so just deploying a workload into a confidential DCasv5 or ECasv5AMD EPYC VM makes an application confidential with minimal performance difference.

Not having to rewrite code to use confidential computing makes it easier for Microsoft to provide its own cloud services as confidential computing services (and offer commercial clouds like the new Microsoft Cloud for Sovereignty aimed at governments who want to use the public cloud).

SQL Azure has had an always encrypted option using Intel SGX for several years but you can now run SQL Server IaaS on AMD confidential VMs, Azure Virtual Desktop can now run Windows 11 in the cloud on AMD confidential VMs and a confidential computing version of Azure Databricks is likely to be announced this year.

If you’re using the open source Confidential Consortium Framework to build decentralized networks, it’s likely that there are multiple organizations involved: the new Azure Managed Confidential Consortium Framework (built on Azure confidential computing and currently in private preview) avoids one organization having to run the infrastructure for that network.

Azure Confidential Ledger, a secure tamper-proof blockchain-based ledger uses CCF running on Azure Kubernetes Service (AKS), Graham Bury, a principal PM manager on the Azure confidential computing team, told the New Stack: “You can think about a lot of these services that Microsoft builds as managed PaaS services that just happen to run on Kubernetes leveraging confidential computing.”

But Microsoft also wants to enable confidential computing for customers who are building on Kubernetes.

Confidential Clusters

Confidential computing can protect containers as well as VMs. Azure Container Instances (which is good for isolated container scenarios that don’t need orchestration, like machine learning and AI workloads or short-lived workloads you want to burst to the cloud securely) now has serverless confidential computing in limited preview (a public preview is due soon).

This doesn’t need changes to your container images and gives you a dedicated hypervisor with in-memory encryption per container group. It also has full guest attestation, so you can verify that the container is only running the components that you expect to run.

Microsoft says this is popular with data scientists deploying Python containers to use with Azure Machine Learning, which can be made confidential without code changes that might affect the model.

When you do need orchestration because each node on AKS is a virtual machine in a Virtual Machine Scale Set that you provision in Azure, those VMs can be confidential VMs.

AKS has actually had confidential computing support since 2020 as the first cloud using Intel SGX VMs to run containerized apps built with the Open Enclave SDK but that meant making changes to those applications to partition apps so the trusted code will run in the SGX enclave or using third-party tools like Anjuna, Edgeless, Fortanix or SCONE that handle that for you. Signal, the secure messaging service, uses Intel SGX nodes in AKS for storing user contact details where neither the admins at Signal nor Microsoft can view them.

Now Azure is the first cloud service to support AMD SEV-SNP confidential computing in Kubernetes. Because the confidential virtual machine node pools can now use AMD confidential VMs, you can just lift and shift your containers into a confidential environment, or make an existing AKS cluster to a more secure state by adding a confidential node pool to it — but you don’t have to make the entire cluster confidential if you only need the extra protection on specific nodes where you’re processing sensitive data.

Confidential node pools work with the full AKS feature set, like autoscaling, AKS addons, Azure CNI, Azure Defender for Containers and the rest. They use a customized Ubuntu 20.04 image(Microsoft is partnering with Canonical to make sure that all Azure confidential services will be supported for Ubuntu). ARM64 and Mariner images aren’t currently available but Gowda said that Windows Server nodes will be coming soon.

AKS supports heterogeneous clusters where not all node pools use confidential computing.

You deploy a confidential node pool for AKS the same way you currently deploy node pools, using an ARM template or the Azure CLI or portal to create a node pool and pick the VM size to use for that — just pick the DC-Series or EC-Series VMs that offer confidential compute. You don’t need to change the code that runs in the container or the container image: just edit the pod YAML spec to deploy the confidential node pool (if you use node affinity, you can pick the confidential computing node pools as an affinity).

As well as the data-in-use protection of using memory encryption, you also have remote guest attestation so you can be confident your workloads are deployed in the environment you expect and that only what you put in those containers is running on that hardware.

Confidential virtual machine AKS node pools are generally available now. The AMD confidential VMs that confidential node pools are based on are already available in East US, West US, North EU and West EU regions with Southeast Asia coming soon and Microsoft plans to expand them to more regions “in the near future”, so all those regions have confidential node pools.

Confidential Becomes Common

Not needing to rewrite code to use confidential computing in AKS makes this an appealing option for customers who have Kubernetes-based applications on their own infrastructure, where they’re unlikely to have confidential computing enabled because the server hardware to do this is very new.

They can now lift-and-shift applications they’ve not been comfortable taking to the cloud because of concerns about privacy, security compliance or data regulations, Bury suggested. “The aspiration is how can I move more of the workloads to the public cloud because I want it there by default: bring my whole container workload as is and I don’t even have to think about it because it’s encrypted memory.”

That applies to IT organizations running code build agents and code signing to achieve software supply chain hardening, financial institutions to execute data processing pipelines for dynamically spinning up containerized jobs, and telecommunication providers to meet Schrems II and other data regulatory compliances, he said. “As well, our internal Microsoft code signing services are onboarding to our confidential VM capabilities in AKS.”

If you want to run Kubernetes on Azure yourself, you can use confidential computing VMs to host it and manage the Kubernetes deployment yourself. Some customers do that with AKS-Engine or the Kubernetes Cluster API Provider for Azure, Bury noted, “but most of the customers we talk to look for us to bring confidential computing and added isolation directly to our managed AKS.”

In the long term, once hardware is widely available Microsoft expects confidential computing to go from a specialist requirement for things like multiparty data analytics that needs “heroic” data protection to as standard as encrypting data at rest and in transit.

“We’re expecting to see compute to evolve in general from computing in the clear to computing confidentially and both in the cloud and on the edge,” Bury said. “We do see it being much more general purpose over time. We expect to have confidential computing capabilities be pervasive across our infrastructure platform over time as we can make the hardware with those data protection capabilities pervasive. If we could update all of our hardware overnight and have these capabilities that would be fantastic!”

Confidential computing will also start to align with software developments intended to create more isolation in Kubernetes.

At Kubecon+CloudNativeCon this year, Microsoft announced an upcoming limited preview of support for Kata Containers in AKS, with a lightweight VM that runs in a dedicated kernel per pod, using VT virtualization extensions to create stronger workload isolation for network, memory and I/O. That promises higher security for different workloads on the same cluster and Bury hinted that isolation could make sense as one of the ways that Microsoft contributes confidential computing concepts into the open source Kubernetes space.

“With Kata containers, we can see a unification of the open-source isolation technologies available across AKS and specifically leveraged for our confidential computing stack in AKS too,” Bury said.

“AKS and Kubernetes in general can benefit from things like Kata containers to bring that level of container isolation with that specially tuned kernel, and then we can think about what we do in confidential computing, where you have you have a specific piece of hardware enables you to verify you’re running on that hardware with that added data protection, with memory encryption.

It’s just a matter of time when we can get all of these things working together,” he suggested. “We can create more and more isolation and data security and protection while keeping that Kubernetes native experience.”

He also pointed to the confidential computing support in ACI as an example of what Azure wants to offer for Kubernetes. “How can we be very user-friendly with their containers having as much isolation and data protection [as possible] from a security and privacy point of view?”

Pick a confidential compute VM for your node pool and AKS automatically enables memory encryption for that node pool

The post Confidential Compute on Azure with Kubernetes appeared first on The New Stack.

Although most people think of Kubernetes and containers generally as Linux technology, Linux is not the only OS where you can use containers. Once you start running multiple containers and microservices on one or more hosts, you will need the kind of features that a container orchestrator like Kubernetes provides, such as load balancing, high availability, container scheduling, resource management, etc. Although the Kubernetes control plane currently only runs on Linux, you can still run Windows containers on Kubernetes.

Windows on Kubernetes

Windows Server 2016 introduced containers (using job objects and silo kernel objects, whereas Linux uses control groups and namespaces). Work on Windows support for Kubernetes started in 2016, with the stable release shipping in Kubernetes 1.14 in 2019. The goal wasn’t to move the entire control plane to Windows but to offer Windows Server as a compute node for Kubernetes, giving organizations an environment that would let them run all their apps in the same place.

Think of it less as bringing Kubernetes to Windows and more as bringing Windows, .NET, IIS and other Windows programming frameworks to Kubernetes so that Windows developers can use cloud native tools to build and deploy distributed apps while reducing the costs of supporting existing apps and streamlining migration off older versions of Windows as they lose support.

Now you can manage Windows and Linux containers side by side in the same Kubernetes cluster by adding Windows Server worker nodes that can run Windows containers to that cluster: They just have to be running Windows Server 2019 or later (and you need to use a CNI that’s compatible with both Windows and Linux, like Calico or flannel).

For instance, Microsoft runs many of the services that power Office 365 and Microsoft 365 in Windows containers on Azure Kubernetes Service.

Clusters with Windows support will be a mix of Windows and Linux nodes, even if the Linux node is only used for leadership roles like the API server and scheduler. But you can also deploy a Linux container running a reverse proxy or Redis cache and an IIS application in a Windows container in the same cluster or even as part of the same app, and use the same pipelines for deployment and the same tools for monitoring all the different pieces of the app.

This makes Windows containers a good way to modernize applications: You can start by “lifting and shifting” an app into a container, and then add more cloud native features at your convenience.

Windows Containers in Kubernetes

Supporting Windows containers on Kubernetes doesn’t make Windows work like Linux; admins will still be using familiar Windows concepts like ACLs, SIDs, and usernames rather than Linux-style object permissions, userIDs and groupIDs, and they can use \ in file paths the way they’re used to doing.

Linux features like hugepage aren’t available in Windows containers because they’re not a feature in Windows, and you can’t make the root file system read only in the way it does in Linux containers because Windows registry and system processes need write access.

There are also a number of features where you need to use a slightly different option on Windows, like runAsUserName rather than runAsUser, to pick what user a container runs as or to restrict the container administrator account rather than the root user.

Windows Server containers have two user accounts by default (neither of which are visible to the container host): container user and container administrator. Container user is for running workloads that don’t need extra privileges, and it’s definitely the best choice if you’re deploying containers in a multitenant environment: Container administrator lets you install user mode services and software that persist (like IIS), create new user accounts and main configuration changes to the container OS.

You can also create user accounts with the specific permissions you need. Although you can specify file permissions for volumes for Linux containers, they are not currently applied for Windows containers, but there’s a proposal to use Windows ACLs to support that in the next version of Kubernetes.

Generally, identity is one of the places where Kubernetes for Windows is most different because it needs to support Active Directory to give applications access to resources. A Windows app talking to an external database server or file share will likely use a Windows identity for authorization and won’t get access without that AD account. But containers can’t be domain joined.

Instead, workloads can use Group Managed Server Accounts (GMSA), which assigns an AD identity to the container that handles password management, service principle name management and delegation to other administrators in a way that can be orchestrated across the cluster. If a node fails and the workload gets migrated to another node in the cluster, as long as all the Windows hosts in the cluster where the pod might land are domain joined, the identity goes with it.

You can use Kubernetes with Azure Active Directory through Azure AD workload identity for Kubernetes (for both Windows and Linux container workloads). This enables Azure AD workload identity federation, so you can access resources protected by Azure AD — everything from your own Microsoft 365 tenant resources to Azure services like Azure Key Vault — without needing secrets.

Storage and Networking for Windows Containers

Other areas of Kubernetes on Windows are increasingly becoming similar to the way they work on Linux.

Early on, Windows containers on Kubernetes could access a limited range of storage types: Support for Container Storage Interface plugins (introduced in Kubernetes 1.16 and stable since 1.22) through CSI Proxy for Windows means Windows nodes can work with a wide range of storage volume systems by using existing CSI plugins.

Similarly, Windows Kubernetes networking has moved from relying on the Host Network Service to using overlay networking to support CNI plugins, kube-proxy and network control planes like Flannel. The updated kube-proxy Next Gen is also being ported to Windows.

If you’re running into problems, this is an excellent list of troubleshooting tips for Kubernetes networking on Windows, and these tips should help you find out what’s a problem with Kubernetes or with Windows.

Understanding Isolation in Kubernetes for Windows

Windows containers can use the traditional Linux container isolation model, known in Windows Server as process isolation, where containers share the same kernel with each other (and the host), or they can use Hyper-V isolation, where each container runs in a lightweight VM, giving it its own kernel. That’s similar to the improved isolation Kata containers offer on Linux.

With process isolation, containers share the kernel with other containers on the same host as well as with the host itself, which means the kernel versions of the container image have to match.

Unless you’re using Windows Server 2022 or Windows 11 on current versions of Kubernetes, where you can carry on using an existing Windows Server 2022 or Windows 11 container image even if you update the container host, Windows containers need the OS version of the host and container image to match down to the build number (which changes in each new version of Windows). For Windows Server 2016 and older versions of Kubernetes, the match needs to be even closer: down to the revision number, which changes when you apply Windows updates.

Hyper-V containers would avoid that problem because the kernel would no longer be shared with other containers or even the container host; instead, Hyper-V would load whatever kernel a container needs, giving you backward compatibility so you can move Windows nodes to a new version of the OS without rebuilding your container images and updating apps to use them.

However, Hyper-V containers aren’t currently supported in Kubernetes: There was alpha support using the Docker runtime and Hyper-V’s Host Compute Service in earlier versions of Kubernetes, but it only supported one container per pod. That’s been deprecated, and the work to enable Hyper-V containers with the containerd runtime and v2 of the Host Compute Service is proceeding slowly.

But another long-awaited container option is now available. Although Windows containers don’t support privileged containers, you can get similar functionality with the new HostProcess containers. HostProcess containers have access to everything on the container host, as if they were running directly on it. You don’t want to use them for most workloads, but they are useful for administration, security and monitoring — including managing Kubernetes itself with tasks like deploying network and storage plugins or kube-proxy.

A HostProcess container can access files and install drivers or system services on the host. That’s not a way to deploy server workloads, but it gives you one place where you can run cluster management operations. That means you can reduce the privileges needed for other Windows nodes and containers. Networking and storage components, or tasks like log collection and installing security patches, or certificates can now run in (extremely small) containers that run automatically after you spin up a new node, rather than having to log in manually and run them directly as Windows services on the node.

Stable release for HostProcess containers is on track for Kubernetes 1.26, which is being released in December 2022.

Scheduling Windows Containers in Kubernetes

You can deploy Windows nodes to a cluster with kubeadm or the cluster API, and the kubectl commands for creating and deploying services and workloads work the same way for Linux and Windows containers. But you need to do some explicit infrastructure planning (and remember, you’ll be deploying those nodes by interacting with the Kubernetes control plane, running on Linux).

You can’t mix and match container types on a single pod: A pod can run either Windows or Linux containers. Windows nodes can only run Windows containers, and Linux nodes can only run Linux containers, so you need to use node selectors to pick what operating system deployment will run on.

The IdentifyPodOS feature gate that adds an OS field to the pod spec defaults to enabled in Kubernetes 1.25, so you can use that to mark which pods run Windows Server, allowing the kubelet to reject pods that shouldn’t run in the node because they have the wrong OS — but it’s not yet used for scheduling. It’s still worth using because it gives you much clearer error messages if a Windows container fails because it ends up on a Linux node (or a Linux container fails because it ends up on a Windows node).

If you’re adding Windows nodes to a cluster where you already have Linux workloads deployed, you will want to set taints on the Windows nodes so that if a Linux node fails over, those applications won’t end up on a Windows node (or vice versa). You can also use taints to mark every Windows node with the OS build it runs (because while you can run multiple Windows Server versions in a cluster, the Windows Server version on the pod and node need to match). You can simplify that by using the RuntimeClass variable to encapsulate the taints and tolerations that define the build of Windows that you need.

If you’re using Helm charts for deployment, check that they cope with heterogeneous clusters or add taints and tolerations to steer containers to the right nodes.

Another thing to consider when adding Windows nodes to a cluster is increasing the resources you specify in the template. While they don’t need significant amounts of memory when in active use because read-only memory pages are shared between multiple containers, Windows Server containers tend to need more memory to start up successfully — and the startup time for the first container may be longer. Containers will crash if applications need to call more memory than they have access to, and the Windows background services running in containers means the memory allocation likely needs to be larger than for a Linux container. If your templates were specified for Linux containers, increasing the memory allocation will avoid issues for Windows containers — while still giving you much higher density than you would get with virtual machines.

Resource management is slightly different for Windows nodes too. CPU and memory requests in pod container specifications can help avoid overprovisioning a node, but they won’t guarantee resources if a node is already overprovisioned.

The metrics for operations like pod scaling are the same as for Linux. Node Problem Detector can monitor Windows nodes, although it’s not yet using Windows Management Instrumentation (WMI), so only a few metrics are included. Use Microsoft’s open source LogMonitor tool to pull metrics from the Windows log locations like ETW, Event Log, and custom log files that Windows apps typically use.

Which Windows Server Versions Are Best for Kubernetes

Because Windows Server versions have end-of-support dates, you may need to think about upgrading the OS version of containers, which is easier than with a virtual machine: You can just edit the dockerfile for the container (and upgrade the node so that it matches), although that doesn’t help with any changes you might need to make to an app when the version of Windows changes.

Windows Server 2016 used to be supported by Kubernetes, but that didn’t allow multiple containers per pod. Windows Server 2019 made significant changes to overlay networking that enabled that, adding support for CNI networking plugins like Calico, so you now need to use Windows Server 2019 or later for Kubernetes pods, nodes and containers. But different Windows builds are supported depending on which version of Kubernetes you’re running.

This was a little more complicated when Windows Server had more frequent Semi-Annual Channel (SAC) releases, but Microsoft is now suggesting that organizations that want to upgrade versions of Windows Server more quickly to get improvements in container support move to Azure Stack HCI and use Azure Kubernetes Service, so you only need to think about that if you’re already running existing SAC releases. With Kubernetes 1.25, Windows Server 2019, Windows Server 2022 and Windows Server 20H2 (the final SAC release) are supported.

Older SAC releases are supported on Kubernetes 1.17 to 1.19, but the point of using SAC releases was to take advantage of new features more quickly, so most organizations affected by this should be in a position to upgrade to Windows Server 2022. That has smaller base container images but also includes more container features like virtualized time zones for distributed apps, running apps that depend on Active Directory without domain-joining your container host using GMSA, IPv6 support for Windows containers and other networking improvements.

If you’re using GKE, you can’t create new containers using SAC images anymore.

Going forward, you will be able to run a Windows Server 2022 container image on all new versions of Windows 11 and Windows Server until the next Long-Term Servicing Channel release, so you can build a Windows container image now using the Windows Server base OS image, and it will run on releases up to and including Windows Server 2025 (or whatever Microsoft calls the next LTSC release). At that point, Microsoft will add a deprecation scheme, so the base OS image for Windows for that new release will run on the next LTSC after that.

That gives Microsoft more freedom to change the APIs between user and kernel mode as it needs to, while allowing users to run one container image for longer by using process isolation.

Kubernetes Runtimes on Windows

For many developers, Kubernetes is synonymous with Docker containers, but while the Docker runtime has been widely used containerd and has been supported as a Kubernetes runtime since 2018 and has been the interface for Windows containers since Kubernetes 1.18. Using containerd as the runtime will eventually allow Hyper-V isolated containers to run on Kubernetes, giving you a secure multitenant boundary across Windows containers: It is also required for node features like TerminationGracePeriod.

When Mirantis bought Docker Enterprise and renamed it the Mirantis Container Runtime, it also committed to maintaining the dockershim code with Docker: Windows containers using that runtime will still build and run in the say way, but support for them now comes from Mirantis rather than Microsoft. You can also use the Moby runtime for Windows containers.

Kubernetes on the Windows Desktop

If you’re running Kubernetes infrastructure, you likely don’t want the overhead of virtualization. But if you want to work with Kubernetes — for development or just to learn the API — on your Windows desktop, you can use Hyper-V and a Linux VM, or WSL and run a Linux distro directly to run Kubernetes.

If you’re using Docker for Windows, Rancher Desktop, or minikube and a recent build of Windows 10 or 11, they integrate with WSL 2, so you get better performance, better memory usage and integration with Windows (simplifying working with files). Kind and k3s will both run on WSL 2 or Hyper-V, but you may need some extra steps (and as Kind stands for Kubernetes in Docker, you’ll need that or Rancher Desktop anyway). You can install Docker on WSL 2 without Docker Desktop if you only want Linux containers.

Alternatively, if you’re getting started with Kubernetes on Windows and you want to quickly build your first Windows Kubernetes cluster to try things out — or to create a local development environment — the SIG Windows dev-tools repo has everything you need to create a two-node cluster from scratch, with your choice of production or leading edge Kubernetes versions. This uses Vagrant to create VMs with Hyper-V or VirtualBox, create and start the Kubernetes cluster, spin up a Windows node, join it to the cluster and set up a CNI-like Calico.

Is Kubernetes Right for All Windows Apps?

The .NET, ASP.NET and IIS applications many enterprises run can be containerized, as can applications that consume Windows APIs like DirectX (like a game server), but as always with containers, you need to think about the state. The rule of thumb is that you can containerize Windows apps on Kubernetes if critical data like the state is persisted out of the process and rebooting the app fixes common errors. If that would lose state, you’ll need to think about rewriting the app or adding extra jobs to the workload. If more than one process can work on shared data, Kubernetes should be a good way to scale the app.

Many IIS and ASP.NET apps have hardcoded configurations in web config. To migrate to Kubernetes, you’ll want to move application configuration and secrets out of the pods using environment variables (so the workload knows whether it’s running in a test or production environment, say) in both the web config file and the YAML file for the application. If you don’t want to rewrite the code, you can do that by calling a PowerShell script from the dockerfile for the container image.

There are tools built in to the Windows Admin Center to help you containerize existing Windows apps: It has the option to bring containers to Azure, but you can use it to create images and use them on any Kubernetes infrastructure.

Will the Kubernetes Control Plane Ever Come to Windows?

Running Kubernetes means either running your own Linux infrastructure or using a cloud Kubernetes service, but even enterprises with large numbers of Windows Server workloads increasingly have Linux expertise. Although there have been discussions about bringing the Kubernetes control plan to Windows (and even a few prototypes because most of the components needed to run nodes as leaders can be ported to run on Windows), the broader Kubernetes ecosystem for logging, monitoring and other operations tooling is based on Linux. Even with technologies like eBPF coming to Windows, replacing or migrating all of that to Windows would be a significant amount of work, especially when VMs and WSL can handle most scenarios.

But as Kubernetes is increasingly used at the edge, especially in IoT scenarios where resources are often severely constrained, the overhead of a Linux VM to work with Windows containers can be prohibitive. There are a lot of edge locations where IoT devices and containers collect and process data — an automated food kiosk in a shopping mall, a pop-up store at a festival or an unattended drill head on a small oil field — where Kubernetes would be useful but running a management server is challenging.

Brendan Burns, Kubernetes co-founder and corporate vice president at Microsoft, mentioned in a recent Azure event that while the team had expected that customers would deploy bigger and bigger clusters, instead “people were deploying lots and lots of small clusters.” IoT is likely to make that even more common.

Microsoft’s new AKS-lite Kubernetes distribution designed for edge infrastructure runs on IoT Enterprise, Enterprise, and Pro versions of Windows 10 or 11 on PC class hardware, with Kubernetes or k3s running in a Linux VM (the private preview initially ran only Windows containers using Windows IoT images, although Linux container support is available in the public preview).

The value of Kubernetes is the API it delivers more than the specific Linux implementation that delivers that, and the strong CNCF certification process means that the many Kubernetes distributions compete on the tools and enhancements they include and the choices they make about runtimes, networking and storage to suit particular scenarios, rather than on the fundamentals of Kubernetes. If a scenario like orchestrating IoT containers makes it useful, perhaps a future Windows Kubernetes distribution that doesn’t rely on Linux VMs will join the list.

The post Kubernetes for Windows appeared first on The New Stack.

Kubernetes comes with its own web UI for deploying containerized applications to a cluster using wizards, troubleshooting workloads and managing cluster resources – known as Dashboard. But there are other options as well.

What Is a Kubernetes Dashboard?

Having one place to see logs, metrics and error reports is useful and if you’d rather use a graphical interface than the command line, you can initiate rolling updates, scale pods and nodes, create and restart pods and other resources like jobs and deployments, or see all the running services and edit them.

The Kubernetes Dashboard is actually a container that you have to choose to deploy, so it won’t be consuming resources if you choose to use a different Kubernetes dashboard for managing, monitoring and troubleshooting applications and the cluster itself.

Benefits of Kubernetes Dashboards

There’s a wide range of alternative Kubernetes dashboards: some aim to simplify operations and provide guardrails, others add advanced features or integrate Kubernetes with existing management ecosystems. Some run on the cluster, some on your local workstation and you can also consider SaaS solutions like Datadog and ContainIQ: what you pick will depend on whether you want to monitor the Kubernetes cluster itself, the performance of host servers, the application and service layer or the entire stack.

Additional Kubernetes Dashboard Options

You may also want to consider the implications of dashboards like Headlamp, Lens or Octant having both read and write access so you can use them to make changes the way you would with kubectl. This can be helpful but remember to use RBAC and admissions controllers if you want to avoid out-of-band changes through these tools because you use GitOps automated management.

Headlamp

Kinvolk’s web UI for Kubernetes starts out minimal with panes for cluster, workload, storage, network and security information with a mix of headline metrics and details of different objects, but you can extend Headlamp significantly or just modify the interface with frontend plugins (which are easy to author as they’re written in JavaScript).

Even without that, Headlamp is extremely capable, has good performance — and we like that it gives you the documentation for Kubernetes objects right in the interface.

Unusually, Headlamp gives you the choice of running it locally on your desktop or directly in the cluster if it’s used frequently. Authentication is through OpenID Connect (OIDC).

Skooner (Previously K8dash)

A fully web-based dashboard with both a detailed desktop view and a simpler interface optimized for running on mobile devices so you can get a real-time view of your cluster from almost anywhere, Skooner is both powerful and popular — and it’s now a CNCF sandbox project, which may see missing features like multicluster support arrive in future.

If you’re already using OIDC to authenticate to your cluster you can use that to log into dashboards by adding the environment variables: if not, you can use service account tokens or a NodePort service. Skooner is easy to install and only runs one service but it does rely on metrics-server for real-time metrics. You can monitor and manage a range of cluster objects — namespaces, nodes, pods, replica sets, deployments, storage, RBAC configurations and workloads — see logs and documentation, edit resource with the YAML editor or SSH into a running pod using a terminal in your browser window.

Weave Scope

Open source maintained by Weaveworks, Scope is a monitoring, visualization and management web platform for Docker and Kubernetes that automatically generates maps of your container infrastructure and specific applications to help you troubleshoot stability and performance issues. You can see topologies for processes, containers, orchestrators and hosts and drill in for details and metrics, or launch a command line to interact with containers if you want to do more than pause, stop, restart or delete.

Lens

Increasingly, developers need to do at least a modicum of Kubernetes operations. Lens is actually a desktop IDE that catalogs your Kubernetes cluster and lets you explore and create resources even in extremely large clusters with thousands of pods. Lens Desktop was originally created by VMware and then open sourced by Mirantis but there are also various commercial modules for security scanning and collaboration that integrate with it.

Getting set up is fairly straightforward because the installer uses the existing kubeconfig file on your device so you don’t have to set up extra permissions or authentication options. If you want to display metrics rather than a catalog view (think directories of files or Slack channels) that lets you view and edit pods, deployments, namespaces, storage, networks, jobs, custom resources and other objects in your cluster, you can configure it to work with Prometheus to see live statistics and real-time log streams. Developers can use Lens to install apps from Helm charts, edit objects in a terminal window rather than using kubectl or add a limited range of plugins for more functionality.

If you just want a view of a Kubernetes cluster in your IDE, Kubernator is a Visual Studio Code extension that gives you a Kubernetes object tree viewer and manifest editor — but no dashboards.

Octant

Originally developed by Heptio and adopted by VMware as part of its Tanzu project, Octant appears to no longer be in active development and it hasn’t reached a 1.0 release. However, it’s still popular and you may find it useful for visualizing cluster workloads, namespaces, metadata and other real-time information that developers need to get an overview of how applications run on the cluster — especially if you don’t have a deep knowledge of Kubernetes already.

Octant runs locally on the workstation you use to manage Kubernetes clusters so it doesn’t use up cluster resources (which will be especially welcome if you’re using k3s, KIND or similar options for getting Kubernetes on systems with minimal resources). Like Lens, it uses your kubeconfig file for setup.

Octant calls itself “a platform with a dashboard view” rather than a dashboard. The interface is organized into applications, an overview of the Kubernetes namespace, a cluster overview and a panel for your choice of plugins (the extensibility is one of the big advantages of Octant), with diagrams to show the structure of applications, label filtering to narrow down what you see and color coding to help you spot potential issues.

Kube-Ops View

Kubernetes Operational View (KOV) is a read-only system dashboard that runs locally and designed to give you an overview of multiple clusters in one screen. The rather retro console-style GUI is actually written in WebGL. KOV isn’t designed as a Kubernetes Dashboard replacement because you can’t manage applications or interact with the cluster, but you can see status and resource usage at a glance.

Prometheus

You might already be using Prometheus to monitor containerized workloads since this graduated CNCF project is an extremely popular monitoring and alerting toolkit with service discovery for many tools and clouds and integrations for a wide range of applications, usually used in conjunction with Grafana for creating dashboards. Kubernetes exposes metrics directly in the Prometheus format through the kube-state-metrics service, so you can use it to monitor your Kubernetes infrastructure as well as container workloads. This Grafana template for setting up the Kubernetes monitoring dashboard is a quick way to get started — or you can use a tool like Lens to consume Prometheus metrics.

Prometheus is powerful, fully free and has an active community that can help you get up to speed. Bear in mind that is a system to collect and process metrics not an event logging system so you may prefer something like Grafana Loki for handling logs, and it’s not designed to store data for a long time, so it’s less useful for long-term trend analysis.

The post Kubernetes Dashboards: Everything You Need to Know appeared first on The New Stack.

Kubernetes is an orchestrator. It’s how you deploy, network, load balance, scale and maintain containerised apps. And each of those workloads has its own architecture, whether that’s stateful or stateless, a monolith that you’ve containerised or microservices that you use with a service mesh, batch jobs or serverless functions.

But you also need to think about the architecture of your Kubernetes infrastructure itself: how you build the platform where Kubernetes runs.

Kubernetes is flexible enough to deploy almost any kind of application on almost any kind of hardware, in the cloud or elsewhere: in order to be both that generic and that powerful, it’s extremely configurable and extensible. That leaves you with a lot of architectural choices to make.

These choices include whether you make all the individual configuration choices yourself, follow the default options in tools like VMware Tanzu or Azure Arc that offer a more integrated approach to deploying and managing infrastructure — or go with a managed cloud Kubernetes service that still gives you choices about the resources you deploy but will have quick starts, reference architectures and blueprints designed for common application workloads.

Planning Kubernetes Resources

Your Kubernetes infrastructure architecture is the set of physical or virtual resources that Kubernetes uses to run containerized applications (and its own services), as well as the choices that you make when specifying and configuring them.

You need to decide what virtual machines (or bare metal hardware) you need for the control plane servers, cluster services, add-ons, clusters, data store and networking components, how many nodes you need on your cluster and what memory and vCPU they should have based on the workload and resource requirements for pods and services.

Autoscaling lets you adjust capacity up or down dynamically, but you need to have the underlying capacity available. You need to think about the best platform for hosting your Kubernetes clusters: infrastructure in your own data center, at the edge, with a hosting provider, or in a public, private or hybrid cloud.

Some of that will be dictated by the needs of your workloads: if they’re primarily stateless (or if it’s easy to store that state externally), you can keep cloud costs down by using spot instances that are deploy discounted but might also be interrupted suddenly. You need to know something about the size, complexity and scalability of the applications you plan to run and the amount of control and customization you’ll need, as well as factoring in the performance, availability and cost of the resources you’ll be using.

Originally, Kubernetes was built with the assumption that all the hardware it was running on would be fundamentally similar and effectively interchangeable, because it was developed to take advantage of the commodity servers common in cloud Infrastructure as a Service (IaaS).

But even in the cloud, different workloads still need very different resources and Kubernetes has evolved to support much more heterogeneous infrastructure: not just Windows nodes as well as Linux, but GPUs as well as CPUs, Arm processors as well as x86. There is even the option to use certain classes of Linux devices as nodes.

If you’re using cloud IaaS for your Kubernetes virtual machines or a managed cloud Kubernetes service like AKS or EKS, you can choose the appropriate instances for your VMs. If you’re building your own Kubernetes infrastructure at the edge, you might pick Arm hardware or consumer-grade Intel NUCs to run a less demanding Kubernetes distribution like k3s in a restaurant or retail store, where you don’t have the facilities for data-center grade hardware.

Depending on the Kubernetes distribution you choose, you may also need to think about the host OS you want and which container runtime you’re going to use. Will you run your own container registry or only pull images from public registries? Where will you store secrets? Using HashiCorp Vault or a managed key store from your cloud provider means you won’t have credentials in your deployment pipeline where they might leak.

Multi-Cluster K8s Infrastructure Architecture

You also need to think about possible failure: do you need highly available clusters that run multiple replicas of key control plane components or will you be running a multi-cluster architecture?

For smaller Kubernetes infrastructure, you can separate different workloads using namespaces: logical partitions that let you isolate and manage different applications, environments and projects on one cluster. But you can also use a single Kubernetes control plane to manage multiple clusters of nodes, putting workloads on distinct clusters for better security and performance.

If you have regulatory requirements or strict limits on what latency is acceptable, need to enforce different policies and permissions, or want to avoid a single point of failure for an application that requires zero downtime, this lets you orchestrate applications in different locations – including on different cloud providers – but still have one place to access that infrastructure. That simplifies migrating applications from cluster to cluster, whether that’s for scaling or disaster recovery, although it also introduces significant complexity.

Networking your Kubernetes Infrastructure

You also need to plan your service discovery options and network topology, including the firewall and VPN connectivity, as well as the network plugins, DNS settings, load balancer and ingress controller for the cluster.

Think about access management: you will need to deploy role-based access control (RBAC) to enforce fine-grained permissions and policies for your users and resources, and make sure you’re securing admin access. But you also need to manage machine identities for workloads that need access to existing data stores.

Native Kubernetes user authentication uses certificates: if you need centralized control and governance for user access, you will probably want to use your existing identity provider for authentication.

Architect for Managing Kubernetes

Since Kubernetes is built to make it easy to scale applications, while you can make manual changes to individual settings like liveness and readiness probes, it’s really designed for declarative configuration management. You write configuration files in YAML (or use a tool that emits those for you) to tell Kubernetes how an application should behave, and Kubernetes handles making that happen.

Instead of tweaking settings, you should focus on automating for repeatability using Infrastructure as Code: set up the configuration as version-controlled, auditable code and apply it as often as you need (or restart it if there’s a problem), getting the same system every time.

Repeatable, immutable infrastructure where you treat clusters as cattle (rather than pets that you name and hug and care about individually) is what Kubernetes is designed for. Preparing for that is how you reduce the effort of ongoing management and actually operating containers in production.

You can extend this to policy management and governance as well as application delivery using a GitOps workflow with Flux or Argo CD that deploys application updates and keeps clusters in the desired state all the way from bootstrapping to configuration updates. You’ll want to collect metrics and track performance: most workloads emit Prometheus metrics but you’ll also need to think about a monitoring dashboard and what logging you want to enable.

You’ll need to monitor your container infrastructure for threats and security risks, as well as making sure your VM hosts are appropriately hardened. Again, thinking about the tools and processes you’ll use for that while you’re planning your Kubernetes infrastructure architecture will make it easier to make sure you don’t miss anything.

Understanding Kubernetes Architecture

Putting all of this together isn’t trivial and you can learn a lot from how other Kubernetes users have structured their infrastructure architecture.

“You’re trying to acquire eight years of Kubernetes development before you can be productive with it. That’s too much to ask. You need an almanac that helps you navigate and avoid the icebergs,” cautioned Lachlan Evenson, former Kubernetes release lead and steering committee member. Evenson co-authored “Kubernetes Best Practices” with Kubernetes co-founder Brendan Burns in an attempt to provide a companion guide to offer some of that.

But you should still expect to spend time figuring out what infrastructure architecture will best suit your particular workloads and acquiring the expertise to run it.

The post How to Build The Right Platform for Kubernetes appeared first on The New Stack.

Since it grew out of the browser, it’s easy to assume that JavaScript would be a natural fit for WebAssembly. But originally, the whole point of WebAssembly was to compile other languages so that developers could interact with them in the browser from JavaScript (compilers that generate Wasm for browsers create both the Wasm module and a JavaScript shim that allows the Wasm module to access browser APIs).

Now there are multiple non-browser runtimes for server-side WebAssembly (plus Docker’s Wasm support), where Wasm modules actually run inside a JavaScript runtime (like V8), so alignment with JavaScript is still important as WebAssembly becomes more of a universal runtime.

Wasm is intentionally polyglot and it always will be; a lot of the recent focus has been on supporting languages like Rust and Go, as well as Python, Ruby and .NET. But JavaScript is also the most popular programming language in the world, and there’s significant on-going work to improve the options for using JavaScript as a language for writing modules that can be compiled to WebAssembly (in addition to the ways WebAssembly already relies on JavaScript), as well as attempts to apply the lessons learned about improving JavaScript performance to Wasm.

Developer Demand

When Fermyon released SDKs for building components for its Spin framework using first .NET and then JavaScript and TypeScript, CEO Matt Butcher polled customers to discover what languages they wanted to be prioritized. “[We asked] what languages are you interested in? What languages are you writing in? What languages would you prefer to write in? And basically, JavaScript and TypeScript are two of the top three.” (The third language developers picked was Rust — likely because of the maturity of Rust tooling for Wasm generally — with .NET, Python and Java also proving popular.)

Suborbital saw similar reactions when it launched JavaScript support for building server-side extensions, which quickly became its most popular developer language, Butcher told us.

It wasn’t clear whether the 31% of Fermyon customers wanting JavaScript support and the 20% wanting TypeScript support were the same developers or a full half of the respondents, but the language had a definite and surprising lead. “It was surprising to us; that momentum in a community we thought would be the one to push back the most on the idea that JavaScript was necessary inside of WebAssembly is the exact community that is saying no, we really want [JavaScript] support in WebAssembly.”

Butcher had expected more competition between languages for writing WebAssembly, but the responses changed his mind. “They’re not going to compete. It’s just going to be one more place where everybody who knows JavaScript will be able to write and run JavaScript in an emerging technology. People always end up wanting JavaScript.”

“I think at this point, it’s inevitable. It’s going to not just be a WebAssembly language, but likely the number one or number two WebAssembly language very quickly.”

While Butcher pointed at Atwood’s Law (anything that can be written in JavaScript will be), director of the Bytecode Alliance Technical Steering Committee Bailey Hayes brought up Gary Bernhardt’s famous Birth and Death of JavaScript (which predicts a runtime like WebAssembly and likens JavaScript to a cockroach that can survive an apocalypse).

“Rust can be hard to learn. It’s the most loved language, but it also has a pretty steep learning curve. And if somebody’s just getting started, I would love for them to start working with what they know.” Letting developers explore a new area like WebAssembly with the tools they’re familiar with makes them more effective and makes for a better software ecosystem, Hayes suggested. “Obviously we’re all excited about JavaScript because it’s the most popular thing in the world and we want to get as many people on WebAssembly as possible!”

What Developers Want to Do in JavaScript

Butcher put WebAssembly usage into four main groups: browser applications, cloud applications, IoT applications and plugin applications. JavaScript is relevant to all of them.

“What we have seen [at Fermyon] is [developers] using JavaScript and WebAssembly to write backends for heavily JavaScript-oriented frontends, so they’ll serve out their React app, and then they’ll use the JavaScript back end to implement the data storage or the processing.”

There are obvious advantages for server-side Wasm, Hayes pointed out. “Folks that do server-side JavaScript are going to roll straight into server-side Wasm and get something that’s even smaller and starts even faster: they’re going to see benefits without hardly any friction.”

“People are very excited about running WebAssembly outside the browser, so let’s take the most popular language in the world and make sure it works for this new use case of server-side WebAssembly.”

There were some suggestions for what else JavaScript in WebAssembly would be useful for that struck Butcher as very creative. “One person articulated an interesting in-browser reason why they want JavaScript in WebAssembly, that you can create an even more secure JavaScript sandbox and execute arbitrary untrusted code inside of WebAssembly with an interface to the browser’s version of JavaScript that prevents the untrusted JavaScript from doing things to the trusted JavaScript.”

Being able to isolate snippets of untrusted code in the Wasm sandbox is already a common use case for embedded WebAssembly: SingleStore, Scylla, Postgres, TiDB and CockroachDB have been experimenting with using Wasm for what are effectively stored procedures.

Fastly’s js-compute runtime is JavaScript running on WebAssembly for edge computing, Suborbital is focusing on plugins (where JavaScript makes a lot of sense), Shopify recently added JavaScript as a first-class language for WebAssembly functions to customize the backend, and Redpanda shipped WebAssembly support some time ago (again using JavaScript).

Redpanda’s WebAssembly module exposes a JavaScript API for writing policy on how data is stored on its Kafka-compatible streaming platform, and CEO Alex Gallego told us that’s because of both the flexibility and popularity of JavaScript with developers.

The flexibility is important for platform developers. “When you’re starting to design something new, the most difficult part is committing to a long-term API,” he noted. “Once you commit, people are going to put that code in production, and that’s it: you’re never going to remove that, you’re stuck with your bad decisions. What JavaScript allows you to do, from a framework developer perspective, is iterate on feedback from the community super-fast and change the interface relatively easily because it’s a dynamic language.”

With JavaScript, developers get a familiar programming model for business logic like masking social security numbers, finding users in specific age groups, or credit-scoring IP addresses — all without needing to be an expert in the intricacies of distributed storage and streaming pipelines. “The scalability dimensions of multithreading, vectorization instructions, IO, device handling, network throughput; all of the core gnarly things are still handled by the underlying platform.”

JavaScript: Popular and Performant

Appealing to developers is a common reason for enabling JavaScript support for writing WebAssembly modules.

When a new service launches, obviously developers won’t have experience with it; but because they know JavaScript, it’ll be much easier for them to get up to speed with what they want to do. That gives platforms a large community of potential customers, Gallego noted.

“It gives WebAssembly the largest possible programming community in the world to draw talent from!”

“WebAssembly allows you to mix and match programming languages, which is great. But in practical terms, I think JavaScript is the way to go. It’s super easy. It’s really friendly, has great packaging, there are a million tutorials for developers. And as you’re looking at expanding talent, right, which is challenging as companies grow, it’s much easier to go and hire JavaScript developers.”

“When it comes to finding the right design for the API that you want to expose, to me, leaning into the largest programming community was a pretty key decision.”

“JavaScript is one of the most widely used languages; it’s always very important because of adoption,” agreed Fastly’s Guy Bedford, who works on several projects in this space. “WebAssembly has all these benefits which apply in all the different environments where it can be deployed, because of its security properties and its performance properties and its portability. All these companies are doing these very interesting things with WebAssembly, but they want to support developers to come from these existing ecosystems.”

JavaScript has some obvious advantages, Bucher noted: “the low barrier to entry, the huge variety of readily available resources to learn it, the unbelievably gigantic number of off-the-shelf libraries that you can pull through npm.”

JavaScript could become the SQL equivalent for Wasm

Libraries are a big part of why using JavaScript with WebAssembly will be important for functionality as well as adoption. “If you’ve developed a library that’s very good at matrix multiplication, you really want to leverage the decade of developer hours that it took you to build that library.” With those advantages, JavaScript could become the SQL equivalent for Wasm, Gallego suggested.

The 20 years of optimization that JavaScript has had are also a big part of the appeal. “There’s so much money being poured into this ecosystem,” he points out. “Experts are very financially motivated to make sure that your website renders fast.” The programming team behind the V8 JavaScript engine includes the original creator of Java’s garbage collector. “The people that are focused on the performance of JavaScript are probably the best people in the world to focus on that; that’s a huge leg over from anything else.”

“I think that’s why JavaScript continues to stay relevant: it’s just the number of smart, talented people working on the language not just at the spec level, but also at the execution level.”

“Single thread performance [in JavaScript] is just fantastic,” he noted: that makes a big difference at the edge, turning the combination of WebAssembly and JavaScript into “a real viable vehicle for full-blown application development”.

Similarly, Butcher mused about the server-side rendering of React applications on a WebAssembly cloud to cater to devices that can’t run large amounts of JavaScript in the browser.

“V8 has all of these great performance optimizations,” he agreed. “Even mature languages like Python and Ruby haven’t had the same devoted attention from so many optimizers [as JavaScript] making it just a little bit faster, and just a little more faster.”

“The performance has been pretty compelling and the fact that it’s easy to take a JavaScript runtime and drop it into place… I looked at that and of course, people would want a version that would run in WebAssembly. They can keep reaping the same benefits they’ve had for so long.”

But WebAssembly isn’t quite ready for mainstream JavaScript developers today.

“JavaScript has this low barrier to entry where you don’t have to have a degree or a bunch of experience; it’s a very accessible language. But if you’re a JavaScript developer and you want to be using WebAssembly it’s not easy to know how to do that,” Bedford warned.

Different Ways to Bring JavaScript to Wasm

You can already use JavaScript to write WebAssembly modules, but “there are significant updates coming from the Bytecode Alliance over the next few months that are going to enable more JavaScript,” Cosmonic CEO Liam Randall told us.

“When we think about what the big theme for WebAssembly is going to be in 2023, it really comes down to components, components, components.”

“There have been significant advancements this year in the ability to build, create and operate components and the first two languages down the pipe are Rust and some of this JavaScript work,” Randall continued.

Currently, the most popular approach is to use the very small (210KB) QuickJs interpreter originally adopted or popularised by Shopify, which is included in a number of WebAssembly runtimes. For example, Shopify’s Javy and Fermyon’s spin-js-sdk use Quickjs with the Wasmtime runtime (which has early bindings for TypeScript but doesn’t yet include JavaScript as an officially supported language) and there’s a version of QuickJS for the CNCF’s WasmEdge runtime that supports both JavaScript in WebAssembly and calling C/C++ and Rust functions from JavaScript.

QuickJs supports the majority of ECMAScript 2020 features, including strings, arrays, objects and the methods to support them, async generators, JSON parsing, RegExps , ES modules and optional operator overloading, big decimal (BigDecimal) and big binary floating point numbers (BigFloat). So it can run most JavaScript code. As well as being small, it starts up fairly quickly and offers good performance for running JavaScript — but it doesn’t support JIT.

Using QuickJs is effectively bundling in a JavaScript runtime and there’s a tradeoff for this simplicity Hayes noted: “you typically have a little bit larger size and maybe the performance isn’t as perfect as it could be — but it works in most cases, and I’ve been seeing it get adopted all over.”

Fermyon’s JavaScript SDK builds on the way Javvy uses QuickJs but uses the Wizer pre-initializer to speed up the QuickJs startup time by saving a snapshot of what the code will look like once it’s initialized. “Wizer is what makes .NET so fast on WebAssembly,” Butcher explained. “It starts off the runtime, loads up all the runtime environment for .NET and then writes it back out to disk as a new WebAssembly module. We discovered we can do the same thing with QuickJs.”

“When you run your spin build, the SDK takes the JavaScript runtime, takes your source files, optimizes it with WIZER and then packages all of that up and ships that out a new WebAssembly binary.”

If the idea of getting a speed boost by pre-optimizing the code for an interpreted language sounds familiar, that’s because it’s the way most of the browser JavaScript engines work. “They start interpreting the JavaScript but while they’re interpreting, they feed in the JavaScript files to an optimizer so that a few milliseconds into execution, you flip over from interpreted mode into the compiled optimized mode.”

“One of the biggest untold stories is how much, at the end of the day, WebAssembly really is just everything we’ve learned from JavaScript, Java, .NET — all the pioneering languages in the 90s,” Butcher suggested. “What did we learn in 15-20 years of doing those languages and how do we make that the new baseline that we start with and then start building afresh on top of that?”

Adding JIT

Shopify also contracted Igalia to bring SpiderMonkey, the Mozilla JavaScript engine, to Wasm; while Fastly (which has a number of ex-Mozilla staff) has taken an alternative approach with compontentize-js, using SpiderMonkey to run JavaScript for WebAssembly in the high-speed mode it runs in the browser, JIT compiling at least part of your JavaScript code and running it inside the WebAssembly interpreter.

Although WebAssembly modules are portable enough to use in many different places, it’s not yet easy to compose multiple Wasm modules into a program (as opposed to writing an entire, monolithic program in one source language and then compiling that into a single module). Type support in Wasm is primitive, the different WebAssembly capabilities various modules may require are grouped into different “worlds” (like web, cloud and the CLI) and modules typically define their own local address space.

“The problem with WebAssembly has been that you get this binary, but you’ve got all these very low-level binding functions and there’s a whole lot of wiring process. You have to do that wiring specifically for every language and it’s a very complex marshaling of data in and out, so you have to really be a very experienced developer to be able to know how to handle this,” Bedford told us.

The WebAssembly component model adds dependency descriptions and high-level, language-independent interfaces for passing values and pointers. These interfaces solve what he calls “the high-level encapsulation problem with shared nothing completely separated memory spaces.”

“You don’t just have a box, you have a box with interfaces, and they can talk to each other,” he explained. “You’re able to have functions and different types of structs and object structures and you can have all of these types of data structures passing across the component boundary.”

That enables developers to create the kind of reusable modules that are common in JavaScript, Python, Rust and other languages.

Compontentize-js builds on this and allows developers to work with arbitrary bindings. “You bring your bindings and your JavaScript module that you want to run and we give you a WebAssembly binary that represents the entire JavaScript runtime and engine with those bindings. We can do that very quickly and we can generate very complex bindings.”

This doesn’t need a lot of extra build steps for WebAssembly: JavaScript developers can use familiar tooling, and install the library from npm.

Although the SpiderMonkey engine size is larger than QuickJs — Bedford estimates a binary with the JavaScript runtime and a developer’s JavaScript module will be 5-6MB — that’s still small enough to initialize quickly, even on the kind of hardware that will be available at the edge (where Fastly’s platform runs).

Again, this uses Wizer to optimize initialization performance, because that affects the cold start time. “We pre-initialize all of the JavaScript up until right before the point where it’s going to call your function, so there’s no JavaScript engine initialization happening. Everything is already pre-initialized using Wizer.”

“You’re just calling the code that you need to call so there’s not a whole lot of overhead.”

That isn’t AOT (Ahead Of Time) compilation, but later this year and next year, componentize-js will have more advanced runtime optimizations using partial evaluation techniques that Bedford suggested will effectively deliver AOT. “Because you know which functions are bound you can partially evaluate the interpreter using Futamura projections and get the compiled version of those functions as a natural process of partially evaluating the interpreter in SpiderMonkey itself.”

Compontentize-js is part of a larger effort from the Bytecode Alliance called jco — JavaScript components tooling for WebAssembly — an experimental JavaScript component toolchain that isn’t specific to the JavaScript runtime Fastly uses for its own edge offering. “The idea was to build a more generic tool, so wherever you’re putting WebAssembly and you want to allow people to write a small bit of JavaScript, you can do it,” Bedford explained.

Jco is a project “where you can see the new JavaScript experience from stem to stern”, Randall noted, suggesting that you can expect to see more mature versions of the JavaScript and Rust component work for the next release of wasmtime, which will be aligned with WASI Preview2. It’s important to note that this is all still experimental — there hasn’t been a full release of the WebAssembly component model yet and Bedford refers to componentize-js as research rather than pre-release software: “this is a first step to bring this stuff to developers who want to be on the bleeding edge exploring this”.

The experimental slightjs is also targeting the WebAssembly component model, by creating the Wasm Interface Types (WIT) bindings that lets packages share types and definitions for JavaScript. So far the wit-bindgen generator (which creates language bindings for programs developers want to compile to WebAssembly and use with the component mode) only supports compiled languages — C/C++, Rust, Java and TinyGo — so adding an interpreted language like JavaScript may be challenging.

While spin-js-sdk produces bindings specifically for Spin HTTP triggers, SlightJs aims to create bindings for any WIT interface a developer wants to use. Eventually, it will be part of Microsoft’s SpiderLightning project, which provides WIT interfaces for features developers need when building cloud native applications, adding JavaScript support to the slight CLI for running Wasm applications that use SpiderLightning.

Currently, SlightJS uses QuickJs because the performance is better, but as the improvements to SpiderMonkey arrive it could switch and Butcher pointed out the possible performance advantages of a JIT-style JavaScript runtime. QuickJs itself has largely replaced an earlier embeddable JavaScript engine, Duktape.

“There’s a real explosion of activity,” Bedford told us: “there’s very much a sense of accelerating development momentum in this space at the moment.”

Improving JavaScript and Wasm Together

You can think of these options as “JavaScript script on top and WebAssembly on the bottom,” suggested Daniel Ehrenberg, vice president of the TC39 ECMAScript working group, but another approach is “JavaScript and WebAssembly side by side with the JavaScript VM beneath it”.

The latter is where Bloomberg and Igalia have been focusing, with proposals aimed at enabling efficient interaction between JavaScript and WebAssembly, like reference-typed strings to make it easier for WebAssembly programs to create an consume JavaScript strings, and WebAssembly GC for garbage collection to simplify memory management.

Making strings work better between the two languages is about efficiency, TC39 co-chair and head of Bloomberg’s JavaScript Infrastructure and Tooling team Rob Palmer explained.

“This unlocks a lot of use cases for smaller scale use of WebAssembly [for] speeding up some small amount of computation.”

“At the moment they cannot currently really be efficient, because the overhead of copying strings in between the two domains outweighs the benefit of higher speed processing within WebAssembly.”

GC goes beyond the weak references and finalization registry additions to JavaScript (in ECMAScript 2021), which provide what Ehrenberg calls a bare minimum of interoperability between WebAssembly’s linear memory and JavaScript heap-based memory, allowing some Wasm programs to be compiled. The GC proposal is more comprehensive. “WebAssembly doesn’t just have linear memory; WebAssembly can also allocate several different garbage-collection-allocated objects that all point to each other and have completely automatic memory management,” Ehrenberg explains. “You just have the reference tracing and when something’s dead, it goes away.”

Work on supporting threads in WASI to improve performance through parallelization and give access to existing libraries is at an even earlier stage (it’s initially only for C and it isn’t clear how it will work with the component model) but these two WebAssembly proposals are fairly well developed and he expects to see them in browsers soon, where they will help a range of developers.

“Partly that’s been enabling people to compile languages like Kotlin to WebAssembly and have that be more efficient than it would be if it were just directly with its own memory allocation, but it also enables zero-copy memory sharing between JavaScript and WebAssembly in this side-by-side architecture.”

For server-side JavaScript, Ehrenberg is encouraged by early signs of better alignment between two approaches that initially seemed to be pulling in different directions: WinterCG APIs (designed to enable web capabilities in server-side environments) and WASI, which aims to offer stronger IO capabilities in WebAssembly.

“You want WinterCG APIs to work in Deno but you also want them to work in Shopify’s JavaScript environment and Fastly’s JavaScript environment that are implemented on top of WebAssembly using WASI,” he pointed out. “Now that people are implementing JavaScript on top of WebAssembly, they’re looking at can JavaScript support the WinterCG APIs and then can those WinterCG APIs be implemented in WASI?”

The Promise of Multilanguage Wasm

The flexibility of JavaScript makes it a good way to explore the componentization and composability that gives the WebAssembly component model so much promise, embryonic as it is today.

Along with Rust, JavaScript will be the first language to take advantage of a modular WebAssembly experience that Randall predicted will come to all languages, allowing developers to essentially mix and match components from multiple WebAssembly worlds in different languages and put them together to create new applications.

“You could use high performance and secure Rust to build cloud components, much like wasmCloud does, and you could pair that with less complicated to write user-facing code in JavaScript. I could take JavaScript components from different worlds and marry them together and I could take cargo components written in Rust, and I can now recompose those in many different ways.”

“You can have Rust talking to JavaScript and you can be running it in the sandbox or you could have a JavaScript component that’s alerting a highly optimized Rust component to do some heavy lifting, but you’re writing the high-level component that’s your edge service in JavaScript,” agreed Bedford.

The way compontentize-js lets you take JavaScript and bundle it as a WebAssembly component will translate to working in multiple languages with the Jco toolchain and equivalent tools like cargo-component that also rely on the component model.

Despite WebAssembly’s support for multiple languages, using them together today is hard.

“You have to hope that someone’s going to go and take that Rust application and write some JavaScript — write the JavaScript bindgen for it and then maintain that bindgen,” Beford explained. “Whereas with the component model, they don’t even need to think about targeting JavaScript in particular; they can target the component model, making this available to any number of languages and then you as a JavaScript developer just go for it.”

“That’s what the component model brings to these workflows. Someone can write their component in Rust and you can very easily bring it into a JavaScript environment. And then [for environments] outside the browser you can now bring JavaScript developers along.”

That will also open up JavaScript components for Rust developers, he noted. “Jco is a JavaScript component toolchain that supports both creating JavaScript components and running components in JavaScript.”

In the future, the wasm-compose library “that lets you take two components and basically smoosh them together” could help with this, Hayes suggested. As the component model becomes available over the next few years, it will make WebAssembly a very interesting place to explore.

“If you support JavaScript and Rust, you’ve just combined two massive language ecosystems that people love, and now they can interop and let people just pick the best library or tool.”

“I’m so excited about WebAssembly components because, in theory, it should break down the silos that we’ve created between frontend and backend engineers and language ecosystems.”

The post Will JavaScript Become the Most Popular WebAssembly Language? appeared first on The New Stack.

Rising interest in using JavaScript to write WebAssembly is driving maturity in tools, while the component model points to polyglot programs.

This year’s annual update to ECMAScript, which formally standardizes the JavaScript language, will be approved in July 2023, but four proposals for new language features have already reached stage four. This means they’ve been signed off by the editors of ECMAScript in the TC39 working group that manages the language standard, have passed the test suite, and shipped in at least two implementations to check for real-world performance and issues.

Small but Helpful

Symbols as WeakMap keys fills in a small gap in the language, explained Daniel Ehrenberg, vice president of Ecma (the parent organisation of TC39) and a software engineer working on JavaScript developer experience at Bloomberg, who worked on the proposal. Introduced in ECMAScript 2015, WeakMap lets you extend an object with extra properties (for example, to keep track of how often the object is used) without worrying about creating a memory leak, because the key-value pairs in a WeakMap can be garbage collected.

Initially, you could only use objects as keys in a WeakMap, but you want the keys to be unique “and symbols were defined as a new immutable way, that cannot be recreated, so having those as a unique key in the weak map makes a lot more sense”, developer advocate and browser engineer Chris Heilmann told us. This integrates symbols more with these new data structures and might well increase usage of them.

Two of the proposals improve working with arrays, which he notes are becoming increasingly powerful in JavaScript, avoiding the need to write functions and loop over data to process it.

“Now you can do a filter or map, and just have a one-liner for something that was super complex in the past.”

Change Array by Copy gives developers new methods for sorting, reversing and overwriting data without mutating the array it’s stored in. “You’ve always been able to sort arrays, but when you call a sort function it would change the current array; and in functional programming and the functional patterns that have become very popular [in JavaScript], people like to avoid mutations,” TC39 co-chair and head of Bloomberg’s JavaScript Infrastructure and Tooling team Rob Palmer explained.

This proposal lets developers call a method to change a single element in the array, using with or splice, and get a new array with that single change — or sort and reverse an array into a fresh array but leave the original array unmodified. This is simpler for developers because it makes array and tuple behavior more consistent, Heilmann pointed out. “The inconsistency between whether array prototypes change the original array or not is something that drove me nuts in PHP. You do a conversion, send it to a variable and then there’s nothing in the variable, because some functions don’t change the original one and others do change it. Any consistency we can bring to things so people don’t have to look it up every day is a very good idea. And anything that allows me to not have to deal with index numbers and shift them around is also a good thing!”

Array find from last also does exactly what the name suggests, returning matching elements in an array starting at the end and working back, which can improve performance — or save writing extra code. “If you have a huge array, it’s really beneficial because you don’t have to look through the whole thing or reverse it before you look it up, so you don’t have to make a temporary duplicate — which developers do all the time,” Heilmann explained.

Most comments in JavaScript are there for developers working in the source code, to document how it works or record why it was written that way. Hashbang comments, which start with #!, are for specifying the path to the JavaScript interpreter that you want to use to run the script or module that follows the comment (a convention inherited from UNIX script files). CLI JavaScript hosts like Node.js already strip the hashbang out and pass the valid code onto the JavaScript engine, but putting this in the standard moves that responsibility to the JavaScript engine and makes sure it’s done in a uniform way.

Making hashbang grammar official in JavaScript gives it more consistency with the rest of the languages out there, he noted.

While serverside JavaScript is far from new, he said, “it feels to me like JavaScript has finally arrived as a serverside language with this, because when I think of Perl or PHP or all the other languages, you always have the hashbang.”

Although it’s another small change, it’s possible this will make it easier for JavaScript to participate in the AI and machine learning ecosystem, where Python is currently the dominant language.

Larger Proposals

These four proposals are very likely to be everything we see in ECMAScript 2023, which Ehrenberg noted is a small update, but there are also some important larger proposals that have already reached stage three (which means the spec has been agreed, but can’t be developed further without a full test suite and the real world experience of shipping the feature in at least two implementations).

Reaching stage three isn’t a guarantee that a feature will make it into the standard (because the implantations can reveal that changes need to be made). But iterator helpers, Temporal, explicit resource management and decorators are all stage three proposals making good progress that could be on track for ECMAScript 2024.

Iterator Helpers (and the companion stage two proposal for async iterator helpers) aim to make extremely large (including possibly infinite but enumerable data sets) as easy to work with as finite data structures like arrays. This includes methods like find, filter, map and reduce, which Python developers will be familiar with from generator expressions and itertools (and are available for JavaScript developers through libraries like lodash). “You can have an iterator and then map or for each or check whether some elements are there,” Ehrenberg explains.

Like change array by copy and hashbang grammar, this again brings useful options from other languages, because it’s the kind of feature that’s already widely used in languages like Python, Rust and C#.

“I feel like we’re making pretty good progress towards catching up with Python from fifteen or twenty years ago.”

Almost Time for Temporal

We’re still waiting for Temporal, which former ECMAScript co-chair Brian Terlson once described to us as “the replacement for our broken Date object” (other developers call Date “full of many of the biggest gotchas in JavaScript”). This eagerly awaited top-level namespace for a new date and time API that covers the full range of date, time, time zones, calendars and even public holidays worldwide will give developers far better options for working with the complexities of dates and times.

Although Temporal reached stage 3 in 2021, it’s been waiting for the Internet Engineering Task Force (IETF) to standardize string formats used for calendar and time zone annotations. While there were hopes that it would be completed in 2022, it’s still in draft stage. However, there are no major open issues and Carsten Bormann, one of the editors of the IETF date format proposal, told The New Stack that he believes it’s ready for IETF last call. The delay has been down to procedural questions about amending RFC 3339, internet timestamps, rather than any issues with Temporal or the IETF date and time formats it will use, and that’s being worked through, he said. “We have wide agreement on the parts that Temporal needs; we just need to clear that process hurdle.”

It’s still possible that there could be, for example, changes to the calendar format Temporal uses, but developers can start using Temporal now with polyfills (although you may not want to use that in production). Once the IETF draft is officially adopted, there will still need to be two implementations before it can reach stage four but a lot of that work is already underway.

“I’m really hopeful that this will be the year when we will see Temporal ship in at least one browser.”

“This is being implemented so many times,” Ehrenberg told us. “The implementation is in progress in V8, in [WebKit’s] JSC, in SpiderMonkey; in LibJS, the Serenity OS JavaScript engine, they have a pretty complete Temporal implementation and there are multiple polyfills. In addition to the IETF status, there have also been a number of small bug fixes that have been getting in, based on things that we’ve learned over the course of implementing the feature.”

“Hopefully, in the next few months we will be coming to an end with those bug fixes. And I’m really hopeful that this will be the year when we will see Temporal ship in at least one browser.”

While Temporal isn’t one of the priorities for this year’s Interop browser compatibility project, it did get a lot of votes from developers as an API to consider. “This is visible to browsers — to everyone — that this is high priority,” Ehrenberg said.

Delivering Decorators

The TC39 working group has spent more than five years working on different iterations of the Decorators proposal: a way of adding functionality to an object without altering its original code or affecting other objects from the same class. Decorated functions are available in other languages, like Python and C#, and JavaScript developers have been using transpilers like Babel and TypeScript to get them. Those differ slightly from what the ECMAScript Decorators proposal will finally deliver, but with the help of a proposal from the TypeScript team, TC39 was able to avoid a breaking change.

“A lot of people are using experimental TypeScript decorators or Babel legacy decorators,” Ehrenberg noted: “in either case, you need to explicitly opt into it, but a lot of frameworks do use decorators and do have presets that include them — and those original decorators are a little bit different from what ends up being stage three Decorators.”

“We went through many iterations of the Decorator proposal and we finally arrived at one that we could agree met both the use cases and the transition paths that were needed from previous decorators and the implementability concerns from browsers. We were finally able to triangulate all of that. It does mean that there are some differences, but at the same time we’ve really tried to make sure that the transition is smooth.”

For example, when you export a class that has a decorator, the first Decorators proposal put the decorator before the export keyword — but a later version of the proposal changed the syntax, putting the decorator after the export.

“A lot of the community was pretty upset about the change because it would have transition costs and there were lots of strong opinions in both directions. And at the very last minute, we decided, you know what, you’re allowed to do either — but not both. In one particular exported class declaration, the decorators can come either before or after the exported keyword, because we saw that the transition path from existing use of decorators was important. We want to enable incremental adoption and treat the existing ecosystem as real: we’re not designing this in a vacuum.”

Palmer credits the TypeScript team with putting in extra effort to make sure that TypeScript and JavaScript continue to be aligned. Ehrenberg agreed.

“There was a scary moment where we thought that TypeScript might ship decorator before export without JavaScript allowing it; and I’m really glad that just in time, we were able to convince everyone to agree on the same thing. That’s the birth of standards.”

There will be a slight difference in behavior depending on which order you pick: If you put the decorator before the export keyword, then it won’t be included in the Function.prototype.toString() text. If the decorator comes after export or export default (or is in a class that isn’t exported), it will be included in the string.

Making Resource Management Obvious

Having garbage collection doesn’t mean that JavaScript developers don’t need to think about managing memory and cleaning up resources, like file handles and network requests that are no longer needed. Some of the options for doing that work differ depending on where your code will run: you return a JavaScript iterator but close a Node.js file handle. And they depend on developers remembering to write the code and getting the code right.

“This makes it difficult to translate front-end development skills, where you might primarily work with the DOM, to back-end development, where you might be working with something like Node’s API, and vice versa. This inconsistency also makes it difficult for package authors to build lifetime management into their packages in a way that allows for reuse both on the web and on the server,” the creator of the proposal, Ron Buckton, told us.

Explicit Resource Management adds a new using statement (or await using for async code) to JavaScript, that’s similar to the with statement in Python or using in C#. Like const, it uses block scoping which developers will be familiar with since it’s been in JavaScript since ECMAScript 2015. You can open a resource like a file with using, work with the file, and then at the end of the block of code, the file will be automatically closed by the Symbol.dispose or Symbol.asyncDispose method in the using declaration.

“If closing the file means persisting something to a database you can make sure that you wait for that persistence to happen,” Ehrenberg explained.

If you need to compose multiple resources that will be used and then disposed of, there are container classes — DisposableStack and AsyncDisposableStack — which Buckton says were inspired by Python’s ExitStack and AsyncExitStack — that also let you work with existing objects that don’t yet use the new API.

The asynchronous version, awaitusing, was temporarily split off to a separate Async Explicit Resource Management proposal, because the syntax for that wasn’t as easy to decide on. Now it’s been agreed and has also reached stage three, so the proposals are being combined again and implementations are currently underway, Buckton says. According to Palmer:

“This is great for robust, efficient code, to really make sure you’re cleaning up your resources at the correct time.”

“I think this will be a big win for JavaScript developers, because previously, to get this effect reliably, you had to use try finally statements, which people would often forget to do,” Ehrenberg added. “You want to make sure to dispose of the resource, even if an exception is thrown.”

The feature is called “explicit” to remind developers that the resource cleanup will be done immediately and explicitly, as opposed to the implicit and somewhat opaque resource management you get with WeakMap, WeakRef, FinalizationRegistry or garbage collection. Using gives you an explicit, well-defined lifetime for an object that you know will be cleaned up in a timely way, so you can avoid race conditions, if you’re closing and reopening a file or committing transactions to a database.

“The garbage collector can run at weird and magical times, and you cannot rely on the timing,” Palmer warned.

It’s also not consistent across environments. “All JavaScript engines reserve the right to have reference leaks whenever they feel like it and they do have reference leaks at different times to each other,” Ehrenberg added.

“There are a lot of use cases for explicit resource management, from file IO and Stream lifetime management, to logging and tracing, to thread synchronization and locking, async coordination, transactions, memory management/resource pooling, and more,” Buckton said.

It will be particularly important for resources that have a significant impact on performance, but also drain battery. “I’m hoping that, as this proposal gets adopted by various hosts, we’ll soon be able to use ‘using’ and ‘await using’ with WebGPU and other DOM APIs where resource lifetime and memory management are extremely important, especially on mobile devices.”

Building on What’s New

Having proposals become part of ECMAScript doesn’t mean they don’t carry on developing, as implementers get more experience with them — and as new language features offer ways to improve them.

After a good many years, class fields (including private fields) was included in ECMAScript 2022, “but even though they’ve been shipping in browsers for years, some of the Node community found that there were some performance penalties in using these,” Palmer told us. To address that, Bloomberg funded Igalia to optimize private field performance in V8. “Now private fields access is now at least as fast as public fields and sometimes it’s even faster.”

Other work made it easier for developers to work with private fields by making them accessible inside the Chrome developer tools. From the top level of the console, you can now jump into private fields or look into them while inspecting an object. That doesn’t break any security boundaries, Palmer noted, because you’re in a development environment: “it makes life easier for the developer, and they are entitled to see what’s inside the class”.

In the future, Ehrenberg suggested, there might be a capability for authorized code to look into private fields, based on the stage three decorators proposal, which has features that aren’t in the existing decorators features in Babel and TypeScript. “When you decorate a private field or method, that decorator is granted the capability to look at that private field or method, so it can then share that capability with some other cooperating piece of code,” he explained.

“The new decorators provide a path towards more expressive private fields.”

As always, there are other interesting proposals that will take longer to reach the language, like type annotations, AsyncContext and internationalization work that — along with Temporal — will replace some commonly used but large libraries with well-designed, ergonomic APIs built into the language. There are also higher-level initiatives around standardizing JavaScript runtimes, as well as the long-term question of what ECMAScript can address next: we’ll be looking at all of those soon.

The post The New JavaScript Features Coming in ECMAScript 2023 appeared first on The New Stack.

The next JavaScript update brings smaller additions familiar from other languages, but there are more significant developments in the wings.

WebAssembly has come a long way from the browser; it can be used for building high-performance web applications, for serverless applications, and for many other uses.

Recently, we also spotted it as a key technology used in creating and controlling a previously theoretical state of matter that could unlock reliable quantum computing — for the same reasons that make it an appealing choice for cloud computing.

Quantum Needs Traditional Computing

Quantum computing uses exotic hardware (large, expensive and very, very cold) to model complex systems and problems that need more memory than the largest supercomputer: it stores information in equally exotic quantum states of matter and runs computations on it by controlling the interactions of subatomic particles.

But alongside that futuristic quantum computer, you need traditional computing resources to feed data into the quantum system, to get the results back from it — and to manage the state of the qubits to deal with errors in those fragile quantum states.

As Dr. Krysta Svore, the researcher heading the team building the software stack for Microsoft’s quantum computing project, put it in a recent discussion of hybrid quantum computing, “We need 10 to 100 terabytes a second bandwidth to keep the quantum machine alive in conjunction with a classical petascale supercomputer operating alongside the quantum computer: it needs to have this very regular 10 microsecond back and forth feedback loop to keep the quantum computer yielding a reliable solution.”

Qubits can be affected by what’s around them and lose their state in microseconds, so the control system has to be fast enough to measure the quantum circuit while it’s operating (that’s called a mid-circuit measurement), find any errors and decide how to fix them — and send that information back to control the quantum system.

“Those qubits may need to remain alive and remain coherent while you go do classical compute,” Svore explained. “The longer that delay, the more they’re decohering, the more noise that is getting applied to them and thus the more work you might have to do to keep them stable and alive.”

Fixing Quantum Errors with WASM

There are different kinds of exotic hardware in quantum computers and you have a little more time to work with a trapped-ion quantum computer like the Quantinuum System Model H2, which will be available through the Azure Quantum service in June.

That extra time means the algorithms that handle the quantum error correction can be more sophisticated, and WebAssembly is the ideal choice for building them Pete Campora, a quantum compiler engineer at Quantinuum, told the New Stack.

Over the last few years, Quantinuum has used WebAssembly (WASM) as part of the control system for increasingly powerful quantum computers, going from just demonstrating that real-time quantum error correction is possible to experimenting with different error correction approaches and, most recently, creating and manipulating for the first time the exotic entangled quantum states (called non-Abelian anyons) that could be the basis of fault-tolerant quantum computing.

Move one of these quasiparticles around another — like braiding strings — and they store that sequence of movements in their internal state, forming what’s called a topological qubit that’s much more error resistant than other types of qubit.

At least, that’s the theory: and WebAssembly is proving to be a key part of proving it will work — which still needs error correction on today’s quantum computers.

“We’re using WebAssembly in the middle of quantum circuit execution,” Campora explained. The control system software is “preparing quantum states, doing some mid-circuit measurements, taking those mid-circuit measurements, maybe doing a little bit of classical calculation in the control system software and then passing those values to the WebAssembly environment.”

Controlling Quantum Circuits

In cloud, developers are used to picking the virtual machine with the right specs or choosing the right accelerator for a workload.

Rather than picking from fixed specs, quantum programming can require you to define the setup of your quantum hardware, describing the quantum circuit that will be formed by the qubits and as well as the algorithm that will run on it — and error-correcting the qubits while the job is running — with a language like OpenQASM (Open Quantum Assembly Language); that’s rather like controlling an FPGA with a hardware description language like Verilog.

You can’t measure a qubit to check for errors directly while it’s working or you’d end the computation too soon, but you can measure an extra qubit (called an “ancilla” because it’s used to store partial results) and extrapolate the state of the working qubit from that.

What you get is a pattern of measurements called a syndrome. In medicine, a syndrome is a pattern of symptoms used to diagnose a complicated medical condition like fibromyalgia. In quantum computing, you have to “diagnose” or decode qubit errors from the pattern of measurements, using an algorithm that can also decide what needs to be done to reverse the errors and stop the quantum information in the qubits from decohering before the quantum computer finishes running the program.

OpenQASM is good for basic integer calculation, but it requires a lot of expertise to write that code: “There’s a lot more boilerplate than if you just call out to a nice function in WASM.”

Writing the algorithmic decoder that uses those qubit measurements to work out what the most likely error is and how to correct it in C, C++ or Rust and compiling it to WebAssembly makes it more accessible and lets the quantum engineers use more complex data structures like vectors, arrays, tuples and other ways to pass data between different functions to write more sophisticated algorithms that deliver more effective quantum error correction.

“An algorithmic decoder is going to require data structures beyond what you would reasonably try to represent with just integers in the control system: it just doesn’t make sense,” Campora said. “The WASM environment does a lot of the heavy lifting of mutating data structures and doing these more complex algorithms. It even does things like dynamic allocation that normally you’d want to avoid in control system software due to timing requirements and being real time. So, the Rust programmer can take advantage of Rust crates for representing graphs and doing graph algorithms and dynamically adding these nodes into a graph.”

The first algorithmic decoder the Quantinuum team created in Rust and compiled to WASM was fairly simple: “You had global arrays or dictionaries that mapped your sequence of syndromes to a result.” The data structures used in the most recent paper are more complex and quantum engineers are using much more sophisticated algorithms like graph traversal and Dijkstra’s [shortest path] algorithm. “It’s really interesting to see our quantum error correction researchers push the kinds of things that they can write using this environment.”

Enabling software that’s powerful enough to handle different approaches to quantum error correction makes it much faster and more accessible for researchers to experiment than if they had to make custom hardware each time, or even reprogram an FPGA, especially for those with a background in theoretical physics (with the support of the quantum compiler team if necessary). “It’s portable, and you can generate it from different languages, so that frees people up to pick whatever language and software that can compile to WASM that’s good for their application.”

“It’s definitely a much easier time for them to get spun up trying to think about compiling Rust to WebAssembly versus them having to try and program an FPGA or work with someone else and describe their algorithms. This really allows them to just go and think about how they’re going to do it themselves,” Campora said.

Sandboxes and System Interfaces

With researchers writing their own code to control a complex — and expensive — quantum system, protecting that system from potentially problematic code is important and that’s a key strength of WebAssembly, Campora noted. “We don’t have to worry about the security concerns of people submitting relatively arbitrary code, because the sandbox enforces memory safety guarantees and basically isolates you from certain OS processes as well.”

Developing quantum computing takes the expertise of multiple disciplines and both commercial and academic researchers, so there are the usual security questions around code from different sources. “One of the goals with this environment is that, because it’s software, external researchers that we’re collaborating with can write their algorithms for doing things like decoders for quantum error correction and can easily tweak them in their programming language and resubmit and keep re-evaluating the data.”

A language like Portable C could do the computation, “but then you lose all of those safety guarantees,” Campora pointed out. “A lot of the compilation tooling is really good about letting you know that you’re doing something that would require you to break out of the sandbox.”

WebAssembly restricts what a potentially malicious or inexpert user could do that might damage the system but also allows system owners to offer more capabilities to users who need them, using WASI — the WebAssembly System Interface that standardizes access to features and services that aren’t in the WASM sandbox.

“I like the way WASI can allow you, in a more fine-grained way, to opt into a few more things that would normally be considered breaking the sandbox. It gives you control. If somebody comes up to you with a reasonable request that that would be useful for, say, random number generation we can look into adding WASI support so that we can unblock them, but by default, they’re sandboxed away from OS things.”

In the end, esoteric as the work is, the appeal of WebAssembly for quantum computing error correction is very much what makes it so useful in so many areas.

“The web part of the name is almost unfortunate in certain ways,” Camora noted, “because it’s really this generic virtual machine-stack machine-sandbox, so it can be used for a variety of domains. If you have those sandboxing needs, it’s really a great target for you to get some safety guarantees and still allows people to submit code to it.”

The post How WASM (and Rust) Unlocks the Mysteries of Quantum Computing appeared first on The New Stack.

The performance, portability and sandbox that make WebAssembly so appealing are now helping to improve quantum error correction.

When you look at the new features going into JavaScript each year through the ECMAScript standards process, it might not always seem that the language is making major changes. A lot of improvements are what TC39 co-chair and head of Bloomberg’s JavaScript Infrastructure and Tooling team Rob Palmer calls “paving the cowpath” — by building something you could already do with a tool or a framework into the language — or syntactic sugar, to make it easier to use an existing feature without making mistakes.

“We see this iteration of tools, frameworks [and] patterns, and over time, we in the standards world start to pick out where the convergence has happened and we try to pave that cowpath, which has the effect of decreasing the complexity in the application stack. So the size of your Node modules directory decreases, because more of that functionality is provided by the language itself.”

Taken with major developments like adding native async in 2017, those incremental features add up, Vercel’s TC39 delegate Justin Ridgewell told the New Stack. “Over a span of years, they add up to a lot of new APIs with new functionality that we’re adding very quickly and then while the smaller things are being added, we’re working on massive features.”

“These massive features, they don’t land every year: maybe every two years, maybe every three years. But when they do get added they really advance the language forward.”

Some significant new features, like Temporal, are almost ready for adoption; and others are in development to arrive in the next few years. We’ve picked some of the most interesting here, as well as asking people involved in building the JavaScript standards to explain how the language is progressing, and what it might make sense for JavaScript standardisation to tackle next.

Types That Don’t Turn JavaScript into TypeScript

TypeScript was developed to make JavaScript developers more productive, rather than to replace JavaScript, but it’s also been a source of improvements to the language. Currently, you use TypeScript to make types explicit in your code while you’re writing it — but then you remove them when your code runs.

Still some way off, the stage 1 Type Annotations proposal for including type information in JavaScript code but having them treated as comments by JavaScript engines is important, because it converges TypeScript and JavaScript for consistency in a way that keeps them aligned, but also makes it clear that they’re working at different layers.

Developers can use first class syntax for types, whether that’s TypeScript or Flow syntax with long JSDoc comment blocks, and know that their code is still compatible with JavaScript engines and JavaScript tooling — avoiding the complexity of needing a build step to erase the types before their code will run, Palmer pointed out.

“There’s huge value just in having static types that only exist during development and are fully erased during runtime,” he explained. Some of that value is obvious: “The fact that you can provide type checking and can tell you when you’ve made a mistake, and you’ve dereferenced a property that doesn’t exist: that’s great. But above and beyond that, the type information also powers a number of quality of life improvements for the developer, like the ability to refactor things — such as renaming variables and properties, all at once just automatically in the IDE, as well as code navigation.”

“If you’re working with a lot of different code that you didn’t write, then it helps to have the type view of it,” suggested Daniel Ehrenberg, vice president of Ecma (parent organization of TC39) and a software engineer working on JavaScript developer experience at Bloomberg.

And again, given how contentious static typing has been in the JavaScript community despite continuing demand from the community (it’s been the top missing feature in the State of JavaScript survey for three years in a row), it’s a testament to the standards process that this approach would bring a hugely useful feature to the language without affecting the simplicity that attracts many users in the first place.

Simplifying Localization with Smarter Message Formats

Localizing websites (and web apps) is more complicated than just swapping out the message strings in the user interface, because if you want those messages to make sense and be grammatically correct, you can’t just swap in words without thinking about how numbers, ordinals (like first and second), dates, plurals and other constructions are handled in different languages.

There are libraries to help with this, like FormatJS, but it’s more work for developers and translators in JavaScript than in other languages like Java and C, where there are built-in capabilities for translating and formatting strings with international Unicode components like ICU4J and ICU4.

“Plurals are really difficult,” Igalia’s Romulo Cintra tells us. “All the grammatical concepts, inflections and depending on gender number, and then different placeholders can vary in different languages; [handling] this complexity normally relies on those libraries, but also requires tons and tons of data.”

In fact, browsers already use those components for internationalization, and for building APIs like date time format and relative time format. So why not bring similar options to the web and have built-in options for JavaScript developers that include that linguistic expertise — after all, it’s the language so many interfaces are written in.

Intl MessageFormat is another stage 1 TC39 proposal, in conjunction with the Unicode Consortium’s Message Format Working Group, for templated strings that include internationalization and localization logic, with a built-in engine in JavaScript for filling in those templates correctly in different languages.

The work to bring internationalization APIs to the web is so comprehensive that it triggered a major update to the 20-year-old ICU message format API. “It only relies on strings, it’s very rigid, not modular — so why not solve the problem from the root and start a new standard on Unicode,” Cintra wondered. That became MessageFormat 2.0 (MF 2.0 for short) which is designed to be a common ground, handling internationalization for software and the web alike, and it underpins Intl MessageFormat, which he views as the most needed international API to help the web reach the next billion users.

“It’s closing the cycle of providing a more accessible web at the level of localization and personalization.”

“To me as a non-native English speaker,” said Cintra, “it’s extremely important as the web grows and everybody has access [to it], having something like this, that puts in our hands the capability of making more accessible all the software we write, is fantastic!”

Currently, a lot of localization relies on mostly proprietary specifications for custom message formats that have to be parsed at runtime. Mozilla uses its Fluent tool to translate all its interfaces: Bloomberg has an internal tool as well, Igalia’s Ujjwal Sharma told us. “Everybody is trying to tackle this problem through custom tooling that does different things.” While Intl MessageFormat will let organizations who are already doing internationalization build on a common standard, with all the usual advantages of open collaboration, Sharma hopes it will also help smaller organizations that don’t yet have a process for translating sites.

Beyond simply translating text strings, MF 2.0 will include metadata and comments that can be used to mark up everything from the tone of the writing — whether it’s formal or informal — to hints for speech synthesis that will be useful for smart speakers like Siri and Alexa as well as screen readers. That could also remove a bottleneck: “A lot of the innovation in speech and interfaces is happening on the client side,” Sharma noted, but the amount of data has made that impractical for localization: “Intl MessageFormat could enable us to do a lot more client-side”.

Localization is a multimillion-dollar industry so balancing compatibility with the necessary improvements is something of a balancing act. “Providing an easy-to-use intuitive API that can still somehow work with all the legacy effort is a challenging task,” Sharma noted. “But I think if we can do it well, then we can really change some things around how people think about websites.”

A Language for Translating Languages

The MF 2.0 spec defines what you can think of as a simple programming language with name binding (“let” declarations) and pattern matching (selectors), explained Igalia’s Tim Chevalier, who is working on the ICU implementation. He suggested thinking of it as “a domain-specific language for writing translatable messages” that can draw on what we know about writing compilers and interpreters.

“Hopefully, the developer experience using MF 2.0 will be less like writing cryptic strings of characters and more like programming in a special-purpose language embedded within their general-purpose language of choice, such as JavaScript.” He compares that to going from hard-coded query strings for working with databases to SQL.

“There was no reliable way to write programs that manipulated the query strings themselves in order to generate variations on a given query, because there was no knowledge about the query language embedded into the general-purpose languages like C++, Java and JavaScript. Modern language tools provide richer ways to construct queries than just writing a string and passing it as an argument to a function.” MF 2.0 promises a similar developer experience.

It’s extensible: developers get an interface to create new abstractions of their own in JavaScript (or other languages) to deliver formatting functions. The spec is also intended to make it easy to write higher-level translation tools that can provide a friendlier user interface to make it easier for translators to work with messages, without having to be programmers themselves.

The ICU work on MF 2.0 is still at an early stage, with a technical preview; initially, that’s only ICU4J but it’s being ported to ICU4C, which is the language used for most JavaScript engines. The TC39 proposal relies on MF 2.0, but “as soon as there is a stable version of Message Format 2 implemented, I think that things on the TC39 side will go smoothly,” Cintra predicted. A stable release will also allow browsers to start implementing the API to see how it works in the wild.

If you want to try this out, FormatJS has early support for the initial IntlMessageFormat proposal and there’s an experimental polyfill under development. Because the polyfill is based on tools that are already widely used for internationalizing React and other applications, it will give developers experience with the new syntax required.

Milestones, but Still Plenty to Do

JavaScript will continue to evolve slowly, Palmer suggested. “Some argue that things should shift faster so that we can get feedback and recognize the importance and iterate; our style in TC39 tends to be more conservative.”

Removing something from the web platform, if it turns out to have been designed the wrong way and needs changing — like the first implementation of web components — has an extremely high cost. Polyfills and tool-based implementations of features that are candidates to become part of the standard allow for the faster feedback cycle that helps protect web compatibility, without making developers wait for all the steps of the formal process.

“I do think it makes sense for JavaScript to be developed conservatively.”

“We have many implementations; we have a lot of usage,” Ehrenberg said. “Other platforms can be these more experimental environments, and we can be the more conservative implementation environment.”

Following that pattern, he said JavaScript now contains natively many of the features that developers used to have to turn to tools for (and some proposals, like iterator helpers, are inspired by other languages).

“When you look at class fields, including private fields and decorators, but also small things like hashbang grammar, and then hopefully soon our type annotations proposal and on-going modules work, they kind of complete the pantheon of things people are doing through tools, that we want to bring into the language,” said Ehrenberg. “CSS is at a similar point, where they’re also bringing in many features that were done through tools into the core language, like nesting — the classic example — or scoping, or layers and variables.”

But standardization also helps minimize the configuration involved in creating your project, by giving developers a clear default for how to set things up; and that inspires new ideas for what JavaScript could come to include. One area that particularly interests Ehrenberg for future consideration: core components for reactive user interfaces, like signals and cells, could perhaps become part of the language; that’s something developers rely on frontend frameworks for today.

That would follow what he calls the same “incremental shared path of discovery” that JavaScript is known for, with experimentation turning into a new baseline for core features that then get built into the language. There’s plenty of scope for the evolution of the language yet, he maintained.

“I think we’re nowhere near yet running out of things to do that are inspired in that direction.”

The post What’s Next for JavaScript: New Features to Look Forward to appeared first on The New Stack.

Useful types and modern tools to make internationalizing your sites and web apps easier, plus hints at what might come later for JavaScript.

It’s a testament to how important the web is that JavaScript, by many rankings the most popular and widely used language, has emerged from browsers and become useful far beyond its initial platform. But the ECMAScript standard for the language has continued to be driven primarily by the needs of browsers. Even in new environments, like serverside and embedded JavaScript runtimes, those runtimes are still using the JavaScript engines from browsers — though their requirements are often rather different.

The first feature that’s coming into the language from those serverside runtimes — async context — also turns out to be useful in browsers, and even for building some browser features. That might mean the focus for new JavaScript standards extending beyond the browsers that have tended to dominate the language’s evolution: at the very least, we can expect more collaboration and compatibility between serverside runtimes.

JavaScript Beyond the Browser

One of the more ambitious ideas for JavaScript is to standardize the specification not for JavaScript code but to fully parameterize what’s required to create a JavaScript runtime. That used to be something only browser makers did, and they already have their own bytecode specifications. But as serverside JavaScript becomes more common — and the definition of “server” much more varied, from edge and IoT devices to WebAssembly environments — more people are interested in having a JavaScript runtime that suits their own specific needs, which might be about creating the smallest possible runtime.

“Normally we pick a runtime like NodeJS or Deno or the browser and that runtime comes with certain execution assumptions, certain global objects,” explained Fastly’s Guy Bedford, who works on projects like Componentize-js for compiling JavaScript to WebAssembly. Being able to choose specific runtime characteristics, like what global objects are available and how you link them together, would be very helpful for flexibility and efficiency. “If you’re writing a little JavaScript plug-in system for security, you can decide ‘do I want to support npm packages that can work in that environment?’”

That’s a very complex and difficult problem, Bedford warned, because JavaScript developers (and the JavaScript code they write) expect various things to be available, so there needs to be a baseline of what will be supported by a runtime. “The long-term development of this project is fully parameterizing those decisions so that it is like building your own JavaScript runtime out of these primitives.”

These questions about what belongs in JavaScript runtimes both in and beyond the browser are one of the reasons the WinterCG (for Web Interopable Runtimes) community group at the W3C started last year: to focus on cross-runtime interoperability between browsers and non-browser runtimes, advocating for features important to serverside JavaScript environments that may not be as critical for browsers — which is all the charter for WHATWG covers.

Serverside JavaScript runtimes and platforms like Node, Deno, Bun, Cloudflare Workers, Netlify and Vercel are obviously different from browsers, but the Web Platform APIs that browsers have been adding for the last few years are often very relevant there as well, maintains James Snell, an engineer working on Cloudflare Workers and member of the NodeJS technical steering committee. “Yes, Node is a different environment, but you still parse URLs, you still deal with streams, there are still common things.”

In 2016, he implemented the standard URL class for NodeJS and the various runtimes now support more of these standard APIs. But the browser focus at WHATWG and, to some extent, the W3C has meant that the slightly different needs of serverside runtimes aren’t always a priority in API design — especially as the different runtimes all have different goals and requirements (for example, some have full file system access, others have no local file system at all).

That disconnect can lead to multiple non-browser runtimes creating different ad hoc solutions for the same functionality, which means developers need to use polyfills to make code run on multiple environments and may face some odd performance issues between different implementations of those standard APIs.

WinterCG isn’t a competing standards body: instead, it’s about coordination — identifying common needs and working through that in standards bodies or elsewhere. “It gives us a venue where it’s not these separate one-off conversations with WHATWG,” Snell explained. “We have a venue now to talk and discuss issues.”

“Getting these runtimes that are doing the same things to do them, in the same way, benefits everybody, whether it’s part of the language standard or not. It doesn’t need to be different API’s for parsing a URL, there doesn’t need to be a different API for sending an HTTP request: it’s a single, consistent way of doing that works, whether you’re in Node or Deno, Workers, Bun or wherever. That’s really where we’re trying to get to.”

This kind of (even unofficial) standardization gives organizations more confidence that they can depend on serverside JavaScript runtimes, which have sometimes been prone to rather erratic governance in the past. “A number of the folks that have wanted to get involved [with WinterCG] are not developing runtimes; they’re the companies that are using the runtimes. They’re saying, we want to build our business on top of this, but we need to make sure that we have continuity, that we have multiple options,” Snell said.

Standards that Suit Serverside

Developers don’t want to write their code multiple times to run on different serverside runtimes, which they have to do today. It slows down development, increases the maintenance load and may put developers off supporting more platforms like Workers, Snell noted. Library and framework creators in particular are unhappy with that extra work. “They don’t want to go through all that trouble and deal with these different development life cycles on these different runtimes with different schedules and having to maintain these different modules.”

That’s a disadvantage for the runtimes and platforms as well as for individual developers wanting to use tools like database drivers that are specifically written for one runtime or another, he pointed out. “We talk to developers who are creating Postgres drivers or MongoDB drivers and they don’t want to rewrite their code to fit our specific API. It would be great if it was an API that worked on all the different platforms.”

WinterCG is trying to coordinate what Ehrenberg calls “a version of fetch that makes sense on servers” (leaving out requirements like CORS that don’t apply to servers) as well as supporting Web APIs like text encoder and the set timeout APIs (which are part of the HTML and DOM specs rather than JavaScript) in a compatible way across runtimes. This isn’t creating a new spec, Snell noted: “We’re starting with the with actual fetch spec, and we’re going to line out the parts that don’t apply to servers.”

The idea of the server-specific subset of an API with consistent serverside implementations underpins a bigger WinterCG project. The Minimum Common Web Platform API list is starting fairly modestly by documenting which Web Platform APIs are already implemented in Node.js, Deno, and Cloudflare Workers. But it’s also a first step towards a comprehensive unified API surface that JavaScript developers should be able to rely on in all Web-interoperable runtimes.

Supporting a subset of what’s available in web browsers in a common way would make building new functionality more affordable and sustainable for both developers and runtime creators, Ehrenberg noted. “Everyone’s talking about React server components or SolidStart or SvelteKit; they all have this common base of some of the code runs in the server and you have an environment that is somewhat rich and has some web APIs.” The common API list would make an ideal base for building that system on, and alternatives would focus their energy on what they want to do differently.

Between this standard base and the API development, Palmer described WinterCG as “promoting universal JavaScript, increasing the set of JavaScript that can work both on in the client environment and on the server side, across multiple servers.”

Keeping Track of Distributed Context

A universal JavaScript would include features that didn’t start in the browser. “Sometimes there are small Web APIs that are not in browsers but could be, and have a lot of value for server environments but also have value for browsers, that haven’t been prioritized by browsers,” added Ehrenberg.

The first of those serverside ideas to become a TC39 proposal is async context, a way to simplify what Ehrenberg calls “code which follows other code” to trace what’s happening, by having a variable in common.

“[A user clicking] a button may lead to a fetch of information from the server, followed by processing the information in a worker, and finally writing it onto the screen. To measure how long such a chain of events takes (which is important for optimizing performance), you first have to know which chain you’re in: JavaScript, whether on the server or the client, can juggle multiple things at a time. AsyncContext is the perfect tool for this.”

The async context variable could hold like an OpenTelemetry span ID. The event listener that you use to track that button click fetches JSON and puts it in the DOM tree. “It would be really nice to know that that fetch took 10 seconds and so it’s very, very slow for the user. That’s all solved by OpenTelemetry”, explained Vercel’s TC39 delegate Justin Ridgewell, one of the champions of the proposal.

This kind of distributed tracing (or adding annotations to logs) is vital for performance monitoring and debugging, Snell pointed out. “There’s so much that can go wrong. Applications today are so complicated and involve so many different moving pieces; unless we have that telemetry there’s no way to know what’s actually happening. When you have a server that is serving millions of requests from different users, all within a single application, the ability to trace a request context through an entire flow while all these different things are happening concurrently is very important.”

The asynchronous functionality in JavaScript, like promises, is key to writing web applications but they complicate tracing. “The way that promises work in JavaScript hides so much detail. An asynchronous flow using promises fails and I know that a promise failed but I don’t know where the promise was created. When the promise is rejected, it only tells you where it was rejected. It doesn’t tell you exactly which promise it was and where that came from. We have to go through all these different hacks to try to identify where that promise was created.”

Async context propagates the context of what’s happening through multiple levels of code — for example, if you have framework-level code that needs to call on user-level code which then calls back into the framework-level code.

“Without an observability mechanism like async context, where you can say ‘here’s an additional piece of data that travels with this thing’ so you have this additional information it’s impossible,” Snell noted. “You can’t have that awareness, that observability, unless you can actually propagate that information.” The current options for doing that are “hacks with lots of holes and lots of ways they don’t work: we need to build it into the runtime”.

Node already has a similar feature, async local storage, which Vercel also implemented in the Next.js Edge Runtime. And there’s a polyfill for Deno — although implementing something that wasn’t a web platform standard API, because popular libraries need it, was slightly controversial for some of the runtimes, highlighting the need for a standards-based approach. Datadog also uses async local storage in their runtime.

But there can be architectural and performance issues with the approach in Node: using async local storage and the async hooks that it relies on in Node to track reading async files like HTTP requests using a side channel (because the JavaScript engine and language don’t support this directly) can result in a lot of overhead that will be avoided by bringing this approach into the language.

“Async context is a minimal subset of async local storage; it’s the core functionality implemented in a much less complicated way,” Snell said.

To help Cloudflare Workers implement async local storage for compatibility with Node (and so it could use it for internal tracing that makes it easier to support customers), the WinterCG group created a portable subset of the API. “This is the subset of async local storage that we can implement in the same way we would implement async context. It doesn’t have all the APIs Node has, the ones that we know will be compatible with async context.”

The async context proposal is designed to coexist with async local storage, Ridgewell told us. “There shouldn’t be any special ability that you get with one you can’t get with the other. This allows you to write the small wrapper class so that if you already have async local storage, you can write the interface for async context and then implement it without implementing the entire thing. You can reuse async local storage, or vice versa. If you already have async context, because it’s provided by the JavaScript API, then async local storage is a small wrapper implementation around async context.”

Ridgewell expects a massive performance increase from moving to async context from async local storage, because the implementation strategy will store the current request data for your concurrent request handling on your server. “That all gets much, much, much, much faster.”

Clientside Needs Context too

It turns out that serverside runtimes aren’t the only place developers need this functionality. As you might expect for distributed tracing, it’s useful on clients too, for collecting performance information or annotating logs.

Angular’s Zone.js is essentially the same thing as async local storage with a couple of extra APIs, only for clients, Ridgewell pointed out. “It’s the exact same need that is solved by async local storage arrays and async context. React is going to need async context in the future for async functions and async components. NextJs needs async context for its request servers. An OpenTelemetry library requires this and everything currently has to use Zone.js because it’s the only thing that will work on clients.”

But like the hacks for serverside runtimes, the polyfills that implement this have limitations that developers need to know about and work around. If you use the native async await API — native async functions that allow you to wait on a promise — you can’t instrument it. “It skips all possible monkey patching that you could do on a promise so it’s impossible to use Open Telemetry or Zone.js client-side implementations,” he explained. There are various ways to transpile native async await and use promises by monkey patching the Promise.prototype.then() method, but that’s more work for developers. “There are cases where the current client-side can’t work. You have to install a polyfill and that polyfill is flawed. If we can have a real API that covers all of this, then these client use cases are finally possible without these onerous difficulties we have to place on the user code to be aware of the limitations of our implementation.”

Async context will even be useful for features inside browsers like optimizing how resources on a web page are loaded by setting priorities and tracking what tasks depend on those resources with the new Prioritized Task Scheduling API or keeping track of long-running tasks that might block user input with the new Long Tasks API and let developers put those tasks in the background, so they won’t run until the computer is idle. Browser makers are very interested in tracking performance for that’s tricky with Single Page Applications where tasks create new URLs for the app so the browser back button still works: async context could do that.

“They need the exact same core primitive capability that async context provides,” Ridgewell noted: “these APIs could be implemented in terms of async context.”

Being useful to the browsers might speed up implementation of async context, which has been in discussion for several years and recently made very swift progress to Stage 1 and then Stage 2. “Future advancement will be a bit slower,” warned Ehrenberg. “We are working on a draft implementation in V8+Chromium and specification updates on both the JS and HTML side, before proposing Stage 3.”

Getting the agreement that TC39 wants to go ahead with the proposal and a concrete spec that explains how it can be implemented that are required for Stage 3 might happen this year. But getting the implementation and signoff required for Stage 4 will take longer because to be useful, async context needs to integrate with everything that creates scheduling events — and it needs to be on by default. TC39 can specify how async context will work with promises, because they’re part of the JavaScript specification, but it also needs to work with the set timeout and set interval APIs that are part of the HTML and DOM specs, as well as event listeners, the fetch API and many more. “You have to be able to propagate that async context through a set timeout; when that timer fires later on you have to be able to restore the proper context,” Snell noted, suggesting that the proposal won’t reach Stage 4 until at least next year and maybe later.

That means WinterCG, TC39 and WHATWG are all key to moving this forward. “We have this entire ecosystem that we need to make aware of this new proposal and get them to adopt it so that we can go from Stage 3 to Stage 4 with a useful set of features,” Ridgewell noted.

But there’s also a lot of motivation for everyone in the ecosystem to pitch in. As soon as Cloudflare Workers enabled their implementation of async context for Vercel, “they were able to replace their hacky workaround that was actually causing us problems internally,” Snell told us. “Enabling the API made things much more reliable.”

Another Approach to Runtime Standards

At this point, it’s not clear whether async context will be a one-off or something that ushers in a new path for features to arrive in JavaScript, because it’s unusual in several ways, Snell noted. “It can be in the web platform or it can be language level: that’s unique about it. It’s one of the first that has a significant ecosystem implementation already in the form of async local storage. It does make for a really interesting example case: if it works it could open the door for maybe standardizing things that are coming more from the ecosystem rather than from the language or the runtime.”

But async context may prove to be a rare example of an API coming from the serverside JavaScript community, Ridgewell suggested, both because the major implementation work done by the browser makers will always have a lot of influence on how the language advances, and because a universal JavaScript would also cover even more environments than browsers and servers.

“I don’t think we’ll see more server-only use cases added to the language as a whole. There are a lot of competing ideas about how we want to evolve the language and one of the things that we’re concerned with is that the TC 39 specification should be useful for all platforms that implement JavaScript. There’s server obviously, there are clients, obviously — but there are also micro embedded chips that run JavaScript, there are edge functions that are running JavaScript that are not the classic server environment.”

“If the use case is only applicable to servers, I think we’re going to have a really difficult time getting committee consensus on adding a feature, but if we can show that it applies to server and another in another environment, it becomes a little bit easier because now we have to coordinate how this thing works across the environments.”

For features that don’t get picked up by WHATWG, W3C or TC39, WinterCG can be “the venue where we could still collaborate and push forward on common APIs,” Snell suggested. “Bun hasn’t been too involved in the conversations yet, but we want them to be more involved.”

His hope is that multiple runtimes will implement the fetch subset so that the serverside fetch implementations all work in a consistent way and that WinterCG can also help standardize a Connect API for outbound TCP sockets. “Node has its net and tls APIs and they’re very specific and they’re very complicated. Deno has its own API, Deno.connect. Bun is doing some stuff where they’re implementing those APIs, but they also have their own Connect.” Workers has its own connect() API that attempts to simplify this and will contribute this to WinterCG.

“What I’m hoping is we can keep this momentum going. Let’s use WinterCG. We’re going to bring our Connect API to WinterCG and say let’s standardize a TCP API, so we can do this in a consistent way.”

“We can’t underestimate how important it is to the stability of the ecosystem moving forward; it’s not maintainable that we ask everybody to adapt to each individual runtime, especially as the number of runtimes continues to grow. Whether it’s part of the language or not, having these standards is important. Having support for these things, implementing them in a consistent way, it’s something we as an ecosystem have to embrace.”

The post Beyond Browsers: The Longterm Future of JavaScript Standards appeared first on The New Stack.

How the scope of ECMAScript is changing as JavaScript matures; and where the opportunities for adding new features are coming from.

For users, the promise of the web is simplicity — you don’t have to install anything, just type in a URL or search. But for developers, it’s about reach and portability — and that requires strong compatibility and interoperability between browsers.

But even with Internet Explorer gone, the top frustrations that shows up in survey after survey of web developers are all about the lack of compatibility: avoiding features that don’t work in multiple browsers, making designs look and work the same way in different browsers, and testing across multiple browsers. “Making things work between browsers is their biggest pain point,” Kadir Topal, who works on the web platform at Google, told The New Stack.

For the last few years the Interop project (and the Compat 2021 effort that preceded it) have helped to reduce and eventually remove a number of these compatibility pain points. But even if they know about focus areas targeted for improvement through Interop, web developers aren’t likely to keep checking the Web Platform Tests dashboards when they’re deciding what features to use on a site, let alone follow the various draft and approved stages of specifications on their sometimes slow progress through standards development, even with the W3C’s fairly comprehensive list of browser specifications.

Despite the name, Chrome Platform Status covers more than one browser; but entries aren’t usually updated after a feature ships in Chrome, so you can’t rely on the compatibility details to stay current. Apple no longer publishes a WebKit status page, although you can look up its position on various proposed web standards; Mozilla keeps a similar list of its own positions on specifications, but both are mainly a glimpse of the future. Developers can check the CSS database and bug trackers for Chromium and Firefox, look up polyfills at Polyfill.io, or check on feature status on MDN and caniuse.com.

Or they might just stick with what they know works already. If we don’t want to lose out on the promise of the web platform with evergreen browsers and living standards, how do we make it easier for web developers to know which web platform features are ready for mainstream use?

Setting a Baseline

Because different browsers are developed and updated on their own schedule, there’s no one moment when everything in a standard becomes available universally. Safari 16.4 was a major release with a long list of new features — some of which have been supported in other browsers for five or more years.

Release notes might attract some attention, but if developers hear about an interesting new feature in a conference talk or a blog and look it up only to find it works in only one browser, the excitement about it can easily dissipate. Developers want to know what works everywhere, and even when features are in multiple browsers, “they’re often available with bugs and inconsistencies, and therefore developers often deem them impossible to use in production,” warned Topal, who worked on MDN for many years.

What it adds up to is that while the caniuse site is invaluable, “developers are unclear on what is actually available on the platform”.

Baseline is a project from the WebDX Community Group that attempts to remove that confusion, “making it really clear to developers what they can and cannot use on the web platform” by listing the set of features that are available in all the major browsers and (in future) making it easier to track new features that are under development.

Rather than adding features as they get released, which could turn into just one more thing to try and keep track of, the list will be compiled once a year. “We’re hoping that once a year we can do this cut of the platform and say, ‘this is the web platform baseline 2023, 2024 or 2025’. Then each year we can talk about what’s new: what are the things that you can use now that you couldn’t use before, not just that they’ve landed in a browser, but are actually available to you because they are already widely available.”

The criteria for a feature to be included in an annual baseline is actually stricter than for most web standards, which require only two implementations: baseline features have to be supported in the current and previous version of Chrome, Edge, Firefox, and Safari. “The idea of Baseline is to provide a clear signal for when something is broadly available, and shouldn’t end up causing any problems, rather than just leaving it up to developers to work it out,” explained Mozilla engineer James Graham.

Baseline isn’t a return to waterfall engineering, by deciding in advance what features will be in the next year’s web platform, or an attempt to force all the browsers to coordinate the features they ship, Rick Byers, director on Google’s web platform team, noted. It just records what features are actually broadly available in browsers, in a way that’s easy to spot in documentation or highlight in a blog. “It’s breaking the assumption that the pace of developer understanding has to match the pace of standards development.”

Communicating to busy developers has been the missing piece of standards development. “As browser vendors, we’ve been focusing a lot on the things that we ship in our own browsers, but for developers what really matters […] is what is it that they can use now,” Topal said. “Once features are available across browsers, and once they are interoperable, we still need to go out there and make sure that developers are aware of them. Because for the longest time we basically trained developers to look at features that land in browsers as things that they might be able to use in a decade from now.”

Web standards are changing quickly and there’s still plenty of experimentation pushing it forward, but the platform is also getting less fragmented, he maintained. “Now that we have more collaboration between browsers and things are shipping faster across browsers and in a more interoperable way, we also need to change the mindset of developers that the web platform is actually moving forward.”

“Baseline is one way for us to get that across in a way that’s not chaotic.”

Google will be using the Baseline logo in articles and tutorials on web.dev, but perhaps more importantly it will also be on MDN and — hopefully by the end of 2023 — on caniuse. There will also be widgets that make it easy to include the Baseline status of a feature in a blog or other documentation.

One of the first MDN pages to highlight the Baseline status of a feature.

“We’re excited to be displaying Baseline support on relevant MDN pages. Through our research, we found web developers lack a quick and reliable way to see the status of features. And while our browser compatibility tables are useful and accurate, they are detailed and more suited to a developer’s deeper support research,” Graham noted. “It’s still early days, but we’re looking to roll it out further over the next few months. This will allow us to gain feedback from our users to make sure it’s a useful and relevant feature for them.”

So far, the Baseline information is only on a few MDN pages, and not even on all of the pages documenting some recent features Google calls out as qualifying for Baseline status. Partly that’s because it takes time to add the information to MDN and the Open Web Docs project that MDN draws from, and for the caniuseteam to integrate it, but he also added, “Discussions about exactly how to decide when a feature meets the bar of being broadly available are ongoing.”

“The point of Baseline is to make it clear when features are safe to use without worrying about running into bugs and compatibility issues,” he explained.

Baseline or Lowest Common Denominator?

There’s always a tension between making information clear enough to grasp quickly and detailed enough to be useful.

The caniuse site doesn’t give developers the yes or no answer they might be looking for. But the browser landscape is equally complex and not everyone updates to the latest browsers as soon as they ship — or uses the four main browsers that will be covered by the annual Baseline feature list. A commercial website or web application may be able to dictate what browsers customers can use with it. But a government department or a service provider building a website will need to support a very wide range of users and devices, and may need to use polyfills and progressive enhancement to cover all the browsers they need features to work with.

“Developers have situated needs regarding interoperability, which is why tools like caniuse are so helpful,” Alex Russell, Partner Product Manager on Microsoft Edge, cautioned. Sometimes you need the extra detail. “Caniuse allows developers to import their logs to weight feature availability to represent their user’s needs with higher fidelity than a general-purpose lowest-common-denominator view can provide.”

You can pair the compatibility matrix on caniuse with usage statistics for a detailed view of where a particular feature will and won’t work — IWantToUse has a friendly interface for doing that for multiple features — but even so developers won’t always find the information they need, Graham pointed out.

“In some cases, the specific APIs you’re interested in don’t directly map to something in caniuse, so you need to look at the MDN browser compatibility tables and work out for yourself whether the users you’re hoping to support are likely to have access to the feature.”

That compatibility data is on individual MDN pages, so developers have to check one API at a time — or run a query against the data in the browser-compat-data repo and the W3C’s browser implementation status repo, which adds in Chrome Platform Status and caniuse data but still isn’t a comprehensive list of all web features.

These different resources don’t always match up completely. BCD covers some 14,000 features — down to API interfaces and CSS properties — while caniuse has a higher level list of around 520 features and the 2,200-odd entries in Chrome Platform Status are a mix of both; but from the viewpoint of people building a browser rather than a web site, so there might be separate listings for different interfaces in an API like FileReader.

“All sites are different since they have different needs and audiences, so it’s hard to pick a line that works for everyone all of the time,” he noted. Baseline may have less detail but it will also be much simpler for developers to keep track of.

“The aim is that we get to a place where developers trust that if something’s in Baseline, they feel confident to go ahead and use it for any kind of website that doesn’t have really unusual compatibility requirements. And by putting it directly on MDN, we hope that developers are able to learn when features have reached that threshold of usefulness much faster than they do with the current processes.”

Priorities and Politics

One of the biggest advantages of the Baseline project may be the opportunity to make more web developers familiar with the cycle that moves the web platform forward — features emerging in origin trials in browsers; being tested, stabilized, standardized and made interoperable through projects like Interop, with features that score well enough on interoperability graduating into each year’s baseline.

Subgrid is a good example of that pipeline. Currently, it’s not something most developers can use. “Features like subgrid that haven’t shipped everywhere — subgrid isn’t in stable Chromium even though it’s been in Gecko and WebKit for a while — are really hard to use on mainstream sites without causing problems for users,” Graham cautioned. But it’s also a focus area in Interop 2022 and continuing in 2023 to make sure it ships as an interoperable feature. “The hope is that once features ship, they’re already in a usable state and so developers are able to use them on production sites much sooner than they could in the past. This in turn should mean things reach Baseline much sooner than they might have historically.”

Indeed, subgrid is now starting to ship in browsers, Topal said. “Next year it’s going to be in Baseline: it’s going to be widely available and we’re going to talk about it again because that’s when most developers, most of the time, will be able to use it.”

Knowing the cycle works could encourage more developers to bring up their interoperability and compatibility issues in the open bug tracker that feeds each year’s Interop priorities.

But it’s also important that a browser baseline doesn’t limit developers to only consider features that all the browsers agree on, in a way that holds the web back if some of the browser makers fall behind on features that don’t make it into the Interop focus areas. Baseline can’t be a “good enough” bar that allows browser makers to skate on delivering further progress.

For all the community positivity around Interop and the advantage of having the most influential browsers involved and making commitments to fully develop and support features, the price of that involvement is that they also have a veto. And while the bug tracker and the web platform test results are public, the governance of the process for reaching consensus and committing to the focus areas each year isn’t as open.

That underlines Interop’s complicated balancing act: getting browser vendors, who nearly all also have other platform interests beyond the web, to commit to moving the web platform together compatibly is an enormous achievement, but the process has to accommodate the various commercial pressures they all face to keep them involved.

As well as driving improvements in web platform test scores across all the main browsers, Interop (and the web platform test suite that underlies the project) has clearly helped draw more attention to the importance of compatibility and interoperability between browsers. Last year, the HTTP Archive’s Web Almanac included a section on interoperability for the first time, and Baseline is a continuation of this new focus.

But arguably, the reason we’re now seeing much faster progress in browsers like Safari (where Apple has hired a much larger team in recent years and is updating the browser far more frequently) is due not just to Interop providing a way for browser makers to jointly set priorities for improving compatibility, but also to the impact of regulators (like the UK’s CMA and the Japanese equivalent) investigating competition in mobile ecosystems and what part browsers play in that.

In the end, the continuing success of Interop likely depends on correspondingly continuing pressure from web users, developers and regulators demanding a web platform that is powerful and compatible. Broader participation in Interop, perhaps driven by developer awareness as part of the Baseline project, could help. “The thing that I would like to happen next time for Interop 2024 is for more people to know about the process,” Daniel Ehrenberg, vice president of Ecma (parent organization of the TC39 committee that standardizes JavaScript) told The New Stack.

Alongside Baseline, the WebDX group is also involved in research like the State of CSS and State of JS surveys, along with short surveys on MDN: “They’re really quick to fill out, and limited in scope, so that we can get input from people who don’t necessarily have the time to spend on a longer form feedback process,” Graham explained.

All that will feed forward into Interop 2024 by identifying the things on the web platform that need acceleration, Topal suggested.

“Instead of ad hoc asking about things that we could do in Interop, what we want to get to is a shared understanding of developer pain points between browser vendors. Even though we all already have individual product strategies, we’re still addressing the same audience. It’s the same web developers. We want to get on the same page since we own this platform together, we maintain this platform together. We want to make sure that we together have a shared understanding of the developer pain points.”

New Ways of Creating Standards?

What’s also interesting about Baseline is that like async context and Open UI, it’s emerging from a W3C community group rather than a standards body.

Since the days of HTML5, the WHATWG, W3C and (to a lesser extent) ECMAScript approach of “paving the cowpath” by codifying the most common patterns found on websites in the standards for browsers has meant that standards often reflect patterns adopted because browsers already support them.

Open UI and WinterCG incubate draft proposals that are brought to those standards bodies for consideration, aligning more with the Origin Trials that Chrome and Edge use for features they want to bring to the web that solicits developer feedback and produce tests and specifications.

Separating design from standardization like this can have the advantage of working faster — and failing faster — than a formal standard process, with a tighter feedback loop with the developers who are interested in a new feature. Iteration and experimentation in a community group can preserve momentum even when ideas don’t work out the first time. It also avoids everyone getting stuck with the first implementation of a feature when that turns out to have design flaws that can’t be changed because developers have already taken dependencies on them.

The Web DX community group includes not just Interop participants Apple, Google, Microsoft and Mozilla, along with Igalia and Boucoup, but organizations like LG who aren’t as well known for making a browser. “It’s a new era of collaboration on the web platform,” Byers suggested.

Having Baseline emerge from a community focused on developer experience should help it become something that’s useful for developers, rather than something that lets web browser makers pat themselves on the back for how well they’re doing, and likely means we’ll see iterations in the way the annual Baseline is decided on and what it includes over time. If it takes off, it could add another level to the way the web platform creates and adopts the standards that make it powerful.

The post What Does It Mean for Web Browsers to Have a Baseline appeared first on The New Stack.

A new project called Baseline aims to clarify what developers can rely on across browsers and the web platform in general.

“It’s in society’s interest as a whole when browser vendors feel the need to be interoperable.”

Government regulation on competition is one part of pushing that interoperability forward; the Interop project is another key part, Rick Byers, director of Google’s web platform team, told The New Stack. Last year Interop 2022 delivered significant improvements in all four main browsers. This time around, it’s not just about getting individual browsers to make feature implementations more compatible with the same feature in other browsers, but about measuring how well all the browsers deliver the same web platform.

For browser makers, the need to allocate development resources can make it feel like there’s a tension between adding new features to the web platform and going back to fix interoperability issues in features that have already been shipped. Interop focuses on technology that is already specified in web standards with shared test suites in web-platform-tests (WPT), but it covers a mix of features that have already shipped (in some or all browsers) and features that are still being implemented — some rather belatedly.

Interop is open to any organization implementing the web platform, but since it’s about having web platform technologies work the same way across multiple browsers, it’s run by the four major browser implementers and two open source consultancies who do a lot of work on the web platform: Apple, Google, Microsoft, Mozilla, Bocoup and Igalia.

Interop progress as of August 2023.

“It’s really good that there’s companies out there that realize that the web standard process is not only for people that build a browser and want to show ads, but that everybody benefits from it, because we can write more secure and stable applications if the platform gives us solutions that we don’t have to simulate [in JavaScript],” explained developer advocate Christian Heilmann — who used to work on the Firefox and Edge browser teams.

For instance, he pointed out, before the dialog element was supported across browsers, every developer had to build their own with a positioned div element, write the JavaScript code to show and hide the dialog — and usually with tweaks for different browsers. It might sound like a trivial example, but that’s a lot of unnecessary work repeated on every project.

Moving the Whole Platform Forward Together

Browser makers could (and do) fix interoperability and compatibility issues individually using WPT, but the value of Interop is that it makes for more coordination of what all the browsers work on each year by focusing on what developers are seeing problems with. Those developer pain points are gathered through the surveys that MDN runs — both the big annual State of CSS and State of JS research projects and, in future, shorter regular surveys on MDN — and a bug tracker for issues submitted via GitHub, then turned into formal proposals for the Interop participants to vote on in November 2022.

This time around, that generated a lot of requests and suggestions, Igalia developer advocate Brian Kardell told us. “Last year, we weren’t very proactive. We had a wider call for this year, and we left it open a little longer and we had, at peak, maybe 90 different issues open.”

“Interop 2023 is the biggest, most aggressive attempt at interop I think we’ve ever made.” — Rick Byers, director of Google’s web platform team

There are 26 focus areas, compared to 15 in 2022 (eight of which have been carried forward from previous years), plus several investigations — where there is work to be done but the standard or the web platform tests aren’t mature enough to start implementing. “Among those focus areas are some things that developers have asked us for forever.”

The 26 areas all the browsers agreed to work on range from features everyone uses on the web without realizing, to those last annoying paper cuts in otherwise finished areas.

The point of Interop is often getting multiple browsers to the same stage. Firefox has supported multicolor forms for a long time; the vector color font support that’s part of Font feature detection and palettes brings that to all the main browsers.

Browser progress chart

On the other hand, Firefox lagged on Modules in Web Workers. “Right now, Web Workers don’t allow me to use other people’s JavaScript modules. I can send data in and get data out: I cannot have any interaction in between [anything with third-party dependencies],” Heilmann explained. “Web Workers become more important as we do more high-performance stuff in the background, especially with machine learning: you don’t want that on the main thread.”

That’s already improved since Interop 2023 began, as has Firefox support for Color Spaces and Functions, going from passing just over half the tests to almost 95%. That means designers can specify uniform gradients and color shifts so sites look the same in different browsers and on screens with different color gamuts, and developers can lighten or darken colors in CSS without having to recompute them. Operating systems are already beginning to support better color formats and if the web platform follows, “this world of more colorful, rich vibrant things becomes possible,” Kardell explained.

Similarly, at the beginning of 2023, Safari had much better support for Math Functions, which let developers do things in CSS (like tracking mouse cursor position) that used to need Canvas or precompilers: now all three browsers score in the high 90s. Chromium browsers started the year with less support for Masking in CSS: applying the kind of image effects you’d use in a graphics application to a web page, like using an image or a gradient to mask or clip an image. Again, doing that in CSS avoids the need to use canvas for something that helps web apps feel more native on different platforms. Animating a graphic along a custom motion path with CSS is supported in all three browser engines, but doesn’t work quite the same way in all of them.

Making Less Work for Developers

Many focus areas improve developer productivity, like being able to rely on the border-image CSS property in all browsers for replacing the default border style “with one element rather than five nested elements just to get a frame around something,” Heilmann said. And some go back to basics: URL is about getting all browsers to agree on an implementation of URLs that matches what’s defined in the standard.

“It’s quite amazing how many things are a valid URL and how dangerous that could be.” — Christian Heilmann

Drawing graphics on the screen with the canvas element and API lets you script graphics, but running that on the main browser thread can block anything else a user is trying to do on the page. Offscreen canvas, as the name suggests, puts that offscreen where it won’t interfere with rendering handled in a Web Worker. It’s widely used in game development, Heilmann explained: “We [always] had one canvas that showed the game and we had one canvas that did all the calculations; and that way, the performance was so much better. Standardizing that rather than having to hack it every single time would be a really, really good thing.”

It’s not just a specialized technique though; most web developers use offscreen canvas already but without realizing it, because they use it through a library, Kardell pointed out. “A very small number of libraries need to update to take advantage of that and then suddenly, everybody in all the places that they’re using it will get better performance on maps and drawing tools and Figma and all kinds of cool stuff.”

Custom properties (or custom variables) is another long-standing request, that will make it much easier to use CSS variables — for example, defining colors, font sizes and other settings once, directly in your CSS, rather than putting them in a selection block, which will simplify switching a site between light and dark mode. This focus area is concentrating on @property, which lets you set default and fallback values when you define a custom property in a stylesheet; again, this isn’t new, but it hasn’t been consistent between browsers.

CSS Pseudo-classes add a keyword to specify a special state, like the way a button looks when you hover over it, so you can style it. That will be useful for input, but also media handling for full screen, modal and picture-in-picture; and interoperability is particularly important here, Heilmann noted. “We need to actually make sure that every browser does them the same, because if we have yet another one that is only in Chrome it costs us a lot of time and effort.”

Isolating part of the page with Containment in CSS, so it can be rendered independently, whether that’s a navigation bar or an animation, is “very good for performance” Heilmann said. Although it can be somewhat complex to work, because it requires some understanding of rendering and layers, he suggested most developers will use it through tools like GreenSock rather than directly.

Other focus areas include substantial work on long-standing web developer priorities that will be widely used. “has() and Container queries are literally the number one and number two requests for a decade from web developers and we’re getting both,” Kardell enthused.

How to Unblock Progress

“Container queries is a CSS mechanism to reason locally about layout decisions,” Byers explained: “To understand the context, the container you’re in, and do responsive design in a way that works well across components, so you can build more reusable components that work in whatever layout environment they’re put into.”

If you put a component in a container that doesn’t have as much space, you could pick which elements should be hidden, or just switch everything to a smaller font.

“This is a direct answer to the needs of the componentized web for something like reusable components to put together bigger applications,” Heilmann told us.

To Byers, it also highlights the opportunity Interop gives developers to highlight what they need.

“Just a couple years ago, most of the browser engines were saying, ‘we don’t think this can be a feature, we don’t think this is something we can put in our engines’. And now not only does it exist in Chromium and WebKit, and it’s coming in Gecko, but it’s something all three of the major engines believe — that by the end of the year we can have working interoperability and stability and something you can actually depend on. For web developers, that should be enormously exciting, not just because that feature is exciting to them, but because it signals this is the kind of thing that web developers can get done on the web. When they come together and push and say, ‘Hey, this is a capability we really want in the web’.”

Similarly, the idea of having a parent selector (has) to go with the child selector has been in the CSS spec since the first draft of CSS 3 in 1999. Heilmann suggested thinking of it more as a family selector: “It can allow you to look around you in the DOM in each direction, which we couldn’t do with CSS before, and I think that’s a huge step towards people not having to use JavaScript to do some things.” That would make it easy to have one style for an image that has a caption and a different style for pictures that don’t.

Like container queries, the implementation was held up by worries about slowing down page loading (because the rendering engine might have to go up and down the DOM tree multiple times to apply a developer’s CSS rules correctly).

“Performance has always been the barrier, because of the way pages are assembled and because of the way CSS and rendering engines have been optimized in browsers,” explained Igalia web standards advocate Eric Meyer, who built one of the earliest CSS test suites. “When we lay out a page, we generally do one pass: we do not want to have to do multi-pass [rendering] because that’s a performance killer.” Two decades on, computers are faster than when has() was first proposed, but “you could still grind a browser to a halt with container queries or the ‘parent’ selector”.

“It’s a scary thing that no one wants to take up because it’s computationally complex, and it could just really blow up performance,” Kardell added. Years of discussions about these features might have helped make them seem impossible, he suggested.

“When you have things that have been discussed a lot, that seem hard, that look like you could invest a lot of money and a lot of time and it might not go anywhere — it’s unsurprising that nobody wants to break the standoff!” — Brian Kardell, Igalia

In the end, investigation and experimentation by multiple browser teams (and a bottom-up approach suggested by Igalia) showed that these features could work without degrading performance, including compromises to avoid the risk of circular dependencies and undefined behavior.

Some of that relied-on work was done in a previous iteration of Interop, explained Kadir Topal, who works on the web platform at Google, highlighting a new pattern in browser development that’s emerging from the collaborative approach to compatibility. “Since all the browsers shipped that [work], there was something that we could build on top of. I think we’re going to see more and more of that, where we can ship together something in one year, and then build on top of that in the next year. I can already see some of that coming for the next year.”

The technical work is critical for unblocking implementations that give developers what Byers considers “most of what they were asking for” with container queries, but he also noted that this is also part of a different philosophy in building the web platform that’s not just about what browser engines need.

“The larger story for me is the shifting of how we approach platform design to just be more humble and listening to developers more. Browser engineers used to say ‘anything that can introduce cycles and delays sounds really scary, so like I refuse to go there on principle. Over the last several years, I think the industry as a whole, but certainly the Chrome team, has really had this transformation of saying our number one job is to serve developers and really listen and really be empathetic to their pain points. We’re hearing consistently that developers are having problems with this sort of thing. It’s a legitimate problem. What are we going to do about it rather than just say it’s not possible?”

Finishing the Last Mile

Some focus areas continue from previous years: CSS subgrid has now landed in the experimental version of Chromium, using code contributed by Microsoft. Before Interop 2022, only Firefox supported subgrid; Safari added it in 2022 and now it’s going to be broadly available — and interoperable.

Byers compared that to Flexbox, which had been in browsers for years “but it was just so different in different engines for so long, and there were paper cuts all over the place that we’re still cleaning up from. The way grid has happened; as much as we wish some of it would have gone faster, I think it’s the model for how big new things can get added to the web in a way that’s high quality and consistent across browsers, and not that far off timewise from each other in the different engines.”

Flexbox scores are already in the 90s for all the browsers, although polishing off the last bugs takes time.

“When you look at the numbers [for some focus areas], there’s almost a question of ‘why are they in Interop?’” Meyer noted. “That seems really interoperable, how come that’s there?” For example, the stable and experimental numbers for Media Queries were already high.

“That actually points to one of the things that Interop was intended to do: in some cases where things are almost but not quite universal, let’s get them there. There’s just a few bugs across the various browsers that keep us from being 100% across-the-board compatible, so let’s get there.”

With such high scores, he explained, browsers might be tempted to prioritize areas like Web Codecs, where Safari was only scoring 59% at the beginning of 2023, over Media Queries with 99% compatibility. “It would be very easy to say that one percent isn’t worth devoting time to when we have these other areas that really need to be dealt with. Interop is a way of drawing people back to say ‘Hey, we really do need to correct this’.”

In the case of Media Queries, the holdup is Firefox with a score that’s now gone from 82.7% at the beginning of the year to 99% already (and 99.9% in nightly builds). “They really only need to fix whatever that percent tests, and Interop is meant to encourage that.”

Consistency Is When Everyone Wins

One of the most interesting charts in WPT has always been the number of tests that fail in just one browser, neatly illustrating the compatibility failures that can bite developers. This year Interop is highlighting the inverse of that: the number of tests that are passing in all the engines (Blink, Gecko and WebKit). That’s more important than any one browser having a higher score, Byers said. “It’s not about who wins: I want developers to win by that line getting as high as we can make it.

“It’s the only number that really matters,” Kardell suggested. Browsers can score in the 80s or even the high 90s individually, but the Interop score for the focus area might be much lower. “By old standards, that’s off to a pretty good start, but if it turns out that each of them did a completely different 80%, then that’s not the case in any way.”

While Chromium browsers are slowly improving scores for CSS Masking (from 65.8% at the beginning of the year to 69.1% in experimental builds at the time of writing), the Interop score for this focus area is improving faster — from 56.7% to 64.3% — because the work in Chromium is happening at the same time as Firefox and Safari investing further in an area where they were already scoring in the 80s.

Another good example is pointer and mouse events, where the lowest individual score was 46.6% with other browsers achieving 67% and even 80%. “Looking at the individual browser numbers, you might think, well, the worst one is almost half support,” Meyer warned. “But if you create a Venn diagram and in the middle is what they all support consistently, where the three overlap is only a third [of the standard].”

While that 33.3% has improved to 61.3% in experimental browser builds (at the time of writing), this Interop focus area doesn’t cover using touch or a pen with pointer-events — which is important on tablets and phones.

There are lot of IP challenges and patent issues in this area and a messy history (when Microsoft first proposed pointer events, Apple suggested its patents on touch events might block the W3C from adopting it as a standard, and Apple’s resistance to supporting pointer events led to Google planning to remove it from Chrome at one point). But while that explains the significant differences between browsers in this focus area, vendor politics isn’t behind leaving out touch and pen: it’s a more prosaic problem that WPT doesn’t include the tests to cover it, explained Byers (who did a lot of the early work on touch and pointer events and was an editor on the spec until recently).

“A lot of the time [when] things don’t make it into Interop, it’s because the specs aren’t ready or we don’t have tests we can push for conformance.” — Rick Byers

“We have to do a lot of work to pay back the debt of not having touch input well tested,” continued Byers. “And sometimes there’s infrastructure issues like, does WebDriver support simulated touch input on all the major browsers? If not, then we can’t realistically push for common touch behavior across all browsers. We need to do this groundwork first.”

“There’s almost certainly challenges around actually even validating that touch behaves consistently across a Windows computer running Firefox and an iPhone running Safari, and all those different devices. Safari doesn’t support touch on desktop; generally, MacBooks don’t have touchscreens.” Even getting the infrastructure to run the tests will be tricky: “We can get desktops in the cloud to run our tests: it’s just harder to do that for mobile.”

Mobile testing is actually one of the active investigation areas in Interop 2023, because it’s currently not part of WPT’s CI infrastructure, and Topal indicated there would be more investment there in the second half of this year. Investigation areas often lay the groundwork for future focus areas. “Mobile testing as part of the Interop dashboards and scoring is something that we’ll hopefully be able to do next year,” he confirmed.

Always Improving, Never Finished

Between September and November this year, the Interop participants will look at what needs to go into the focus areas for 2024, and that includes assessing progress on Interop 2023. “That’s where the decision gets made on which of the features are now basically interoperable and where there are things we [still] need to keep track of,” Topal explained.

The Web Compatible focus area covers what Byers referred to as “a little grab bag of paper cuts” for “little niggly things that wouldn’t make sense on their own” but didn’t need to be an entire focus area. Some of these were focus areas in previous years: the work isn’t finished, but enough progress has been made that there are only a few issues to clean up.

That’s not just about going the last mile in developer experience, important as that is; there’s a bigger point about the continuous nature of web standards, Kardell pointed out.

“Interop is a strange project because it seems like it shouldn’t exist, because the point of web standards is that they’re standards.” — Brian Kardell

But even when browsers score 100% on these tests, it doesn’t mean that interoperability is done, especially for areas like viewport, which needs to support new classes of hardware as they come out, like folding phones, so the focus area was carried forward even though all the work that was agreed on last year got done. “There will be test cases that we haven’t thought of. There will be places where you still wind up getting a different result. The thing with standards is eventual consistency increasing interoperability, that is hopefully stable.”

As Topal noted: “It’s a living platform: it keeps getting developed, it keeps getting extended.”

That doesn’t mean the Interop list will just keep getting longer. “Everything we had in 2021 is still in the list somewhere,” Byers noted. “I’m not aware off the top of my head of an area that has reached 100% of all tests passed across the board. But whatever the numbers are, if it feels that [tracking an area] isn’t providing value to developers anymore, there’s no sense in us tracking it [in Interop].

What Interop Doesn’t Cover

The Interop project is ultimately pragmatic. “What makes it into Interop are things that browser makers either are already working on or are about to work on,” Meyer explained. “Things that any browser maker has no intention whatsoever of working on, [they] don’t make it.”

MathML hasn’t been included in Interop because at least one of the browser makers objected to it and the group accepts vetoes. “If anyone puts in an objection, even if everyone else thinks it should be in there, it doesn’t go in unless you can convince the objector to rescind their objection.”

Those vetoes are usually about resources and priorities. “Whoever was objecting, would say something like ‘this is really cool, and we would love to work on it, but we can’t work on this and all these other things’.” That’s not an objection to the technology: it might be about a specification that’s not yet complete. “There’s no sense [in] us adding this to Interop when the specification might change halfway through the year and invalidate everyone’s work. Let’s wait until the spec is ready and then maybe next year, we can add it.”

That’s a realistic approach that underlines that the browser makers are serious about Interop, he suggested. “It’s nice to see browser teams saying ‘We have to set priorities’. They were actually thinking through ‘can we do these things’ instead of ‘sure, put it on the list and if we get to it, great’. There was none of that.”

“Our goal here is to prioritize what we think is the meaningful work that we just have to get done,” Kardell agreed.

The focus areas in Interop 2023 continue to concentrate on CSS, although they include JavaScript elements like Web Components and the catchall Web Compat category, which includes areas like regex look behind.

Partly that’s because there weren’t many submissions for JavaScript incompatibilities, Kardell told us (which may be a testament to the ECMAScript process).

But he also noted that while the web platform tests include some JavaScript tests, they don’t yet incorporate the ECMAScript Test262 test suite (new features can’t become part of ECMAScript without submitting tests to this suite), so tracking JavaScript focus areas would require doing that integration work. Some investigation has been done on how to keep the different test suites in sync, “but I don’t think we’re there yet” he suggested.

“[Web standards] are constantly a learning process,” Kardell said, pointing out that for many years those standards didn’t even include formal test suites that went beyond individual browsers.

“Our idea of how we manage all this is slowly evolving and we’re learning and figuring it out better.”

The post How Interop 2023 Will Move the Web Forward appeared first on The New Stack.

Interop is a welcome addition to web standards work, and is galvanizing browsers to drag their feet less and support developers better.

The chaotic, slapstick unravelling of generative AI darling OpenAI started with the technology family equivalent of an early Thanksgiving argument that turns unexpectedly bitter. It may or may not have been ended by a firm but friendly intervention from Microsoft, looking rather like the adult in the room. But through all the twists and turns — and there may still be more of both — Microsoft’s intervention to keep the OpenAI technology (if not the company) stable is inevitable.

More Than Money

Microsoft’s recent $10 billion investment in OpenAI was hardly chump change (although it was paid for in part by wide-ranging layoffs that tarnished the impressive culture change that CEO Satya Nadella delivered at the company), but it’s already proved something of a funding Ouroboros, with a substantial amount of the money Microsoft has invested in OpenAI over the years apparently spent on (Azure) cloud computing to run OpenAI’s large language models.

Never mind the far-distant plans to create an AGI that may never materialize. Microsoft — which wants you to think about it as “the AI company,” and specifically “the Copilot company,” rather than “the Windows company” — will effectively get the technology underpinnings of ChatGPT for about half what it paid for Nuance in 2021 or slightly less than the $7.5 billion it spent on GitHub in 2019 (adjusted for inflation). It wasn’t all spent on cloud, but Microsoft’s capex for Q1 2023 alone was $7.8 billion.

Despite having its own impressive roster of AI researchers, and its own extremely large foundation models, Microsoft cares an enormous amount about OpenAI’s ChatGPT LLMs because of the equally enormous investments in hardware and software it’s made to support them, and because of the dependency it’s taken on with OpenAI technology in almost all of its divisions and product lines.

Nadella’s opening keynote at the Ignite conference was peppered with references to OpenAI, including a preview of the GPT-4 Turbo models. Microsoft’s own products are equally seasoned with OpenAI technology, which is at the heart of the many Copilots.

Making Foundation Models Economical

LLMs and other foundation models take a lot of data, time and compute power to train. Microsoft’s solution is to treat them as platforms, building a handful of models once and reusing them over and over again, in increasingly customized and specialized ways.

Microsoft has been building the stack for creating Copilots for five years — changing everything from low-level infrastructure and data center design (with a new data center going live every three days in 2023) to its software development environment to make it more efficient.

Starting with GitHub Copilot, almost every Microsoft product line now has one or more Copilot features. It’s not just generative AI for consumers and office workers with Microsoft 365 Copilot, Windows Copilot, Teams, Dynamics and the renamed Bing Chat, or the GPT-powered tools in Power BI; there are Copilots for everything from security products (like Microsoft Defender 365), to Azure infrastructure, to Microsoft Fabric and Azure Quantum Elements.

Microsoft customers are also building their own custom copilots on the same stack. Nadella name-checked half a dozen examples — from Airbnb and BT, to NVidia and Chevron — but the new Copilot Studio is a low-code tool for building custom copilots using business data and Copilot plugins for common tools like JIRA, SAP ServiceNow and Trello that could make OpenAI essentially ubiquitous.

To make that happen, Microsoft has built an internal pipeline that takes new foundation models from OpenAI, experiments with them in smaller services like the Power Platform and Bing, and then uses what it’s learned from that to build them into more specialized AI services that developers can call. It has standardized on Semantic Kernel and Prompt flow for orchestrating AI services with conventional programming languages like Python and C# (and has built a friendly front around that for developers in the new Azure AI Studio tool). These tools help developers build and understand LLM-powered apps without having to understand LLMs — but they rely on Microsoft’s expertise with the OpenAI models that underpin them.

Hardware Is a Real Commitment

Microsoft would have made significant investments in the Nvidia and AMD GPUs that OpenAI relies on, along with the high bandwidth InfiniBand networking interconnections between nodes and the lower latency hollow-core fiber (HFC) manufacturing it acquired Lumensity for last year, whichever foundation models it was using.

Microsoft credits OpenAI with collaboration on not just the Nvidia-powered AI supercomputers that now routinely show up on the Tops500 list but also even some of the refinements to the Maia 100. It doesn’t just sell those Azure supercomputers to OpenAI; that’s the public proof point for other customers who want similar infrastructure — or just the services that run on that infrastructure, which is now effectively almost every product and service Microsoft offers.

But previously, its main approach to AI acceleration was to use FPGAs, because they allow for so much flexibility: the same hardware that was initially used to speed up Azure networking became an accelerator for Bing search doing real-time AI inferencing and then a service that developers could use to scale out their own deep neural network on AKS. As new AI models and approaches were developed, Microsoft could reprogram FPGAs to create soft custom processors to accelerate them far faster than building a new hardware accelerator — which would quickly become obsolete.

With FPGAs, Microsoft didn’t have to pick the system architecture, data types or operators it thought AI would need for the next couple of years: it could keep changing its software accelerators whenever it needed — you can even reload the functionality of the FPGA circuit partway through a job.

But last week, Microsoft announced the first generation of its own custom silicon: the Azure Maia AI Accelerator, complete with a custom on-chip liquid cooling system and rack, specifically for “large language model training and inferencing” that will run OpenAI models for Bing, GitHub Copilot, ChatGPT and the Azure OpenAI Service. This is a major investment that will significantly reduce the cost (and water use) of both training and running OpenAI models — cost savings that only materialize if training and running OpenAI models continue to be a major workload.

Essentially, Microsoft just built a custom OpenAI hardware accelerator it won’t be deploying into data centers until next year, with future designs already planned. That’s hardly the perfect time for its close partner to have a meltdown.

Keeping the Wheels Turning

Although it’s likely made overtures over the years, Microsoft didn’t start out wanting to acquire OpenAI. Originally, it deliberately chose to work with a team outside the company so that it knew the AI training and inference platform it was building wasn’t only designed for its own needs.

But with OpenAI’s models staying so far ahead of the competition, Microsoft has bet on them more and more. Only a year after launch, ChatGPT claims 100 million users a week and OpenAI had to pause ChatGPT Plus signups because new subscribers were overwhelming capacity — and that’s not counting the OpenAI usage by Microsoft’s direct customers.

Whether you use ChatGPT from OpenAI or an OpenAI model built into a Microsoft product, it all runs on Azure. The lines between what Microsoft calls a ‘first party service’ (its own code) and ‘a third party service’ (from anyone else) have become rather blurred.

Theoretically, Microsoft could back out and pivot to a different foundation model, and almost all the foundation models from key players already run on Azure. But not only is changing horses in mid-stream messy and expensive, leaving you likely to lose a lot of ground, it’s also likely to damage you in the stock market and with customers. Far better to make sure that the OpenAI technology survives and thrives — whatever happens to OpenAI the company.

While the developer relations team at OpenAI has been reassuring customers that the lights are still on, the systems are still running and the engineering team is on call, OpenAI customers have reportedly been reaching out to rivals Anthropic and Google; which might include Azure OpenAI customers who Microsoft won’t want to lose. LangChain, a startup building a framework for creating LLM-powered apps that have just announced significant integration with Azure OpenAI Service, has been sharing advice to developers on how switching to a different LLM requires significant changing of your prompt engineering (and most examples today are for OpenAI models).

The OpenAI Dependency

If the internal customers at Microsoft — which means pretty much every division and product line — are having the internal version of those same conversations, bringing as much OpenAI expertise as possible in-house is going to ease whatever transitions it needs to make if OpenAI itself does fragment or fade away.

Yes, Microsoft has what CFO Amy Hood described as “a broad perpetual license to all the OpenAI IP” up until AGI (if that ever happens) even if its partnership with OpenAI ends, but generative AI is moving so fast that just keeping today’s models running isn’t enough. Microsoft needs to count on getting future LLMs like GPT-5.

Despite the name, OpenAI has never been primarily an open source organization, with only a handful of releases and none of them the core LLMs. But it’s instructive to compare the way the points that were significant in Microsoft’s slow embrace of open source weren’t just when it released core projects like PowerShell and VS Code as open source, but was when it started taking dependencies on open source projects like Docker and Kubernetes in Windows Server and Azure.

The dependency it’s taken on with OpenAI is even more significant, and one that’s ironically proved to have less stability and governance. One way or another, Microsoft is going to ensure that what it needs from OpenAI survives.

The post Why Microsoft Has to Save OpenAI appeared first on The New Stack.

Hire the CEO, hire the team, send the CEO back, take a seat on the board — Microsoft will do just about anything to keep OpenAI afloat.

A short film about a girl and her robot, created in Autodesk’s upcoming OpenUSD-based Flow industry cloud

Too often, the term “metaverse” is just used to mean the kind of virtual reality we’ve been seeing for a decade, but it’s supposed to mean interoperable 3D and VR environments where you can share and integrate components from different sources. Now that Pixar’s Universal Scene Description file format is not just an open source project (OpenUSD) that acts as a de facto specification, but is being developed as a standard through the Linux Foundation by the Alliance for OpenUSD, USD could emerge as a way to make that real.

“OpenUSD is a highly extensible framework for describing, composing, simulating, and collaboratively navigating, and constructing 3D scenes at any scale,” Aaron Luk, who oversees USD engineering at NVIDIA and previously co-developed USD at Pixar, told The New Stack.

“OpenUSD’s unique composition capabilities provide rich and varied ways to aggregate assets into larger assemblies and enable collaborative workflows that accelerate individuals and teams across their multi-app workflows and projects.”

Unlike existing 3D interchange options, USD — which was originally created to enable workflows that moved millions or even billions of individual objects in an animated movie scene between multiple 3D tools — isn’t just a file format for importing and exporting. “You can have streaming or in-memory representations of USD,” AOUSD chairman and Pixar CTO Steve May told us at the launch of the Alliance.

USD assets can have multiple topologies, allowing them to render differently in different conditions. It’s also extensible using schemas, which already cover 3D metadata like BIM data, sensors and other object properties. It handles hierarchies of scene layering and composition, not just the geometry of individual 3D assets — a scene from a Pixar movie could be made up of hundreds or thousands of USD files.

“That’s where the power of AOUSD really lies, in that ability to aggregate and modify large numbers of assets and then combine them into a complete picture,” he said.

And that’s relevant far beyond entertainment, May maintained.

“Whether it’s immersive 3D content, interactive experiences, new spatial computing platforms, or scientific and industrial applications, OpenUSD will become the fundamental building block on which all 3D content will be created.”
Steve May, AOUSD chairman and Pixar CTO

From Media to Metaverse

OpenUSD is already a de facto standard supported by a wide range of tools, from Adobe Photoshop, Blender, Autodesk Maya, ESRI and Adobe’s new line of Substance 3D tools, to AR tools like Adobe Aero, Microsoft’s Mixed Reality Toolkit, Nvidia Omniverse — and the visionOS SDK for Apple Vision Pro (which uses USDZ, a zipped version of USD).

But as May explained, “right now, the behavior of OpenUSD is defined by what’s in that open source distribution that we provide, which means that if something in the code changes, effectively the way that works can change.”

With Apple and others investing so much in USD, it’s time to turn it into formally defined data specifications that allow for interoperability between tools and ecosystems that stretch beyond media and entertainment.

The open source nature of OpenUSD (and the influence of Pixar on the software market for illustration and animation) means USD is already widely used in game development as well as the entertainment industry (in visual effects as well as animation). Now, it’s being adopted in architecture, engineering, construction, automotive and manufacturing — anywhere that complex 3D assets and environments are important.

IKEA already uses computer-generated images instead of photographs in its catalog: it recently joined AOUSD to be involved in creating a standard it can use to manage 3D content for furniture that can go from CAD models to manufacturing guides, to marketing that shows an entire 3D scene with multiple items, and an AR app that shows shoppers what those items will look like in their own home according to the assembly instructions that come in the package. Lowes has also signed up, for similar reasons.

If OpenUSD takes off as the format for that, you can imagine picking furniture and décor from different stores to see together — because you don’t buy everything in the same place. Use a 3D phone scanning app like MagiScan or Luma AI and you can create your own USD assets of physical objects to include in a 3D environment, so you can include what you already own.

“[OpenUSD] enables robust interchange between digital content creation, CAD, and simulation tools with its expanding ecosystem of schemas, covering domains like geometry, shading, lighting and physics,” Luk explained.

That’s why it is now proving ideal for industrial applications, from augmented reality and spatial computing to Industrial IoT (IIoT), factory digital twins and computer vision (think robots and self-driving cars). It’s also relevant for simulations and scientific computing like computational fluid dynamics and finite element analysis, where you work with 3D meshes; an area that increasingly feeds into product design and manufacturing processes.

New USD schemas could include the electrical, physical and mechanical properties of materials and objects. NVIDIA has already contributed UTF-8 support for international characters, geospatial coordinates, metrics assembly, content validation and visualization as a service.

Having a universal data interchange with interoperability across different ecosystems of tools is going to be key to an industrial metaverse that’s more than just a marketing slogan. Building a metaverse of 3D virtual worlds, interactive experience and 3D industrial environments will require layering and compositing a lot of different elements — like the 3D equivalent of the web, Luk suggested.

“OpenUSD has the potential to be the ‘HTML of the 3D world’ – an open, unifying technology that doesn’t require the obsolescence of other formats, but rather can contain and integrate them as necessary.”
Aaron Luk; oversees USD engineering at NVIDIA

If you’re an architect, you could use it to build a model of how the sun lights a landscape, with accurate shadows to help you forecast energy needs for heating and cooling — and do it once, then use that in multiple designs rather than creating the path of the sun to visualize a sports stadium, and then doing it all over again for an apartment block or a new road layout. Or even easier, you could get an OpenUSD plugin that does that, across the multiple applications you use.

BMW is using NVIDIA’s OpenUSD-based Omniverse platform to animate processes in the virtual factories it uses to plan manufacturing facilities before it builds them, laying out factory workflows, checking for potential collisions, or experimenting with the best place to put an industrial robot by combining 3D data from systems used to design buildings, vehicles, equipment and logistics with the simulation of processes and human workers. The BMW Factoryverse lets teams walk through the virtual factory and make changes to different layers without interfering with each other’s work.

Generative AI goes 3D

With tools like Cesium, you can combine your own 3D assets and data from digital twins with real-world 3D geospatial data from multiple sources: for example, you could take an architectural model built in Autodesk and place it on Google’s Photorealistic 3D Tiles, or lay out an entire interactive city built in Esri ArcGIS on Bing Maps imagery.

If you want a smaller-scale scene for your virtual environment, you could shell out for a 360-degree camera — or you could call a generative AI service to build you a photorealistic panorama from a prompt: Adobe’s Firefly generative models will be available as APIs in NVIDIA Omniverse, and they let designers start with an image or a sketch. “We are also working on 2D-to-3D USD generative AI models,” Luk noted.

Or you could add in AI-animated characters using Wonder Studio or generate facial animations and gestures from audio files with Omniverse Audio2Face, again using OpenUSD to bring them into the environment.

“OpenUSD is the portal for 3D workflows to access generative AIs,” Luk said. “Software vendors and tool builders with OpenUSD-connected applications can create their own proprietary large language models to act and operate across their portfolio of tools — vastly streamlining and enhancing the user experience.”

In fact, NVIDIA has built a foundation model of its own ChatUSD, that software developers can fine-tune with their own data, which can parse USD scenes and generate USD scripts.

That lets developers treat Omniverse as an OpenUSD portal for tools from multiple software providers that they might not be familiar with: “A proprietary LLM or RAG chatbot becomes a co-pilot to the user and orchestrates various ChatUSD or 2D-to-3D-based agents in each of the software vendors’ tools to perform actions. From the user’s perspective, they prompt an arbitrary UI in a viewport that can visualize USD data to complete a task, and tasks are coordinated and performed automatically without having to complete them manually in separate tools.”

Not only does OpenUSD provide a common framework for creating, describing and sharing 3D scenes and assets: it could also be a way to stitch together a whole workflow by combining different tools and AI services.

Making that work will rely on OpenUSD being a comprehensive and robust standard.

Evolving a Standard

Making sure the schemas that make OpenUSD extensible work across a wide range of industries and ecosystems also means making sure the data models that underlie schemas for all this extensibility are specified consistently. That’s a big part of the work the AOUSD will handle, along with creating a full, normative specification for USD, covering Foundational Data Types, Foundational Data Models, Core File Formats, Composition Engine, and Stage Population.

“This is critical work that will be foundational for specifications in areas such as materials, physics, and solid modeling in the future,” Luk told us: the first draft of the specification is due in 2024, with final drafts expected to be ratified before the end of 2025.

The Alliance was formed in August 2023 by Apple, Pixar, Adobe, Autodesk and NVIDIA: it recently announced that Meta has now joined too, along with Cesium, Chaos, Epic Games, Foundry, Hexagon, OTOY, SideFX, Spatial and Unity — and IKEA and Lowes. It’s also now collaborating with the Khronos group, which will reassure developers who are currently faced with handling two 3D asset standards that are starting to overlap.

Khronos’ existing glTF format is already an open source ISO standard with very efficient 3D object representation and it covers scenes as well as 3D assets, but it’s best thought of as “JPEGs for 3D” rather than the high resolution, ultra-realistic 3D scenes USD excels at. While OpenUSD focuses on interoperability and authoring workflows, glTF’s strength is as a publishing format for real-time display. Plus, scenes in 3D environments like the metaverse will likely include other objects and formats, like audio, video and other media, so the two groups are working together in the Metaverse Standards Forum’s 3D Asset Interoperability Working Group to make sense of how the two standards will work together.

There are already a variety of tools that promise to convert between glTF and USD but that doesn’t always preserve all the details, so the group will work on defining “scene elements such as objects, geometry, materials, lights, physics, behaviors in a form that allows straightforward and lossless conversion” between glTF and USD, especially for interactive environments as well as complex static scenes.

AOUSD is starting out under the governance of the Linux Foundation as part of the joint Development Foundation, which May described as having expertise in incubating early-stage technologies into standards that are complementary to the Academy Software Foundation — which already has a large working group on OpenUSD and focuses on supporting the use of open source software in the film industry. Picking the JDF is a sign of the broader opportunities for USD and you can expect to see the specification transition to a larger standards body like ISO or ECMA in the long run, May told us. “The Alliance comprises its own standards body as an incubation point to develop it to the point where we can then move on to common standards bodies.”

Delivering a consumer metaverse will depend on navigating the complexity of competing economic interests between organizations that fiercely defend their expensive IP and content as much as on any technical interoperability. But for the industrial metaverse, there are much stronger incentives for cooperation and collaboration. OpenUSD is starting to gather the right momentum for delivering technology that could actually deserve to use the name.

The post OpenUSD Could Enable a Real Metaverse appeared first on The New Stack.

Don’t think moving your character from game to game: a likely outcome is that scientific and industrial computing get a common, interchangeable and standardized way to describe 3D scenes.

Between remote work, the Great Resignation, AI coding assistants and almost a quarter of a million layoffs in the technology industry in 2023, developer productivity continues to be a thorny topic.

Finding the frustrations and bottlenecks that slow down developers, improving the developer experience, and making sure devs have the right access and tools to work through the ever-present backlog — all of this ought to help organizations and developers alike. This year, Atlassian reported good results from telling its developers to spend 10% of their time “improving the things that make their day job suck”.

But measuring developer productivity isn’t as straightforward as it sounds. Done wrong, it can feel like intrusive micromanagement — potentially adding to developer burnout rates that have barely fallen since the height of the pandemic. Or the process ends up getting gamed and maybe even harming overall code quality.

Some productivity issues, like slow development infrastructure or speccing powerful enough machines to run demanding developer tools, are best handled centrally; issues around release speed, code quality or collaboration are probably something to tackle with the specific team. But how do you find what’s holding back developers and check whether changing things ends up making them better or worse?

LinkedIn hopes the Developer Productivity and Happiness (DPH) Framework it has just open sourced might help. The framework describes the guidelines LinkedIn uses to build the system it uses to understand its developers, the success (or otherwise) of engineering work, and what to focus efforts on (like its Developer Insights Hub).

Max Kanat-Alexander, one of the technical leads for developer productivity and happiness at LinkedIn, explained how it works to The New Stack. “When you talk to engineering executives, very often they will tell you ‘I really want to help people be more productive, but I don’t know where to start. I don’t know where the problem is, I don’t know where my investment will pay off the most’. And it’s really having data and feedback that lets you do that.”

Kanat-Alexander was previously the technical lead for code health at Google, and chief architect of the Bugzilla project before that, so he’s been working on developer productivity for almost two decades. In the last few years, interest in using data and feedback to improve developer productivity and happiness has increased significantly across the software industry, he told us. “This whole subject of data and feedback has really just taken wing.”

Find Your Own Metrics

It’s tempting to pick lists of metrics to track from frameworks like DORA (the DevOps Research and Assessment research group) and SPACE (work done at GitHub and Microsoft that extends DORA from the initial, very functional metrics to include satisfaction and well-being; performance; activity; communication and collaboration; and efficiency and flow). DORA has expanded enormously from the initial four metrics (deployment frequency, lead time for changes, change failure rate and time to restore service) and SPACE has a matrix of possible metrics — but it’s a mistake to treat those like a menu.

“Most of the time the most effective and successful metrics are going to be very specific to a team and their situation,” Kanat-Alexander warned.

“Trying to find a single set of cookie cutter metrics almost always falls flat in terms of their effectiveness at moving productivity for the organization and really making things better for developers.”

“What people need is a way to define their own metrics: they need to understand how to come up with metrics.” That’s what the DPH Framework tries to help with.

Image via LinkedIn; Teams at LinkedIn track a mix of metrics that are used across the company and some specific to them: the DPH Framework is a guide to finding what those might look like for your own team.

There are obviously useful metrics in frameworks like DORA and SPACE but they may be at such a high level that you need to implement metrics and gather other data to make them useful for your own situation.

Take the end-to-end time to deliver a change, which sounds like a good thing to track. “That might show changes at a very, very macro level in a very large company. But to know what drives those changes, you’d have to have a lot more information and you also have to have a very deep understanding of developer productivity.” And even with all that information, it might not be clear that the work a developer did made a difference to that final figure, while a more specific metric could be more informative.

“You might be able to have another metric that’s very clear, that shows some improvement that’s very related to the specific work that you’re doing, and really demonstrates the value of the work that you’re doing to the business and also allows you to understand where the biggest pain points are so that you know that you’re doing the most impactful work you possibly can.”

Similarly, the response time for code review can be a useful metric, and it’s one that closely tracks the size of code changes: “The relationship is super linear to size,” Kanat-Alexander pointed out; “it becomes much harder to get fast responses as changes become larger”. If you decide to track review response time to improve review throughput, you might need to split it by language or by framework or by ecosystem – but it’s more important to know what you’re optimizing for and why because encouraging smaller code changes can improve software velocity; it can also be an easy metric to game.

“You have to have the data and you have to look at the data and then you have to talk to developers and ask ‘Is this a problem? Are slow review times a thing you care about? Is there something wrong with the review tool that’s a blocker today?’. If you haven’t told us that it’s a problem and we give you a solution, will you appreciate it? Will you even use the solution?”

“First find out from your developers what they think the problem is, and then instrument that area.”

Although the framework includes examples of some of the metrics LinkedIn itself uses, they’re purposely left that till the end of the discussion, which starts with the Goals-Signals-Metrics approach used to choose those metrics.

“Goals are actually the most important and those are the things that are much easier to align up and down the org chart, because you can always ask why. For any goal you have, you say: why is that my goal? And if you keep playing that game, like a two-year-old who just keeps saying ‘why why why’, eventually you will get to the company’s mission.”

For organizations and managers, the right metrics can help understand what needs changing, especially when there are limited resources, he noted. “What do we do, what’s impeding developers the most, and what if I can only assign a limited number of people to do that work? What would be the most effective work to do?”

Avoid vanity metrics that won’t help you make decisions with the data you collect – which is, after all, the reason for collecting data in the first place. The organization has to be willing to use the metrics to make systemic changes, rather than trying to use them for individual performance reviews. “You always have to have metrics that you’re willing to be bad because that’s when you learn that there [are] problems and that’s when they’re effective,” Kanat-Alexander pointed out.

“The great news about bad news is it’s your best opportunity. It’s a thing that you can change and you can do something about.”

Happy Developers, Happy Business

Don’t be surprised that the framework doesn’t just talk about productivity. “Happiness is a really important signal for leaders to pay attention to. People sometimes think this is some sort of touchy-feely thing where you want people to be happy and they’re saying ‘I don’t need people to be happy. I just need people to do work.’ If you look into the research, it is very difficult to separate productivity and happiness. They’re very tied together.”

“It’s really never ‘how do I, by myself, code faster?’ It’s really how do we as a team, or an organization or a business, move as quickly as we can with the highest quality that we can and also keep everybody happy. Because happy people are productive, productive people are happy, happy people produce better products.”

“Everybody feels better almost always means the company is more effective at accomplishing its goals.”

Thinking about happiness alongside productivity also helps solve the problem of connecting what can be abstract metrics to the problems developers have that stop them from being productive, he suggested.

“A lot of the time what you see is that somebody thinks some metric represents productivity, but the developer doesn’t think that doing that thing represents them being productive, [or] represents their impact, or […] producing value. Happiness gives you that insight of, ‘Hey, you missed something’; maybe you’re measuring the wrong thing.’ And very often, the happiness signal is actually the broader and more encompassing signal than the quantitative productivity signals.”

The human element matters because software engineering, especially in a large organization, is best thought of as a team sport; and the issues developers run into are human issues as well as technical issues. That might be that the goals for the team aren’t clear, or the way the organization is structured isn’t efficient. “Maybe they’re involved in a lot of alignment meetings or they don’t have a clear planning process; and that ends up taking up a lot of the time that could otherwise be going to software development.”

Whether or not you believe that every company is now a software company, you might discover that poor developer productivity is actually exposing bigger underlying problems for the organization. “Theoretically, that data can absolutely help you make more intelligent decisions about how to focus your organization.”

But the framework also encourages everyone to be pragmatic. If people tell you that having the data isn’t actually going to make any difference to their behavior, you know that’s a metric that isn’t worth collecting and an analysis that isn’t worth doing.

The post LinkedIn Shares Its Developer Productivity Framework appeared first on The New Stack.

LinkedIn's new open source software development framework mixes hard data with the importance of the human element.

Kubernetes is increasingly popular as a platform for building machine learning projects on, but the developer experience on Kubernetes remains challenging, often requiring more infrastructure expertise than many coders are interested in acquiring.

And despite the promise that containers encapsulate applications and their dependencies mean portable and consistent environment throughout the development cycle, that’s just not practical for the largest models like those used in generative AI, where neither the dataset nor the GPU hardware is available for developers working locally.

To improve the developer experience, LinkedIn created FlyteInteractive, which provides the user with “an interactive development environment that allows them to directly run the code, connect with Microsoft VSCode Server inside Kubernetes pods with access to resources and large scale data on the grid. It’s like doing SSH to the GPU port and developing directly from there. Everything’s exactly the same,” explained Jason Zhu, a LinkedIn machine learning engineer who helped create the software.

Instead of writing a mock dataset to use with their model, developers have access to the real dataset that’s on the cluster using VSCode’s remote development support, which avoids wasting time on a model that can’t cope with the full-size dataset. “As we’re pushing towards larger and more complex architectures. It’s almost impossible to develop the code locally and get it tested,” he explained.

“The resources available for local development don’t include the same high-end, high-priced GPUs that are used in production, the same amount of memory — or the complexities of a distributed system. You can compromise the model size and complexity [to run it locally] but that will also compromise the chance that once you will upload the model to real production data, it will succeed.”

Early Flyte

As the name suggests, FlyteInteractive is a plug-in for adding more features to another open source project already in use at LinkedIn, Flyte.

Originally developed and open sourced by Lyft, Flyte is a workflow orchestrator for Kubernetes written in Go that’s designed specifically for data and machine learning pipelines, with an interface that allows developers to build their workflows in the most popular language for machine learning developers: Python, with strong type checking so more bugs are caught at compile time (which can save money as well as time given the expensive infrastructure required for machine learning).

Flyte graduated from the LF AI & Data Foundation in early 2022 and is already in use at HBO, Intel, Spotify — and LinkedIn, which uses AI extensively and has already migrated all its LLM workloads as well as some traditional machine learning workloads. Flyte covers more scenarios than Kubeflow and doesn’t demand as much Kubernetes expertise from developers (but it also has Kubeflow integrations for popular packages like PyTorch and TensorFlow).

A big part of the appeal for large organizations is the scalability it offers, according to Byron Hsu, a committer to the Flyte project who works on scaling machine learning infrastructure at LinkedIn. “We have over a thousand workflows every day and we need to make sure every workflow can be brought up quickly.”

Flyte also helps with the kind of rapid experimentation that’s so important for machine learning, where datasets change and new algorithms come out frequently. “The scheduling time is super, super fast so users can do experiments quickly,” Hsu told the New Stack.

The Python interface also makes Flyte easy for machine learning developers to pick up: “If you want to add a custom Python task to your workflows, it’s intuitive and easy in Flyte. It definitely makes machine learning developers much faster.”

Flyte also brings some familiar DevOps features that speed up machine learning development, explained Zhu, who works with extremely large models like the one that drives the LinkedIn homepage feed. “Previously, every time we built our pipeline we had to pull in the dependencies locally and we had to wait for that [to happen]. But because Flyte is image-based, we can bake in all those dependencies in the image ahead of time, so it just takes several seconds for a user to upload their job and the process of putting in all those dependencies happens at runtime.” That saves a significant amount of time, including every time you update a workflow and run your machine learning job again.

To encourage code reuse and avoid every team rebuilding the same components and processes for each new project, LinkedIn has created a Component Hub on top of Flyte, which already has more than 20 reusable components that save a lot of repetitive work. ”It’s for common tooling like data pre-processing or training or inference,” Hsu explained. “The training team can build a training component like a TensorFlow trainer and all the ML engineers at linked can use that without reimplementing it.”

This also makes more powerful and complex techniques like the model quantization Zhu has been working on recently much more widely available by turning it into a function or API call. There are multiple algorithms for converting a model’s representation from high to low precision so you can compress models and serve them using the fewest possible resources and usually machine learning developers would need to research the latest developments, pick an algorithm and them implement it for their own project.

“We built it as a component and because Flyte has the concept of reusable components, for every other user’s pipeline, they can choose to call that as an interface or an external API. So they can quantize their model after the model has been trained, no matter whether it’s a model for summarization, or it’s a model for reasoning or is a model for entity extraction,” Zhu said.

Developers can explore multiple algorithms quickly, because they can just plug them into their workflow to test their effect on both resource usage and the accuracy of the model.

“If you’re not sure whether quantization will work on your specific use case, we can have a centralized hub with all the different quantization algorithms, so you can test them all and look at the matrix of results and the latency to understand the trade-offs and figure out the right approach. As the field evolves, more quantization algorithms will come up so we have to have a very flexible platform that we can test all these algorithms and add them to the centralized hub that all the downstream pipelines can benefit from,” Zhu said.

Remote Interactive Debugging

Being able to write your pipeline more quickly and reuse components speeds up machine learning development enough that the software engineers at LinkedIn started to notice the other things that were slowing down their workflow: everything from having to work with smaller mock datasets that turned out not to match production datasets well enough, to the local development and testing environment lacking the hardware and resources of the production environment which means artificial limits on the size of models to the long cycle of debugging and having to wait for the code to be deployed before finding out if it actually fixed the bug.

Thanks to the differences between the local and production environments only about one in five bugs were fixed first time around: with each code push taking at least 15 minutes to get into production. Tracking down even a minor bug could take dozens of attempts: in one case, it took nearly a week to find and fix an issue.

These problems aren’t unique to machine learning development, but they’re exacerbated not only by the sheer size of model machine learning models and the datasets they work on and the expensive infrastructure required to run models in production but also by an ecosystem that doesn’t always offer tools developers in other areas take for granted, like code inspection and remote debugging.

Even the smallest generative AI model that has reasonable can’t run on a CPU, Zhu pointed out. “When you get to that stage, it’s really natural for us to move the coding and debugging process into the Kubernetes pod or GPU clusters with the real data and the same resources as you would run in production.”

FlyteInteractive can load data from HDFS or S3 storage and it supports both single-node jobs and more complex multinode and multi-GPU setups.

Developers can just add the VSCode decorator to their code, connect to the VSCode server and use the Run and debug command as usual to get an interactive debugging session that runs their Flyte task in VSCode. Flyte caches workflow output to avoid rerunning expensive tasks, so VSCode can load that from the previous task.

You get all the usual options like setting breakpoints (even on a distributed training process) or running local scripts, as well as the code navigation and inspection tools that make it easier to understand the complex code structure of a large model with multiple modules and see how the data flow into the model.

You can also set the plug-in to automatically run if a Flyte task fails, which stops the task from terminating and gives you a chance to inspect and debug from the point of failure. When you’ve worked out what the problem is and rewritten your code, you can shut down the VSCode server and have Flyte carry on running the workflow. “You can resume the workflow with the changed code: you can just click a button and then the task will run with the new changed code and the whole workflow will continue,” Hsu explained.

The Jupyter notebook support in FlyteInteractive will also be helpful, he suggested: “It’s a quick orchestrator with the capability of Jupyter notebooks and interactive debugging, so you can use it to both quickly experiment and also for a scheduled job or batch job.”

Although it’s currently a plugin, now it’s open source he hopes with community input it will become a built-in feature in Flyte.

Exploring Resources and Code

FlyteInteractive has already saved thousands of hours of coding and debugging time at LinkedIn; it might also help with cost control with its resource monitoring option. “If a pod is idle for a certain period of time, we’ll just clean it up and send an email to notify the user ‘Hey, your pod has been idle for a while. Think about releasing the resource or doing something to take some action on it’.” In the future, Hsu told us that will be finer-grained. “For example, we want to detect GPU utilization. If they occupied a GPU, but they don’t actually use it, we might want to kill it after say ten minutes, so we have better budget control for our GPU system.”

That relies on the checkpointing support in Flyte because taking checkpoints is expensive and not usually a good fit for the iterative training loops used in machine learning. “We have to provide good checkpointing so that out when user job gets pre-empted, it also has the model saved.”

But for developers, the most appealing feature isn’t even the fast debugging, Zhu suggested. “I like the code inspection feature because it allows me to understand the inner working mechanism of algorithms quickly and also helps me to come up with some new approaches.”

That’s not just useful for your own code, he pointed out. “Not only can engineers apply this to their internal repos but they can also apply that to open source repos. As a field, ML is super fast: new algorithms come up every week that engineers like us have to test out. We can just point this tool at an open source repo and quickly understand whether it’s a technique that we want to go with.”

The post LinkedIn Open Sources Interactive Debugger for K8s AI Pipelines appeared first on The New Stack.

Based on Lyft's Flyte Kubernetes scheduler, FlyteInteractive connects with VSCode Server inside Kubernetes pods to access resources and large-scale data on the clusters.