Quantcast
Channel: Mary Branscombe, Author at The New Stack
Viewing all 172 articles
Browse latest View live

How Docker on Microsoft Azure Helps Developers Get More Into DevOps

$
0
0

When Microsoft first announced support for Docker on Azure, it looked like just another container to add to the list alongside its support for container management tools like Docker libswarm and Kubernetes instead of using comand line tools, while we waited for native support of the Docker engine in the next version of Windows Server. But although Azure is built on Windows Server, it’s also where Microsoft increasingly prototypes new server features that are well suited to the cloud. For example, Azure CTO Mark Russinovich says that Azure is a “key driver, working closely with the Windows team on requirements [for Docker] in the next version of Windows Server, because Azure will be the primary customer.”

That’s why over the last six months Azure has gone from letting you spin up Docker virtual machines to adding more and more support for the management platform. And it’s the management that makes Docker so interesting for devops, and the often-neglected developer side of devops.

Azure Integration

The latest development on Azure is putting the Docker engine in the Azure Marketplace, ready to go in an Ubuntu image. Before this you had to create and run your Linux VM and then install the Docker Azure extension in the running VM – and you can still do that if you want to work with a different Linux like CoreOS. That takes advantage of the agent that’s built into all VMs running on Azure, which you can use to inject management agents like Chef and Puppet or anti-malware software. The agent makes Azure’s IaaS offering a little bit more like PaaS, but it’s still a manual step to inject the agents you want into VMs.

But if you’re happy with Ubuntu, this is faster and simpler; you choose Docker on Ubuntu Server from the gallery of VMs, click Create and fill in either a password or an SSH key. (And with Microsoft calling this only the first of many Docker integrations that will be in the Azure Gallery – which is adding more and more open source content – other Linux integrations seem likely to follow.)

“It’s a further simplification of being able to launch and manage the Docker host in Azure,” Corey Sanders of the Azure team told us. “Now you can go into the Marketplace and launch an Ubuntu image with the Docker engine pre-installed so you can launch and manage Docker with just a few clicks. The idea is how do we get it so you can very simply and quickly get to that Docker environment that you want to launch and deploy.”

That’s ideal for both devs and devops deployment practices, he observes. “The most common use case that we’re seeing is the way it’s just few clicks – or if you automate it, a single click — to go from your developer environment to your test environment to your production environment with only a few changes.

From a developer point of view, you can move whatever app solution you’ve written with ease across lots of environments without having to care about them. Whether you’re thinking about multiple cloud providers or your desktop, it can deploy anywhere. Previously you had a lot more that you had to deal with as you moved between environments.”

If you’re deploying inside an enterprise or creating a hybrid deployment, you might be looking at Docker Hub Enterprise which hosts private Docker images; you can also put that on Azure Storage in a blob, using the Docker Azure storage driver.

And if you’re waiting for the planned Docker Orchestration APIs they’ll be coming to Azure; MS Open Technology, the subsidiary that creates open source implementations of technologies that Microsoft is interested in, has already contributed code for the alpha release of the Docker Machine service https://github.com/bfirsh/docker/pull/1. When the Docker orchestration APIs are defined, you’ll be able to use them with Azure, through the Azure Management portal. That — and the native Windows Server Docker engine — is when Azure will really pull all the pieces of Docker together.

Cloud Services Put the Developer in DevOps

A cloud service like Azure is a good match for Docker because the appeal of Docker for developers is the portability (as well as the ability to build on what other people have already build thanks to the stacked virtualisation model that lets you compose multiple layers of virtual file systems to get the combination you need by referencing base images rather than duplicating them), both of which give you faster development cycles.

Docker Vice President of Marketing Dave Messina suggests that Docker created “a separation of application and infrastructure constraints – you get integration of distributed applications and the infrastructure folks can focus on infrastructure management. We have customers who are used to take months to get code changes into productions being able to get a change into production in a matter of weeks – or faster.”

That’s giving developers more of the benefits of devops. “Even though the word is devops, most of the emphasis has been on the ops side,” he notes. “What has made Docker unique is it fundamentally changes the productivity of the developer, because of how much more productive they can be by focusing on the container. You don’t have to deal with the headache of ‘it worked on my laptop’; they can focus on optimizing the right infrastructure for the apps to run on knowing they have portability.”

Making Docker simpler to start and better integrated with Azure matters when you’re trying to take advantage of the agility Docker can enable. And having it in the cloud means that you can get the benefits of devops without a lot of internal changes. “With devops, part of the change is organizational change. The great thing about this model is it separates your apps from the constraints of infrastructure. You can change your org structure [to do devops with Docker] – but you don’t have to.”

Feature image: “Ancona, Marche, Italy – Porto / containers,” bygdb – Gianni Del Bufalo,  is licensed under CC BY-SA 2.0.

The post How Docker on Microsoft Azure Helps Developers Get More Into DevOps appeared first on The New Stack.


The Switch to Open Source is Working for .NET and Microsoft

$
0
0

Microsoft had a clear strategy when it open sourced .NET. How is it playing out? Making .NET open source lets Microsoft build it faster, get it onto more platforms (you can run the common language runtime [CLR] on OS X now) and take advantage of community contributions. It’s also changing the way .NET is developed.

Some of that is the nature of open source. For one thing, taking contributions from the community and implementing pull requests means that contributors scratch their own personal itches, down to improving comments or fixing the formatting of the code.

Seemingly trivial changes in Microsoft open source projects don’t get rejected. The mind-set is changing from the view that a product team is letting people play with its toys, to an appreciation of the fact that the more you let the community have an impact, the more committed they’ll be to the product.

And when it comes to internationalization issues, something as simple as a comma can be the problem.

Getting that community involvement is a priority, because it’s what will make .NET truly cross-platform. As Immo Landwerth of the .NET team puts it, “if you want to do cross-platform for real, you kind of have to be an open source stack; look at all the other cross-platform tools out there, like Linux and browsers. If you move it out in the open, the people who care about a certain architecture will jump in.” And they did.

Of the first 100 pull requests for .NET, more than 40 came from the community, according to Dave Kean of the .NET team. Many of those were small changes, but there have already been more substantial contributions. By the end of January, there were too many forks for GitHub to display as a graph: over 1,000. By that time, there had been nearly 250 pull requests and 51% of those came from the community.

In fact, it looks like Microsoft was surprised by the amount of participation and contribution, especially for OS X. CoreCLR initially had support for Windows and Ubuntu, with OS X support planned for later, after the team had done some more preparation. But so many people wanted to work on it — including Geoff Norton from the original Mono team who had worked on Mono for OS X and contributed the code to get CoreCLR onto OS X — that Microsoft quickly created a feature branch and migrated the work it had already done on that preparation to GitHub so the community could join in.

In practice, that discussion and collaboration took a few days to resolve, with some frustration along the way. Microsoft engineers have a lot more experience running internal projects than they do running high-profile open source projects, and they’re still getting up to speed with the process. But it did get resolved, and Microsoft did the right thing, so it’s clear that they’re committed to this and want to do the right thing, even if they don’t always move as fast as the community.

“We’re still learning,” says Landwerth repeatedly.

They’ve recently highlighted a number of areas as ‘up for grabs’ to make it clear what they’re not working on (and you can see what they are working on as it moves into public feature branches). The new .NET Foundation Advisory Council (http://www.dotnetfoundation.org/blog/welcoming-the-newly-minted-advisory-net-foundation-advisory-council-members) isn’t a rubber-stamp list of Microsoft staff and existing .NET projects either; there are some familiar, credible, open source names there, like Debian and GitHub.

Just putting .NET onto GitHub meant some fairly substantial changes in the way the team works. Their internal engineering systems were all based on Microsoft’s Team Foundation Server, which is great for collaborating with people inside Microsoft, using Visual Studio, but it wasn’t going to work for everyone else.

“To bring .NET into the open source, we needed the engineering system to be open source,” Kean points out. “We don’t want to have snapshots of the source code that we give you every once in a while. We need to give you the real deal.” That meant changing the way the internal systems work by adding a two-way mirror to GitHub so the code could be live on both systems. To prepare the code, Microsoft also used Roslyn to write an automatic code formatter to convert it all into the .NET-approved coding style (older, core libraries often had different coding styles).

That’s also why the codebase has been going up piece by piece: first the out-of-band libraries, like mutable collections, and the metadata reader used for cross assemblies and xml; then bigger pieces like primitives, the registry, the file system and the CoreCLR execution engine — which even includes the garbage collector. (The garbage collector is big in more than one sense. It’s too large a file for GitHub to pretty print; at one point Microsoft’s build system had to be rewritten to handle the file size. It was originally written in LISP, then machine translated to C; and one of the Microsoft Technical Fellows has been working on it for years, adding features.)

A bigger change is moving internal processes into the community. If the .NET Core code that’s getting a code review is on GitHub, the code review now happens on GitHub — even if it’s code from an internal Microsoft developer. The idea is that “there’s really no difference between the external and the internal contributor; both will submit a pull request, both will get feedback from Microsoft people as well as the community, and then it gets decided whether things are in or not in,” says Landwerth. “This really blurs the line; it doesn’t matter who you are.”

In time, internal and external pull requests will run through the same internal code, checking for things like internationalization and compatibility. That might slow things down, because the compatibility testing can take several days, but it paves the way for major contributions from the community because they can be treated the same way internal development is.

The .NET team does regular API reviews (several times a week if there are enough requests and changes to deal with). Open source contributors can’t go to those meetings, but there’s a process for community discussion first, and video recordings and notes from the API go onto Microsoft’s Channel 9 site and a GitHub wiki, so everyone can see how decisions were made.

That’s part of how the .NET team wants to get Microsoft’s “tribal knowledge” out to the community.

“Internally, there is a site where we have every single discussion they had in [the] C# and .NET [teams] since the very beginning,” Landwerth explains. “There is so much insight in there, into why you did certain things, why you rejected things.” He’s also enthusiastic about changing the process, and, in general, having as little process as possible. So if the current system for doing API reviews doesn’t work for the community, you can expect it to change.

.NET is far from the only serious open source project at Microsoft: there’s the Roslyn compiler, F#, the Azure SDKs, TypeScript, TouchDevelop, Project Orleans, Bond… At the beginning of February there were 840 Microsoft repos on GitHub (there may be even more by now).

And all this is starting to change some other things inside Microsoft. CEO Satya Nadella has repeatedly talked about “internal open source” — getting different divisions to share their code with each other rather than reinventing the wheel every time — that’s started internally, through internal systems. The source code for Visual Studio Online (VSO) is available internally, and the Office team has contributed code they use for building mobile applications to the next version of the VSO build system. But some Microsoft developers are also getting code from Microsoft teams in other divisions — by getting it from their GitHub repo.

Development in the open isn’t for everyone, and it doesn’t work for all projects. But it clearly has benefits, even in as traditional a development organization as Microsoft.

Featured image via Flickr Creative Commons.

The post The Switch to Open Source is Working for .NET and Microsoft appeared first on The New Stack.

Project Orleans, the .NET Framework from Microsoft Research Used for Halo 4

$
0
0

Late in 2014 Microsoft announced that it was planning to open source Project Orleans — a .NET framework built by the eXtreme Computing Group at Microsoft Research (MSR) to simplify building cloud services that can scale to thousands of servers and millions of users, without needing to be an expert in distributed systems.

The promise of cloud is you can scale out whenever you need to (and scale back when you don’t), and get resiliency and performance without having to worry about getting bogged down in maintenance and management. But getting those benefits for the service you build, rather than just the cloud platform you run on, means building the architectural principles of cloud right into your service as well, so you can handle hundreds of thousands of requests per second, across thousands of servers, in real-time and scale to high demand, low latency and high throughput.

Project Orleans is designed to help with that. First announced as a research project back in 2010, it’s been used for several internal Microsoft systems including two very large player services in Halo 4: one that tracks all players and game sessions, another that handles detailed statistics about players and all their actions in every Halo game ever played, which show up in the Halo Waypoint interface.

If you try building that kind of system with the familiar n-tier development model at scale in real-time — even though cloud lets you scale up when you need to — you quickly run into storage bottlenecks. To avoid information bouncing back and forth between the front-end and the mid-tier storage you add a cache, but that makes for a far more complex development model, because your cache has different semantics from your main storage, and you have to handle moving information from storage that has concurrency to storage that has no concurrency.

The n-tier model doesn’t handle common cloud scenarios well either. If you want to chat with someone on a social network, switch playing an online match with one group of friends to another, or deal with information streaming in from multiple devices, you need to efficiently pass information back and forth. Even map/reduce, which is ideal for off-line processing of very large data sets, isn’t really efficient for handling interactive requests where you need to work with a small set of related data items.

It’s easier to handle those issues on a smaller scale, but if you’re building a cloud service you want it to scale up when it gets popular, without having to change the architecture. And concurrency and distributed computing are just hard to get right. Reliability, distributed resource management, complicated concurrency logic and scaling patterns are problems almost every developer building a cloud service needs to deal with, and the idea behind Orleans is that these things shouldn’t be difficult to write.

“The goal of Project Orleans,” explains Sergey Bykov, from the eXtreme Computing Group at MSR, “was to make building cloud services much easier than it has been and to make it so you write relatively straightforward code, but it can scale through several orders of magnitude. If it’s successful, you shouldn’t be required to throw away the code you wrote.”

Scaling Cloud Like a Farm, With Grains and Silos

To handle cloud workloads that need high throughput, low latency and efficient crosstalk, the team took their inspiration from the actor model of the 1970s, although you’re more likely to know it because Erlang used it in the 1990s.

Each of the objects you need to work with is treated as an actor — an agent that has its own isolated state. Actors don’t share any memory and they only exchange messages, so they’re easy to distribute. It doesn’t matter if another actor is on the same server or in a cluster on the other side of the country, because you have a way to pass messages built in to the system. That’s a much higher level of abstraction than the distributed objects in COM and CORBA and Enterprise Java Beans. And unlike some cloud app development systems (say, Google App Engine), Orleans is both asynchronous and single-threaded.

That makes it far easier to write. Developers don’t have to handle concurrency, locks, race conditions or the usual problems of distributed programming — just actors, which Orleans calls grains. Grains are .NET objects and they live in runtime execution containers called silos — there’s one on each node. The Orleans framework will automatically create more silos when they’re needed, clean them up when they’re not and restart any that fail. It also handles creating the grains, recreating them in another silo if a node fails, and again, cleaning them up when they’re not needed. Messaging is done through .NET interfaces and Orleans uses the async/await patter in C# to be asynchronous.

The Twist in Orleans is Making High Level Abstraction Efficient by Virtualizing the Actor

For grains, you have both the object class that defines them and the instance of the grain that does the work, but there’s also the physical activation of the actor that’s in memory. A grain that’s not being used still exists and you can still program against it — but until you call it, the grain is virtual. When you need it, the Orleans runtime creates an in-memory copy to handle your request; and if it’s been idle for a while, it garbage collects it.

If you need it back, the framework can reactivate the grain, because the logical entity is always there. The same grain might get activated and deactivated in memory many times on many different machines in a cluster. It’s there when you need it, it’s not taking up memory when you don’t, and the state is handled by the framework without you having to remember to create or tear down anything. If a grain is stateless, the runtime can create multiple activations of it to improve performance.

Orleans uses familiar design patterns, like dispatchers, to send a batch of messages in a single call and distribute them to the correct grains; hubs that give clients a single publishing endpoint to channel events from multiple sources; observers that monitor a grain and get notifications when it changes state; hierarchical reduction to efficiently aggregate values that are stored in many different grains; and timers that mean interactions between grains don’t have to be tied to the cadence of inputs from external systems.

Many failures are handled automatically, because the runtime restarts silos and grains as necessary.

Orleans is ideal for building many typical cloud services. One of the proof of concept (POC) systems MSR built was a Twitter-style social network (called Chirper) Another was a linear algebra library to do the kind of vector matrix multiplications used in algorithms like PageRank and machine learning clustering, feature extraction and partitioning.

It’s particularly well suited to handling large graphs — like a social network graph, you need to query efficiently even though it’s distributed over many machines. But it’s also good for handling social graphs, and also for near-real-time analytics, streaming data from IoT sensors, as a mobile backend and as a smart distributed cache that sits in front of your more traditional storage system.

There are plenty of other tools for building these kinds of systems, but very few that let you work at a high level without having to implement your own logic for scaling and concurrency and still get really high performance, which is why Orleans has been getting a lot of interest.

Open Source for Confidence and Community

Orleans is a perfect example of how Microsoft is using open source strategically, to improve key technologies and give developers the confidence to build on them. This isn’t a side project or any kind of abandonware; Orleans is a powerful system that’s actively being worked on and is being used for major projects inside Microsoft.

Take the new Microsoft ‘trillion events per day’ in-memory streaming analytics engine for handling big data, Trill. Trill is a .NET library used for analytics on Bing, is currently speeding up Bing Ads, and powers the query processing in Azure Stream Analytics. MSR vice president Jeannette Wing has called it out as one of MSR’s success stories and it’s being used in a range of Microsoft services. For a number of those, to get scale, Trill is being hosted inside Project Orleans, for example, to handle large numbers of independent, long-standing queries against real-time streaming data and generate alerts.

There aren’t many details about Trill, and it’s still a research project that you can’t get your hands on unless you work at Microsoft. However, you can see Trill hosted inside Orleans in this session from Microsoft’s 2014 Build conference, where Paolo Salvatori from the Azure team uses them together to build a real-time analytics service to handle events from IoT sensors using the Azure Service Bus.

Each instance of Trill is inside an Orleans grain, so the system can scale up and down to handle input from many devices. (The term grain might make you think of small objects, but there aren’t any restrictions on the size of a grain, so they can hold a lot of state.)

The first public preview of Project Orleans was released at the Build conference in 2014, to lots of enthusiasm. Tapping that enthusiasm is why the team decided to take it open source. Developers who were working with Orleans liked the fact that many of their bugs and suggestions made it into the September refresh, but they weren’t as keen on waiting months for fixes. In late January, the Orleans source code was released on GitHub under the MIT license, and it was getting pull requests the same day.

Open sourcing Orleans is part of the pattern we’re seeing from Microsoft on development frameworks for building services. Just as we’ve heard from the .NET team, open source is what developers expect — even from Microsoft. “Services are very different from boxed software,” points out Sergey Bykov.

Open source is the norm for the services stack. Everyone needs the assurance that they can fix an issue fast if necessary, and not have to wait for an official patch. Most people will never make a fix to the stack, but they still need the assurance that they can if necessary. Open source is that assurance.

Plus, Bykov is excited about getting the community to help improve Orleans. “Scaling out development is an equally important reason [to go open source]. There are way more talented and experienced people out there than you can ever hire. If you manage to make them part of the ‘extended family’ team, you can do so much more, and faster.”

As usual, the community contributions range from simple code cleanup to tracking down some complex bugs in the code. That’s exactly what Bykov was hoping for and it’s why you can see the Orleans team on GitHub, proposing API changes, debating project goals and being very clear about the work they still have to do. For example, to allow developers to write runtime extensions to Orleans as well as use it to develop their own applications.

“You have to treat these people truly as your peers. You have to listen to them, discuss issues and trade-offs, disagree with them. You can’t be above them and give them orders. They are just like you, but collectively smarter and more experienced than you can ever be.”

Feature image via Flickr Creative Commons.

The post Project Orleans, the .NET Framework from Microsoft Research Used for Halo 4 appeared first on The New Stack.

Project Orleans: Different Than Erlang, Designed for a Broad Group of Developers

$
0
0

Project Orleans is a .NET framework built by the eXtreme Computing Group at Microsoft Research that has given the actor model a somewhat different role to play compared to Erlang, the best known example of the actor model. Erlang is designed for distributed systems, originally created by Ericsson in 1986 to control circuits in telecoms equipment, as noted by Dr. Natalia Chechina of the University of Glasgow in a talk last year about scalable Erlang.

Natalia_Chechina_-_RELEASE_Scalable_Erlang_on_Vimeo

Erlang, for example, is used in Riak Core, as discussed in an excellent post by Yan Cuti titled: “A look at Microsoft Orleans through Erlang-tinted glasses.” He writes:

In the Erlang space, Riak­ Core pro­vides a set of tools to help you build dis­trib­uted sys­tems, and its approach gives you more con­trol over the behav­ior of your sys­tem. You do, how­ever, have to imple­ment a few more things your­self, such as how to move data around when clus­ter topol­ogy changes (though the vnode behav­ior gives you the basic tem­plate for doing this) and how to deal with col­li­sions, etc.

The Orleans Approach

The Orleans team made some very different decisions when considering the trade-offs between high availability and consistency, and especially between ease of use and a more academic approach.

The idea was to democratize the actor model and put it in the hands of a broad audience of developers who might appreciate a language like Erlang but would never turn to it for daily development. “Actors represent real life better than other paradigms,” maintains Sergey Bykov from the Orleans team. “Object-oriented is close, but you’re not tightly coupled in real life.” That makes it too useful a model to restrict to the small community of developers who were prepared to start from scratch and learn a new language in a small ecosystem.

“We often hear from people saying ‘this is so easy to pick up — I just downloaded the SDK and it just works. I didn’t need to learn anything, I deployed the worker role on a cluster in Azure and it just works,'” Bykov says. “They’re so happy, and that’s the customer we’re after.”

“When we were thinking about how to frame the key APIs and what audience to target, we deliberately chose to go to a more democratized version of message passing. The trade-off we made there is for ease of use and flexibility and acceptance for the majority of developers.”

That means it’s not just the .NET the Orleans framework is written in that will look familiar to developers. “The method names, the argument names; everything will be familiar to them,” says Bykov. Anyone who’s worked with the Windows Communication Foundation will also feel at home.

“You can always squeeze a few extra percentage points of performance out by hand-crafting things, but in the modern world, five, ten even twenty percent performance is nothing compared to developer productivity and time to market and being able to hire people quickly to build a product,” Bykov maintains. “Expensive developers may create a system that’s faster five years later, but you need a system in a year or six months. It’s a complex question of optimization that’s not just about raw performance but the economics of software development and hiring people.”

Design Decisions

Message passing is a key feature of the actor model, and how efficient Orleans can be at message passing depends on whether you’re running it on a single system or distributing it over a network. “If you’re sending messages between two actors on different machines you’ve got no choice, you have to send it over the network and the efficiency comes down to how much data you’re sending,” Bykov points out. “Orleans’ messages come with a bunch of headers — who sent it, who it’s to, some tracking information. Some fields look extraneous, but in reality when you’re debug giving a system because something doesn’t work, you appreciate having the call stack and the debug context so you can see what happened. It saves time when it’s critical when you’re in production.”

If you’re running on a single system, Orleans can save the overhead of serializing messages. “You can just isolate messages and do a mem copy instead of real serialization, and we do that if a message is between two grains on the same machine.”

The team also made what might seem an unusual choice about serialization and built its own serialization layer. “We could have picked up Bond or another framework. We looked at the five serialization frameworks that are already used inside Microsoft,” explains Bykov, but they all had drawbacks, often being expensive in terms of computation and message size. “.NET is expensive, binary XML even more. Others require you to define message formats up front and compile them; WCF does that with contracts. Bond is very efficient but is subject to the limitation of having to define message types up front. We wanted something more flexible than that. We wanted it to look like almost any call to function. So we ended up building our own serialization layer. It may sound expensive, but it just works. You can send a dictionary with ten references to the same object, and on the receiving end and you’ll get exactly that. It will preserve types so you don’t have to know up front what type will be sent. We made a trade-off to invest more [in development] to get flexibility and efficiency.”

One decision developers often ask the Orleans team about is the trade-off between availability and consistency. Bykov disagrees with the suggestion that the actor model demands consistency to the point that you can only ever have a single instance of an actor. “That’s not a feature of the actor model, it’s an implementation detail. We deliberately chose to optimize for availability. If you have partitioning failures, if your cluster is in some funky transition state, it will resolve itself overtime but have uncertainty and that allows for the potential creation of duplicate actors with the same identity. You could end up with more than one. It’s very unlikely even in the event of a failure, but it’s possible.”

That’s a trade-off that is common in cloud systems, he says.

“For cloud systems, people choose availability all the time. You don’t want to be consistent but not available, because that means you’ll be consistent five minutes later. We chose this instead of the actor being unavailable until the uncertainty is resolved.”

What Orleans gives you is eventual consistency. “It’s eventually a single instance of an actor. It’s relatively rare that you end up with two instances; you almost never end up with three, and eventually we guarantee a single instance of actor, so there’s an eventual consistency of state.”

If both instances of the actor have been updated, the updates need to be merged. “It might seem like a big problem but in reality it’s much less so, because in reality your system is usually backed by persistent storage. If it’s an Azure table or a blob, they support detecting conflicts.”

Your system may be able to get the state for the actor without untangling the updates. For the Halo presence service that keeps track of everyone in the game, “the truth is on the console, so even if you become inconsistent in the cloud, you don’t need to resolve anything. Eventually the console will send a new version of the truth to the cloud.”

Even if you need to append to an existing value instead of just updating it, Bykov maintains eventual consistency is enough because of the nature of distributed systems. “In a distributed system you can never guarantee delivery exactly once without two phase commit; in reality it’s at least once. You may lose a message coming in or you may lose the acknowledgement, you know Byzantine failures are possible, so your operations need to be commutative. That means you’ll already be building the storage layer to be eventually consistent and handle duplicates. These are inherent problems in distributed systems.”

Choosing availability is even more important at cloud scale. “It’s easy to say I will take a lock on this thing in storage and nobody can update it concurrently, but this leads to things being unavailable. Suppose you lock it for a minute and during that time the machine dies. Now no one can update it until the failure’s detected and the lock is removed or the lock just expires. It’s expensive and slow, and you need a lot of resources to keep track of locks. If you have a million objects, now you’re keeping track of a million locks.”

For efficiency, the Orleans team made a similar trade-off globally. “We decided that everything would be asynchronous; you cannot define a blocking operation. If you use promises and you block on a promise, instead of using async/await, you’re blocking threads. If you have ten concurrent requests, you’re blocking ten threads; if you have a thousand threads you’re blocking, you’re in trouble. And if you’re trying to process 10,000 requests a second, you just cannot get the throughput. Threads are expensive.”

To deal with that, Orleans has a tightly-controlled thread pool on the server side — the silo the grains live in — with one thread per physical call. “If there’s nothing to process on the thread, if the code makes an IO call that’s going to be 100ms, there’s no point waiting 100ms,” explains Bykov. “So we essentially relinquish the thread to the system. While you’re waiting for that thread to have something to process, we can process hundreds thousands of messages that require CPU. That’s a design decision that means you can have a huge number of requests in flight without exhausting resources.”

That was the Orleans model even before the async/await pattern was available. Previously it was implemented with a promises library, but Bykov is enthusiastic in his praise for the new pattern, even though he admits it has some extra overhead (because the compiler has to create a state machine to handle it).

“This async/await pattern is the best thing since sliced bread for parallel programing. It’s made it so much easier to write what looks like sequential code without any call-backs, and synchronization works very efficiently. It’s a paradigm shift.”

Using async/await inside Orleans meant the team could remove nearly half of their application code. “Instead of continuations and curly braces and ‘in case of success,’ we just do try catch, await call and catch exceptions.” And again, that helps with making Orleans easier for the broader developer audience to understand, and easier to develop with.

“We’re all human. The longer the code is, the more likely it is that we’ll miss something and make a mistake.”

Orleans: An Unusual Way of Getting Scale Without Complexity

Cloud scale applications aren’t suited to the common MVC, MVVM and other n-tier design patterns, especially when you’re developing microservices for scale-out deployments. That’s where the actor model, at the heart of Orleans, comes in to play. Messages are passed between blocks of code — the actors that process the content of a message. It’s a model that allows you to quickly create new instances of an actor as there’s no concurrency — all you need is a new address to send messages. As each actor is a separate functional element, they’re easy to use as the basis of a parallel computing framework at massive scale. All an actor needs are the addresses of the actors that are the intended recipients for its messages, and there’s no need for an individual actor to do more than process message contents as they arrive and send the results on to the next actor. That next actor might be anything from an API endpoint to a marshaling engine to a piece of business logic.

There’s a lot of similarity between Orleans’ actor constructs and Erlang. Erlang’s started as a language for developing telephony switch applications, and can best be thought of as a functional programming environment for actor models with message passing. Erlang is a powerful tool, and is a common tool for building large-scale actor services, especially in the financial services industry, which takes advantage of its functional basis to handle complex tasks. It’s also found a role at the heart of distributed NoSQL databases and in configuration and source control management systems. But it’s a complex language that not all developers will be interested in learning and it doesn’t have the advantages of Orleans’ virtual actors.

Similarly, there’s a lot of support for Scala, which is again popular in financial services and in online gaming. As RedMonk analyst James Governor points out, “It likely won’t be for everyone, and while Scala brings scale and really powerful pattern matching, it has a steep learning curve.” That gives Orleans an opportunity, as Governor notes, “If Microsoft can provide actor-based concurrency with a simpler programming syntax, Orleans could be a useful tool, certainly for Microsoft shops.”

Perhaps the best known actor framework is the open source Akka from Typesafe. It’s a Scala-based framework running on standard JVMs. Like Orleans, it’s built around asynchronous messages with additional tooling to handle clustering and to work with message queuing systems. Akka has developed a large ecosystem, with more than 250 GitHub projects, as well as a port to .NET. As Akka is part of the Typesafe platform, is provides event-driven middleware functions for the Play Java framework. But not only do you have the steep learning curve of Scala to cope with, Akka is intended as a far more low level solution than Orleans.

Other actor frameworks include Pulsar, which adds actors to Clojure. While it gives you asynchronous IO, it works through a synchronous API, reducing the complexity of your code. Underneath Pulsar is a Java queuing framework written using light-weight threads — Quasar — which Pulsar gives an Erlang-like API. It’s still very much in development, and currently isn’t designed to handle distributed actors, making it harder to write scale-out microservices because many operations end up being blocking operations. The intent is to deliver a framework where Java handles the heavy lifting, while the Clojure-wrapper manages concurrency as part of Parallel Universe’s Galaxy in-memory data store, which handles distributed data structures. Using actors, data can be consistently shared across processing nodes using point-to-point messaging.

Named after a pioneering Sanskrit grammarian, Panini describes itself as a “capsule-oriented” language delivered on top of a JVM as PaniniJ. The aim is to deliver an actor-like programming environment for concurrent programming using asynchronous messages while sequential code capsules handle messages, avoiding common concurrency errors. Panini is perhaps best thought of as a way of delivering parallel programming on Java, with development using familiar techniques that are abstracted away from the underlying actor-message model — in much the same way garbage collectors manage memory — with code running inside modules, called capsules. Capsules are created by taking a program and breaking it up into simple actors, which are wrapped with definitions of the capsules that need to work — defining messages and APIs at the same time. Panini is a research language at the moment, but it shows promise as a set of techniques that can be ported to other languages and run times, not just the JVM.

While some frameworks aren’t explicitly implementations of the Actor model, they’re still using it. Take Seneca, an up and coming Node.js application framework. Designed for microservices, it’s focused on building API implementations of user stories, taking them and using them to define the messages used to signal between endpoints and services. When combined with the open MQTT messaging framework, it becomes a scalable actor-message framework, with Node.js-hosted services operating as actors and with JSON messages marshaled by MQTT. Seneca might not explicitly be a way of writing actors, but it (and Node.js’s underlying switching model) make it an interesting way of rolling your own asynchronous front-end abstraction of everything from the Internet of Things’ message handling and processing to highly scalable e-commerce systems — while still using familiar JavaScript constructs.

None of these really work at the same high level of abstraction as Orleans. That, and the successful Xbox services built using Orleans-inspired Electronic Arts’ BioWare division to develop its own virtual actor platform for its cloud gaming properties, Orbit. Electronic Arts credits Orleans as its inspiration for creating a Java version of the virtual actor model, and the intention is to solve “many of the problems that make working with actor frameworks difficult” by being “lightweight and simple to use, providing location transparency and state as a first-class citizen.” It’s a clear validation of the Orleans approach, using virtual actors and favoring simplicity over the supposedly ‘purer’ architectural ideas.

Orbit has now been open-sourced as a JVM actor framework, so it’ll work with any language that can run on a JVM, including Java and Scala. Like Orleans, Orbit will manage your actors for you, simplifying application development. A container can be used to wrap your applications, and handle wiring objects together, as well as starting and stopping your applications. The framework also includes web service interfaces, so you can hook an Orbit application to other services and tools.

The future for the actor model is a promising one, with many different implementations in use and in development. It’s also at the heart of popular construction games, like Minecraft and Project Spark, where programmable objects can easily be thought of as interacting actors, where those interactions are handled by asynchronous messages. That means the next generation of developers will be familiar with event-driven actor frameworks without knowing it, just from playing games. That’s going to make an actor framework that’s easy to work with — like Orleans — particularly appealing.

Open Source Unknowns

Thanks to the open source model and simplicity of Orleans, there are plenty of Orleans developers the team knows nothing about, or only hears of from the Azure support teams or through comments on GitHub or Codeplex. The Halo systems are well known, but there are plenty of other projects using Orleans. “It’s so easy to use that some people go into production without asking us a single question,” Bykov points out.

He didn’t know about the German company that had had an Orleans system in production for six months until they asked a question on Codeplex, or about another European business developing large IoT solutions; “they manage a major energy storage facility for renewable energy.” Their Orleans system is used in pest control: “They manage up to two million mousetraps.” He’s come across a wide range of projects. “They’re controlling devices or processing data coming from devices, and organizing them in hierarchical manner for building control, or vehicle telemetry.”

That’s possible because of the flexibility of the virtual actor model. “It’s a range from large-scale device deployments to handling a small number of high throughput devices. It works for high throughput and for the infrequent message that arrives once a day or once an hour. The resources are managed automatically: I just set the time window and say I want this to be garbage-collected after two hours or five minutes of inactivity. I don’t have to worry about how many of them come and go, how many are activated or deactivated. I can program as if they’re all always there. I don’t have to write code to resource manage them, and this really expands the range of applications.” And with that broad of a community, being an open source project makes sense for Orleans.

Riak is a a decentralized datastore from Basho, a sponsor of The New Stack.

Feature image via Flickr Creative Commons.

The post Project Orleans: Different Than Erlang, Designed for a Broad Group of Developers appeared first on The New Stack.

Open Source at Microsoft is Clearly Mainstream Now but Also Very New

$
0
0

The open sourcing of .NET is only the most high profile example of what is happening at Microsoft. The move last week to bring the Microsoft Open Technologies subsidiary back in house shows again how much attention Microsoft is putting on open source projects.

Between GitHub and CodePlex, Microsoft engineers are involved in around 2,000 open source projects. From the adapter that lets you use the Chrome Developer Tools in IE to the Windows driver framework to the DirectX toolkit to the new open source Selawik font, GitHub has a lot of Microsoft source code. And Microsoft employees are contributing to open source projects like Mono and Docker, formally as part of the company strategy, or casually because they’re interested in them.

It’s one thing when you find a Microsoft job advertisement for technologists to help work on R and participate in its open source ecosystem. It’s entirely another thing when that advertisement also is on GitHub. It’s just more proof that the open source mentality has permeated deep inside Redmond.

Microsoft isn’t shuttering, shutting down or in any way backing away from its open source involvement by bringing the Open Tech team back in-house. There are no layoffs and no projects getting canceled. It’s just that there are probably more open source projects at Microsoft than at Microsoft Open Technology at this point, so it makes sense to have the expertise back in the main company.

There are plenty of other examples to cite. Azure CTO Mark Russinovich says that nothing is off the table, including making Windows open source. Windows architect Jeffrey Snover spoke at the Chef conference. From CEO Satya Nadella down, open source is clearly mainstream at Microsoft now.

That doesn’t mean every team at Microsoft knows how to do open source projects well. But having the Open Tech group in-house will help the company adapt to a new culture. There are the basics to learn, such as learning how to reply to a comment on GitHub without closing the thread by accident. There’s also understanding how to involve external contributors in architectural decisions and navigating the new legal ramifications of an open source approach.

When Microsoft puts its weight behind something, it might start slowly, but once it gathers momentum you get the Redmond juggernaut in full force. That happened with web standards and it’s happening now with open source. It just doesn’t mean that Microsoft will ever stop being a commercial company.

As Gianugo Rabellino, the director of open source communities at Microsoft, put it at Build last year, “People focus on business goals rather than on technology or technology ideology. Technology solutions are becoming more pragmatic and you can see commercial and open source working more readily together. What matters is maximizing investment. Businesses need technology to run faster and cheaper, to run across platforms and devices. The developer attitude is the technology should get out of the way so I can build my solution.”

Yes, Microsoft can still make money while being genuinely involved in open source. In Rabellino’s words, “the end goal of this is building a sustainable and thriving business.”

Feature image via Flickr Creative Commons.

The post Open Source at Microsoft is Clearly Mainstream Now but Also Very New appeared first on The New Stack.

JavaScript 6 Offers Big Changes, and Kicks Off an Expedited Timetable

$
0
0

The language we build the Web with can officially develop as fast as the Web does.

Twenty years after the first version of JavaScript and six years after the last update to the official ECMAScript standard it’s developed into, we’re finally getting a new version of JavaScript/ECMAScript. ES2015 is the largest ever update to JavaScript.

ECMAScript 6 — now called ES2015 because that’s when the committee voted to adopt the draft spec — is important not just for the host of new features, but because it’s start of annual updates to the JavaScript spec.

It really comes out feeling like a new language,” said Jonathan Turner, who’s recently moved from running Microsoft’s TypeScript transpiler project to Mozilla. It’s also taken the longest to develop. “While ES5 was released 6 years prior, ES2015 is the culmination of 15 years of effort that started when the committee began working on the unreleased ES4 specification around the year 2000,” explains Brian Terlson, editor of the ECMAScript standard, and a senior program manager on the Microsoft Edge team.

That adds up to a wide range of improvements with more concurrency patterns and more tools for modular design and code organization.

The two biggest new features are probably modules and classes.

Using modules to organize your code is a basic programming feature that you previously needed to use extra tools to take advantage of in JavaScript. “Before you had to decide which module format you wanted to use; now you have a unified syntax,” said Turner. “The new module syntax gives developers granularity and flexibility; it gives you a very fine-grained way of importing because each project is a folder and each file is a module and you can just import the ones you want.” Turner believes that’s powerful enough to make it worth switching.

“The syntax takes maybe a bit of getting used to if you come from other module systems. But once you get used to it you can work with a whole module at a time or you can work with single imports at a time and you can build one module with submodules. You can have one module that flattens that API and multiple modules that are backing it into one flat API. It allows you a lot of flexibility in how you modularize an application, how you modularize a library or how you consume a library,” Turner said.

ECMASript 6 compatibility

“Modules are the feature I’m personally most proud of,” said David Herman, Mozilla’s representative on the TC39 committee that standardizes JavaScript, “and I’d encourage all JavaScript developers to give them a look. The state of the art today is to use them with transpilers like Babel, but as work on the WHATWG Loader spec proceeds, and as Node.js works out its implementation and interoperability strategy, we will start to see native implementations emerge both in browsers and Node. Regardless, developers can start learning and using modules in real, production apps today, thanks to transpilers.”

“Classes are great for building classical OOP hierarchies that previously took much more code and were less reliable,” added Terlson. There was plenty of debate about how classes should work in JavaScript, which resulted in some changes in the specification before it was finalized, but the result should seem familiar to both JavaScript and OOP developers.

“JavaScript’s object system is famously based on prototypes, and while it’s possible to emulate classes through prototypes, it requires a good deal of boilerplate and clutter,” Herman pointed out. “Even for small, one-off abstractions, I’ve found it gratifying to upgrade the age-old pattern of defining a constructor with prototype methods into a single, simple class declaration. At the same time, ES2015 classes are defined in terms of constructors and prototypes, so they are completely compatible with the existing object system and interoperable with existing code.”

Terlson highlights a number of other new features. “Built-in promises have paved the way for async APIs in the DOM and elsewhere to begin returning promises which is a huge boon to asynchronous programs. Generators are also key capability for certain async programming techniques and an all-around useful feature. WeakMaps give great capabilities to library authors especially.” Then there’s a host of small but significant improvements; Typed Arrays, new methods for String, Array, Number, Object and Math, new options for variables (‘let’ scopes a variable to the block you define it in) and — finally! — support for constants.

There’s also a lot of new ‘syntactic sugar’ in ES2015, from defining a function using the shortcut of typing => to the spread operator for auto-expanding arrays. Don’t be too quick to think that the style you know is best. “The various sugary features like destructuring, template literals, arrow functions, binary and octal literals, rest and spread, default parameters, etc. work together to make ES2015 code more terse, readable, and maintainable,” claimed Terlson, a sentiment to which Herman agreed.

“I’d encourage programmers to experiment with the many little conveniences and ergonomic improvements that make everyday programming tasks more pleasant,” Terlson said. “For example, parameter default expressions and rest parameters make the job of defining function signatures less tedious and easier to understand. Object syntax shorthands such as inline methods and default properties are another nice one. Each one of these conveniences may sound small in isolation, but all those ergonomic improvements can really add up, and, in my opinion, make JavaScript overall both a far nicer language to write code in and much clearer for reading.”

Transpilers; tools and testing grounds

If the major changes in ES2015, and the length of time since we last had a new JavaScript standard, make you expect a disruptive change, remember that it’s really a recognition of the continuing evolution of the language. As Turner pointed out, “you don’t have to buy into everything all at once and you can pick and choose what’s useful to you.” Even classes are really functions with syntactic sugar to make inheritance easier to work with, using familiar prototype inheritance.

Plus, even the new features aren’t entirely new to developers. The browser vendors haven’t waited for the standard to be finished to start work. Although there isn’t a single JavaScript engine that already implements everything, “the vast majority of ES2015 is implemented in the big runtimes (Chakra, SpiderMonkey, V8, and JSCore),” Terlson said (there’s a feature compatibility matrix on GitHub at where you can check support). “Developers that only need to target fairly recent browsers can use ES2015 today,” he suggests.

Those that can’t, can still use many ES2015 features in transpilers like TypeScript and Babel that will compile many ES2015 features to ES5 code that works in older browsers; “the vast majority of ES2015 features are available today in Babel, with the exception of a few hard-to-transpile features like proxies,” explains Herman. Unlike transpilers that define their own language, like CoffeeScript, or let you use languages like C and C++, like Emscripten (which turns LLVM bytecode into asm.js code), these produce JavaScript. TypeScript generates modular JavaScript, with explicit interfaces, with one file per class, where you can export namespaces and get code that looks like it was written by a developer rather than a machine, with a file for ES5 and another for ES2015 — and another for ES2016, when that comes along.

In fact, those transpilers have had ES2015 features for quite some time, and many of the new features have actually been prototyped in transpilers, so they’re already tried and tested.

“Transpilers have dramatically improved the JavaScript standardization process,” Herman claimed. “Historically, it was difficult to get real feedback from developers about candidate features, because of a nasty chicken-and-egg problem: browser vendors were reluctant to ship features that hadn’t been tested, but developers couldn’t afford to spend time building realistic apps using features that weren’t shipping in browsers.”

“With transpilers, developers can try out new features in their real production apps even without native browser support, which means the standards get valuable experience reports, bug reports, and usability feedback before it’s too late to change. And developers don’t have to worry about their code silently breaking because they get to decide when they’re ready to upgrade to a new version of their transpiler.”

“Tooling like transpilers is only going to get more important as the years go on,” added Turner. “If JavaScript is going to keep evolving really quickly, so there’s a new version every year, it’s going to be a challenge for browsers to keep up and it’s going to be a challenge to get users onto that new platforms that can support that new JavaScript standard at that rapid cadence.” Transpilers will fill that gap, giving developers early access to new features but allowing them to address browsers that don’t have the latest version of JavaScript from a single code base.

Both Terlson and Herman are confident that JavaScript developers who aren’t ready to move to the new standard won’t run into breaking changes and incompatibilities.

“The ECMAScript standards committee takes backward compatibility as an extremely serious responsibility,” Herman said. “ES2015 was carefully designed to cause no breaking changes to real-world code. The same goes for the browser vendors, agrees Terlson. “TC39 and engine implementers go to great lengths to ensure new language features do not introduce incompatibilities. Where the language has introduced incompatibilities in the past have been in edge case scenarios where we have data that no one or very few people are affected. In such cases, committee members or implementers will reach out to the affected libraries or sites to address the issue. Developers can sleep soundly knowing that TC39 is committed to not breaking their code.”

But with JavaScript now moving to an annual update cycle, and ECMAScript 2016 (with a much smaller set of new features) heading for standardization in June, most developers are going to want to start trying out the new features, and getting ready for a language that stays up to date.

Feature image via Pixabay.

The post JavaScript 6 Offers Big Changes, and Kicks Off an Expedited Timetable appeared first on The New Stack.

JavaScript Standard Moves to Yearly Release Schedule; Here is What’s New for ES16

$
0
0

ECMAScript 6, officially known as ES2015, was the largest and longest-running update to JavaScript ever. The official name included the year rather than the version number because subsequent updates will be annual, with the first due this June.

And the first of those annual updates, ES2016 or ES16 for short, might be one of the smallest official updates to JavaScript; it has just two new features. Though very small, ES16 perfectly illustrates the shift to the new way of evolving the language based on interoperability and what’s ready to use.

“It’s very tiny,” said David Herman, Mozilla’s representative on the TC39 committee that standardizes JavaScript. “It adds the exponentiation operator as a syntactic affordance for Math.pow [for raising a number to the power of an exponent], and it adds Array.prototype.includes (something we wanted to include in ES15 but couldn’t due to web compatibility concerns).”

The Array.prototype.includes method – which checks if a value is included in an array or not — was originally going to be named Array.prototype.contains; it changed because that name was already used by the MooTools library, so reusing it might break some websites. There’s been some discussion as to whether that was the best decision, but it certainly reflects the emphasis the ECMAScript committee puts on not breaking code that’s already on the web.

“TC39 and the JavaScript engine implementers go to great lengths to ensure that new language features do not introduce incompatibilities,” explained Brian Terlson, editor of the ECMAScript standard, and a senior program manager on the Microsoft Edge team.

Async functions were expected in ES16, but Terlson (who is what T39 calls the ‘feature champion’ for async) explained that the delay is due to the new requirement that new ECMAScript features have already been implemented.

“They’ve been implemented in transpilers for some time; Babel and TypeScript both support them, but today Edge is the only browser implementation. Until a couple of months ago that had a significant piece lacking, and the feeling on the committee was that without that we don’t really know that this is going to work on the web.” That should be addressed when Firefox adds async support and Herman predicts that “Async functions will hopefully get into the ES17 spec.”

Small and Ready to Use

The fact that ES2016 is “super small” isn’t a problem with the new process, Terlson said. “It’s very small, but that’s actually the process working. After ES2015, we had to get the pipeline filled with new proposals and those proposals had to go through the process and this year we could only get two of those things done in time.”

In fact, he suggests, the process is perhaps the most important thing about the new spec. “While the spec wasn’t that big in terms of new features, there were huge amounts of work done in other areas. We had on the order of a hundred web fixes, perhaps more even, and probably thousands of individual editorial fixes and tweaks. And that was all made possible by our transition to GitHub.”

Previously, although you could see it in PDF and HTML formats, the ECMAScript standard had been “lovingly maintained” in the same Word document, all the way from the original version one to ES2015. “That development method wasn’t scaling,” said Terlson (which sounds like something of an understatement).

“It was hard for the community to participate and hard for implementers to keep track of what was going on. We got out of Word and moved everything to GitHub, using a custom HTML dialect, Ecmarkup,” Terlson said.

That means that both proposals and the specification itself are in the same format; People can send pull requests for both bug fixes and editorial requests. “If you track where most of the work happened, it was building up the tooling and getting on board with the new tooling process, and that’s going to continue paying huge dividends,” Terlson believes.

The ES2016 requirement that features have to have had multiple interoperable implementations before they make it into the standard should save time further down the line.

“ES6 didn’t have the implementation requirement,” explained Terlson, “and as a result after it was ratified, as we got around to implementing certain features like proxies, there were issues implementers came across that were just not reflected in the spec, so we had to make changes after the fact. That really reflects the important of making sure a feature can be implemented as specified before ratifying the standard.”

There’s some debate about the level of implementation experience that should be required for a feature, and the process doesn’t specify that; “it’s done on a case by case basis,” said Terlson. “For certain features we can be pretty certain that they’re going to work if we get a transpiler implementation plus a browser and having them implemented behind an experimental flag is probably OK.”

Opinions vary but demanding more implementation would involve tradeoffs, he suggests. “We could require an implementation in two browsers shipping it, but that may mean when you discover an issue it’s too late, because it’s already shipped, and the standard would move more slowly because it would be gated by the implementers,” Terlson said.

What the new process does add are tests. “When ES2015 was ratified, there were no tests, so it took huge efforts to backfill them so implementations could test when they started building out features, that they were conforming to the specification.” For ES2016, there’s an official test suite and tests are required before a feature can become part of the standard. “We will never again be in a situation where implementers can read the spec but not know what the implications are,” claimed Terlson.

Take the Next Train

Terlson suggests looking at ECMAScript in a different way going forward because of the new process. “Developers shouldn’t be looking at the version of the standard as much; it’s really on a feature by feature basis,” he said.  “On the GitHub repo, we have a table of proposals, and those proposals have varying level of maturity. If something is in stage three, that’s a good indication that it’s going to be a thing, and it’s something to be aware of. If a transpiler supports it, maybe turn that on and try it and give feedback, on a feature by feature basis.”

Herman suggests thinking of the yearly schedules like train timetables; whatever features are ready to use (which means they’ve reached stage four when the scope of the release is finalized, around January each year) go on the next train.

“The principal benefit of the “train schedule” to developers is that features can be finalized more independently and shipped sooner. The new process also provides clearer signals about the relative stability of individual features, allowing developers to test-drive experimental features in their apps via transpilers. This not only means developers get to take advantage of exciting language features sooner, but it also provides a feedback channel for improving experimental designs and amplifies developers’ voices in the standardization process.”

There’re other advantages, pointed out Jonathan Turner, now at Mozilla and former program manager of Microsoft’s TypeScript transpiler. “If each update were the size of the ECMAScript 6 update, that would be a gargantuan task to keep up with.”

That fits the way the JavaScript development community is willing to experiment with the plethora of new tools and frameworks and pick out the ones that deliver, he believes.

“There’s a new technology idea almost every day; there’s a new framework that comes out almost every month. You can’t pick up every single one and run with it, but the community as a whole gets used to trying things out. Developers have side projects where they try technologies out, and they say ‘Oh that really did make things better’ or ‘Oh no, that’s got some rough edges,’ Turner said.  “It’s really natural for things to grow organically across the community as people listen to each other and try little demos out with technology before they start rocking with it.”

Feature image: Unsplash.

The post JavaScript Standard Moves to Yearly Release Schedule; Here is What’s New for ES16 appeared first on The New Stack.

Microsoft Prepares for Serverless Computing with Azure Functions Preview

$
0
0

Moving to the cloud means more than not having to worry about hardware and system maintenance anymore; it’s a shift in the way you design applications and services. PaaS takes you further towards that than IaaS but serverless (sometimes called stateless) computing like AWS Lambda, IBM OpenWhisk, Google’s upcoming Cloud Functions and Microsoft’s new Azure Functions offers a different approach.

Unlike many PaaS options, you can write arbitrary code in the language of your choice and have it execute based on triggers and events; unlike IaaS, you don’t have to care about the VM or any other part of the infrastructure that code executes on.

That makes serverless a natural match for processing data from Internet of Things sensors, but it’s more widely applicable than that. IDC analyst Al Hilwa believes the model is “pretty broad — file processing, ETL, IoT data workloads or anything that requires back-end crunching. The flexibility, composability, productivity and cost efficiency of the cloud function model makes it compelling for developers.”

Nir Mashkowski from the Microsoft Azure Functions teams agrees the service is relevant for more than IoT, useful as it is for that. “It’s the whole idea of event-based computing, things like bots for Skype and so on, or doing DevOps using Functions to do management chores. We’re seeing some folks using Functions to implement search indexing for a WordPress site; that’s a canonical example of why Functions is so awesome in accelerating cloud development because you can use them for things that would have been a much bigger chore otherwise. You can do batches; you can do one-offs in protected environments.”

He sees this as a new tool to accelerate cloud development. “Serverless computing is another pillar; you get dynamic hosting plans and the ability to truly pay for what you use.”

Is Azure Functions Truly Serverless?

You can trace the service that’s now Functions back to Azure’s Web Sites service. “Even when we launched four years ago, we wanted to have the vision of PaaS implemented,” Mashkowski claimed; “the notion of shared hosting with horizontal scaling where you can purchase an instance that’s a promise of compute and memory power.”

With the addition of previously separate mobile app building, notification and workflow tools, the web Sites service evolved into the Azure App Service for building mobile, cloud and workflow apps and added the WebJobs SDK for running background tasks based on triggers and events. Even though the SDK is for .NET, you can do that in a variety of languages including PHP, Python, Node and Java because Microsoft uses JSON metadata to create bindings for multiple languages, and for WebJobs Extensions.

The functionality of WebJobs evolved well beyond the initial image processing, log handling and other tasks you need for a website, especially when Microsoft added WebHook support, and abstracted away the WebApp host WebJobs run on, into a serverless dynamic compute layer that does scaling, monitoring, diagnostics and telemetry. The combination of the flexibility of the WebJobs service and the experience the Azure team gained managing servers and allocating capacity proved ideal for building Functions.

“We manage a lot of servers,” Mashkowski pointed out. “What we do to manage all these promises of always having a warm instance for you means we became really good at understanding how much capacity is left to run to compute. We took the WebJobs SDK because they had an awesome experience with the bindings, and we took another project, called dynamic hosting plans, where we take all the free capacity that is that is used as buffers to guarantee reliability, and we are able to slot the atomic units of Functions in on those.”

Being based on the WebJobs infrastructure is also how Azure can make the economics of serverless work, rather like Google Drive slotting in your files on drives that would otherwise be left empty for performance reasons; Functions takes advantage of Microsoft’s expertise at managing images at scale to charge the fractional cost. “On the one hand, we had the innovation we needed to provide serverless; on the other, we figured out a model where essentially putting images on machines is something we do very fast and we don’t need to pass on the cost of getting images and lining them up to customers because we’re so good at it. And then we’re really good at measuring how much time your code really ran.”

So there is infrastructure under Functions, but you shouldn’t have to care about it. How true is that in practice?

“Given that Azure Functions are a specialization of a larger and more complex framework, with more features and options, it does mean that the model for using them is more complex than using AWS Lambda,” Hilwa noted. “On the other hand, the Azure team has done a great job with the user experience.”

So while there is a file system on the host you’re using, where you have a large amount of storage (currently 5TB), you only need to take account of that if you want to use it. There’s a very simple file layout for deployment, so you can use GitHub and CI tools, package managers like npm and NuGet, cloud storage, Visual Studio Team Services, the Functions portal or Azure Resource Manager templates, and there are templates for creating common actions like HTTPTrigger and QueueTrigger, plus a number of extensions, including DocumentDB.

“We have input bindings and output bindings,” explained Mashkowski; “Input bindings can be triggers like queue storage or file storage. We provide many formats, and we give you a strongly typed object you can play with. They’re all pluggable, and you can write your own binding extension for the WebJobs SDK.” Functions already has extensions for many other Microsoft services. “Functions is probably the easiest way to get a CRUD going in DocumentDB,” he suggested. “I can create an Azure Function to type into a DocumentDB connection, or put a file into Azure Storage or process an image with Bing Cognitive Services.” He promises easy ways to create bindings and triggers, both for developers and partners. “Instead of 40 or 50 lines of code to launch integration with Slack, all I do is visually say I want the Slack binding.”

In many of these areas, Functions is unusually advanced because it takes advantage of the maturity of the underlying PaaS platform, he suggested. “If I don’t like the web portal experience, I can develop in my IDE; I can download the WebJobs SDK and run it locally. It’s a bit more work than we want it to be but developers can run Functions locally and debug them — or use the features available for Web Sites like cloud debugging; you can debug Azure Functions in C# in real time already.”

I don’t want to know I’m running LAMP on RedHat; I just want to write my code — and for that, we’ll have the development model of Functions” — Nir Mashkowski, Microsoft.

Mashkowski promises more options for local development; real-time JavaScript debugging is possible already, but he notes that “it doesn’t meet our standard of smooth developer experience so you can expect it to keep improving.”

Currently, you need to use a release manager to do versioning, but Functions will get deployment features from App Services in future, like swapping between test and production environments. “You’ll be able to start managing versions and the more comprehensive things that real development projects need,” he promises.

The runtime is open source, and Mashkowski says that not only will there be a local runtime as well, but he expects Azure Functions to be available beyond Azure, and not just on Microsoft’s Azure Stack private cloud option. “I think will you see the development model available on other platforms, with a bias to Azure Stack. You may also see us detaching the serverless runtime model from Functions and making it available for other services.” He was careful to call that a strategy rather than a commitment but the strategy is that the potential savings of serverless computing aren’t tied only to Azure Functions.

Overlapping Azure Services

There’s some overlap with another Microsoft Service, Flow, which Mashkowski calls a sister product to Logic Apps in the Azure App service (and implemented on the same infrastructure); Flow lets you connect APIs and services into workflows. “The way we differentiate between Logic Apps and Flow and Functions is that Flow and Logic Apps are all about workflow; orchestrating steps together, making sure they’re transactional, making sure if the Flow stops it can be restarted. Functions creates a great way to implement some logic very quickly; we see it as complementary, they have a great natural affinity. You won’t see the Functions team focus on managing workflow between Functions; you can use Logic Apps or in some cases Flow for that.”

Rather than viewing serverless as competing with other cloud models like containers and microservices, Mashkowski suggests they’re part of a continuum of approaches. “I think containers are essentially a packaging format. If I use Docker Gateway to have a local development environment and then push it to the cloud, I know it’s going to behave the same way. If it’s not enough to use high-level services like Functions, you can package in a container.”

He predicts that containers will eventually end up running on serverless infrastructure too, which will mean not passing the cost of loading the container into memory onto customers who are looking value PaaS gives them. “I don’t want to know I’m running LAMP on RedHat; I just want to write my code — and for that, we’ll have the development model of Functions. I predict there will be a separation where we see serverless as the runtime and billing model, and then we see all these development paradigms adapting to serverless. The most natural one is PaaS, but we envision having other types of apps running on serverless. We think it’s the next big disruptive thing in the cloud.”

When Would You Pick Functions?

“The key value of Functions and similar capabilities like AWS Lambda, Google Cloud Functions and IBM Bluemix OpenWhisk is to run lightweight background processes. The capabilities vary from one to other in a variety of dimensions like the languages supported, the time-limits, the degree of concurrency, the ease of integration with other developer toolchains, and so on,” Hilwa points out. “I think the space is pretty fluid right now, and we should expect to see these solutions evolve rapidly and considerably over the next 12 months.”

Functions supports Node.js, C#, F#, Java, Python and PHP, as well as PowerShell, Bash and batch files; as RedMonk analyst Fintan Ryan notes, that gives Microsoft “the greatest depth so far” in languages.

You can have persistent environment variables. Functions doesn’t have an execution limit — or a billing cap. That combination is good for ETL and working with large files, and Mashkowski notes that he comes across as many developers complaining about billing caps as are concerned about accidentally running up bills with a bug in their code.

“We really want to end up allowing developers to have all the controls. If you don’t want to run for more than two minutes, we can say ‘here’s the knob to set max runtime.’ If you want to make sure a function runs only one time and is completely isolated, it will cost you a little bit more because you might have taken advantage of multithreading but that’s up to you.”

Functions is free to use in preview, and Microsoft hasn’t announced pricing yet, but the measurement units will match other serverless offerings although but Mashkowski says Microsoft will monitor how developers use the service to see if that’s the right billing model. Initially, the memory size will be the same as AWS Lambda, but he suggests, “it may make sense to have two sizes or dynamic memory for the combination of memory and runtime you consume. We’re starting with the notion that the most expensive resource is memory and time of compute and then we take it from there.”

In preview, you can only have ten concurrent executions per function. “All the limitations we have now come from being in preview and offering it for free, Mashkowski explains. “We want to strike a balance between letting people play with the service but not be able to DoS us. We decided what we feel comfortable with as far as creating an awesome experience for preview and learning more to get to GA as soon as possible.”

That let Microsoft open the preview immediately rather than having a private preview like Google Cloud Functions. “I’m proud that when we launched we could open the floodgates; we didn’t have a waiting list, we were able to let people play with it from day zero,” Mashkowski boasts, adding that “it may surprise everyone how fast we go GA because it’s critical for us to make sure enterprises trust us with their business.”

IBM is a sponsor of The New Stack.

The post Microsoft Prepares for Serverless Computing with Azure Functions Preview appeared first on The New Stack.


What JavaScript Programmers Need to Know about Transpilers

$
0
0

Want to keep up with ECMAScript without leaving behind the browsers that don’t have the newest JavaScript features yet? Or experiment with upcoming features before they make it into the standard so you can tell the ECMAScript what works for you as a developer and what doesn’t? Or just take advantage of tools that make JavaScript more efficient for large projects? There are transpilers that can help you with all that.

Transpilers, which convert code in one language to another, used to be more about alternative programming languages like CoffeeScript, ClojureScript and Elm, or even using languages like C and C++, like Emscripten, which turns LLVM bytecode into asm.js code. That’s not about replacing JavaScript, points out Dave Herman, Mozilla’s director of strategy and research; “multiple programming models for the Web can happily coexist and even provide healthy competition and cross-pollination of ideas.”

Extending JavaScript

Similarly, he views transpilers like TypeScript, PureScript, Flow and JSX that add custom extensions to JavaScript as “great for the Web.”

TypeScript is a superset of JavaScript with optional static types, plus tooling to make that efficient to write, with refactoring as well as detecting errors from typos in method names to operations that won’t work because the type is wrong. You can experiment with type safety in JavaScript that stays as human-readable JavaScript, without getting locked into alternative languages like Dart and CoffeeScript.

When Babylon.js was rewritten in TypeScript, David Catuhe pointed out “developers that use Babylon.js will not be able to see the difference between the previous version developed with JavaScript and the new version developed using TypeScript.” He also noted that porting to TypeScript helped him find many small bugs that had been in the code all along.

“Having a transpiler means that developers are exposed to the newer features and APIs as they are written, which can help the community as a whole”–Henry Zhu.

And for large teams writing a lot of code, those benefits can be a huge productivity boost. That’s what Microsoft was looking for when the TypeScript project started in 2011. The Office Online web apps had more than a million lines of code, “and back then there weren’t a lot of tools you could use to build apps like that,” former program manager for the TypeScript team Jonathan Turner told us. The plan was to get better JavaScript code using static types supported by the powerful developer tools Microsoft developers were used to for other languages.

As well TypeScript support in VS Code and Visual Studio, there are TypeScript plugins for Sublime, Emacs and Vim, plus support in an increasing number of tools. The transpiler has been picked up by projects like Angular, Asana and Dojo, and Mozilla’s Flash replacement, Shumway, as well as the Babylon.js WebGL framework and the vorlon.js remote JavaScript debugging tools.

Inside Microsoft, TypeScript is used by Bing, Visual Studio and Visual Studio Online, Azure and the Xbox team inside Microsoft, but it’s also in use at companies ranging from Adobe, Google, Palantir, Progress (for NativeScript) and SitePen to the eBay Classifieds Group.

As well as extending JavaScript, TypeScript also transpiles your code to match multiple ECMAScript standards, which gives you a way to support multiple browsers with less effort, and to try out proposed ECMAScript standards early on.

This feature is also offered by the open source Babel transpiler, another transpiler for JavaScript.

“Transpilers allow developers to write future facing code, even though the current version of the language isn’t supported in all environments,” explains Henry Zhu from the Babel core team. “For example, if you are supporting Internet Explorer, which doesn’t have any ES2015 features, you would have to transpile to do so because IE doesn’t know about the newer syntax. Babel is that layer that allows you not to have to think about what browser you are using and specify which features you need to be transpiled. Browsers take time to implement the spec, and they do so incrementally. If there isn’t an auto-updating feature, then users may never update their JavaScript version, so the only way to write new versions of JavaScript is to transpile.”

Like TypeScript, Babel is useful beyond just that transpilation, Zhu notes. “Babel is a general purpose tool for your own Javascript transformations. It’s not just about transpiling ES6 to ES5.” There are more than a thousand plugins to extend Babel; “People are writing plugins for specific libraries, tools for things like linting, optimizations for browsers, and minification.”

Standards at Scale

Plus, said Zhu, “Having a transpiler means that developers are exposed to the newer features and APIs as they are written, which can help the community as a whole.”

“The spec creators will be able to get feedback on proposals during the TC-39 stage process from stage-0 to stage-4 if someone writes a Babel plugin for it,” Zhu said. “Because there is such a wide user base, it allows a lot of users to try out experimental features which can help mold the feature into a better one than if it was just approved by the language authors without much ‘real world’ testing. Because many of the proposals are on Github, anyone can provide input on the future of the proposal as it is moving along.”

Herman is enthusiastic about what he calls “The adoption at scale of transpiled standards-track technology, in particular with the success of Babel. For developers, the immediate appeal is getting to take advantage of improvements to JavaScript even before engines (in browsers or Node.js) support them natively. And because the features are standards-based, developers can adopt them without fear of major incompatible changes. It’s hard to overstate the value of that to developers in the rapidly evolving JavaScript ecosystem.”

Brian Terlson, an editor of the ECMAScript standard, and a senior program manager on the Microsoft Edge team, agrees. “Transpilers are hugely important. JavaScript programmers generally want to use the latest features. Catering to the lowest common denominator is miserable, and no one wants to do it. What transpilers let you do is write code in that fancy new syntax that you love, that makes you productive, that makes your app maintainable – and compile it down to something that runs on old crusty browsers you wish didn’t exist in the marketplace but unfortunately do. Transpilers have been transformative in how the JavaScript community writes code.”

That early use and feedback from developers lead to a virtuous cycle, Herman says. “Transpilers have unleashed a surge of early adoption and community experimentation of new features. They give Web developers the ability to try features out in real, production apps, and they give them control over how often and when they want to upgrade to latest versions of the features. And that means more Web developers are participating in earlier vetting of standards-track features, which gives them a stronger voice in the standardization process and ultimately leads to better standards.”

“Thanks to transpilers, features from future editions are continuing to get lots of early adoption and experimentation. Decorators make it possible to abstract common patterns in class definitions and are popular with Web frameworks like Angular, Ember, and React,” Herman said. The Ember.js community was an early adopter of Babel and Herman says that led to lots of usability feedback on the module system that went into ECMAScript 2015.

Feedback also helped the standardization of decorators, said Terlson. “Features that get implemented early in transpilers can be really big, compelling features, like decorators; that can be instrumental in iterating on the design of those features.”

“If there’s a feature you know is really going to improve the code you write and the app you’re working on,” he suggested, “just pick it up in a transpiler or a polyfill and use it, and give us feedback on it.”

New Features, Faster

Transpilers are one way to handle the chicken and egg problem that new features can’t go into ECMAScript until they’ve been implemented. But browser vendors are reluctant to implement features that haven’t yet been standardized not least because that can lead to developers not keeping up with the spec of a feature that changes as it goes through the standards process.

ECMAScript 2015 didn’t require previous implementations; “as a result,” explained Terlson,” after we had ratified certain features like proxies there were issues that the implementers came across that just weren’t reflected in the spec, so we had to make changes after the fact. That highlights the importance of making sure a feature can be implemented as specified before ratifying the standard.” There’s a similar issue with tail-call optimization, and it may not be a coincidence that Zhu notes those are both features that couldn’t be tried out in transpilers.

The maintainers of languages need feedback from programmers, before a new version of a language is fully baked. Transpilers are a big part of that, Terlson believed. “Transpilers are helping us get feedback on syntax.  We’re really fortunate to have tools like Babel and TypeScript that can let us experiment with syntax before we’ve done browser implementations. For certain features, we can be pretty convinced that they’re going to work if we can get a transpiler or polyfill implementation, plus a browser.”

Transpilers can develop new features faster than browsers, Herman pointed out,  “Babel is implemented in JavaScript, whereas browsers are implemented in C++, so functionality is much easier to engineer. Some features may have trickier challenges for integrating with an entire browser. JavaScript engines all have sophisticated, multi-tier just-in-time (JIT) compilation architectures, which can often mean a single feature needs to be implemented multiple times, one for each tier. And browser engine development teams have many more responsibilities than just implementing JavaScript features, so they have to balance their priorities.”

Transpilers can’t give you all the new features, Herman pointed out, “Some features, like ECMAScript 2015’s proxies, or the current SharedArrayBuffer proposal, are essentially impossible to implement with a transpiler. Others, like ECMAScript 2015’s symbols, can be partially implemented but with some known limitations. This latter category requires some care on the part of developers, who have to be sure not to depend on the behavior that the transpiler is unable to implement correctly.”

Transpilers also don’t insulate you from changes in JavaScript as the ECMAScript standard develops, either. “There’s a caveat,” Terlson warned; “We will listen to feedback from developers who are using features in transpilers, and it’s possible the spec will shift because of that. We might make breaking changes to a specification before it ships so we do recommend caution when you’re using features in advance of the standard.”

But even then, they can help you transition, Herman said. “When it does come time to upgrade to a new version of a transpiler, having it break your code because of incompatible changes to experimental language features can still be frustrating and time-consuming. For that reason, transpilers like Babel allow you to set your level of tolerance for instability. If you choose to use features that are more experimental, you get the enjoy the bleeding edge of new functionality, but you also might have to deal with more churn. Alternatively, you can choose more conservative settings to reduce the risk of incompatible changes, while restricting yourself to the smaller set of more stable language features.”

Feature image of Denys Nevozhai via Unsplash.

The post What JavaScript Programmers Need to Know about Transpilers appeared first on The New Stack.

Microsoft’s JavaScript Engine Comes to Linux, OS X and Node.js

$
0
0

For the last year, Microsoft has been working to make its Chakra JavaScript engine open source and cross platform. At the recent NodeSummit conference, Amanda Silver and Arunesh Chandra from the Microsoft Chakra team showed an “experimental implementation” of the open source ChakraCore interpreter and runtime on x64 Ubuntu Linux and OS X 10.9+.

They also demonstrated ChakraCore running a version of Node.js.

It’s still early days, Silver told us. “In terms of Linux, we really only have one build supported thus far. We have not yet optimized for performance on those OSes. We started with the client workload of IoT-type scenarios, but we have not yet optimized for server-side scenarios; that’s a work in progress. We’re certainly not at the point that we’re saying this is something that should be broadly applied in production environments.”

Early as it is, the Chakra team wants to show the progress they’re making to keep the community in the loop. Silver also notes that there’s increasing participation from developers outside Microsoft on ChakraCore; “it is increasingly becoming a community web project.”

One feature, in particular, proved popular at NodeSummit; time-travel debugging. “That allows you to create a step back button in the debugger,” Silver explained, “which is a cool capability and also pretty critical if want to see what’s happening in various production and test environments.”

One Node, Multiple Engines

That’s the kind of innovation she believes having multiple VMs available for Node will foster. “The more we have more industry players working on the Node stack and the developer experience for the Node stack, the more innovation you’re going to see.”

The standardization of Node needed to allow multiple VMs also helps with that, she believes. “Ensuring there’s a broad tooling ecosystem that’s helping move the node community and experience forward also depends on there being multiple engines enabled. If you think about things like debugging, that ends up creating hooks into the core VM that runs the JavaScript portion of Node. So part of this is creating a set of diagnostic capabilities and APIs that work across the different core engines.”

Having multiple VMs in Node could help avoid a development monoculture in much the same way having multiple browsers does.

Tom Dale, the creator of the Ember.js Web application framework, and Salesforce developer evangelist Emily Rose both see it as a positive development. “Making Node engine-agnostic unlocks the kind of competition that benefits developers. I’m very excited to see Microsoft tackling this,” Dale tweeted and Rose pointed out that the work goes beyond Chakra.

As Silver puts it, “the goal of the decision last year to have Node be shepherded by the Node Foundation was to improve the reliability and confidence people have in Node, in that there are multiple industry sponsors shepherding Node’s future. In terms of ensuring we can define a common JavaScript interface that works across everywhere JavaScript runs, that does depend on there being multiple sponsors.”

As Node becomes more widely used on servers as well as clients, interest in multiple VMs generally and in ChakraCore specifically is increasing. “Industry partners who are building their own stack or platform based on Node want to be able to have the option of having alternative engines be supported.”

For Microsoft, the original impetus was for the Internet of Things, she explained, and not only because the Google V8 JavaScript engine Node currently relies on isn’t available for Windows running on ARM chips.

“It’s important there’s a minimal JavaScript engine that’s optimized for Windows. When you’re thinking about the lower end devices in IoT, you really only want to have one JavaScript engine on there,” she said.

Ironically, though, bringing Node to Windows IoT mean bringing Chakra to platforms beyond Windows. “For anybody to look at ChakraCore and take it seriously, it needs to be something that’s not just for Windows,” Silver explained. “It needs to be able to work with the tooling ecosystem that already exists around Node.”

Could that include client-side shells like Electron? Although Microsoft isn’t making any announcements on that, Silver compares that to the way Microsoft treats the Cordova framework. “The way we approached it for Windows was to make it so Cordova apps are essentially native apps on Windows, which implies that they use the Chakra engine to execute. The reason to do this is to make sure that the payload for the app is very small and that the runtime is optimized for the windows OS. So it’s very similar to our rationale for [Node and] IoT devices and you can extrapolate out what means for various app frameworks that run on JavaScript on Windows.”

Creating a version of Chakra that can power Node doesn’t mean Microsoft is trying to get developers to stop using V8 with Node, either on Azure or when they’re using Microsoft developer tools like Visual Studio and VS Code.

“With respect to Azure and our tooling experiences, our main focus for those continues to be the Node ecosystem as it exists today, which is based on V8,” Silver reassured developers. “In VS Code, when you build Node apps you’re building against V8 unless you intentionally set it to ChakraCore. We expect many people will continue to use Node against V8. We provide great tooling experiences for that and our commitment is to continue to have the best tooling experiences for Node developers.”

Feature Image: Amanda Silver at NodeSummit.

The post Microsoft’s JavaScript Engine Comes to Linux, OS X and Node.js appeared first on The New Stack.

How to Use HTTP/2 to Speed Up Your Websites and Apps

$
0
0

Now that Chrome has dropped support for Google’s SPDY protocol, if you want your users to get a faster connection to your code than good old HTTP/1.1, it’s time to switch to HTTP/2.

There are plenty of benefits to HTTP/2 according to Mozilla’s Patrick McManus (the author of Firefox’s HTTP/2 implementation and co-chair of the IETF HTTP working group): “Faster page load for high latency environments, better responsiveness, and a higher security and privacy bar.”

With HTTP/2, users will see as much as a 30 percent improvement in page load times. On the server side, HTTP/2 will lower CPU and bandwidth requirements, McManus said.

Web browsers and servers are ready for HTTP/2 (which was based, in part, on the SPDY protocol). But only around 9 percent of sites currently use HTTP/2, because it’s not as simple as changing the configuration on your web server — or even changing the configuration and then monitoring them more closely than usual.

Microsoft’s IIS (Internet Information Services web server software) won’t support HTTP/2 until version 10 ships with Windows Server 2016. Oracle has only just announced that Java EE 8 will more fully support HTTP/2, for example, and if you use NGINX, make sure you’re using 1.11.0 to avoid issues. (This list of HTTP/2 debugging tools may be useful, and you should check that your server software deals with HTTP/2 security issues disclosed at Black Hat this year).

You’ll also need to make some changes to your code to take full advantage of it, although you don’t need a significant rewrite to get the immediate benefits. You need to get an SSL certificate and move to HTTPS because Firefox and Chrome only support HTTP/2 over TLS using the new Application Layer Protocol Negotiation (ALPN). You need to think about any third-party content you’re calling because HTTP/2 won’t speed that up.

But more than that, you need to think about how you design sites for performance because, as Matthew Prince, CEO of web acceleration technology provider CloudFlare pointed out to us, HTTP/2 turns the principles we’ve used to improve website performance on their head.

Encrypt and Speed Up

HTTP1 1.1 is an inherently synchronous protocol. First, the browser requests the HTML for the page, and once it starts parsing that it requests all the other objects on the page in turn: CSS, JavaScript, then the different media formats and all the other page content. To speed things up, browsers make multiple simultaneous connections per domain. To take advantage of that, websites began splitting resources up onto multiple domains. “With domain sharding, you have www1, www2, www3 and so on, and you say ‘we’ll put all the JavaScript on this one, all the CSS on this one, all the images on this one,’” explained Prince.

HHTP/2 is asynchronous and multiplexes requests on a single TCP connection using binary streams (HTTP/2 is a binary rather than text-based protocol). There’s still an initial request for the HTML, but the requests for all the other resources happen in parallel. That’s faster for the user and more efficient for the server — and it means that domain sharding slows down the site because the browser has to open multiple connections again. (Browsers can optimize which assets they receive first using the stream priority option in HTTP/2.)

Then there’s encryption (which HTTP/2 itself doesn’t require, but every browser that’s implemented HTTP/2 has implemented it with the requirement for SSL). If you want to get a head start on moving to HTTP/2 says Google’s Ilya Grigorik (co-chair of the W3C Web Performance Working Group), “Migrate your site to HTTPS! It’s a prerequisite for HTTP/2.”

“We don’t know what the next great application that server push is going to enable will be, but we do think it is one of those opportunities to have there be a reshaping of how the Internet works”–Matthew Prince, CloudFlare

Web developers have sometimes avoided encryption in the belief that establishing and communicating over an encrypted channel slows down the connection. Modern CPUs and content delivery networks mean that’s rarely true anymore. And as fewer TCP connections per user means fewer server resources per user, the asynchronous requests in HTTP/2 more than makeup for any overhead from encryption.

“In the old HTTP1 world, adding encryption to your page was a performance hit. In the new HTTP/2 world if you want to get the advantage of these new protocols, you have to have encryption. Every piece of good advice becomes bad advice,” Prince pointed out. “If you want to be as fast as possible, you don’t use encryption, you shard domains, you never embed anything on pages, you try and concatenate things into as large files as possible, so you have fewer files. In this world it actually makes sense to slice things up into smaller pieces so you can cache smaller pieces and you can download everything in parallel, you have to have encryption on by default and sharding is the worst possible thing you can do.”

Changing that might be the hardest thing about switching to HTTP/2, Ian Fergusson of security software provider Thawte suggested; “A lot of sites have spent a lot of time trying to increase their speed. Any shortcuts and hacks taken will need to be reversed before implementing HTTP/2.”

Switching to push

The initial benefits of HTTP/2 are very similar to SPDY; CloudFlare supports both although Prince predicted that “SPDY will die off over the next 18 months as people upgrade.” But HTTP/2 has two additional features: header compression and server push.

The headers in every HTTP request are very similar; “there are fields that are repeated thousands of times,” Prince said. HTTP/2 includes a compression dictionary for request and response headers so instead of sending plain text every time; the headers are compressed (and header fields that have been sent once are referred to rather than duplicated). That’s faster for visitors and saves you bandwidth. Prince said CloudFlare saw a 30 percent bandwidth saving in headers when they turned on header compression and Dropbox saw ingress traffic bandwidth halve after turning on HTTP/2 (although they’d used SPDY before, they hadn’t used header compression because of security issues)

From the "Can I Use" Website

From the “Can I Use” Website.

Server push is a more significant change. While the main browsers have supported HTTP/2 for some time, server push is a more recent addition. “Traditionally the browser has said ‘give me something’ then the server responds and sends it back. Server push flips the model so the server can say ‘here are the things you’re going to need’ and proactively push those down the line to the browser. The HTML still takes the same amount of time to render, but all the other resources, the CSS and JavaScript and the JPEGs and PNGs and GIFs are all sent proactively while you’re waiting for the HTML to render,” Prince explained.

That particularly helps dynamic sites and it’s a good match for development patterns like web workers, web hooks and async/await that are starting to become important for JavaScript, as well as for real-time communications like WebRTC, and resource-constrained devices like IoT devices.

Prince expects it to have a broad impact. “AJAX-like JavaScript calls can now get transferred into a much simpler more streamlined function; the server can simply say ‘that query you needed. I’m going to send the request down, so you have that’.”

In some ways it’s like the way push email turned email from a desktop tool into something you could do on your phone in real time, but for a broad range of sites and apps.

“When you think about how much bandwidth and battery power and CPU is wasted on a mobile device by having to poll continuously back and forth to the client; if you flip that around and say ‘I’m here, I’m waiting and the minute you’re ready for me just push down what I need’, that flips the entire model,” Prince explained. “We don’t know what the next great application that server push is going to enable will be, but we do think it is one of those opportunities to have there be a reshaping of how the Internet works.”

But Prince warned that the way server push works currently – you have to put a Link header in the header response from the server – is “kind of clunky and brittle; we’re having conversations with developers to put in place what are more efficient ways to deliver this.”

Expect to see this show up in applications and platforms beyond web servers and browsers; “people are working on WordPress plugins and are playing with it in Rails and in PHP to see how can we use it to build new tools,” said Prince. And McManus noted, “The WHAT-WG [Web Hypertext Application Technology Working Group] is discussing how push can be exposed directly to JavaScript as another promising push use case.”

You also have to think about the tradeoffs between getting better performance and responsiveness on your site by using Server Push versus the bandwidth impact of pushing resources that might not be consumed, especially for mobile users.

“Server push works reliably in browsers, but it can be tricky determining what a website may want to push to get the best benefit,” McManus noted. “The biggest HTTP/2-related gains for mobile users are due to its improved request parallelism on high latency mobile networks. HTTP/2 is also much better at managing the priorities of different objects than its predecessor, and this is very important on slower networks.”

“Developers should tread carefully when experimenting with different strategies with server push as best practices have not yet emerged,” he said. “Effective strategies will depend greatly on site content. Less is more has been the experience so far; aggressively pushing can waste bandwidth. Real User Monitoring is highly recommended to help developers evaluate their site performance. Targeted approaches rather than site-wide policies make sense at first.”

McManus also pointed out that “the optimal impact of push is limited to a fairly short window of time at the beginning of page load so pushing something small and critical, such as a font or a stylesheet, would be a reasonable place to start.”

Grigorik agreed. “There isn’t a right universal answer here. Some applications will want to trade off latency against risk of redundant bytes, others might want to optimize bytes over latency. It depends on the application.”
You also have to think about how you handle third-party content (especially ad networks), which may not yet be available over HTTPS, let alone HTTP/2.

“With third-party content, there are a number of techniques developers can use to get around this issue such as async, preload and preconnect,” Fergusson suggested.

“One important feature of HTTP/2 that helps with the multiple origin problem is connection coalescing,” McManus explained. “Coalescing allows different host names that share the same hosting provider and secure proof of identity to share the same connection, even though they have different host names. The emergence of HTTP/2 Alternative Services as a recent IETF HTTP/2 extension (RFC 7838) also helps with this transition. Every connection that is eliminated has an incremental benefit in performance.”

You might think about switching more things onto your own host rather than pulling them in from external sites. “There are benefits to keeping more assets on the same host. That’s not a requirement, though,” Grigorik pointed out. “And in the worst case — you’re no worse off.”

Feature image: “The HTTP/2 Server Implementers Club,” Twitter photo by Brad Fitzpatrick.

The post How to Use HTTP/2 to Speed Up Your Websites and Apps appeared first on The New Stack.

Baidu Joins the Crowded Race of Open Source Machine Learning Frameworks

$
0
0

Chinese Web services giant Baidu has joined the other big names in search and social media by releasing its machine learning toolkit as an open source project on GitHub.

PaddlePaddle — which stands for PArallel Distributed Deep Learning — lets you write pseudocode in Python, which the head of the project, Baidu Distinguished Scientist Wei Xu, said allows developers to “focus on the high-level structure of their model without worrying about the low-level details.”

Working at that high a level may mean less flexibility; Baidu Chief Scientist Andrew Ng noted at the announcement that “other deep learning platforms have been a great boon to researchers wanting to invent new deep learning algorithms. But their high degree of flexibility limits their ease of use.”

In contrast, PaddlePaddle is aimed at enthusiasts and mainstream developers rather than machine learning researchers developing their models and algorithms, and Xu told us that makes it more accessible than some current frameworks.

“For many tasks, PaddlePaddle requires significantly less code to be developed. For example, a machine translation model built on the PaddlePaddle engine would need about a quarter the amount of specially written code than the same task built on other AI platforms.”

PaddlePaddle lets you use some common deep network architectures; convolutional and recurrent neural networks (CNNs and RNNs), as well as attention networks and neural Turing machines. “It’s hard to enumerate,” Xu told us, “because it’s like playing with blocks; you can construct many different things with it.”

Some of the most interesting work with deep networks involves extremely deep networks with many more layers; PaddlePaddle doesn’t have specific tools for that, but there’s no fixed limit in the framework on the depth of networks; “it depends on the size of the model and the memory limit,” said Xu, pointing out that the framework scales to many GPUs and CPUs on multiple machines, and uses common Basic Linear Algebra Subprograms (BLAS) libraries to speed up its mathematical operations.

Baidu has been using PaddlePaddle to develop advertising, search ranking, large-scale image classification, optical character recognition and machine translation. According to Charles King, principal analyst at Pund-IT, that includes ad click-through rate (CTR) prediction, virus detection, and user recommendations; all standard scenarios for deep learning that show the framework is worth considering.

Xu claimed that it handles sequence models very well (like tagging parts of speech for recognition and translation), as well as “problems with high-dimensional sparse data, which is very common in industrial applications but lacks support in most other deep learning frameworks.”

Frameworks, Platforms and APIs

PaddlePaddle is competing with Google’s TensorFlow, which quickly became popular — especially after Google allowed the open source version to run on more than one machine, quickly overtaking the previous favorite, the Caffe image recognition framework from Berkeley Vision and Learning Center. Microsoft’s Computational Network Toolkit (CNTK) — originally developed for speech recognition but now applicable for many other models and available on GitHub for commercial use instead of its original academic limitation — is currently the third most popular machine learning framework on GitHub (by number of stars), followed by veteran open source tools Torch and Theano, and then Amazon’s Deep Scalable Sparse Tensor Network Engine (DSSTINE) framework, released in May.

All these projects want the developer community to use — and contribute to — their machine learning framework, to help improve it. “As vendors work to push new technologies and processes into broad adoption, it’s critical for them to engage as many interested participants as possible; that’s certainly the case in deep learning, and machine learning whose past technical requirements have discouraged non-specialists,” King noted.

But he added that “the news at this year’s fall technology conferences has been filled with vendors and their partners attempting to lower or dismantle those barriers to encourage AI development.” It’s a crowded market.

If PaddlePaddle really is faster and easier to work with, that would give non-expert developers a way to see if machine learning is useful for a project quickly. But for that audience, PaddlePaddle is also competing with services like AzureML, Microsoft’s drag and drop visual programming cloud service which offers multiple machine learning models including deep learning networks

It is also competing with APIs like those offered by IBM Watson and Microsoft Cognitive Services for image recognition and captioning, emotion and sentiment detection, voice matching and speech recognition, which mean developers don’t have to build a machine learning model themselves at all — just send their input and get the results back.

“For some standard problems, such as speech recognition, it is better to use some other existing services,” Xu admitted. However, he believes, “in many cases developers will encounter situations which do not have prebuilt services because of customized data or problems.”

One advantage of PaddlePaddle may have little to do with the technology, King points out. “Baidu’s framework should catch the attention of any company or developer hoping to do business in China,” he told us. “The sheer size of that market makes it hugely attractive, but history shows that collaborating with a recognized, respected Chinese organization is the only realistic way to proceed.”

He noted that Baidu has recently partnered with both Intel and NVidia, making sure that PaddlePaddle can take advantage of both CPU and GPU improvements; many existing frameworks are tuned to rely on GPU power. And despite the sudden dominance of TensorFlow, it is still early days.

“AI and deep networks and machine learning are really still in their infancies, with no single platform having a solid leadership position. That said, things are moving quickly, o it’s good for Baidu to make its move now. Next year might be too late.”

IBM and Intel are sponsors of The New Stack.

Feature image: Lido de Jesolo beachfront by Max Boettinger, via Unsplash.

The post Baidu Joins the Crowded Race of Open Source Machine Learning Frameworks appeared first on The New Stack.

Splunk Incorporates Machine Learning to Aid Security Monitoring and DevOps Workflows

$
0
0

IT analytics company Splunk is doubling down on Machine Learning.

The next versions of Splunk Enterprise, Splunk IT Service Intelligence (ITSI), Splunk Enterprise Security (ES) and Splunk User Behavior Analytics (UBA) will include custom machine learning-based predictive analytics, in both on-premise and the cloud versions.

Splunk Cloud and Enterprise 6.5 get a new interface to help you build your own machine learning (ML) models, along with ML tools to predict maintenance windows and help you forecast demand and react to changes by building models based on your own traffic and customers.

Splunk ES and UBA are predictive analytics tools and they will now learn what the baseline of normal behavior for your systems looks like so you’re not so swamped by alerts when everything is running smoothly that you miss the warnings for serious problems.

Splunk ITSI is already an ML-driven tool to help you find the root cause of problems and fix them faster; it gets new ML models to spot unusual events that could mean there’s a security or system problem.

“Both ITSI and UBA have machine learning models that are used to surface anomalies”, explains Splunk principal product manager for machine learning Manish Sainani. “ITSI is focused on key performance indicators, while UBA is focused on raw events and their sequences.”

“Machine learning can help detect, predict and prevent what matters most to an organization,” Sainani told The New Stack. “They can use it to help detect IT or security incidents, predict and prevent outages, forecast product inventories, and much more. Unlike human analysis, Splunk’s machine learning is always on – an important addition to their normal monitoring, operations and business analysis.”

“Typically, customers will use machine learning to detect anomalies, events or circumstances that do not fit normal patterns,” he said. “In IT that might be web server response times, or network congestion or many other infrastructure readings. More sophisticated customers might measure complex KPIs and IT services critical to the business. In security, they may look for anomalous user behaviors, systems communications, data transfers, or failed logins.”

colorfulnotableevents

Machine Learning for DevOps

Sainani suggests several DevOps workflows that Splunk’s machine learning is a good fit for:

  • Ranked root cause analysis for quicker resolution of issues, using clustering and prediction of categorical fields.
  • Outlier Detection using statistical methods to detect outlier across your key performance indicators (KPIs).
  • Adaptive Thresholds that adjust based on how your data is behaving so they automatically get updated to reflect changes in your data.
  • Anomaly Detection for both univariate (single) KPI and multivariate (multiple) KPI’s across your services.

Splunk uses three machine learning techniques: Clustering, which takes a lot of data and puts it into groups; classification, which produces a prediction; and regression, which uses historical values to come up with predictions about the future.

User Behavior Analytics uses those machine learning techniques for behavior baselining and modeling, anomaly detections (for which it has more than 30 models) and advanced threat detection. For both tools, you can also create your own custom analytics.

IT Service Intelligence uses machine learning for anomaly detection, adaptive thresholding and KPI management. It needs seven days of historical data for that detection to be statistically sound. “The algorithm by itself does not require more than two days’ worth of historical data,” Sainani told us, but Splunk decided on seven days of data for better accuracy.

“Once the baseline has been fed to the anomaly detection model, it can immediately start detecting and alerting on unusual patterns it hasn’t seen before.” And if your systems are already compromised, he claims “the algorithm is robust enough to avoid being affected by it.”

If you want custom machine learning models for working with the data you have in Splunk Enterprise (which already offers more than 20 machine learning commands), the new ML Toolkit also lets you work with open source Python libraries (scikit-learn, statsmodels, pandas, numpy, scipy) that include over 300 algorithms, Sainani told us.

These algorithms can be applied directly to the data for detection, alerting or analysis for specific use cases, whether for IT or security. The ML Toolkit also provides a guided workbench for data scientists to build their own models, Sainani said.

Anomaly detection

Anomaly detection.

The interface guides you through creating custom machine learning analytics with interactive examples. “With a single click they can deploy models into production to help detect IT or security incidents, predict and prevent outages, forecast product inventories, and much more. The biggest differentiator that the ML Toolkit brings is the ease with which a customer can build a machine learning model and put it into operation leveraging Splunk’s alerting and scheduled search framework.”

ML is becoming an increasingly useful security and analytics tool, and it’s a good fit for Splunk’s existing visualizations, believes Jason Stamper, data platforms and analytics analyst at 451 Research. “With a broad integration of machine learning, Splunk provides a comprehensive answer to one of the biggest challenges facing modern organizations: how to harness diverse, prevalent and increasingly profuse amounts of data to gain valuable business insights.”

Images: Splunk.

The post Splunk Incorporates Machine Learning to Aid Security Monitoring and DevOps Workflows appeared first on The New Stack.

Salesforce Ramps Up Deep Learning Research, Offers Pre-Packaged Machine Learning for Developers

$
0
0

Enterprise software cloud provider Salesforce.com is building out a continuum of artificial intelligence-driven services, from prepackaged solutions to bring intelligence to common workday routines, to a platform that would support companies in building their own deep learning models.

Part of Salesforce’s momentum in this emerging field comes from its acquisition of MetaMind earlier this year. The company appointed MetaMind founder and CEO Richard Socher to head a research lab that will investigate how artificial intelligence (AI) and machine learning (ML) can be used in the enterprise.

The lab will publish its findings, rather than issuing yet another deep learning toolkit, Socher, who is now the Salesforce chief scientist, told us at the Salesforce’s annual Dreamforce user conference last week.

Socher came to Salesforce with a startup he founded to pursue the dynamic memory approach to machine learning. It was a topic he had been researching at Stanford University.

Socher’s approach adds more memory to a neural network to help it store and update details as it parses information, to help it deal with a stream of facts or with information coming from different sources. He’d shown the system detecting the sentiment of complex sentences and answering questions about what was happening in photographs; tasks machine learning systems often have problems with.

His lab will be working on fundamental research in deep learning, natural languages processing and computer vision rather than product features, although they’ll show up as part of the Einstein, a set of machine learning features that Salesforce recently added to its cloud services.

“We’re covering a broad range, from things that we know how to solve like lead scoring but we could do it better with the latest and greatest techniques, to things we don’t know how to do,” Socher said. “We don’t know how to reason over large amounts of text, we can’t do a perfect translation. We’re doing some basic, fundamental research, improving neural networks, really hard tasks that nobody can do well yet like question answering. Computer vision and multitask learning we’re very excited about, rather than the single model approach.”

Don’t expect to come out with its own deep learning framework the way Google, Microsoft, Amazon, Intel and Baidu have, though.

“We will publish academic, peer-reviewed papers; we’ll take them forward and we’ll publish the insights we have,” Socher told us; “We don’t have to reinvent the framework. We think there are enough frameworks already. The question is how you use those frameworks and what do you do with them. In a sense, they’re a commodity like programming languages and operating systems are commodity; we don’t need to go back to that assembly language we can go to much higher level languages.”

The techniques the research lab is working on go beyond the Einstein machine learning features built into Salesforce tools like Sales Cloud and Marketing Cloud which offer predictions, recommendations and alerts, based on machine learning models that are custom to each business using Salesforce.

“We can learn patterns and make predictions about customer data,” explained Salesforce head of data science Shubha Nabar; “How likely is your customer to churn? How should you reach out to them, even what should the text of your communication be?”

That could be scoring leads to send priority opportunities to senior sales staff, suggesting the best time of day to send a message that will get read quickly, spotting that you’re trying to close the deal with the wrong person at a company and suggesting who else should be involved in the conversation or detecting a competitor mentioned in an email using entity recognition.

“We can identify what company is mentioned in the context of other companies, who is the CEO, what are the personal networks like there,” claimed Socher.

Einstein can segment an audience by their shopping habits, such as dormant customers you can win back to selective subscribers who have specific interests, or window shoppers who open a lot of pages but don’t click on many things. Einstein could  be used to change the sort order of products on a shopping site by using previous sales to predict what specific customers will be interested in. It can help route customer support cases to the most qualified agent or escalate unanswered customer questions to cases automatically.

Image recognition capabilities could look at customer photos, and allowing the organization to drill into the features they interested in. Gender, hair color and length data could be used to offer hats to men with short hair and headbands to brunettes with long hair.

Salesforce is also working on predictive analytics and predictive device scoring for its Internet of Things services.

Those are all drag and drop options developers can use when building apps based on Salesforce using its Lightning tools for web and mobile development — the latter is based on Cordova mobile framework, and Salesforce Lightning-based apps will also work inside Microsoft Outlook.

But two of Metamind’s more advanced options have also made their way into Salesforce; you can train your own deep learning models for the predictive vision and sentiment services by uploading a training set, which is as straightforward as picking files to upload in a browser.

If none of the pre-trained classifiers suit the images you need to work with, you can train your own model to detect logos, process medical images or tell the difference between flat and pitched roofs. You can use the natural language processing services to look for positive and negative sentiment in text. You can also ask questions about what’s going on in an image in an interactive way, so you could build that into a chatbot for customer service, for example. The predictive services are APIs that you can call from any app.

You can also go a step further and use the in-development Apache PredictionIO open source machine learning server in the Heroku Private Spaces that Salesforce offers, which offer dedicated instances that give you your own version of Heroku, so you can add limit what IP ranges can access the instance or choose what geography it runs in).  PredictionIO lets you use machine learning libraries like Spark MLlib and OpenNLP or build your own custom machine learning models.

In short, Salesforce now offers a continuum of ML services, from the canned Einstein services that add ML to common sales and support processes, to the emerging and more experimental predictive services for images and natural language, to a full machine learning environment that you can set up, customize and program against in Java, PHP, Python and Ruby.

It’s quite a jump to move between those different levels of tools, but if your business is using Salesforce, you can choose how you want to use machine learning with it, depending on the skills and resources you have available.

The post Salesforce Ramps Up Deep Learning Research, Offers Pre-Packaged Machine Learning for Developers appeared first on The New Stack.

Fluentd Offers Comprehensive Log Collection for Microservices and Cloud Monitoring

$
0
0

For those who need to collect logs from a wide range of different data sources and backends — from access and system logs to app and database logs — the open source Fluentd software is becoming an increasingly popular choice.

This framework, created from Treasure Data, is a log collector with similar functions to Elastic’s Logstash, explained Stephen O’Grady of analysts RedMonk. “Fluentd is adept at collecting large volumes of semi- or unstructured data and directing them according to routing rules to other storage backends such as Elasticsearch or PostgreSQL. It’s well regarded by a variety of cloud providers, including Amazon and Google,” he said.

In fact, Google Cloud Platform‘s BigQuery recommends Fluentd as a default real-time data ingestion tool (not least because it lets you log data from AWS into BigQuery). It’s also natively supported in Docker and is being used by Microsoft as the agent for analytics and log monitoring on Linux for the new Microsoft Operations Management Suite (OMS). Why is it proving so popular?

Input, Output, Routing

Like the Unix syslogd utility, Fluentd is a daemon that listens for and routes messages. You can use it as a collector or an aggregator, depending on your logging infrastructure. And you can filter the logs coming from a variety of sources, and send it to a huge range of outputs via plugins — there are over 300 plugins so far). Treasure Data’s Kiyoto Tamura suggests it’s what you’d get “If syslogd evolved a little more and was a little more modern and easy to hack on. We did [Fluentd] because we couldn’t’ get syslogd to do what we wanted it to do.”

Microsoft chose Fluentd for OMS partly because it was what people in the Linux community were already using, but Gartener’s Anurag Gupta praises the modularity of the input and plugin model, as well as the wide support.

fluentd-logging-layer

MySQL monitoring is one of the most popular uses, but there are also plug-ins for Kafka, Twitter, Kubernetes, Twilio, as well as for Short Message Service (SMS) notifications, and Simple Network Management Protocol (SNMP) data.

“There’s just this wide variety — and it’s pretty trivial to go out and create one of these things,” Gupta said. “As an IT guy I don’t have time to build a complex thing in native code but I don’t mind writing a couple of lines of scripting so I can use an existing plugin. The flexibility of Fluentd lends itself to a lot of scenarios.”

That was one of the goals of Fluentd, confirmed Tamura. “We started with the idea that the inputs and outputs should be configurable.” You do that by selecting source and output plugins, and giving them parameters. “We also strongly believed that routing should be included, for people coming from an ops background but that it should also be able to handle pretty complex logic, and that’s the idea behind tag-based routing.”

Every event that comes from a source has a tag, which the routing engine uses to direct the event, a time stamp, and a record that’s a JSON object. Match commands in the configuration file tell Fluentd which output plugin to route events with specific tags, or you can use filter commands to set up routing pipelines that process the events before they’re sent to the output plugin.

fluentd-tag

More than half of the Fluentd plugins are for output, Tamura said. “Inputs are HTTP, files, TCP, UDP, but output is a big differentiator against many other tools. The most popular output is Tableau, the next is Google spreadsheets, we’re working with a company that’s an SQL Server shop. Fluentd can serve as the connective tissue that connects all these multiple platforms,” Tamura said.

Fluentd was able to achieve 2,000 messages per second over TCP with no problems, with one agent on a one-core system with a 1Gb network card

Matches and filters can be sophisticated, Gupta pointed out. “There’s a whole host of things you can do. You can convert the JSON to XML or to an encrypted stream that only the output can recognize. With OMS we enhance some of the data using Fluentd; we take the raw SQL logs and add the computer name and we can tokenize that into a specific field. We have audited data in multiple fields; we have hundreds of thousands of events taking multiline events, and with filters, we can tell you specifically what event is applicable.”

But the overall model remains simple. “The modularity is huge,” Gupta told us. “It helps developers wrap their head around how to build with Fluentd; I need to build a source or an output to a certain endpoint or I need to filter the data. That trifecta of source, filter and output is great for us as we build out more monitoring and functionality and for any developer using Fluentd it gives them a lot of freedom. They’re getting all these sources and filters and transformations, coming in and branching out to external services; that could be OMS or could be another external log analytics services or it could be API endpoints.”

“All these API endpoints just require some data source; you can use Fluentd as the middleman,” Gupta continued “It’s useful anywhere that you need to stream some data, perform some calculation on it, send it to an endpoint and have all that correlated in a central repo.”

Fluentd is particularly well suited to microservices and containers, where logging is a more complex problem than with a monolithic, n-tier service because it can be centralized. In fact, this was one of the original inspirations, said Tamura.

“We built It essentially because increasingly, the stack is very modularized.” But that’s not the only way you can use it, Gupta confirmed.

“Fluentd is very applicable to a per-node architecture where I have a very specific server running my relational database and I need to stream logs from that just one machine to a central place without inflicting any pain to the workload. Or if I have 100,000 containers and I need a central spot to take all the logs from stdout and stderror I can use the Docker Fluentd driver bring that to a single node set or maybe a cluster and fire that off to a central location. You can have container-based logging across the whole container host.”

Buffer and Queue

Making routing and processing efficient across large systems with high volumes of events was key, Tamura explained. “One of the things we really wanted to do well is be performant but also be reliable without relying on an external queue or buffering. That was the biggest difference early on between us and Logstash; that has a simpler queuing model but it relied on Redis for consistent queuing. We try to do it in our own internal buffer (and you can buffer in memory or in file).”

Fluentd is written in Ruby — “we took a cue from Chef and Puppet.” Tamura said, “and that means it’s hackable by a fairly large number of people.”

That makes it simple to deploy, which helped Microsoft pick it over the also-popular Logstash, said Gupta, but the mix of performance and reliability the queuing provides is also key. “Logstash is using JRuby so you need to spin up a JVM; it’s not as lightweight as Fluentd.”

“One of the big enterprise concerns is that you want to make sure messaging is reliable and one of the big things that Fluentd has that Logstash doesn’t have natively, out of the box, is that buffering mechanism to make sure messages were sent over TCP and validate that the message has transmitted,” Gupta said. “For Logstash you have to set up the Redis Cache monitor and make sure it’s set up correctly.” The licence is also simpler, which matters for enterprise customers. “I can bundle Ruby with Fluentd for a customer who just wants OMS; they don’t have to care about the details, they just know I have monitoring.”

In Microsoft’s performance testing, Gupta told us, “Fluentd was able to achieve 2,000 messages per second over TCP with no problems, with one agent on a one-core system with a 1Gb network card.”

Analysis as Well as Ops

While the obvious comparison is to Logstash, especially as part of the common Elasticsearch-Logstash-Kibana (ELK) stack, and monitoring systems like Prometheus, Tamura suggested that “the big competition is Splunk.”

Logging is more important than ever, not just for overloaded ops teams, but because it’s increasingly a source for analysis.

“Logs are increasingly used beyond the first use cases of incident analysis and ad hoc root cause analytics. Now it’s a source of insight and innovation. Often it’s ops and DevOps people with access to logs but their primary responsibility is not analyzing the data; their primary responsibility is to keep the lights on. The data science people told us they want more data but when they go to ops for it, it’s an unwieldy process, and some developers even want to remove logging code to make the system more efficient. The motivation of Fluentd was to remove that friction.”

Docker is a sponsor of The New Stack.

Feature image via Pixabay.

The post Fluentd Offers Comprehensive Log Collection for Microservices and Cloud Monitoring appeared first on The New Stack.


Microsoft Solidifies CNTK Deep Learning Toolkit for Industrial-Grade AI

$
0
0

Thanks to its deep learning toolkit, Microsoft is making huge strides in computer-based speech recognition.

Just this September, a Microsoft research team achieved an error rate of 6.3 percent on the Switchboard speech recognition benchmark, meaning the software interpreted just 6.3 percent of all words it “heard” incorrectly. The researchers used a recurrent neural network architecture, called long short term memory.

Less than a month later, training on a 30,000-word library, they were able to get that down to the 5.9 percent — about the same percentage of incorrect words that professional transcribers made on the same phone call recordings. It was the very first time that a computer has been able to recognize the words in a conversation as well as people can.

It was “a historic moment,” said riding Microsoft’s chief speech scientist Xuedong Huang, who founded the speech recognition team at Microsoft in 1993.

The deep learning algorithm Microsoft used can be found on the recently released version 2 of Microsoft’s CNTK library, Which used to be called the Computational Network ToolKit, but as of version 2, is now called the Cognitive Toolkit

The beta of CNTK 2 improves performance, lets developers use it with Python as well as C++ to make it more widely relevant and gets a new name to show that Microsoft believes its deep learning framework is ready for a lot more than AI research.

“The acronym stays the same but the name reflects the higher aspiration of what we’re trying to do for cognitive computing and supporting Microsoft Cognitive Services.”

“Many of the AI services Microsoft has are now created using CNTK. Cognitive Toolkit is the secret weapon for Microsoft to create cognitive services like Skype Translator and many other AI breakthroughs like speech recognition that has now reached human parity for conversational speech.”

Cognitive Toolkit started as a framework for speech recognition, using not only the usual GPUs to speed up deep learning, but unusually, letting you take advantage of multiple GPUs on multiple machines to do distributed, massive scale deep learning. That way you don’t lose performance or accuracy when you work with bigger datasets.

deep-learning-customer-support-bot

 

Ready for Production

With version 2, Cognitive Toolkit goes from a research tool to something you can use in a production system, Huang said. “Microsoft has been using it for internal workloads. It’s not only Cognitive Services that have been created using CNTK but many other production-ready models. This is a commercially proven tool, it’s been proven in big production systems; it’s not just a tool used to create a toy problem.”

The voice recognition in Cortana is now created using Cognitive Toolkit, and the Cortana team says it’s increased their productivity almost ten-fold. “Before they adopted it, they felt like they were driving a Volkswagen; after they switched it’s like a Ferrari,” Huang said.

Microsoft’s speech services team is using Cognitive Toolkit not just for speech recognition but to create more accurate acoustic models, so they can understand what you’re saying in a noisy environment like a party, a bus or an open-plan office. They’re also using long short term memory, and the improvements will show up in Cortana as well as Skype Translator.

One reason that Microsoft moved CNTK from its original, academic-only release on Codeplex to full open source on GitHub was to expand it to additional workloads beyond speech — starting with image recognition — but without losing the impressive performance. The speech APIs and the Custom Recognition Intelligent Service in Microsoft Cognitive Services (a set of REST APIs you can call to use pre-built machine learning algorithms in your code) were built with the Cognitive Toolkit. CRIS lets you create your own custom acoustic models, by uploading samples from difficult environments along with transcriptions.

Bing uses Cognitive Toolkit to discover “latent connections” in search terms to find better results — if you type “how do you make a pumpkin pie” you’re looking for recipes, even though you didn’t type that in. That kind of natural language understanding is quite different from speech recognition, and it needs a massive dataset to work on.

“No other solution allows us to scale learning to large data sets in GPU clusters as easily,” Clemens Marschner, a principal software development engineer who works on Bing relevance, said.

Natural language understanding is also driving a new customer support system that Microsoft is trying out, under the codename Skyline. The chat bot looks at what the customer says and suggests links to fix the problem; it was good enough to let 25 percent of users in the trial fix their own problem, rather than the usual 12 percent. If a human agent needs to step in to work on a complex problem, the bot summarizes the fault and the conversation so far, so the agent doesn’t need to annoy the customer by asking all the same questions again.

Python and Performance

Most of the commercial production models built with Cognitive Toolkit were done in CNTK 1, but Huang noted, “The guts are identical — but we have new flexibility in CNTK 2.”

One of the advantages of Cognitive Toolkit is the way you describe deep networks — which are usually very complex — as nodes on a directed computational graph with inputs and outputs; once you’ve described a network, all the computation to learn the network parameters is taken care of automatically. Because you don’t need to derive gradients or hand-code the interactions between variables for back-propagation, you can create complex computational networks by composing simple building blocks.

The BrainScript network description language introduced in CNTK 1.5 lets you express very deep nets, beam decoding and other complex structures using infix operators, nested variables, function definitions, recursive function calls, arrays, and even lambdas. There’s a library of standard components that cover state of the art machine learning models like Deep Residual Nets for Image Recognition and Sequence-to-Sequence with Attention, and readers for easily inputting text and speech for deep learning training.

And now you can call all of that with Python, instead of having to use C++.

“This was a major adoption barrier for CNTK in the past,” he explained. “Using C++ for enterprise AI; that’s not a problem, people are familiar with C++. But for the open source community, we needed Python and this beta offers native Python support. It’s the language they’re familiar with; Python is easier to understand, easier to evaluate, it’s an interpretive language. Often they already have existing code using Python and when they add deep learning, they just want to augment what they have instead of switching from Python to C++. For the first time, we are bringing performance and ease of use in a more balanced way, because it can be integrated into other environments more efficiently.”

Python support will make working with reinforcement learning easier (since the majority of reinforcement learning libraries are written in Python). That’s a style of machine learning where the agent learns the best way to perform a task — anything from playing a game to navigating through a space — using trial and error, and rewards when it gets something right. Often it’s used as part of a more complex machine learning system; the Microsoft customer support agent uses both long short term memory and other supervised deep learning methods, plus reinforcement learning to keep improving its results. The rewards can be explicit feedback from the human agent or the reactions of the customer – leaving the chat if they’re frustrated or thanking the bot if the information is useful.

passing-the-chat-to-a-human

You’ll get the same performance using Python, and you might see a performance boost with CNTK 2.

“Compared to the previous version, it delivers almost two times performance boost in scaling to eight Pascal GPUs in an NVIDIA DGX-1,” said Ian Buck, general manager of the Accelerated Computing Group at NVIDIA.

That depends on which version you’re upgrading from, noted Huang. “CNTK 1 has been updated almost every month.” Version 1.5 introduced a parallel processing technique called Block Momentum that significantly reduced communication costs so you could scale parallel training across a large number of GPUs spanning multiple machines. On a 64-GPU cluster, that improved performance by a factor of more than 50. Version 2 is an improvement over that, although if you’re already using v1.8 the performance increase will be incremental.

Cognitive Toolkit’s performance is already impressive, though. Researchers at Hong Kong Baptist University are running regular benchmarks on the most popular deep learning toolkits — CNTK, Tensorflow, Caffe and Torch testing popular workloads: fully connected and recurrent neural networks and two convolutional neural network architectures, AlexNet and ResNet.

“CNTK 2 remains the fastest deep learning toolkit for distributed deep learning,” claimed Huang, “and I want to highlight the word distributed. Even on a single GPU, CNTK offers the fastest performance on both fully connected and recurrent networks. On AlexNet, Caffe is, not surprisingly, the fastest; on ResNet, Torch is fastest. But CNTK, even on a single GPU, is the fastest toolkit for two out of the four. If you compare it with TensorFlow, on all four workloads CNTK is faster now — AlexNet, ResNet, recurrent networks and fully connected networks, even on a single GPU. And when you scale up beyond one machine, that’s where Cognitive Toolkit really shines because many other tools can’t even do that; Caffe is only designed for one machine with multiple GPUs. CNTK is the fastest performing distrusted deep learning network tool.”

cntk2-performance

In fact, the latest version of the benchmarks shows that “CNTK is on par with TensorFlow and Torch on ResNet,” according to the researchers. “As for RNNs… CNTK achieves the best performance for all available settings.”

For many developers, the easiest way to get those multi-GPU systems will be the new Azure N-series VMs that use NVIDIA Tesla K80 GPUs; they’re still in preview but you can use Cognitive Toolkit on them already. “In fact with Azure GPU, we support not only CNTK but TensorFlow, Torch and Caffe,” explained Huang. “If you want to run a small task on a single machine with and multiple GPUs you can use any of those tools — but if you want to be serious about big data, scaling out to multiple GPUs on multiple machines, CNTK is the only one that offers that performance.”

hkbu-gpu-training-speed-results-lower-is-better

When the N-series VMs move into general availability, there will be a gallery image with Cognitive Toolkit already installed, and easier ways to scale out across multiple VMs. “Right now, you have to set up CNTK and run it on one VM; you can manage multiple VMs but it’s tedious, you have to use the command line. As we get the integration finished, it will be much easier to manage the distributed behavior. We’ll rely on Azure Batch to make scheduling much simpler once we are ready to launch the whole service. Azure GPU and CNTK together offer flexibility and ease of use; that will give the whole AI community a powerful toolkit to amplify AI for whatever they do.”

Feature image by Stefan Kunze via Unsplash.

The post Microsoft Solidifies CNTK Deep Learning Toolkit for Industrial-Grade AI appeared first on The New Stack.

How Microsoft Contributes to the Open Source Fluentd Project

$
0
0

Microsoft’s contributions to open source projects keep increasing, and it’s already gone far beyond Microsoft open sourcing its technologies. Having picked the Fluentd log collection framework for its Operations Management Suite, the OMS team has been adding features that it needs, and submitting those back to the Fluentd project.

“We want to have a strong two-way streak where we’re benefiting from Fluentd, and we want to make sure that all the benefits we get from it benefit the community as well,” Microsoft’s Anurag Gupta told The New Stack.

The biggest contribution Microsoft has made so far is what he calls a ‘circular’ buffer. The buffer built into Fluentd is a key part of what makes it reliable without needing an external cache, but if you’re logging a lot of data and for some reason, Fluentd can’t pass that on to its final destination (like a network problem) that’s going to fill up.

The circular buffer will automatically drop the older data to make room for new information and keep doing that until data routed to the output starts being accepted again. “We can drop data in a rolling way, so it doesn’t just fill up and start spamming the log message file with errors, and that makes sure you get the new data that’s most relevant,” he said.

You can even set different buffer capacities for different tasks; you might want to keep more data for your security logs so you can go back and analyze them, but you might not need as much performance data (especially around the time of an outage when it might not be representative).

“You can say that you want to keep 80Mb of every security log, but set a 20Mb rotating data buffer for your performance data. That way, if you have a service disruption, you can make sure you’re keeping the security log audit data, but the performance data can go if necessary. That’s ephemeral; it’s useful to view at the moment, but it’s not something you need to have guaranteed to be there that’s going to be audited,” explained Gupta.

Another Microsoft contribution to Fluentd adds a ’heartbeat’ to its native monitoring mechanism. “You can view information about what Fluentd plugins are running on the instance, about the amount of data going through them, what configuration is. We added some additional pieces on top to surface some of that information back up as a heartbeat, so the agent knows what the state of the system is,” he told us.

It’s very common for logging agents to need a heartbeat so that you know when an agent has failed, and you need to restart it, or when you need to start routing output to a different logging host because the default one isn’t available. That would be just as useful for monitoring a cluster or a set of orchestrated microservices, so the OMS team turned it into a pull request.

“We needed to make sure we had a heartbeat for the Fluentd agent that’s reporting to OMS but instead of creating a proprietary OMS plugin, we modified Fluentd’s native monitoring capabilities to add the heartbeat,” said Gupta.

The next contribution the team makes to Fluentd is likely to be their Statsd metrics server. The code for this aggregator is already available from the OMS GitHub repo, but they’re thinking about packaging it up as a Fluentd plugin, so it’s easier to pick up.

And the OMS repo is also an alternative way to get hold of Fluentd in the first place. If you don’t want to install Ruby and fetch the Fluentd Ruby gem, you can use td-agent, a distribution package from Treasure Data who created the Fluentd framework. That does a little more of the work for you, retrieving the Fluentd package from the repo and installing it using the RedHat or Debian package manager or grabbing the OS X version depending on what system you install it on; it preconfigures some settings, including sending data to Treasure Data. There are also Chef recipes and Puppet modules that will install td-agent for you to get the process started.

But for OMS, Microsoft wanted to be able to distribute the Fluentd agent as a self-extracting shell script that you can run — and you don’t need to be using OMS to take advantage of that. That’s a slightly easier way of installing Fluentd than using td-agent in some cases, said Gupta. “We understand if you’re running Ubuntu or RedHat, we’ll link the correct OpenSSL version, and we clearly say which Fluentd version it is.”

If that’s useful, you can get it directly from Microsoft’s repo. “Anything we’re building for Fluentd is all available as open source in our GitHub repo, and it’s consumable for everyone,” said Gupta. “You can fork it under the Apache 2 license; you can do whatever you want with it.”

Feature image via Pixabay.

The post How Microsoft Contributes to the Open Source Fluentd Project appeared first on The New Stack.

Microsoft Offers Smarter Database Tools for Developers

$
0
0

Microsoft is trying to close the divide between the worlds of software development and database programming, offering better SQL support in Visual Studio Code, and extending more SQL Server Enterprise features for developers.

There are few applications these days that don’t work with data but traditionally — certainly in the Microsoft world — database developers have been a completely different category from general developers.

In fact, the term “database developer” can be code for a database admin who also writes scripts and services that connect to databases.

That divide between developers who need to access data in applications and database developers who also create apps makes Microsoft’s database technologies look less relevant in today’s mobile and cloud world.

SQL Server 2016 adding JSON support and R Services is one way Microsoft is aiming to bring these two worlds closer together. R Services inside SQL Server is now getting new machine learning and deep neural network functionality. That brings “increased speed, performance and scale, especially for handling a large corpus of text data and high-dimensional categorical data,” according to Microsoft CVP for data products, Joseph Sirosh. There are R examples and machine learning templates for SQL Server on GitHub.

There’s also the Hadoop integration enabled by Polybase — with the 2016 release, that became a feature in SQL Server Standard Edition, so it is no longer something you have to buy the Analytics Platform System to get.

Being able to create external tables in SQL that point to a Hadoop cluster and query that as if it was just another table in a data warehouse makes it much more accessible to a database developer who is familiar with SQL. They can query Hadoop clusters running on multiple platforms (including unstructured Azure Blob storage) directly using T-SQL instead of having to learn Hive. At best Hive is different; at worst, it has a number of restrictions and doesn’t support all the query types you’re used to in SQL.

Microsoft is also approaching this from the other end, with better tools for working with SQL Server for general developers. Sirosh promised “improved developer experiences on Windows, Mac and Linux for Node.js, Java, PHP, Python, Ruby, .NET core and C/C++. Our JDBC Connector is now published and available as open source which gives developers more access to information and flexibility on how to contribute and work with the JDBC driver. Additionally, we’ve made updates to ODBC for PHP driver and launched a new ODBC for Linux connector, making it much easier for developers to work with Microsoft SQL-based technologies.” That’s on top of updating SQL Server Management Studio and the SQL Server Data Tools so they work with SQL Server on Linux.

There’s also a new official SQL plugin for Visual Studio Code, to let database developers use the new tools Microsoft is bringing out — which enable SQL development on Linux and Mac as well as on Windows.

“We’re bringing out a VS Code plugin for SQL with updated connectors and tools because we want to make this more seamless,” Microsoft’s Mitra Azizirad told the New Stack. “We want to make it more seamless for developers to build with SQL Server and Azure SQL Database and Azure Data Warehouse. Being able to connect to SQL Server, including SQL Server on Linux, from VS Code will, make it easier to work with. Azizirad is the CVP of what’s now the developer and data division.

“Data has come into my cloud app developer platform team,” she told us, “and that’s representative of a cultural shift internally around applications, and understanding that data is a core part of any application. We recognize that we’re moving from what might be considered data-aware apps to truly data-driven intelligent applications. If you want these immersive experiences, where you’re making reason out of the data, in real time, with predictive analyses and having context awareness, you don’t do that without data. You can’t have that immersive experience without data. So we really need to be reaching out to data developers and all developers to make it easier.”

There are a handful of SQL extensions for Visual Studio Code already, but the most popular only supports MySQL and the others only let you connect to SQL Server. Being able to edit and even execute T-SQL from VS Code makes it a powerful development tool for a whole new audience — which means those database developers can more easily work with all of the other languages and patterns VS Code supports.

You can connect to SQL Server running on-premises on Linux, Windows or docker in macOS, or in any cloud, to Azure SQL Database, and to Azure SQL Data Warehouse. Connection Profiles help you manage connections to multiple databases and set advanced properties (like the TCP port and connection timeout); you can use the standard pick-list in VS Code to switch the active database connection.

Once you’re connected, you can open existing .sql files or create new T-SQL statements and queries (both DML and DDL) in the T-SQL editor in VS Code. The editor gives you T-SQL colorization, context-aware schema autocomplete, syntax & schema error diagnostics, and T-SQL code snippets. Error diagnostics show up as red squiggles in the editor and as error messages in the error list tool window.

You can run T-SQL queries right from VS Code (either just the selected text or everything you have in the editor) and see results and messages in the result preview window. You can have multiple result-sets; they’re shown in a stacked, collapsible results pane, And you can export the results as CSV or JSON files.

“Visual Studio itself has rich integration with SQL Server and so it stands to reason to begin to take some of this integration to VS Code,” IDC research director Al Hilwa told us. “SQL Server has always been a widely deployed relational engine with a fair amount of use from other development environments and languages, so this makes sense given that VS Code is aimed at a broad developer audience.”

Microsoft is also about to announce SQL Server 2016 SP1, which will bring more of the new features in this version to the Standard Edition, so that developers have a single programming model to work with, across all the editions — a long-standing request. “This means ISVs, partners and developers can build to a single app programming surface when they create their applications, and then use the edition that scales to the application needs,” said Azizirad.

The Developer Edition has always had the full set of Enterprise Edition features to develop against, but now you’ll get in-memory OLTP and in-memory columnstore analytics and partitioning for deployment in Standard and even the Express editions.

Adding basic high availability, with a 2-node single database failover and a non-readable secondary means you can create more robust systems. Getting the ‘always encrypted’ option in Standard Edition is a big security improvement; developers can access an encrypted database from an ASP.NET website without needing to decrypt it and only trusted apps with the column master key can decrypt the data. That makes using the stretch database feature — moving data you don’t access as frequently to Azure Database — which was already a feature in the Standard Edition, much more appealing.

Microsoft is announcing a slew of other database technology at its Connect event, from the general availability of Azure Data Lake — both the big data processing and the analytics services — R Server for HDInsght and Operational Analytics for Azure SQL Database, to a local emulator of DocumentDB, to the public preview of SQL Server on Linux. These tools are about making those relevant to a much wider range of developers.

Feature image: Microsoft’s Scott Guthrie, on stage at Connect 2016.

The post Microsoft Offers Smarter Database Tools for Developers appeared first on The New Stack.

TypeScript Expands to Offer Functional Programming, Node.js Integration

$
0
0

In the four years since it went from an internal to a public open source, open design project, the combination of static type checking, transpiling and tooling in TypeScript has become something of a behind-the-scenes success, especially on large projects. You’d expect Microsoft projects like Visual Studio Code (and the Office Online apps) to be written in TypeScript, but so is Angular 2. Slack is currently migrating its Electron-based desktop app to TypeScript.

The recently released TypeScript Version 2.0 brings a broadened scope to the language, which was originally designed as a superset of JavaScript that also offers static typing. The release adds new types, more concepts from functional programming — like tagged unions, and several handy features to improve developer productivity.

Many fundamental pieces in TypeScript 2.0 are also building blocks for features that will come in the next version TypeScript 2.1 (released into preview Wednesday), like using code flow analysis to better assign implicit types to variable and arrays. And with async/await set to become part of the next ECMAScript standard, TypeScript 2.1 will be able to transpile asynchronous code for a wider range of  browsers.

The New Stack asked Anders Hejlsberg, the lead architect of C# and now also a core developer for TypeScript, what’s important in TypeScript 2.0 and why Microsoft wanted to bring new features to JavaScript in the first place.

Making JavaScript Scale

Back in 2011, Microsoft could already see that JavaScript was going to be a critical language, and not just for the web, Hejlsberg said.

“One reason being that it’s really the only thing that runs cross-platform. Even Java is no longer truly cross-platform; it doesn’t run on a bunch of mobile devices. And that’s becoming super-important because the world is becoming more heterogeneous, because of the mobile revolution. People were starting to write really large JavaScript apps and they were it’s hard to write a large app in a dynamic programming language with almost zero tooling that can validate the work you’re doing,” Hejlsberg said. “The only way to find out if it code works was to run it, and you’d better run all of it in all the possible states you can be in, or else you never know. But we know from decades of experience that if you add a static type system and better tooling, you can validate your code before the space shuttle flies rather than while it’s flying!”

The team considered trying to popularize an internal tool for cross-compiling C# to JavaScript called Script#, in the same way, Google was pushing GWT as a way to use java tools and cross-compile to JavaScript.

“But we also realized that you don’t appeal to a community by telling them to write in a different language and cross-compile into this thing we tell you to think of as IL for the web,” noted Hejlsberg. “We realized we had to work with the JavaScript community to figure out what JavaScript was lacking and find creative ways to fix that.”

What stood out most was that JavaScript had none of the methodologies other languages give you for structuring large applications. “There were no modules, no classes, no interfaces. In particular, there was no static type system and because of that you couldn’t do static verification; you also couldn’t do things like Intellisense, go to definition and find all references or safe refactoring. What if we could add that to JavaScript and do it in a way that doesn’t compromise the core value proposition, that’s cross platform and runs everywhere?”

Static types have been in TypeScript since version 1.0; modules and decorators arrived in version 1.5; intermediate releases brought support for ES2015 and more JavaScript patterns and libraries. And the focus on tooling and making TypeScript easy to embed in a wide range of editors has also been key. “Developer productivity doesn’t just come from the type checking but from the editing and authoring experience; getting great Intellisense, great refactoring, great code navigation. Things that used to take a whole day take a second with refactoring and that’s just invaluable.”

Building on Node

He’s enthusiastic about the way TypeScript 2 makes it easier to get the type declarations you need by using the Node Package Manager, taking advantage of the way TypeScript uses node — which as he points out is the standard way to get JavaScript outside a browser.

“TypeScript is very much the Switzerland of transpilers. We have no affinity with any particular development stack; it works with Angular and also with React. It works with Vue and Dojo and Aurelia and Ember, you name it… But to use TypeScript, you have to get the type information from somewhere. If it’s a module that was written in TypeScript, that’s easy, but if it’s written in JavaScript and someone has authored a declaration file you have to get that file; it may not come with that other framework when you install that.”

The DefinitelyTyped site has more than 2,000 type declaration files and the TypeScript community created tools like TSD and Typing to pull type definitions from that and other repos, but Hejlsberg admits that’s become rather ‘messy.’

“We realized we needed to focus on making this a better experience, so for TypeScript we’re automatically scraping Definitely Typed and automatically packaging anything you put on there into node modules and putting those in a private namespace, @types. If you say ‘npm install jQuery’, you get the typing — it’s just a node module. Those node modules can have dependencies on other node modules, so we use the node dependency manager. You just install types now.”

He notes that frameworks like Angular ship with TypeScript types included so you get them automatically; “but if you don’t it’s just another npm install away, and that makes configuration a lot easier.”

Similarly, glob support was a popular request, so you can use wildcards in file paths in config files to include or exclude specific files, which gets very useful as projects get larger and need to pull source code from different locations. TypeScript couldn’t simply use the standard node module globbing support library, he explained.

“The way we’ve written TypeScript, it has very few dependencies. Often what happens with node projects is you have dependencies on this and on that and before you know it you’ve sucked down hundreds of node modules. We want TypeScript to be embeddable. If you want to run TypeScript in a browser, you can’t depend on node, so we had to write our own.”

Functional Programming for JavaScript

Other new features are more fundamental, and let TypeScript adopt some of the programming ideas that developers are starting to become familiar with thanks to the success of functional languages like Swift and F#.

Giving both JavaScript’s ways of marking values as empty — null and undefined — their own types avoids a large number of programming mistakes when you forget to account for null or undefined values being returned from APIs.

He calls the ability to reason about non-null types as one of the precursors to that: “It’s the billion dollar mistake, or two billion, since JavaScript has both null and undefined; that’s something we were keen to work on and we’ve got that covered now with a nice backward-compatible story that allows you to gradually get there.”

That, in turn, depends on control flow analysis. “Control flow analysis is about the compiler reasoning about what happening in your code. When you write an if statement …

 ‘if (x)‘

or

‘if (x !== null)’,

 

… then you know x is not null inside there and if you return in there then you know x is not null in the remainder of the block you’re in. But it takes the compiler being to do code flow analysis to figure this out. It turns out non-nullability isn’t meaningful to do without code flow analysis.”

Once you have both of those, use can use control flow analysis for other patterns. “One thing that’s very common in JavaScript is to write message style processing applications. Lots of microservices are ‘receive a request that can be one of following 17 requests, and they all have a header, and all the headers have a request property that’s a string and some variable data beyond that.”

You could use classes and have a class for each request and a dispatcher; he calls that the object-oriented way of doing it. But that’s not the only option, proving that TypeScript isn’t just “C# for JavaScript’.

“With TypeScript and ECMAScript 2015 you can write OOP-style code with more each than before, but it’s also a very nice place to write functional style code,” he points out. (TypeScript itself is written entirely in functional style code with no classes at all).

“The functional way of handling those requests is more that you describe the various request shapes using object type declarations that each have a discriminant property. For example you may have a ‘kind’ property that in one declaration has the literal type ‘get,’ in another has the literal type ‘update,’ and in a third has the literal type ‘delete.’ You then say that a request can be any one of those objects by declaring a union type with each of the possibilities. So now you’ve described, in a closed world way, all the things that are possible and you’ve described the discriminants.”

Formalization

“We’re Just formalizing patterns that are commonly in use, but this allows you to have the compiler validate that you handled all the possible requests and you didn’t access properties off this request that actually belong to a different type of request,” he said.

That should sound familiar to developers familiar with functional languages. “This is a JavaScript formalization of functional programming’s ADT — Abstract Data Types — where you declare all of the different shapes a record can have and you do pattern matching over them.” But interestingly, the TypeScript team didn’t have to change JavaScript to support this. “It’s done here by declaring union types that are unions of interfaces with discriminates in them, and using switch statements to do your pattern matching — but it’s done without adding any new language constructs in JavaScript!”

As with almost everything Hejlsberg has worked on in his career (which now includes 20 years at Microsoft), this isn’t functional programming for the sake of it; these patterns are fundamentally about developer productivity. “The thing that we’re doing… You already have this picture in your head but your tooling doesn’t when you’re sitting in a dumb editor typing JavaScript. And even when you have the picture in your head, there are a lot of minutiae that maybe you get wrong. If you can teach the tool to think about your code more like the way you think about it, then the tool can be much more helpful. That’s what we’re doing by formalizing these common paradigms and idioms that people use when they write JavaScript.”

Feature image by Paul Morris, via Unsplash.

The post TypeScript Expands to Offer Functional Programming, Node.js Integration appeared first on The New Stack.

R Server 9 Adds Machine Learning to Work with Your Data Where It Lives

$
0
0

Built by data scientists, the R programming language has always been a tool for data scientists. But Microsoft’s R Server 9, the first full new version of the commercial package of R since Microsoft bought the company that created this distribution, Revolution Analytics, is also now aimed at a new audience — enterprise customers who have developers and analysts as well as data scientists.

That makes working with data from a wider range of sources key because enterprises have such mixed environments these days.

R Server already supported Apache Spark 1.6 data processing framework; R Server 9 (which is built on open source R 3.3.2) adds support for Spark 2.0, so you can take advantage of the new options for working with streaming data and the improved memory management subsystem.

“You can intermix calls to massively parallel algorithms in R with calls to native Spark, through the SparkR library,” explained Bill Jacobs, Principal Program Manager on the R Server team.

the-popularity-of-r-keeps-growing

R Server 9 can also now connect to Apache Hive for real-time queries, and Apache Parquet, which is quickly becoming  popular for columnar storage, as a way to load data into Spark DataFrames to be analyzed by Microsoft’s ScaleR functions. ScaleR is designed to deal with datasets too large to fit in memory and it’s available in Azure HDInsight and, soon, the Azure Machine Learning service, as well as in R Server (and in the free Microsoft R Client, for smaller datasets).

develop-in-spark-deploy-to-sql-server-or-web-services

Develop in Spark deploy to SQL Server or web services.

R Server 9 now also runs on Ubuntu as well as SuSE, and RedHat and CentOS, supporting Cloudera, Hortonworks and MapR Hadoop distributions.

As Bharat Sandhu from Microsoft’s Advanced Analytics team put it, “Data in the enterprise is increasing by leaps and bounds and it is on multiple platforms, so customers need this intelligence closer to the data and on multiple platforms. We want to work with what customers have; we want to work with the skills and knowledge they possess, and the systems they have already invested in.”

microsoft-has-multiple-options-for-working-with-r

Microsoft has multiple options for working with R.

Machine Learning on Your Data Platform

Those advanced analytics now include machine learning algorithms and data transforms, based on Microsoft’s extensive machine learning work. From Skype Translator to Bing to Exchange, Microsoft is using machine learning in a wide range of products and already provides many of its algorithms as Cognitive Services APIs you can call in your own code.

The new MicrosoftML package in R Server 9 includes six multi-threaded algorithms (based on machine learning used by Microsoft teams, but generalized to be useful for a wider range of scenarios):

GPU-accelerated deep neural networks should have significantly more performance than models that use only CPU. Microsoft says training multi-layer custom networks is up to eight times faster.

  • Fast linear SDCA (Stochastic Dual Coordinate Ascent Methods) learner, with support for L1 and L2 cache regularization, for binary classification and linear regression. Microsoft says this trains twice as fast as logistic regression.
  • Fast boosted decision tree for binary classification and regression.
  • Fast random forest for binary classification and regression.
  • Logistic regression, with support for L1 and L2 regularization.
  • Binary classification using a OneClass Support Vector Machine for anomaly detection
  • These algorithms are in R Server for Windows, the free R Client for Windows and SQL Server R Services now; they’ll come to Linux and Hadoop in the first quarter of 2017.

They let you do text classification for, say, sentiment analysis or classifying support tickets, create models for churn prediction, spam filtering, fraud and risk analysis, click-through and demand forecasting or create neural networks to solve complex machine learning problems like image classification or OCR and handwriting analysis.

Building a six-layer neural network takes just 60 lines of script, although the topology of the neural network can be arbitrarily deep. The only limitation is the computing power at your disposal. The more layers usually mean slower training time.

MicrosoftML also includes machine learning transform pipelines that let you create a custom set of transformations to feature your data before you train or test with it, using the following calls:

  • concat() combines multiple columns into a single vector-valued column, speeding up training times.
  • categoricalHash() converts a categorical value into an indicator array using hashing, which is useful when you have large numbers of categories.
  • categorical() converts a categorical value into an indicator array using a dictionary, for a small, fixed number of categories.
  • selectFeatures() selects features from a list using count or mutual information modes.
  • featurizeText() produces a bag of counts of n-gram word sequences from your text, with language detection, tokenization, text normalization, feature generation, and term weighting — it can also remove ‘stop words’ that are too common to be useful.

You can use the new algorithms alongside the RevoScaleR functions for importing, cleaning and visualizing your data, and existing open source CRAN R packages. GPU-accelerated deep networks.

a-galaxy-image-classifier-built-with-the-fastlinear-fasttree-fastforest-logistic-regression-neural-network-and-oneclasssvm-machine-learning-algorithms-in-r-server-9

A galaxy image classifier built with the FastLinear, FastTree, FastForest, Logistic Regression, Neural Network, and OneClassSVM machine learning algorithms in R Server 9.

Using Your Models in More Places

R Server 9 is designed to integrate well with enterprise systems. “The big challenge for R is how to become an operationalizable, embeddable, integratable component of larger applications,” said Jacobs. “Now you can take R models from Spark and move them to a SQL Server system or a Linux box or a Hadoop cluster or a Windows Server system or Teradata.”

One way of doing that is what used to be called the DeployR server; this feature is now integrated into R Server, and it’s now just called ‘operationalization capabilities.’ It’s already in available in standalone server installations on Windows, RHEL and CentOS, and Ubuntu and coming to SLES11, Teradata, Hadoop, and SQL Server R Services in 2017. But Microsoft sees the new R support in SQL Server 2016 as the ideal way for enterprises to turn analytics models into solutions you can use at scale in your business, rather than just reports to look at.

It’s a single line command from the new SQLRutils package to embed the R script in a T-SQL stored procedure in a SQL Server database, where you can access it through any app or website that connects to the database. If you create a neural network, you get a binary blob that’s a serialized version of the trained neural network that sits in the database.

“It runs in the database where it can run massively parallel, with multiple threads and multiple cores,” explained Jacobs. “It also provides all the security because the data never leaves the database engine.”

That will become more widely relevant when the Linux version of SQL Server ships in 2017, but it’s not the only new option. To speed up how quickly you can start using your R models, and to make sure models in R Server stay useful as technology platforms develop, R Server 9 also makes it easy to expose both models and even arbitrary R scripts as web services that you can call as APIs from any programming language, using the Swagger API framework.

Again, it’s a simple process to create the Swagger.JSON document; that’s something the data scientist can do themselves, from RStudio, R Tools for Visual Studio or a Jupyter notebook, and send the file to the developer who’s going to use the model. There’s another command the app developer runs to generate the Swagger code that creates the API, that they can then call in their app with a few more lines of code.

That’s much quicker than the traditional deployment process with R. Usually, pointed out R Server program manager Carl Nan, “after the data scientists build the R model, it takes the app developer a long time to convert to other programming languages so they can integrate it with business apps in production. It’s error prone process with slow innovation rates that ends up with stale models.”

That makes machine learning models you create yourself in R Server as easy to work with as the commercial machine learning APIs from providers like Microsoft and Google.

The new options mean that machine learning and analytics models you create in R aren’t tied to the platforms you build them on, or the platform you currently use them on, Jacobs pointed out. You can train in one environment and deploy in many. “No platform lasts forever. Some of what we’re doing here is providing portability that allows you to build hybrid applications. But over time, the ability to build code in one place and run it another also provides a form of future proofing that abstracts a lot of the data scientists’ work away from the peculiarities of platforms in use and makes their work last a long time.”

On the other hand, for businesses that don’t have data scientists, Microsoft is also producing solution templates for specific problems that are ready to use. “At times, the build-it-yourself approach can leave the uninitiated a little bit in the lurch and having a lot work to do to get their first good results,” Jacobs said.

The first template predicts when leads will convert to customers, creating a dashboard that recommends whether to use email, text message or a phone call to use to reach those potential customers and even what time and day to contact them. Modeling the data to make that would usually be weeks of work. It’s based on the insurance industry, so if you do have data scientists, all the code is on GitHub, so you can load it into R and qualify the models against your own data.

On a smaller scale (or for development purposes), several of the key new features of R Server are also in the 3.3.2 version of the free R client, including the machine learning library and the OlapR, MRSDeploy and SQLrutils packages.

The R Server 9 support for machine learning comes in the first community technical preview of the next version of SQL Server — the current version has embedded R support, but not the new packages.

a-lead-conversion-dashboard-from-the-microsoft-solution-template

A lead conversion dashboard from the Microsoft solution template.

Feature image by Nathan Anderson, via Unsplash. Other images from Microsoft.

The post R Server 9 Adds Machine Learning to Work with Your Data Where It Lives appeared first on The New Stack.

Viewing all 172 articles
Browse latest View live




Latest Images