Tor Snowflake Technical overview of another feature to access censored websites and applications

Tor Snowflake Technical overview of another feature to access censored websites and applications

Snowflake is a new WebRTC pluggable transport. This document provides a technical overview of Snowflake in terms of the system's components, interactions, and code. The intent is to introduce Snowflake to the less technically savvy reader, as well as those interested in contributing to this project and Internet Freedom in general. Specifically, this document discusses Snowflake's use of WebRTC, the approach to Rendezvous using domain fronting, the method for overcoming NAT using ICE negotiation, and a number of other considerations without assuming significant prior knowledge of these topics. Snowflake and this live document are an ongoing project. As the Snowflake project evolves, this document will be updated and additional documents will be made available to discuss metrics, additional topics, and other relevant findings in light of future work on this project.

Contents:
1 Introduction
2 Overview
3 Snowflake Bypass Process
3.1 Behavior of Pluggable Transport Clients
3.2 WebRTC connection setup
3.3 Rendezvous
3.4 NAT Bypass
3.5 Recovery and multiplexing
4 Contribution

1 Introduction
Snowflake is a new circumvention tool that provides access to the free and open Internet. As a pluggable transport, it provides easy-to-use access to a censorship circumvention system like Tor. It is inspired by and builds on the earlier work of Flashproxy. Snowflake is a hybrid of previous pluggable transports, and this document is intended to serve as a guide for exploring this system.

To illustrate in the context of Tor, Snowflake allows anyone to leave a browser tab open to become an ephemeral Tor bridge. Similar to the Flashproxy design, Snowflake includes a large network of volunteer proxies, with the goal of surpassing the censors' ability to block proxy IP addresses and providing a very easy-to-use, reliable, and difficult-to-filter method of circumventing censorship. Previously, it was difficult for users to manually configure port forwarding, which limited the adoption of older tools such as Flashproxy. Snowflake solves the problem of NAT traversal by making it automatic and not the responsibility of the user, and offers a number of new benefits.

Ease of use and reliability are of great importance to the Snowflake system, both in terms of making it easier for people in censored regions to connect and in terms of allowing volunteers to help others connect. This will allow the bypass network to more easily increase both the number of volunteers and the number of customers. In this way, the Snowflake system becomes stronger in terms of bypass capacity, bandwidth and resilience as the volunteer network grows.

2 Overview
Using Tor as an example, the sequence of interactions in a Snowflake session might look like this:

A user in the filtered region wants to access the free and open Internet. He opens the Tor browser and selects snowflake as a pluggable transport. This starts the Snowflake client.
Volunteers outside the filtered region visit websites that host the Snowflake proxy code. The browsers of these volunteers then become temporary proxies available to a Snowflake client.
The filtered user's Snowflake client automatically finds some of these volunteer, remote, in-browser Snowflake proxies using a secure rendezvous strategy that also passes NAT automatically.
These two Snowflake peers establish a peer-to-peer connection via WebRTC.
Once WebRTC is ready, the Snowflake client releases the WebRTC transport for use by Tor.
In the meantime, the volunteer's Snowflake proxy connects to a Tor relay and begins routing traffic between the Snowflake client and the Tor relay.
Tor establishes a circuit and the user can now bypass the circuit.
To further clarify, it is not the website hosting Snowflake that acts as the Snowflake proxy. Rather, it is the website visitor - their browser tab becomes a voluntary proxy.

Snowflake comprises three components that enable this process:

The Snowflake client, which is a client transport plugin that conforms to the Pluggable Transport Specification (ptspec). Tor uses it like any other pluggable transport. Any other ptspec-compliant system can do the same. This component is written in Golang.
The Snowflake proxy is a miniature WebRTC proxy in the browser. It transfers data between Snowflake clients and a target - for Tor this would be a Tor relay. This component is written in CoffeeScript.
The broker responsible for Rendezvous. It is similar to the “Facilitator” of Flashproxy, but currently only uses domain fronting. This component is written in Golang.
The Snowflake client and Snowflake proxy can also be referred to as Snowflake peers.

In Snowflake, WebRTC only takes place between the Snowflake peers: a Snowflake client and a Snowflake in-browser proxy, as WebRTC serves as a transport for overcoming the filter boundary. Communication from the proxy to the target currently takes place via a web socket. Communication to the broker takes place via HTTPS / domain fronting.

Here is a diagram to further illustrate the Snowflake bypass process:
These processes and components are associated with many more details, which will be explained in more detail below:

3 Snowflake Bypass Process

3.1 Behavior of Pluggable Transport Clients
Snowflake is a pluggable transport that conforms to the Pluggable Transport Specification.

Specifically, Snowflake includes the Client Transport plugin, which provides a localhost SOCKS server as an interface between the client application and the transport. In the context of Snowflake and Tor Browser, the Snowflake client transport plugin creates a localhost SOCKS server that the client application, Tor Browser, sets as a proxy.

The Snowflake client is also responsible for ensuring that connections to remote Snowflake proxies are available so that the SOCKS server can handle requests from Tor Browser by forwarding traffic to a Snowflake proxy. These remote proxies are assumed to forward traffic to a Tor relay so that the entire system fulfills the expected behavior as a “WebRTC Transport” for Tor.

However, before the Snowflake client can use the transport, the local Snowflake client and the remote peer must first establish a connection via WebRTC.

3.2 WebRTC connection setup
WebRTC is a relatively new standard that enables robust peer-to-peer real-time communication that includes streaming video, audio, and arbitrary binary data. For the purposes of Snowflake, only binary data channels are used via WebRTC DataChannels, not media channels. WebRTC DataChannels use SCTP and DTLS to provide a fairly reliable, encrypted transport. Of course, there are many other aspects to consider here, including but not limited to fingerprinting. In the future, it is possible that WebRTC's RTP media channels could be useful as an alternative means of transport.

Originally, WebRTC was only available either through the JavaScript APIs in modern versions of Chrome and Firefox or through the C++ library in native code. For the development of Snowflake, it was necessary to create a Golang library that customizes the C++ WebRTC library with cgo.

3.2.1 Session descriptions
Programs in the web browser cannot passively wait for incoming connections; they must initiate the outgoing connection. Since both the Snowflake client and the proxy are WebRTC peers subject to this particular restriction, there must be a way for these peers to send signals and recognize each other before connecting via WebRTC in order to initiate their WebRTC peer connection. This signaling via a bidirectional communication channel is required for every WebRTC scenario, although it is not provided for in WebRTC itself. Rendezvous is not part of the scope of WebRTC; all users of the WebRTC API are expected to handle Rendezvous for their own use case.

For Snowflake, this process begins when a Snowflake client creates a new WebRTC PeerConnection that is not yet connected to a remote peer. This PeerConnection then creates a single DataChannel that triggers a series of events that prepare a local Session Description Protocol (SDP) offer so that the rendezvous process can begin, which is described in more detail below. This SDP offer primarily describes the peer and its capabilities, along with instructions for a remote peer on how to reach the client over the network.

Through the Rendezvous process, the Snowflake client's SDP offer reaches a Snowflake proxy, which generates an SDP response that contains similar information to the SDP offer, but describes the proxy instead. If Rendezvous is successful, the Snowflake client receives this SDP response, and both WebRTC endpoints now have each other's SDP messages. At this point, the Snowflake peers can now attempt to establish a direct connection.

3.2.2 Completing the connection
If successful, the WebRTC PeerConnection and its DataChannel are opened on both peers and are available for streaming bytes. The Snowflake client then wires its DataChannel to the aforementioned localhost SOCKS proxy to release the transport for the client application. The Snowflake proxy establishes a simple websocket connection to a Tor relay. At this point, the WebRTC transport is ready and available and allows any user traffic or, for example, the establishment of a Tor connection.

However, in order to successfully establish a WebRTC PeerConnection, the SDP messages described above must be transmitted correctly and securely between the peers. There are many more considerations here, as the attacker could disrupt various rendezvous strategies.

3.3 Rendezvous
Rendezvous is essentially the process by which clients and proxies find each other. To connect two Snowflake peers, signaling messages consisting of the SDP offers and responses described above must be exchanged. This allows the peers to know where to start negotiating a P2P connection.

3.3.1 The Broker
In Snowflake, the rendezvous is managed by the broker, a server running on a third-party web service. The broker is responsible for securely rendezvousing Snowflake clients with Snowflake proxies by exchanging SDP offers and responses between them while maintaining an accounting of the Snowflake peers. Any number of proxies and clients can be simultaneously involved in this process with the broker, requiring additional control to keep the broker scalable, robust, and resilient to DDoS while keeping it secure.

A single rendezvous process consists of a series of nested HTTP requests.

1) A fresh Snowflake proxy sends a POST request to the broker as a long poll, indicating that it is looking for a client to serve.
2. a Snowflake client sends a POST request to the broker containing its SDP offer. The broker keeps this request open and forwards the SDP offer in response to one of the Snowflake proxy's subscribe polls from step 1.
3. the Snowflake proxy receives the SDP offer and creates an SDP response. It then sends another POST request to the broker containing the response as a reply for the client that sent the offer.
4) The broker forwards the SDP response to the original Snowflake client as a response to the POST request.
5. both the Snowflake proxy and the Snowflake client now have each other's SDP messages, which is sufficient to establish a direct WebRTC peer connection. If any step of this process takes too long, the requests are safely aborted and the Snowflake peers try again.

Furthermore, this exchange of signaling messages requires a path that is also highly resistant to filters. In particular, it is assumed that direct connections to the broker from the client in the filtered region are blocked by the attacker without affecting the functionality of the broker. This is possible through domain fronting.

3.3.2 Domain Fronting
Similar to Meek, another pluggable transport, Snowflake uses domain fronting. Domain fronting is a method of circumvention based on freedom from collaterals. It takes advantage of HTTPS and the behavior of large third-party web services. Large Internet companies such as Google, Amazon and Microsoft offer web services via CDNs (Content Delivery Networks) that are tailored to their needs. These CDNs not only provide their own web services, but also services that users can host on their platforms, such as App Engine. Snowflake currently hosts the broker on App Engine, but will do so on other services as well.

To illustrate using App Engine, let's assume a Snowflake broker is located on snowflake-123.appspot.com, and let's assume the censor is already blocking direct connections to it. So when a Snowflake client wants to communicate with this broker, it opens a TLS connection not to snowflake-123.appspot.com, but to a valid root domain, namely google.com. However, the appengine instance is only specified in the host header of the HTTP request, so that the HTTP request looks like this:

GET / HTTP/1.1 host: snowflake-123.appspot.com
When Google's serving infrastructure receives this request, it recognizes that it can provide the desired App Engine instance. (If the host header contains an arbitrary address that is not available via this domain, an error message such as 403 Forbidden is usually returned). Amazon and Microsoft can do something similar with their respective services. Since the HTTP request is sent over TLS, the censor cannot see the Host header, so the request looks like a harmless request to google.com. This means that the censor cannot block the broker without blocking all of Google or all of Amazon, resulting in collateral freedom.

A more comprehensive explanation of domain fronting can be found here.

Since Snowflake uses domain fronting only for the rendezvous and not for the transport itself like Meek, the resource usage is far lower as the short signaling messages to the broker consist of far fewer bytes than the total user traffic. This significantly reduces the cost of third-party providers and CDN fees, allowing the tool to scale to support far more users. This is one of the two main advantages Snowflake offers over older bypass tools.

3.4 NAT Bypass
The second main advantage of Snowflake is its approach to NAT bypass. Snowflake assumes, among other things, that both the client and the proxy are behind NAT (Network Address Translation) and that this is Snowflake's responsibility.

Most devices are located behind a router that implements NAT, which is widely used worldwide. Although NAT offers several advantages, such as overcoming the address space limitation of IPv4, it also introduces a barrier when establishing peer-to-peer connections, making it more difficult to determine a direct path between peers.

Snowflake finds this path without requiring the user to manually configure port forwarding. Because NAT traversal is now automatic and no longer the responsibility of the user, the usability issues that limited the adoption of previous traversal tools are resolved.

Automatic NAT traversal in Snowflake is possible thanks to ICE negotiation.

3.4.1 ICE Negotiation
Snowflake uses WebRTC's Interactive Connectivity Establishment (ICE) negotiation for NAT negotiation.

When a peer participates in ICE, it first collects ICE candidates through a series of fallbacks. Each ICE candidate is a local or translated public IP address that could allow other devices to reach it either directly, by UDP punching via STUN, or as a last resort via a TURN relay. ICE then expects these ICE candidates to somehow reach the remote peer. In WebRTC, these ICE candidates are part of the SDP messages, the delivery of which is handled by the rendezvous process via the broker mentioned above. Once both peers have each other's ICE candidates, the peers try each ICE candidate until they can establish a P2P connection.

3.4.2 Caveats for STUN and TURN
The ICE negotiation process leads to further implications regarding the availability of STUN and TURN servers on the public Internet, as the circumvention process now depends on the availability of at least one of these servers. STUN servers are public and relatively inexpensive, so many are available on the public Internet. TURN servers are less common and are needed for about 10% of peer combinations where STUN does not work, especially in symmetric NAT cases that prevent normal UDP hole-punching.

Currently, Snowflake is configured to use STUN only by default. Including TURN servers in the configuration is trivial if they are available.

While it is entirely possible for the censor to block STUN and TURN servers as well, these servers are typically required for any type of peer-to-peer connection setup that is common in various other domains and applications. This means that STUN and TURN already bring a certain degree of collateral freedom, but there are no guarantees. The provision of a highly available population of non-blocked, high-performance STUN and TURN servers for the client remains an issue to be resolved during deployment. The addresses of the STUN and TURN servers can also be provided via a domain fronted channel.

Further extensive future work with TURN and Snowflake is likely to be required to provide a good experience for this remaining 10% of users.

3.5 Recovery and multiplexing
Once snowflake has prepared WebRTC and enabled it as a pluggable transport, it must also be able to recover quickly and reliably if the connection is lost. This is particularly important in any case, as it is assumed that the Snowflake proxies are short-lived.

For the Snowflake client, there are two primary ways in which the transport can fail:

The remote Snowflake proxy is closed, the connection is lost, or an error occurs because the volunteer leaves the page, closes the tab, loses the connection, or some other scenario that remotely terminates the WebRTC DataChannel.
There is a local error on the SOCKS side. This is much rarer.
In any case, snowflake attempts to maintain high reliability and connectivity as well as a high quality of browsing experience for the user in the Tor Browser use case by having snowflake clients and proxies multiplex each other. If a single WebRTC DataChannel fails, the Snowflake client reconnects to a new WebRTC peer. If the WebRTC DataChannel was actively used as a transport, the Snowflake client triggers a new SOCKS handler that switches the transport to a different WebRTC DataChannel.

The number of multiplexed WebRTC DataChannels to be searched for and maintained on the Snowflake client can be configured with the -max N flag.

4 Contribution
Snowflake is under active development, and there is still a lot of work to be done for the foreseeable future. Much of this is described in the updated Snowflake OTF proposal. This includes reproducible builds, audits, usage and adoption metrics, traffic fingerprinting, a headless Snowflake proxy implementation, a browser extension implementation or integration with existing extensions like Cupcake, and other additions or independent implementations for a variety of use cases.

If you are reading this and are excited or curious about advancing Snowflake's approach to Internet freedom, please don't hesitate to contact Serene or send a pull request.