The life cycle of a Tor relay - Guard Relays (Entry Relays) - Middle Relays - Exit Relays
In this post, I discuss the lifecycle of a new fast Tor non-exit relay. Since bandwidth estimation and load balancing of the entire Tor network has become much more complicated in the last few years, many relays have asked themselves this question.
I hope this summary is useful for relay operators. It also provides background information for understanding some of the anonymity analysis features.
There are different types of relays on the Tor network, each with specific functions. Here are the main types of Tor relays:
Guard Relays (Entry Relays):
Function: These relays are the first nodes that a Tor client connects to. They are responsible for masking the user's original IP address.
Characteristics: Guard relays are known to be stable and reliable as they establish a long-term relationship with clients.
Middle Relays:
Function: These relays mediate the data traffic between the guard relay and the exit relay. They help to disguise and encrypt communication within the Tor network.
Features: They have no knowledge of the origin or destination of the traffic, which increases security.
Exit relays:
Function: These relays are the last nodes in the Tor network before traffic leaves the network and reaches the open Internet. They decrypt the last layer of Tor encryption.
Characteristics: Exit relays are often the subject of misunderstanding and legal challenges because they appear to be the source of traffic on the open Internet.
Bridge Relays:
Function: bridges are special relays that help circumvent censorship. They are not listed in the public Tor Relays directory, making them more difficult to block.
Features: They are useful for users in countries with Internet censorship and provide a way to access the Tor network when traditional Tor connections are blocked.
Directory Authority Relays:
Function: These special relays are responsible for managing the Tor network. They collect and publish information about all other relays in the network.
Features: There are only a limited number of Directory Authorities, and they are critical to the integrity and functionality of the Tor network.
Each of these relay types plays an important role in the Tor network, helping to ensure anonymity, privacy, and censorship resistance.
A new relay, assuming it is reliable and has enough bandwidth, goes through four phases: the unmetered phase (days 0-3), where it is barely used; the telemetered phase (days 3-8), where the load starts to increase; the ramp-up phase (days 8-68), where the load counterintuitively drops and then rises again; and the steady-state phase (days 68+).
When your relay first starts up, it performs a bandwidth self-test: it makes four connections to the Tor network and back to itself, then sends 125KB packets over each connection. This step starts Tor's passive bandwidth measurement system, which estimates the Tor relay's bandwidth based on the value of the largest burst the relay has done in 10 seconds.
The Tor directory services list the new Tor relay in the Tor federation, and clients get good bandwidth performance, balancing the network load by selecting relays proportional to the bandwidth listed in the federation.
Originally, the directory services simply used the estimated bandwidth you specified in your relay descriptor. As you can imagine, this approach was not very stable, and someone or other would attempt to attract traffic "without permission". For some time now, "bandwidth authority" scripts have been used, where a group of fast servers on the Internet (called bwauths) take active measurements at each relay, and the directory services adjust the consensus bandwidth up or down depending on how the relay is performing compared to other relays of similar speed. (Technically, we call the consensus number a "weight" rather than a bandwidth, because it's all about how your relay's number compares to the other numbers, and once we start adjusting it, we're not really talking about bandwidths anymore).
The bwauth approach isn't foolproof, but it's a lot better than the old design.
So that's phase one: your new relay will be virtually unused for the first few days of its life, due to the low 20KB limit, while it waits for a bwauth threshold to be measured.
Phase two: Remote measurement (days 3-8).
At the start of this phase, the gate relay hasn't seen much traffic yet, so your peers are the other relays that haven't seen much traffic yet. Over time, however, some clients will connect through your relays and generate some traffic, and the passive bandwidth measurement will provide a new, higher estimate. Now, the bwauths will compare you to your new (faster) servers and give you a higher consensus weight, which will encourage more clients to use the new relay, which will increase your bandwidth estimate, and so on.
Tor clients typically make three hops (i.e., paths that pass through three Tor relays). The first relay in the path, called the guard relay, is special because it helps protect against a particular anonymity-destroying attack. The attack works like this: If you keep randomly choosing new paths and your opponent goes through multiple Tor relays, over time the probability that *every* single path you've created is unsafe from your opponent drops to zero. The defence is to choose a small number of relays (called guards) and always use one of them for the first jump - either you make the wrong choice and one of the guard relays is executed by the opponent, or you make the right choice and all your paths are safe.
Only stable, reliable relays that have been running for a long time can be used as guard relays, so no client will use your brand-new Tor relay as a first hop in its current state. And because your Tor relay is a non-exit relay (i.e., the relay will not be the system that actually connects to external services like websites), no client will use your Tor relay as a third hop. This means that all of your relay's traffic will be routed through the second hop.
So this is phase two: once the Bwauths have measured the relay and the directory services have removed the 20KB limit, the relay will start to attract more and more traffic, but it will still be limited because the relay is still a middle hop.
Phase three: Upgrade to a relay (days 8-68).
At this point, the guard flag is introduced. The directory authorities assign the guard flag to relays based on characteristics such as "bandwidth" (they must have a sufficiently large consensus weight), "weighted uptime" (the relay must be online for a long time without interference). The last property, uptime, is the most important: In today's Tor network, a new Tor relay can receive a flag for the first time on day eight.
Clients are only willing to select the relay for their first hop if you have the guard flag. But here's the catch: once you get the guard flag, all other clients will stop using you for their intermediate hops, because when they see the guard flag, they assume that you already have a lot of load from clients using you as their first hop. This assumption will be true in steady state (i.e., if enough clients have selected the relay as a guard node), we will see a drop in traffic when the Tor relay receives the guard flag.
Why do clients avoid using relays with guard flags for their middle hop? Clients take into account the scarcity of guard capacity and the scarcity of exit capacity, and accordingly avoid using relays for positions in the path that are not scarce. This is the best way to allocate the available resources: Relays with the exit flag are mainly used for exits when they are scarce, and gate relays with the guard flag are mainly used for entrances when they are scarce.
It's not optimal to allow this temporary dip in traffic (since we don't get any benefit from the resources you're trying to contribute), but it's a short period overall: customers change their guard nodes every 4-8 weeks, so some of them will switch to your relay pretty soon.
To be clear, there are two reasons why we get customers to rotate their guard relays, and these reasons are two sides of the same coin: first, the problem described above that new guards would hardly be used (since only new customers selecting their guards for the first time would use a new guard), and second, that old, established guards would accumulate an ever-increasing load, since they would have traffic from all the customers who have ever selected them as a guard.
One of the reasons for this blog post is to give you some background information so that when I explain later why we need to extend the Guardian rotation to several months, you'll understand why we can't just increase the number of Guardians without changing some other parts of the system to keep up. Stay tuned for more details. If you can't wait, you can also read the original research questions and subsequent research by Elahi et al and Johnson et al.
Phase Four: Guard Relay in continuous operation (from day 68).
If your server has been a Guard Relay for the entire duration of the new status flag determinations
(up to 12 weeks), the relay should reach a stable state. The number of clients using the relay as an entry point and as a relay in the middle has equalised. The Gate Relay now fulfils both functions.