Tag Archives: Interconnection

How Cloudflare’s systems dynamically route traffic across the globe

2023-09-25 David Tuber

Post Syndicated from David Tuber original http://blog.cloudflare.com/meet-traffic-manager/

How Cloudflare’s systems dynamically route traffic across the globe

Picture this: you’re at an airport, and you’re going through an airport security checkpoint. There are a bunch of agents who are scanning your boarding pass and your passport and sending you through to your gate. All of a sudden, some of the agents go on break. Maybe there’s a leak in the ceiling above the checkpoint. Or perhaps a bunch of flights are leaving at 6pm, and a number of passengers turn up at once. Either way, this imbalance between localized supply and demand can cause huge lines and unhappy travelers — who just want to get through the line to get on their flight. How do airports handle this?

Some airports may not do anything and just let you suffer in a longer line. Some airports may offer fast-lanes through the checkpoints for a fee. But most airports will tell you to go to another security checkpoint a little farther away to ensure that you can get through to your gate as fast as possible. They may even have signs up telling you how long each line is, so you can make an easier decision when trying to get through.

At Cloudflare, we have the same problem. We are located in 300 cities around the world that are built to receive end-user traffic for all of our product suites. And in an ideal world, we always have enough computers and bandwidth to handle everyone at their closest possible location. But the world is not always ideal; sometimes we take a data center offline for maintenance, or a connection to a data center goes down, or some equipment fails, and so on. When that happens, we may not have enough attendants to serve every person going through security in every location. It’s not because we haven’t built enough kiosks, but something has happened in our data center that prevents us from serving everyone.

So, we built Traffic Manager: a tool that balances supply and demand across our entire global network. This blog is about Traffic Manager: how it came to be, how we built it, and what it does now.

The world before Traffic Manager

The job now done by Traffic Manager used to be a manual process carried out by network engineers: our network would operate as normal until something happened that caused user traffic to be impacted at a particular data center.

When such events happened, user requests would start to fail with 499 or 500 errors because there weren’t enough machines to handle the request load of our users. This would trigger a page to our network engineers, who would then remove some Anycast routes for that data center. The end result: by no longer advertising those prefixes in the impacted data center, user traffic would divert to a different data center. This is how Anycast fundamentally works: user traffic is drawn to the closest data center advertising the prefix the user is trying to connect to, as determined by Border Gateway Protocol. For a primer on what Anycast is, check out this reference article.

Depending on how bad the problem was, engineers would remove some or even all the routes in a data center. When the data center was again able to absorb all the traffic, the engineers would put the routes back and the traffic would return naturally to the data center.

As you might guess, this was a challenging task for our network engineers to do every single time any piece of hardware on our network had an issue. It didn’t scale.

Never send a human to do a machine’s job

But doing it manually wasn’t just a burden on our Network Operations team. It also resulted in a sub-par experience for our customers; our engineers would need to take time to diagnose and re-route traffic. To solve both these problems, we wanted to build a service that would immediately and automatically detect if users were unable to reach a Cloudflare data center, and withdraw routes from the data center until users were no longer seeing issues. Once the service received notifications that the impacted data center could absorb the traffic, it could put the routes back and reconnect that data center. This service is called Traffic Manager, because its job (as you might guess) is to manage traffic coming into the Cloudflare network.

Accounting for second order consequences

When a network engineer removes a route from a router, they can make the best guess at where the user requests will move to, and try to ensure that the failover data center has enough resources to handle the requests — if it doesn’t, they can adjust the routes there accordingly prior to removing the route in the initial data center. To be able to automate this process, we needed to move from a world of intuition to a world of data — accurately predicting where traffic would go when a route was removed, and feeding this information to Traffic Manager, so it could ensure it doesn’t make the situation worse.

Meet Traffic Predictor

Although we can adjust which data centers advertise a route, we are unable to influence what proportion of traffic each data center receives. Each time we add a new data center, or a new peering session, the distribution of traffic changes, and as we are in over 300 cities and 12,500 peering sessions, it has become quite difficult for a human to keep track of, or predict the way traffic will move around our network. Traffic manager needed a buddy: Traffic Predictor.

In order to do its job, Traffic Predictor carries out an ongoing series of real world tests to see where traffic actually moves. Traffic Predictor relies on a testing system that simulates removing a data center from service and measuring where traffic would go if that data center wasn’t serving traffic. To help understand how this system works, let’s simulate the removal of a subset of a data center in Christchurch, New Zealand:

First, Traffic Predictor gets a list of all the IP addresses that normally connect to Christchurch. Traffic Predictor will send a ping request to hundreds of thousands of IPs that have recently made a request there.
Traffic Predictor records if the IP responds, and whether the response returns to Christchurch using a special Anycast IP range specifically configured for Traffic Predictor.
Once Traffic Predictor has a list of IPs that respond to Christchurch, it removes that route containing that special range from Christchurch, waits a few minutes for the Internet routing table to be updated, and runs the test again.
Instead of being routed to Christchurch, the responses are instead going to data centers around Christchurch. Traffic Predictor then uses the knowledge of responses for each data center, and records the results as the failover for Christchurch.

This allows us to simulate Christchurch going offline without actually taking Christchurch offline!

But Traffic Predictor doesn’t just do this for any one data center. To add additional layers of resiliency, Traffic Predictor even calculates a second layer of indirection: for each data center failure scenario, Traffic Predictor also calculates failure scenarios and creates policies for when surrounding data centers fail.

Using our example from before, when Traffic Predictor tests Christchurch, it will run a series of tests that remove several surrounding data centers from service including Christchurch to calculate different failure scenarios. This ensures that even if something catastrophic happens which impacts multiple data centers in a region, we still have the ability to serve user traffic. If you think this data model is complicated, you’re right: it takes several days to calculate all of these failure paths and policies.

Here’s what those failure paths and failover scenarios look like for all of our data centers around the world when they’re visualized:

This can be a bit complicated for humans to parse, so let’s dig into that above scenario for Christchurch, New Zealand to make this a bit more clear. When we take a look at failover paths specifically for Christchurch, we see they look like this:

In this scenario we predict that 99.8% of Christchurch’s traffic would shift to Auckland, which is able to absorb all Christchurch traffic in the event of a catastrophic outage.

Traffic Predictor allows us to not only see where traffic will move to if something should happen, but it allows us to preconfigure Traffic Manager policies to move requests out of failover data centers to prevent a thundering herd scenario: where sudden influx of requests can cause failures in a second data center if the first one has issues. With Traffic Predictor, Traffic Manager doesn’t just move traffic out of one data center when that one fails, but it also proactively moves traffic out of other data centers to ensure a seamless continuation of service.

From a sledgehammer to a scalpel

With Traffic Predictor, Traffic Manager can dynamically advertise and withdraw prefixes while ensuring that every datacenter can handle all the traffic. But withdrawing prefixes as a means of traffic management can be a bit heavy-handed at times. One of the reasons for this is that the only way we had to add or remove traffic to a data center was through advertising routes from our Internet-facing routers. Each one of our routes has thousands of IP addresses, so removing only one still represents a large portion of traffic.

Specifically, Internet applications will advertise prefixes to the Internet from a /24 subnet at an absolute minimum, but many will advertise prefixes larger than that. This is generally done to prevent things like route leaks or route hijacks: many providers will actually filter out routes that are more specific than a /24 (for more information on that, check out this blog here). If we assume that Cloudflare maps protected properties to IP addresses at a 1:1 ratio, then each /24 subnet would be able to service 256 customers, which is the number of IP addresses in a /24 subnet. If every IP address sent one request per second, we’d have to move 4 /24 subnets out of a data center if we needed to move 1,000 requests per second (RPS).

But in reality, Cloudflare maps a single IP address to hundreds of thousands of protected properties. So for Cloudflare, a /24 might take 3,000 requests per second, but if we needed to move 1,000 RPS out, we would have no choice but to move a single /24 out. And that’s just assuming we advertise at a /24 level. If we used /20s to advertise, the amount we can withdraw gets less granular: at a 1:1 website to IP address mapping, that’s 4,096 requests per second for each prefix, and even more if the website to IP address mapping is many to one.

While withdrawing prefix advertisements improved the customer experience for those users who would have seen a 499 or 500 error — there may have been a significant portion of users who wouldn’t have been impacted by an issue who still were moved away from the data center they should have gone to, probably slowing them down, even if only a little bit. This concept of moving more traffic out than is necessary is called “stranding capacity”: the data center is theoretically able to service more users in a region but cannot because of how Traffic Manager was built.

We wanted to improve Traffic Manager so that it only moved the absolute minimum of users out of a data center that was seeing a problem and not strand any more capacity. To do so, we needed to shift percentages of prefixes, so we could be extra fine-grained and only move the things that absolutely need to be moved. To solve this, we built an extension of our Layer 4 load balancer Unimog, which we call Plurimog.

A quick refresher on Unimog and layer 4 load balancing: every single one of our machines contains a service that determines whether that machine can take a user request. If the machine can take a user request then it sends the request to our HTTP stack which processes the request before returning it to the user. If the machine can’t take the request, the machine sends the request to another machine in the data center that can. The machines can do this because they are constantly talking to each other to understand whether they can serve requests for users.

Plurimog does the same thing, but instead of talking between machines, Plurimog talks in between data centers and points of presence. If a request goes into Philadelphia and Philadelphia is unable to take the request, Plurimog will forward to another data center that can take the request, like Ashburn, where the request is decrypted and processed. Because Plurimog operates at layer 4, it can send individual TCP or UDP requests to other places which allows it to be very fine-grained: it can send percentages of traffic to other data centers very easily, meaning that we only need to send away enough traffic to ensure that everyone can be served as fast as possible. Check out how that works in our Frankfurt data center, as we are able to shift progressively more and more traffic away to handle issues in our data centers. This chart shows the number of actions taken on free traffic that cause it to be sent out of Frankfurt over time:

But even within a data center, we can route traffic around to prevent traffic from leaving the datacenter at all. Our large data centers, called Multi-Colo Points of Presence (MCPs) contain logical sections of compute within a data center that are distinct from one another. These MCP data centers are enabled with another version of Unimog called Duomog, which allows for traffic to be shifted between logical sections of compute automatically. This makes MCP data centers fault-tolerant without sacrificing performance for our customers, and allows Traffic Manager to work within a data center as well as between data centers.

When evaluating portions of requests to move, Traffic Manager does the following:

Traffic Manager identifies the proportion of requests that need to be removed from a data center or subsection of a data center so that all requests can be served.
Traffic Manager then calculates the aggregated space metrics for each target to see how many requests each failover data center can take.
Traffic Manager then identifies how much traffic in each plan we need to move, and moves either a proportion of the plan, or all of the plan through Plurimog/Duomog, until we've moved enough traffic. We move Free customers first, and if there are no more Free customers in a data center, we'll move Pro, and then Business customers if needed.

For example, let’s look at Ashburn, Virginia: one of our MCPs. Ashburn has nine different subsections of capacity that can each take traffic. On 8/28, one of those subsections, IAD02, had an issue that reduced the amount of traffic it could handle.

During this time period, Duomog sent more traffic from IAD02 to other subsections within Ashburn, ensuring that Ashburn was always online, and that performance was not impacted during this issue. Then, once IAD02 was able to take traffic again, Duomog shifted traffic back automatically. You can see these actions visualized in the time series graph below, which tracks the percentage of traffic moved over time between subsections of capacity within IAD02 (shown in green):

How does Traffic Manager know how much to move?

Although we used requests per second in the example above, using requests per second as a metric isn’t accurate enough when actually moving traffic. The reason for this is that different customers have different resource costs to our service; a website served mainly from cache with the WAF deactivated is much cheaper CPU wise than a site with all WAF rules enabled and caching disabled. So we record the time that each request takes in the CPU. We can then aggregate the CPU time across each plan to find the CPU time usage per plan. We record the CPU time in ms, and take a per second value, resulting in a unit of milliseconds per second.

CPU time is an important metric because of the impact it can have on latency and customer performance. As an example, consider the time it takes for an eyeball request to make it entirely through the Cloudflare front line servers: we call this the cfcheck latency. If this number goes too high, then our customers will start to notice, and they will have a bad experience. When cfcheck latency gets high, it’s usually because CPU utilization is high. The graph below shows 95th percentile cfcheck latency plotted against CPU utilization across all the machines in the same data center, and you can see the strong correlation:

So having Traffic Manager look at CPU time in a data center is a very good way to ensure that we’re giving customers the best experience and not causing problems.

After getting the CPU time per plan, we need to figure out how much of that CPU time to move to other data centers. To do this, we aggregate the CPU utilization across all servers to give a single CPU utilization across the data center. If a proportion of servers in the data center fail, due to network device failure, power failure, etc., then the requests that were hitting those servers are automatically routed elsewhere within the data center by Duomog. As the number of servers decrease, the overall CPU utilization of the data center increases. Traffic Manager has three thresholds for each data center; the maximum threshold, the target threshold, and the acceptable threshold:

Maximum: the CPU level at which performance starts to degrade, where Traffic Manager will take action
Target: the level to which Traffic Manager will try to reduce the CPU utilization to restore optimal service to users
Acceptable: the level below which a data center can receive requests forwarded from another data center, or revert active moves

When a data center goes above the maximum threshold, Traffic Manager takes the ratio of total CPU time across all plans to current CPU utilization, then applies that to the target CPU utilization to find the target CPU time. Doing it this way means we can compare a data center with 100 servers to a data center with 10 servers, without having to worry about the number of servers in each data center. This assumes that load increases linearly, which is close enough to true for the assumption to be valid for our purposes.

Target ratio equals current ratio:

Therefore:

Subtracting the target CPU time from the current CPU time gives us the CPU time to move:

For example, if the current CPU utilization was at 90% across the data center, the target was 85%, and the CPU time across all plans was 18,000, we would have:

This would mean Traffic Manager would need to move 1,000 CPU time:

Now we know the total CPU time needed to move, we can go through the plans, until the required time to move has been met.

What is the maximum threshold?

A frequent problem that we faced was determining at which point Traffic Manager should start taking action in a data center – what metric should it watch, and what is an acceptable level?

As said before, different services have different requirements in terms of CPU utilization, and there are many cases of data centers that have very different utilization patterns.

To solve this problem, we turned to machine learning. We created a service that will automatically adjust the maximum thresholds for each data center according to customer-facing indicators. For our main service-level indicator (SLI), we decided to use the cfcheck latency metric we described earlier.

But we also need to define a service-level objective (SLO) in order for our machine learning application to be able to adjust the threshold. We set the SLO for 20ms. Comparing our SLO to our SLI, our 95th percentile cfcheck latency should never go above 20ms and if it does, we need to do something. The below graph shows 95th percentile cfcheck latency over time, and customers start to get unhappy when cfcheck latency goes into the red zone:

If customers have a bad experience when CPU gets too high, then the goal of Traffic Manager’s maximum thresholds are to ensure that customer performance isn’t impacted and to start redirecting traffic away before performance starts to degrade. At a scheduled interval the Traffic Manager service will fetch a number of metrics for each data center and apply a series of machine learning algorithms. After cleaning the data for outliers we apply a simple quadratic curve fit, and we are currently testing a linear regression algorithm.

After fitting the models we can use them to predict the CPU usage when the SLI is equal to our SLO, and then use it as our maximum threshold. If we plot the cpu values against the SLI we can see clearly why these methods work so well for our data centers, as you can see for Barcelona in the graphs below, which are plotted against curve fit and linear regression fit respectively.

In these charts the vertical line is the SLO, and the intersection of this line with the fitted model represents the value that will be used as the maximum threshold. This model has proved to be very accurate, and we are able to significantly reduce the SLO breaches. Let’s take a look at when we started deploying this service in Lisbon:

Before the change, cfcheck latency was constantly spiking, but Traffic Manager wasn’t taking actions because the maximum threshold was static. But after July 29, we see that cfcheck latency has never hit the SLO because we are constantly measuring to make sure that customers are never impacted by CPU increases.

Where to send the traffic?

So now that we have a maximum threshold, we need to find the third CPU utilization threshold which isn’t used when calculating how much traffic to move – the acceptable threshold. When a data center is below this threshold, it has unused capacity which, as long as it isn’t forwarding traffic itself, is made available for other data centers to use when required. To work out how much each data center is able to receive, we use the same methodology as above, substituting target for acceptable:

Therefore:

Subtracting the current CPU time from the acceptable CPU time gives us the amount of CPU time that a data center could accept:

To find where to send traffic, Traffic Manager will find the available CPU time in all data centers, then it will order them by latency from the data center needing to move traffic. It moves through each of the data centers, using all available capacity based on the maximum thresholds before moving onto the next. When finding which plans to move, we move from the lowest priority plan to highest, but when finding where to send them, we move in the opposite direction.

To make this clearer let's use an example:

We need to move 1,000 CPU time from data center A, and we have the following usage per plan: Free: 500ms/s, Pro: 400ms/s, Business: 200ms/s, Enterprise: 1000ms/s.

We would move 100% of Free (500ms/s), 100% of Pro (400ms/s), 50% of Business (100ms/s), and 0% of Enterprise.

Nearby data centers have the following available CPU time: B: 300ms/s, C: 300ms/s, D: 1,000ms/s.

With latencies: A-B: 100ms, A-C: 110ms, A-D: 120ms.

Starting with the lowest latency and highest priority plan that requires action, we would be able to move all the Business CPU time to data center B and half of Pro. Next we would move onto data center C, and be able to move the rest of Pro, and 20% of Free. The rest of Free could then be forwarded to data center D. Resulting in the following action: Business: 50% → B, Pro: 50% → B, 50% → C, Free: 20% → C, 80% → D.

Reverting actions

In the same way that Traffic Manager is constantly looking to keep data centers from going above the threshold, it is also looking to bring any forwarded traffic back into a data center that is actively forwarding traffic.

Above we saw how Traffic Manager works out how much traffic a data center is able to receive from another data center — it calls this the available CPU time. When there is an active move we use this available CPU time to bring back traffic to the data center — we always prioritize reverting an active move over accepting traffic from another data center.

When you put this all together, you get a system that is constantly measuring system and customer health metrics for every data center and spreading traffic around to make sure that each request can be served given the current state of our network. When we put all of these moves between data centers on a map, it looks something like this, a map of all Traffic Manager moves for a period of one hour. This map doesn’t show our full data center deployment, but it does show the data centers that are sending or receiving moved traffic during this period:

Data centers in red or yellow are under load and shifting traffic to other data centers until they become green, which means that all metrics are showing as healthy. The size of the circles represent how many requests are shifted from or to those data centers. Where the traffic is going is denoted by where the lines are moving. This is difficult to see at a world scale, so let’s zoom into the United States to see this in action for the same time period:

Here you can see Toronto, Detroit, New York, and Kansas City are unable to serve some requests due to hardware issues, so they will send those requests to Dallas, Chicago, and Ashburn until equilibrium is restored for users and data centers. Once data centers like Detroit are able to service all the requests they are receiving without needing to send traffic away, Detroit will gradually stop forwarding requests to Chicago until any issues in the data center are completely resolved, at which point it will no longer be forwarding anything. Throughout all of this, end users are online and are not impacted by any physical issues that may be happening in Detroit or any of the other locations sending traffic.

Happy network, happy products

Because Traffic Manager is plugged into the user experience, it is a fundamental component of the Cloudflare network: it keeps our products online and ensures that they’re as fast and reliable as they can be. It’s our real time load balancer, helping to keep our products fast by only shifting necessary traffic away from data centers that are having issues. Because less traffic gets moved, our products and services stay fast.

But Traffic Manager can also help keep our products online and reliable because they allow our products to predict where reliability issues may occur and preemptively move the products elsewhere. For example, Browser Isolation directly works with Traffic Manager to help ensure the uptime of the product. When you connect to a Cloudflare data center to create a hosted browser instance, Browser Isolation first asks Traffic Manager if the data center has enough capacity to run the instance locally, and if so, the instance is created right then and there. If there isn’t sufficient capacity available, Traffic Manager tells Browser Isolation which the closest data center with sufficient available capacity is, thereby helping Browser Isolation to provide the best possible experience for the user.

Happy network, happy users

At Cloudflare, we operate this huge network to service all of our different products and customer scenarios. We’ve built this network for resiliency: in addition to our MCP locations designed to reduce impact from a single failure, we are constantly shifting traffic around on our network in response to internal and external issues.

But that is our problem — not yours.

Similarly, when human beings had to fix those issues, it was customers and end users who would be impacted. To ensure that you’re always online, we’ve built a smart system that detects our hardware failures and preemptively balances traffic across our network to ensure it’s online and as fast as possible. This system works faster than any person — not only allowing our network engineers to sleep at night — but also providing a better, faster experience for all of our customers.

And finally: if these kinds of engineering challenges sound exciting to you, then please consider checking out the Traffic Engineering team's open position on Cloudflare’s Careers page!

Cloudflare’s global network grows to 300 cities and ever closer to end users with connections to 12,000 networks

2023-06-19 Damian Matacz

Post Syndicated from Damian Matacz original http://blog.cloudflare.com/cloudflare-connected-in-over-300-cities/

Cloudflare's global network grows to 300 cities and ever closer to end users with connections to 12,000 networks

We make no secret about how passionate we are about building a world-class global network to deliver the best possible experience for our customers. This means an unwavering and continual dedication to always improving the breadth (number of cities) and depth (number of interconnects) of our network.

This is why we are pleased to announce that Cloudflare is now connected to over 12,000 Internet networks in over 300 cities around the world!

The Cloudflare global network runs every service in every data center so your users have a consistent experience everywhere—whether you are in Reykjavík, Guam or in the vicinity of any of the 300 cities where Cloudflare lives. This means all customer traffic is processed at the data center closest to its source, with no backhauling or performance tradeoffs.

Having Cloudflare’s network present in hundreds of cities globally is critical to providing new and more convenient ways to serve our customers and their customers. However, the breadth of our infrastructure network provides other critical purposes. Let’s take a closer look at the reasons we build and the real world impact we’ve seen to customer experience:

Reduce latency

Our network allows us to sit approximately 50 ms from 95% of the Internet-connected population globally. Nevertheless, we are constantly reviewing network performance metrics and working with local regional Internet service providers to ensure we focus on growing underserved markets where we can add value and improve performance. So far, in 2023 we’ve already added 12 new cities to bring our network to over 300 cities spanning 122 unique countries!

City
Albuquerque, New Mexico, US
Austin, Texas, US
Bangor, Maine, US
Campos dos Goytacazes, Brazil
Fukuoka, Japan
Kingston, Jamaica
Kinshasa, Democratic Republic of the Congo
Lyon, France
Oran, Algeria
São José dos Campos, Brazil
Stuttgart, Germany
Vitoria, Brazil

In May, we activated a new data center in Campos dos Goytacazes, Brazil, where we interconnected with a regional network provider, serving 100+ local ISPs. While it's not too far from Rio de Janeiro (270km) it still cut our 50th and 75th percentile latency measured from the TCP handshake between Cloudflare's servers and the user's device in half and provided a noticeable performance improvement!

Improve interconnections

A larger number of local interconnections facilitates direct connections between network providers, content delivery networks, and regional Internet Service Providers. These interconnections enable faster and more efficient data exchange, content delivery, and collaboration between networks.

Currently there are approximately 74,000¹ AS numbers routed globally. An Autonomous System (AS) number is a unique number allocated per ISP, enterprise, cloud, or similar network that maintains Internet routing capabilities using BGP. Of these approximate 74,000 ASNs, 43,000² of them are stub ASNs, or only connected to one other network. These are often enterprise, or internal use ASNs, that only connect to their own ISP or internal network, but not with other networks.

It’s mind blowing to consider that Cloudflare is directly connected to 12,372 unique Internet networks, or approximately 1/3rd of the possible networks to connect globally! This direct connectivity builds resilience and enables performance, making sure there are multiple places to connect between networks, ISPs, and enterprises, but also making sure those connections are as fast as possible.

A previous example of this was shown as we started connecting more locally. As seen in this blog post the local connections even increased how much our network was being used: better performance drives further usage!

At Cloudflare we ensure that infrastructure expansion strategically aligns to building in markets where we can interconnect deeper, because increasing our network breadth is only as valuable as the number of local interconnections that it enables. For example, we recently connected to a local ISP (representing a new ASN connection) in Pakistan, where the 50th percentile improved from ~90ms to 5ms!

Build resilience

Network expansion may be driven by reducing latency and improving interconnections, but it’s equally valuable to our existing network infrastructure. Increasing our geographic reach strengthens our redundancy, localizes failover and helps further distribute compute workload resulting in more effective capacity management. This improved resilience reduces the risk of service disruptions and ensures network availability even in the event of hardware failures, natural disasters, or other unforeseen circumstances. It enhances reliability and prevents single points of failure in the network architecture.

Ultimately, our commitment to strategically expanding the breadth and depth of our network delivers improved latency, stronger interconnections and a more resilient architecture – all critical components of a better Internet! If you’re a network operator, and are interested in how, together, we can deliver an improved user experience, we’re here to help! Please check out our Edge Partner Program and let’s get connected.

……..
¹CIDR Report
²Origin ASs announced via a single AS path

Making peering easy with the new Cloudflare Peering Portal

2022-10-19 David Tuber

Post Syndicated from David Tuber original https://blog.cloudflare.com/making-peering-easy-with-the-new-cloudflare-peering-portal/

Making peering easy with the new Cloudflare Peering Portal

In 2018, we launched the Cloudflare Peering Portal, which allows network operators to see where your traffic is coming from and to identify the best possible places to interconnect with Cloudflare. We’re excited to announce that we’ve made it even easier to interconnect with Cloudflare through this portal by removing Cloudflare-specific logins and allowing users to request sessions in the portal itself!

We’re going to walk through the changes we’ve made to make peering easier, but before we do that, let’s talk a little about peering: what it is, why it’s important, and how Cloudflare is making peering easier.

What is peering and why is it important?

Put succinctly, peering is the act of connecting two networks together. If networks are like towns, peering is the bridges, highways, and streets that connect the networks together. There are lots of different ways to connect networks together, but when networks connect, traffic between them flows to their destination faster. The reason for this is that peering reduces the number of Border Gateway Protocol (BGP) hops between networks.

What is BGP?

For a quick refresher, Border Gateway Protocol (or BGP for short) is a protocol that propagates instructions on how networks should forward packets so that traffic can get from its origin to its destination. BGP provides packets instructions on how to get from one network to another by indicating which networks the packets need to go through to get to the destination, prioritizing the paths with the smallest number of hops between origin and destination. BGP sees networks as Autonomous Systems (AS), and each AS has its own number. For example, Cloudflare’s ASN is 13335.

In the example below, AS 1 is trying to send packets to AS 3, but there are two possible paths the packets can go:

The BGP decision algorithm will select the path with the least number of hops, meaning that the path the packets will take is AS 1 → AS 2 → AS 3.

When two networks peer with each other, the number of networks needed to connect AS 1 and AS 3 is reduced to one, because AS 1 and AS 3 are directly connected with each other. But “connecting with” another network can be kind of vague, so let’s be more specific. In general, there are three ways that networks can connect with other networks: directly through Private Network Interconnects (PNI), at Internet Exchanges (IX), and through transit networks that connect with many networks.

Private Network Interconnect

A private network interconnect (PNI) sounds complicated, but it’s really simple at its core: it’s a cable connecting two networks to each other. If networks are in the same datacenter facility, it’s often easy for these two networks to connect and by doing so over a private connection, they can get dedicated bandwidth to each other as well as reliable uptime. Cloudflare has a product called Cloudflare Network Interconnect (CNI) that allows other networks to directly connect their networks to Cloudflare in this way.

Internet Exchanges

An Internet exchange (IX) is a building that specifically houses many networks in the same place. Each network gets one or more ports, and plugs into what is essentially a shared switch so that every network has the potential to interconnect. These switches are massive and house hundreds, even thousands of networks. This is similar to a PNI, but instead of two networks directly connecting to each other, thousands of networks connect to the same device and establish BGP sessions through that device.

At Internet Exchanges, traffic is generally exchanged between networks free of charge, and it’s a great way to interconnect a network with other networks and save money on bandwidth between networks.

Transit networks

Transit networks are networks that are specifically designed to carry packets between other networks. These networks peer at Internet Exchanges and directly with many other networks to provide connectivity for your network without having to get PNIs or IX presence with networks. This service comes at a price, and may impact network performance as the transit network is an intermediary hop between your network and the place your traffic is trying to reach. Transit networks aren’t peering, but they do peering on your behalf.

No matter how you may decide to connect your network to Cloudflare, we have an open peering policy, and strongly encourage you to connect your networks directly to Cloudflare. If you’re interested, you can get started by going through the Cloudflare Peering Portal, which has now been made even easier. But let’s take a second to talk about why peering is so important.

Why is peering important?

Peering is important on the Internet for three reasons: it distributes traffic across many networks reducing single points of failure of the Internet, it often reduces bandwidth prices on the Internet making overall costs lower, and it improves performance by removing network hops. Let’s talk about each of those benefits.

Peering improves the overall uptime of the Internet by distributing traffic across multiple networks, meaning that if one network goes down, traffic from your network will still be able to reach your users. Compare that to connecting to a transit network: if the transit network has an issue, your network will be unreachable because that network was the only thing connecting your network to the rest of the Internet (unless you decide to pay multiple transit providers). With peering, any individual network failure will not completely impact the ability for your users to reach your network.

Peering helps reduce your network bandwidth costs because it distributes the cost you pay an Internet Exchange for a port across all the networks you interconnect with at the IX. If you’re paying \$1000/month for a port at an IX, and you’re peered with 100 networks there, you’re effectively paying \$10/network, as opposed to paying \$1000/month to connect to one transit network. Furthermore, many networks including Cloudflare have open peering policies and settlement free peering, which means we don’t charge you to send traffic to us or the other way round, making peering even more economical.

Peering also improves performance for Internet traffic by bringing networks closer together, reducing the time it takes for a packet to go from one network to another. The more two networks peer with each other, the more physical places on the planet they can exchange traffic directly, meaning that users everywhere see better performance.

Here’s an example. Janine is trying to order food from Acme Food Services, a site protected by Cloudflare. She lives in Boston and connects to Cloudflare via her ISP. Acme Food Services has their origin in Boston as well, so for Janine to see the fastest performance, her ISP should connect to Cloudflare in Boston and then Cloudflare should route her traffic directly to the Acme origin in Boston. Unfortunately for Janine, her ISP doesn’t peer with Cloudflare in Boston, but instead peers with Cloudflare in New York: meaning that when Janine connects to Acme, her traffic is going through her ISP to New York before it reaches Cloudflare, and then all the way back to Boston to the Acme origins!

But with proper peering, we can ensure that traffic is routed over the fastest possible path to ensure Janine connects to Cloudflare in Boston and everything stays local:

Fortunately for Janine, Cloudflare peers with over 10,000 networks in the world in over 275 locations, so high latency on the network is rare. And every time a new network peers with us, we help make user traffic even faster. So now let’s talk about how we’ve made peering even easier.

Cloudflare, along with many other networks, rely on PeeringDB as a source of truth for which networks are present on the Internet. PeeringDB is a community-maintained database of all the networks that are present on the Internet and what datacenter facilities and IXs they are present at, as well as what IPs are used for peering at each public location. Many networks, including Cloudflare, require you to have an account on PeeringDB before you can initiate a peering session with their network.

You can now use that same PeeringDB account to log into the Cloudflare Peering Portal directly, saving you the need to make a specific Cloudflare Peering Portal account.

When you log into the Cloudflare Peering Portal, simply click on the PeeringDB login button and enter your PeeringDB credentials. Cloudflare will then use this login information to determine what networks you are responsible for and automatically load data for those networks.

From here you can see all the places your network exchanges traffic with Cloudflare. You can see all the places you currently have direct peering with us, as well as locations for potential peering: places you could peer with us but currently don’t. Wouldn’t it be great if you could just click a button and configure a peering session with Cloudflare directly from that view? Well now you can!

Requesting sessions in the Peering Portal

Starting today, you can now request peering sessions with Cloudflare at Internet Exchanges right from the peering portal, making it even easier to get connected with Cloudflare. When you’re looking at potential peering sessions in the portal, you’ll now see a button that will allow you to verify your peering information is correct, and if it is to proceed with a peering request:

Once you click that button, a ticket will go immediately to our network team to configure a peering session with you using the details already provided in PeeringDB. Our network team looks at whether we already have existing connections with your network at that location, and also what the impact to your Internet traffic will be if we peer with you there. Once we’ve evaluated these variables, we’ll proceed to establish a BGP session with you at the location and inform you via email that you’ve already provided via PeeringDB. Then all you have to do is accept the BGP sessions, and you’ll be exchanging traffic with Cloudflare!

Peer with Cloudflare today!

It has never been easier to peer with Cloudflare, and our simplified peering portal will make it even easier to get connected. Visit our peering portal today and get started on the path of faster, cheaper connectivity to Cloudflare!

Noise

Tag Archives: Interconnection