What is a CDN and why do companies use them?
Web agency » Digital news » What is a reverse proxy and how does it work?

What is a reverse proxy and how does it work?

Reverse proxies are a useful tool in any system administrator's toolbox. They have many uses including load balancing, protection against DDOS attacks

What are reverse proxies?

A standard proxy, called Forward Proxy, is a server through which a user's connection is routed. In many ways, it's like a simple VPN, which sits in front of your internet connection. VPNs are a common example, but they also include things like school firewalls, which can block access to certain content.

A reverse proxy works a little differently. It is a backend tool used by system administrators. Instead of connecting directly to a website serving content, a reverse proxy like NGINX can sit in the middle. When it receives a request from a user, it forwards or “proxys” that request to the end server. This server is called the "origin server" because it is the one that will actually respond to requests.

While a user will likely know if they are being routed through a forward proxy like a VPN or firewall, reverse proxies are primary tools. As far as the user knows, he is simply logging into a website. Everything behind the reverse proxy is hidden, which also has many advantages.

This effect also occurs in the opposite direction. The origin server has no direct connection to the user and will only see a bunch of requests from the reverse proxy IP. This can be a problem, but most proxy services like NGINX add headers like X-Forwarded-For on demand. These headers will inform the origin server of the client's real IP address.

What are reverse proxies used for?

Reverse proxies are quite simple in concept but turn out to be a surprisingly useful tool with many unexpected use cases.

Load balancing

One of the main advantages of a reverse proxy is its lightness. Since they're just forwarding the requests, they don't have to do a ton of processing, especially in situations where a database needs to be queried.

This means the bottleneck is often the origin server, but with a reverse proxy in front you can easily have multiple origin servers. For example, the proxy could send 50% of requests to one server and 50% to another, doubling the capacity of the website. Services like HAProxy are designed to handle this well.

This is a very common use case, and most cloud providers like Amazon Web Services (AWS) offer load balancing as a service, so you don't have to configure it yourself. -same. With cloud automation, you can even automatically increase the number of origin servers in response to traffic, a feature called auto-scaling.

Load balancers like AWS's Elastic Load Balancer can be configured to automatically reconfigure as your origin servers go up and down, all made possible by a reverse proxy under the hood.

Caching

Since a reverse proxy often responds much faster than the origin server, a technique called caching is commonly used to speed up requests on common routes. Caching occurs when page data is stored on the reverse proxy and is only requested from the origin server once every few seconds/minutes. This greatly reduces the pressure on the origin server.

For example, this article you are currently reading was served by WordPress, which needs to communicate with an SQL database to retrieve the article's content and metadata. Doing this for every page refresh is wasteful since the page doesn't really change. So this route can be cached and the reverse proxy will just send the last response back to the next user, rather than messing with WordPress again.

A dedicated network of reverse proxies that cache your content is called a content delivery network, or CDN. CDNs like CloudFlare or Fastly are very commonly used by large websites to speed up global delivery. Servers around the world that cache content are called “edge nodes,” and having a lot of them can make your website very eye-catching.

Network protection and privacy

Since the user does not know what is behind the reverse proxy, they cannot easily attack your origin servers directly. In fact, reverse proxies are commonly used with origin servers in private subnets, meaning they have no inbound connection to the outside internet.

This keeps your network configuration private, and while security through obscurity is never foolproof, it's better than leaving it open to attack.

This inherent trust can also be useful when planning your network. For example, an API server that communicates with a database is similar to a reverse proxy. The database knows it can trust the API server in the private subnet, and the API server acts as a firewall for the database, allowing only good connections through that -this.

Configurable interface

One of the advantages of reverse proxies such as NGINX is their degree of configuration. Often it is useful to have them in front of other services just to configure how users access those services.

For example, NGINX is able to limit requests to certain routes, which can prevent abusers from making thousands of requests to origin servers from a single IP address. It doesn't stop DDOS attacks, but it's nice to have.

NGINX is also capable of forwarding traffic from multiple domain names with configurable "server" blocks. For example, it could send requests to example.com to your origin server, but send api.example.com to your special API server, or files.example.com to your file storage, and so on. Each server can have its own configuration and its own rules.

NGINX is also capable of adding additional functionality to existing origin servers, such as centralized HTTPS certificates and header configuration.

Sometimes it's just useful to have NGINX on the same machine as another local service, just to serve that service's content. For example, ASP.NET Web APIs use an internal web server called Kestrel, which responds well to requests, but not much else. It's very common to run Kestrel on a private port and use NGINX as a configurable reverse proxy.

Centralized logging

This one is pretty straightforward, but having most of your traffic go through a single service makes it easy to check the logs. NGINX's access log contains a lot of useful information about your traffic, and while it doesn't surpass the capabilities of a service like Google Analytics, it's great information to have.

★ ★ ★ ★ ★