Modern Red Team Infrastructure
There’s been a lot of talk recently regarding modern strategies for red team infrastructure. The implementations vary greatly, but hopefully, we can provide some insight into how Silent Break, in particular, tackles the challenge of Command and Control. This is a reflection of many mistakes made, late nights in documentation, current active research, and real-world solutions we use every day.
A Touch of Theory
Before chasing any optimal solution, we first need to clearly state our goals. The particulars will always be unique to each operation, but we feel like the following is a good generalization for red teaming.
- Provide reliable communications to tooling
- Obfuscate and masquerade to sufficiently avoid detection
- Secure operations data against tampering and extraction
- Maximize action throughput
Some might be familiar with a popular adage from car culture “Fast, Reliable, Cheap. Pick two.” These goals conflict with each other in various vectors, making our problem… well… a problem. For instance:
- Maximum throughput might rely on unreliable protocols or systems
- Securing data might involve latent traffic paths
- Sufficiently masquerading might reduce protocol options
You’ll also note that the priority for our goals varies, dependent on each phase of an operation. At some points, we might prioritize stealth, throughput, or reliability. It is with this insight that we reach the first conclusion: “Break your infrastructure into categories based on purpose”. We preach this heavily in our Dark Side Ops courses and it might be old to some, but it continues to ring true. Our particular categories look something like this:
- Stage 0 (Staging) – Phishing and initial code execution
- Stage 1 (Persistence) – Maintaining access to an environment
- Stage 2 (Interactive) – Active exploitation, enumeration, and escalation
- Stage 3 (Exfiltration) – Data extraction and impact
This segmentation can be used to optimize everything about infrastructure: traffic strategies, code design, protocol selection, etc. The particulars of which are outside the scope for this post.
Segmentation also provides inherent resiliency, allowing for the isolated compromise of infrastructure without affecting the system as a whole. We don’t want the discovery of any single C2 domain, payload, or toolkit to compromise the entire operation.
Like any good systems engineer, we must assume that disruption will occur during every stage. Your domains will get burned, your servers will go down, your callbacks will get blocked. The infrastructure design for each phase should be robust, including fallbacks, checks, and recoveries. Generally having “more” of everything (with sophistication) is a good start: multiple domains, multiple protocols, multiple tools, etc. All of this comes with the real challenge of actually designing the system, but always let your core goals be the guide.
Bastions
Nowadays most offensive infrastructure involves routing traffic from a remote endpoint under (or partially under) your control. This could be a VPS, serverless endpoint, or compromised web host. We like to call these assets bastions.
The best practice is to use a bastion solely for capturing traffic and never storing data directly, however, each team will need to make their own security assessment on external infrastructure endpoints. We have an internal network boundary that we prefer all data to reach before being decrypted, handled, or stored. Here we trade the complexities of traffic routing for better security.
We also have many bastions spread across providers and regions. This diversification is a requirement for the stage segmentation we mentioned earlier, but also helps provide the resiliency we need. This means multiple providers, servers, CDNs, domains, etc. We like to keep everything tracked in a centralized wiki for easy reference.
DevOps
We started looking at DevOps solutions, as anyone would, once our asset count became pretty unwieldy. Initially, we traveled the Ansible road, creating a centralized management server for spinning up all of our different infrastructures. We preferred Ansible primarily for its agentless capability, requiring only SSH access for management. It performs tooling installations, routing configurations, certificate generation, hardening, dependency installations, etc. We’ve seen a lot of success there, taking our VPS spin up time from a couple of hours to 10-15 minutes. This success is echoed by others in our industry who perform similar management with their solution of choice (Chef, Puppet, Terraform, etc.).
However, we have never been Ansible experts and constantly pay small prices for our sub-optimal implementation. We’ve also seen less need for a full DevOps tool as we’ve transitioned to simplified bastions with fewer requirements. With that said, DevOps in some form is an absolute must for sufficiently diverse infrastructure. In fact, if you don’t feel like you need automation to manage your infrastructure, you probably need more.
Tunneling
With your army of bastions configured, the traffic routing itself is really a preferential item. The more popular options include:
- Socat – Easy-to-use, but higher in the OS than we’d prefer. It can also get complicated when trying to support lots of ports simultaneously.
- IPTables – Tricky to configure correctly, but very powerful with almost no dependencies.
- Reverse Proxy – Provides advanced selective routing in layer 7, but requires software, certificates, and configuration on the endpoint.
For our bastions, we like to use IPTables. This is partly because we like to consider our bastions the simplest extension of our internal operations network. Anything higher than layer 4 tends to complicate configurations and introduce new dependencies. With that said, we also use a reverse proxy on our front plane in something like a two-phase tunnel setup. Our setup currently looks something like this:
We have a traffic collector in our operations network which creates OpenVPN connections to publicly accessible bastions. Using these tunnels + IPTables, traffic is routed to the traffic collector from public domains/IPs to unique, internal local tunnel interfaces. We use this to create a mapping between public IP addresses (1.2.3.4) and their tunnel IP equivalent (10.8.1.42) on the collector. An alternative here would be to use NAT and port mappings to reduce static complexity, but then it’s difficult to generically handle traffic for every possible destination port (which we like to do).
During this process, we also perform route manipulation to avoid source translation (SNAT) which IPTables would typically use in this scenario. For protocols which don’t support options like X-Forwarded-For, we need to keep track of the client IPs for the purpose of filtering, tracking, etc. The industry has an alternative solution for this problem called the PROXY protocol, which allows you to wrap layer 4 traffic and maintain original packet information during manipulation and routing. We discovered this solution after we had begun our construction, and ultimately decided against a redesign.
Here is an example rule to DNAT inbound traffic down a VPN tunnel:
iptables -t nat -A PREROUTING -d 1.2.3.4 -j DNAT --to-destination 10.8.1.42
In order to get the return traffic back out the correct interface, we might also need to do some policy-based routing (assuming you have multiple public interfaces):
ip route add default via 1.2.3.4 dev eth1 table 100 ip route add 10.8.1.0/24 dev tun0 table 100 ip rule add from 10.8.1.42 lookup 100
This causes any traffic returning from our traffic collector to use a new routing table (100). In this new table, we add a default route for the preferred public IP and knowledge of our VPN network. We would have an individual table for every public interface handling traffic (like an EC2 instance with multiple ENIs). This creates a mapping between every external IP on our bastion with an internal IP at the other end of the OpenVPN tunnel.
Nginx Front Plane
With our tunneling finished, we can now get all of our traffic headed into a single server behind our perimeter. This server could be a “global tooling server”, but given the diversity of our toolkit and the complexity of configuration, this is a bit infeasible. Instead, we will implement the second phase in our traffic routing, the traffic collector. This isn’t much different than any other front plane you’d see and we settled on Nginx to do this work for a few different reasons:
- Nginx is well supported and provides many plugins, features, options, and projects.
- Nginx is one of the only layer 7 proxies which can also perform layer 4 (TCP/UDP) streaming. This is super important for tooling using things like DNS tunneling or obscure layer 7 protocols.
- Nginx supports a “transparent proxy” mode allowing you to forward traffic without SNAT, which we are still trying to avoid.
We also considered Envoy for this job for anyone curious, but the project was still fairly new. Before we dive further, let’s introduce our particular goals for configuring an Nginx traffic collector:
- Callbacks must be routed appropriately.
- Domains must have legitimate SSL certificates.
- Each domain must host legitimate HTML content.
- We can’t lose the true client IP information in transit.
There are two primary methods for rerouting traffic in Nginx: reverse proxying and the stream module. We will first cover the general advantages and disadvantages of both before moving into our particular implementation.
Reverse Proxy
A reverse proxy takes an HTTP request, optionally modifies or inspects various aspects of the data, and sends it on to the destined web server. We can use this layer 7 intelligence to:
- Inspect HTTP traffic on the fly and dynamically route traffic to different listening posts (LPs) or content servers.
- Simplify SSL management and use a single server for all of our certificates.
- Modify/Add HTTP headers and content on the fly to provide context to our backend servers.
The primary disadvantage of using a reverse proxy is the reliance on HTTP/S for function.
Stream Module
In simple terms, the Nginx stream module is basically port redirection, like socat, on steroids. Similar to reverse proxying, you can specify load balancing and failover hosts. We lose the sweet layer 7 intelligence and can’t qualify callbacks prior to forwarding. However, we can route traffic with arbitrary protocols provided they support TCP/UDP. This is a somewhat unique feature of Nginx with limited public discussion, but simple to configure and appears to work fine for our purposes.
Implementation
In our production setup, we use a little of both routing options. Let’s start with a basic configuration, and step through the feature set until we have everything we need. First, a basic reverse proxy configuration that routes to our tooling server:
http { server { listen 10.8.1.42:80; # Tunnel bind IP location / { proxy_set_header Host $host; proxy_set_header X-Forwarded-For $remote_addr; proxy_pass https://172.16.1.20; # Tooling server } } }
Next, we need to add SSL support. We like to break SSL at the Nginx server and only forward HTTP traffic. This helps simplify the configuration of our tooling servers.
http { server { listen 10.8.1.42:80; # Tunnel bind IP listen 10.8.1.42:443 ssl; ssl_certificate /etc/letsencrypt/live/domain.com/fullchain.pem; ssl_certificate_key /etc/letsencrypt/live/domain.com/privkey.pem; location / { proxy_set_header Host $host; proxy_set_header X-Forwarded-For $remote_addr; proxy_pass https://172.16.1.20; # Tooling server } } }
We typically collect our certificates using certbot:
certbot certonly -d mydomain.com –standalone
All traffic to a public IP is now routed to our tooling server, but we’d also like to route unnecessary web traffic to a separate content server. This helps stabilize our LPs and reduce passive load. To accomplish this, we identified two strategies:
- Temporarily forward traffic to our LP only when it is in use for interactive operations, data exfiltration, etc.
- Only forward traffic which is relevant to our tools (e.g. droppers, persistence, etc.)
For solution A, we can use the upstream pool mechanic in Nginx. Traffic will be forwarded to the primary upstream address (tooling server) when available, otherwise, it is routed to the backup address (content server). Nginx will automatically detect when the primary upstream is available again, and forward requests accordingly.
http { upstream pool-domain-com { server 172.16.1.10:80 backup; # Content Server server 172.16.1.20:80; # Tooling server } server { listen 10.8.1.42:80; # Tunnel bind IP listen 10.8.1.42:443 ssl; ssl_certificate /etc/letsencrypt/live/domain.com/fullchain.pem; ssl_certificate_key /etc/letsencrypt/live/domain.com/privkey.pem; location / { proxy_set_header Host $host; proxy_set_header X-Forwarded-For $remote_addr; proxy_pass https://pool-domain-com; } } }
To implement solution B, Nginx can be configured to route requests based on GET or POST variables, HTTP headers, or requested HTTP pages. As an example, the configuration below will reverse proxy requests to /API to the IP address 172.16.1.30. All other requests will be forwarded to the pool-domain-com upstream pool.
http { upstream pool-domain-com { server 172.16.1.10:80 backup; # Content Server server 172.16.1.20:80; # Tooling server } server { listen 10.8.1.42:80; # Tunnel bind IP listen 10.8.1.42:443 ssl; ssl_certificate /etc/letsencrypt/live/domain.com/fullchain.pem; ssl_certificate_key /etc/letsencrypt/live/domain.com/privkey.pem; location /api { # Additional tooling endpoint proxy_set_header Host $host; proxy_set_header X-Forwarded-For $remote_addr; proxy_pass https://172.16.1.30/; } location / { proxy_set_header Host $host; proxy_set_header X-Forwarded-For $remote_addr; proxy_pass https://pool-domain-com; } } }
Routing based on HTTP headers and URL parameters is a bit trickier. We won’t cover this in detail, but to point you in the right direction, check out the ‘if’ directive documentation within the rewrite module. Here is an example of a custom routing rule based on a specific user agent string. The $http_user_agent is a standard Nginx config variable. The same logic can be applied to other config variables. This documentation provides additional detail.
location / { if ($http_user_agent ~ MSIE) { proxy_set_header Host $host; proxy_set_header X-Forwarded-For $remote_addr; proxy_pass https://pool-domain-com; } }
Now that our reverse proxying is configured, we need to add support for things like DNS callbacks. We simply place a stream directive above our HTTP section with details about our bind ports and destinations. It’s worth noting that stream directives still support upstream pools and transparent binding just like the HTTP configuration. The following will redirect TCP 53 globally to 172.16.1.20.
stream { server { listen 53 udp; proxy_pass 172.16.1.20:53; # DNS LP } } http { ... }
Lessons Learned
We ran into the following error shortly after deploying the setup in production.
2099/01/02 21:08:47 [crit]: accept4() failed (24: Too many open files)
This is because Linux sets soft and hard limits on the number of open files a user can have. We simply needed to increase the limit at the OS level and within Nginx by modifying the configs as shown below.
Added to /etc/security/limits.conf
nginx soft nofile 200000 nginx hard nofile 200000
Added to /etc/nginx/nginx.conf
worker_rlimit_nofile 200000;
In addition to this, there are some other configuration items worth mentioning:
- Max request body size (client_max_body_size)
- GZip support (gzip)
- Keep alive timeouts (keepalive_timeout)
- SSL configuration (ssl_protocols)
Tooling Integration (Check In/Out)
With our front plane configured, we like to put some final work into tooling integrations. Ideally, we want a mechanism for temporarily capturing and releasing traffic flows. This would allow us to write tools which can load, takeover an available traffic flow, use it for operations, then release it back to the pool. One such integration is what we call an “Op Station”. Generally speaking, this is just an internal VM used by an operator for interactive operations. Of course, our tool of choice on these op stations is Slingshot.
To build the solution above, we combine some basic DNS services with secondary IP addressing. First, let’s imagine we have the following active traffic flows:
banking.com -> Bastion A [1.2.3.4] -> Nginx [10.8.1.2] -> Content Server [172.16.1.10] health.org -> Bastion B [5.6.7.8] -> Nginx [10.8.1.3] -> Content Server [172.16.1.10] support.com -> Bastion C [9.5.3.1] -> Nginx [10.8.1.4] -> Content Server [172.16.1.10]
With the following general Nginx upstream pools:
upstream pool-banking-com { server 172.16.1.10:80 backup; # Content Server server 172.16.1.20:80; # Operations } ... upstream pool-heath-org { server 172.16.1.10:80 backup; # Content Server server 172.16.1.30:80; # Operations } ... upstream pool-support-com { server 172.16.1.10:80 backup; # Content Server server 172.16.1.40:80; # Operations }
Nginx has detected that all of the operations IP addresses are not in use (unreachable) and is, therefore, routing all of the traffic for the three domains to our content server. These domains are currently in “passive mode”, simply serving content and looking normal. If we wanted to have a new VM/Tool begin capturing traffic, we would simply assign the host an “Operations” address for a particular upstream pool.
# Start collecting traffic for banking.com ifconfig eth0:0 172.16.1.20 netmask 255.255.255.0
Simple and easy, but how would the particular tool know the current list of available IPs and what domain they map to? We spent considerable time exploring different solutions for this. Nginx could expose some sort of API our tools could use to query the current configuration (redx or Nginx Pro). We could have a centralized SQL database which held all of the active domains. We could use some kind of DHCP service to release/renew IPs. All of these are generally expensive, technically and/or physically.
Ultimately we settled on DNS instead, the OG hostname database. Specifically, we opted for dnsmasq paired with a local host file. It’s easy for the collection server to update/modify records, and all of the domains can be retrieved with a simple zone transfer. We can also add textual markers to segment domains by purpose and context.
The setup is relatively simple, and most installation guides will get you 90% there. We just have to make some small tweaks to link the service to a static host file and allow zone transfers from any IP.
addn-hosts=/root/records @ Our host file auth-sec-servers=0.0.0.0 auth-zone=operations.local domain=operations.local log-queries
Our /root/records file looks something like this:
172.16.1.10 content 172.16.1.20 stage2-banking-com 172.16.1.30 stage2-health-org 172.16.1.40 stage1-support-com
We’ve prefixed the domains with a stage identifier to allow different tools to query only domains which apply to them. The lookup is a simple AXFR, following by some validation and selection.
$domains = dig axfr @ns1.operations.local operations.local | grep stage2 # Ping/Nmap to see which are in use # Prompt the operator for a selection
Wrap Up
In summary, we’ve designed a flexible, secure, and scalable way to manage multiple callbacks from a variety of resources in one, centrally located frontplane. This approach simplifies SSL certificate management (a huge headache for 30+ domains) and the need to spin up and configure additional external infrastructure, while simultaneously maintaining all raw C2 data within the internal network.
Ultimately, infrastructure provides a means to an end. It supports and enhances operations, and is foundational to any good red team. Like any building, if the foundation fails it all ends up in a pile of rubble.
We hope to have provided enough information here to pique your interest and motivate you to take your infrastructure to the next level. We welcome feedback and new ideas @silentbreaksec! Additional references are provided below.
References
- https://www.nginx.com/resources/wiki/start/topics/depth/ifisevil/
- https://cheat.readthedocs.io/en/latest/nginx.html
- https://docs.nginx.com/nginx/admin-guide/web-server/reverse-proxy/
- https://docs.nginx.com/nginx/admin-guide/security-controls/securing-http-traffic-upstream/
- https://gist.github.com/v0lkan/90fcb83c86918732b894
Explore more blog posts
The Strategic Value of Platformization for Proactive Security
Read about NetSPI’s latest Platform milestone, enabling continuous threat exposure management (CTEM) with consolidated proactive security solutions.
Backdooring Azure Automation Account Packages and Runtime Environments
Azure Automation Accounts can allow an attacker to persist in the associated packages that support runbooks. Learn how attackers can maintain access to an Automation Account.
The Rapid Evolution of AI Voice Cloning and its Implications for Cybersecurity
Learn about the rise of AI voice cloning, its cybersecurity challenges, and necessary measures for IT and InfoSec leaders to stay protected.