Bug Bounty – Wya.pl

Virtual Hosting – A Well Forgotten Enumeration Technique

By Wyatt | June 16, 2022

I have been a pentester for several years and have gotten to see my fair share of other pentesters and consultants work. As with most people in the security community, I’ve learned a tremendous amount from others. This blog post was sparked by a gap I commonly see during network pentests, which is that the pentesters forget virtual host fuzzing after observing that many domains resolve to a single or couple of IP(s).

As an example, let’s say you were given the range 10.15.1.0/24 to test. In that range you find the following hosts are online:

10.15.1.1
10.15.1.2
10.15.1.4
10.15.1.50
10.15.1.211

At this point you should be running a reverse DNS lookup on each of those IPs to see what domain names are correlated to an IP. You may also perform certificate scraping on TLS enabled ports to grab a few more domain names. Let’s say you gather that data and come back with the following:

10.15.1.1:router.wya.pl
10.15.1.2:win7.wya.pl
10.15.1.4:server.wya.pl
10.15.1.50:app1.wya.pl,app2.wya.pl,app3.wya.pl,*.wya.pl
10.15.1.211:device.wya.pl

The 10.15.1.50 host has 3 domains associated with it along with a wildcard certificate. This doesn’t guarantee there are additional host names that the server will respond to, but it might indicate there are subdomains with that suffix. Normally you’d want to take all these domain names and run them through Aquatone or maybe httpx to see if there is a difference in response.

To continue with the example, let’s say you noticed that https://app1.wya.pl looked very different than https://app2.wya.pl.

How does that work? Aren’t they on the same IP address?

Virtual hosting is a concept that allows individual servers to differentiate between different hostnames. This means that a single IP could respond to many domain names and serve different content depending on what was requested. This can be used in Apache, Nginx, load balancers, and more.

Essentially the server administrator configures a default route for unknown hostnames along with the primary one. They would then configure additional routes for other hostnames and serve that content when requested properly. The DNS server would typically be configured with entries for all hostnames the server accepts corresponding to the server’s IP.

In practice what does this look like? When everything is set up properly, as a user on the example network I should be able to go to https://app1.wya.pl and https://app2.wya.pl in a browser. The DNS lookup would succeed and resolve to 10.15.1.50. The browser would send the HTTPS request with the Host header set to app1.wya.pl or app2.wya.pl. The server would respond with the content for the requested hostname.

Cool, so why is this a big deal? Why do pentesters miss this?

The answer is always DNS.

In large organizations there are many teams that come together to architect a network and deploy applications. Rarely does any individual own the whole process required to get a domain name and certificate, manage the deployment server, deploy the code, and serve the content.

Often times a wildcard certificate will be deployed to a server to allow for dynamic sets of applications. The server owner doesn’t need to reconfigure the server every time a new app is created or an old one is removed. The server will attempt to route any HTTP request to the requested hostname. If it can’t find the server, then it will return the default route.

Network admins will choose most of the time to not publish the DNS records for internal applications to their external DNS server. Internal apps can resolve the servers they need to talk to fine. In most cases it’s possible for an externally facing server to be able to communicate with internal services.

Real World Example

So far I’ve done a lot of talking about what virtual hosts are. My hope is that through a real example you will see why this is important to do and why it can pay off. For this example, I’ll demonstrate virtual host enumeration on Ford, which has a VDP listed here: https://hackerone.com/ford. I’ll note that this is not a vulnerability in itself, but a technique that can be used to find additional hosts that may have not yet been tested.

Pivotal Cloud Foundry (PCF) (now part of Tanzu) is a common way to deploy apps at large organizations. The route handler used by PCF is the gorouter. PCF can be a high reward target for virtual host enumeration because of the way gorouter works.

Within Ford’s ASN exists the IP 19.12.96.10. If you had sent an HTTPS request to this IP, you’d see the following:

PCF has several clear indicators, such as the “404 Not Found: Requested route (‘…’) does not exist.” and the X-Cf-Routererror header. If you happen to do an nslookup on this IP you can see that it is indeed associated with Ford:

Let’s say you have a list of domains related to this company. In an internal pentest, you may have access to this data or you can scrape a source version control service like GitHub for domains. The Chaos dataset from ProjectDiscovery is a great starting point for public programs:

I’ll go ahead and throw all ford.com domains through dnsx to see what resolves to this IP we are testing.

Only a single result! Isn’t that a bit weird? We saw earlier that an nslookup resolved pcf3-vip-chiadc01-rprxy1-19.chi.ford.com to that IP as well. Let’s see what results we get if we send an HTTPS request to that IP with each hostname:

Nothing on both domains! Bummer.

Unfortunately our DNS results were a bust. Maybe certificate scraping will work. If I switch to the -v flag in curl for verbosity, I can see the certificates (I’ll note there are plenty of tools to automatically do this).

In the subject line, the CN of *.apps.edcpd01.cf.ford.com can be seen. A wildcard is interesting. Let’s see if any domains with that suffix exist in the Ford dataset from ProjectDiscovery:

Unfortunately running through the same exercise doesn’t give us any different results:

At this point, I’ve seen a lot of pentesters move on.

Let’s have some creativity here. We weren’t able to find any additional websites through DNS or certificate scraping. What else can we try?

We have a large list of domain names related to the company. That could be a good start. Another trick could be to mangle the subdomains and test out various permutations. Ripgen is a good example of what this would look like and I encourage you to try it out. SecLists also has a nice set of subdomain wordlists that you could prefix to the target company’s domain (or even the wildcard suffix).

Once you are ready to give it another go, you can test your list of domains out against the IP to see if there is any significant variation in response. If there is, you may have found a virtual host.

I wrote a tool named VhostFinder, which tests for exactly this. Virtual host fuzzing isn’t a new technique and there are already good tools out there that do it. The public tools didn’t work quite the way I wanted, so I made my own. It starts by testing for a random hostname to determine the default route. It then compares the response for each guess to that baseline. If there is a significant difference it considers that host to be a virtual host. As an addition, I added a -verify flag to check to see if the guessed response is different than requesting that domain over DNS. This can be used to ensure the results are only virtual hosts and not something you can already publicly talk to.

Eventually the results continue and we see the following:

This indicates that fcsbusinessadmin.ford.com is a virtual host for this IP. If we test this out manually we can see that this is indeed correct:

That’s really strange because if you perform an nslookup on the domain there is no A record associated with it:

Why is this the case? Well Ford’s DNS team didn’t intend for fcsbusinessadmin.ford.com to be publicly facing. Due to the fact that it’s accessible in this PCF server, anyone with the knowledge of the hostname can manually set the Host header to the correct value to visit it. Alternatively, you can add an entry to your /etc/hosts file to set this mapping going forward:

From here you can go ahead and start testing the site normally for bugs.

Let’s look at the total results from VhostFinder:

I ended up with 384 unique virtual hosts associated with this IP that VhostFinder discovered using the Chaos dataset. That’s a nice list of additional targets to test considering DNS and certificate scraping didn’t work.

Wrapping Up

Virtual host enumeration is a great technique to have in your skillset. It’s often forgotten because it’s not as intuitive that virtual hosts exist compared to something like directory enumeration. In a network pentest this is crucial not to miss. If a company asks you to test a range of IPs, it’s possible there could be thousands of websites and APIs behind a single IP. If you forget to check for this you could be missing significant coverage.

PCF is a technology that is easily susceptible to virtual host enumeration. Not all deployment softwares work this way or respond as nicely. Load balancers can often be vulnerable to the same issue.

Try out different servers to see what works and what doesn’t. Ask bold questions such as would a cloud provider or CDN route domains in the same way? Perhaps you can find additional services where others have not.

On the defensive side, it would be a healthy checkup to ensure your routeable domains match up with your DNS names. If not, figure out if a host really needs to be exposed. Don’t let DNS be a lie (security by obscurity isn’t a good operating model). In terms of mitigation, you can rate limit by IP to slow an attack. Most WAFs do provide protection for directory enumeration, but they typically do not provide protection against virtual host enumeration.

I hope you enjoyed this blog and learned a bit about virtual host enumeration and PCF. I’d love to hear if you have any cool stories (like tens to hundreds of findings at once) from testing this out.

Year End Review: Automation with a Bug Bounty Pipeline

By Wyatt | January 5, 2021

Bug Bounty and Vulnerability Disclosure Programs are growing at an alarming rate. At the end of 2020, I was monitoring over 800 companies across 3+ million domains on approximately half a million IPs. All of this data continues to be frequently updated as companies change their scope and assets. A pipeline provides passive income, while allowing for me to spend time working on other interesting projects and bugs.

Bug Bounty programs (BBPs) are companies that agree to pay researchers/testers for disclosed vulnerabilities. On the other hand Vulnerability Disclosure Programs (VDPs) publicly state that they will accept bugs through a communication channel, but do not provide compensation. VDPs will sometimes give out swag or place researchers on a hall-of-fame list. In the bug bounty community, there are strong feelings on which types of programs researchers should spend their time on. In general VDPs will have a less-hardened attack surface compared to BBPs due to the compensation. VDPs will generally be more secure than companies not accepting vulnerabilities from security researchers.

The first step in aggregating bug bounty data is determining what programs to hack on. From there, the program scopes need to be frequently retrieved in a reliable fashion. Researchers need to determine if they will test on BBPs or VDPs and if there are certain industries they want to opt out of, such as blockchain-contracts.

Where Do I Find Companies Accepting Vulnerabilities?

Various companies that are looking for vulnerabilities can be found on platforms like HackerOne, Bugcrowd, Intigriti, YesWeHack, and through sources such as disclose.io. Invite-only platforms exist as well, but have various requirements that may or may not play well with an automation pipeline.

An example of Spotify’s Bug Bounty scope can be seen with item’s such as *.spotify.com and *.spotifyforbrands.com.

Scraping Scopes

Bug bounty platforms provide a central repository for researchers to identify what companies are accepting vulnerabilities. They require companies fill out their profile page with rules and scope in a semi-consistent fashion. These profiles on a common platform allow for scraping. Some of them allow for unauthenticated APIs to be used, but there isn’t a great way to pull private program information without better APIs from the platforms.

One attempt is to use a tool such at https://github.com/sw33tLie/bbscope, which requires the cookies for each of your HackerOne, Bugcrowd, and Intigriti sessions and will then try to parse out the scope on each program.

I wrote my own solution a few years ago that grabs all programs on each platform and tries to parse out the scope from what each company wrote. My solution is very ugly and requires consistent refinement, but it works.

A lazy solution could be to just download a list of subdomains from public sources such as ProjectDiscovery’s Chaos project.

It’s important to pull this data on a recurring basis. This will allow you to obtain new companies that can be tested on. It will ensure that you have coverage for new domains that companies add to their platform profile and can be used to remove items from scope when they are no longer applicable to a company. As a side note, it’s a good idea to grab each company’s status. If they are not currently accepting vulnerabilities, then there is no reason to spend the compute time or energy gathering data.

Once you get the scope and any other bits of metadata you wish to store you can start to filter and perform recon on a company.

Recon

Automated recon has boomed over the last few years. There are new scripts and tools being added every month that are worth testing out to see if they fit into your bug bounty pipeline. It’s overwhelming to look at complex flow charts that have been built out by some researchers and determine where to get started. Test out some tools and find what works for you. Those tools and components can always be changed as your methodology matures.

I start my reconnaissance by performing subdomain enumeration. This means I take companies with wildcard scopes and try to find all related subdomains.

I have found good results from using tools such as Amass, Subfinder, Sublist3r, and ProjectDiscovery’s Chaos. Many of these tools aggregate public and commercial APIs that pull out subdomains for a given domain.

Some researchers will perform DNS bruteforcing to identify additional subdomains using a list like Jason Haddix’s list in SecLists. I personally don’t perform DNS bruteforcing, but it’s a good candidate for improving a pipeline.

After identifying a large list of subdomains to test, that data should be filtered to only what is relevant. Any filtering that can be done upfront will save hours of time in the future. Running large lists of domains through scanners and tooling will greatly slow down and break pipelines. A good start is to check what is online. This status may be defined by DNS resolution or by the availability of some network service such as HTTP.

It’s worth determining if the metadata, IP and relevant ports, are valuable to keep in your inventory or if your pipeline should retrieve fresh data consistently. My preference is to store that data and periodically check stale records to see if they are still accurate. It’s excellent to be able to automatically query network data and associations when writing test cases.

If maintaining an inventory of IPs and ports is of interest, then DNS resolutions and network scans are a large portion of ongoing recon. Massdns is a frequent suggestion for checking to see if many domains are online. It requires a list of DNS resolvers to be updated regularly. Nmap is the most famous network scanner, however masscan, naabu, and rustscan offer faster results with reduced coverage/detection. It also depends what is of interest. There are 65535 TCP ports that can be scanned, which can take a significant amount of time. It may be valuable to scan some UDP ports as well. Network scanning can provide valuable information such as what type of software is running on a given port and can even be configured to run vulnerability scans against that service.

The gathered IPs can be analyzed with services such as Shodan to perform passive network scanning on your behalf. Additional metadata can be grabbed from these services such as the ISP and if it’s hosted on the cloud. The downside is that the rate limit for many of these services is slow and checking hundreds of thousands of IPs at a time can be a bottleneck.

You may decide to filter out subdomains and domains that are offline. This will certainly save space and time as you recheck this data, however it can be useful to keep around. Unresolved subdomains can be used for virtual hosts fuzzing and easy proof of concepts for Server-Side Request Forgery (SSRF) vulnerabilities that allow you to request a company’s internal content.

At minimum, common web services should be identified and tested in a bug bounty pipeline. Port 80 is commonly used for insecure traffic (http), while port 443 is used for TLS traffic (https). An extended number of ports such as 3000, 3001, 3002, 8000, 8080, and 8443 may be commonly seen as well. I highly recommend using httprobe to identify what domains are online and if they are accessible through https, http, or both.

Some other items that may be interesting or relevant to grab:

Screenshots
Wappalyzer Tags
DNS CNAMES

Storing and Managing the Data

The scope from the platforms and the reconnaissance data can become quite large after some time. In an automated system, the data needs to be stored and processed automatically. It needs to be frequently queried and updated. The ideal scenario would be to use an API to manage this entirely or certain components.

There are a few main tables that need to store the appropriate data. I have a Company, Site, IP, and Vulnerability table in my database. I created a join table to map IPs to sites and vise-versa, which allows me to be very efficient in translating this data. Some companies list out their public IP range as part of their scope, so another possibility would be to link IPs to companies. At the core of it, a simple database is required with a lot of data.

Any framework or language could be used to create this central database. The bulk of the effort is in the endpoints that process the data and requests. Some questions to ask are:

How do I want to interact with a company’s data?
Do I need aggregate or individual results?
How do I handle large HTTP responses?
How much metadata do I intend to store on each table?

A secondary consideration is how to trigger events and queue jobs. Cron works great to schedule time-based tasks. An example would be to fetch the scopes of all bug bounty companies at 5 PM daily and send any new data to your reconnaissance suite. Certain jobs such as importing hundreds of thousands of records from sites like Yahoo may take up all of your APIs CPU. You may want to consider storing those and processing them in batches.

Some bug bounty hunters will store this data in folders on a filesystem and stitch everything together with bash scripts. I prefer using an API as it provides more granularity on how I want to shape the data, it allows me to stay organized and consistent across companies, and it can easily be deployed to different systems.

The Fun Stuff – Finding Vulnerabilities

At this point in the journey there are some systems set up to continuously grab data and start working on it. That data is stored and can now be queried based on how many attributes you have stored. This leads to a lot of exciting potentials.

As part of a MVS (minimum viable scanner) the bug bounty pipeline needs to be able to pull a subset of the data it has collected and start to scan or fuzz it for vulnerabilities and then report back positive results. It would be possible to auto-report these issues to companies, however I discourage doing this as scanners can have false positives. Results should always be manually reviewed/exploited.

A strong baseline would be to implement functionality to run ProjectDiscovery’s Nuclei scanner on all of your domains on a rolling basis. This means that once it runs through your list, it starts over again. The scanner and templates are continuously updated by the community, which takes the work out of writing test cases for CVEs and common misconfigurations.

If you have read my Metasploit’s RPC API article, then another option could be to attempt to automate the community version of Metasploit against your targets. Metasploit provides the check command on a large number of modules that have a default port associated with them. Metasploit is regularly updated by Rapid7 and is another great way of attempting to automate without recreating the vulnerability signatures manually. A successful vulnerability will likely give you a shell, which will likely be a critical severity vulnerability.

Some other options would be to write your own modules on a regular basis or run other people’s scripts that can be incorporated into the pipeline. They can be efficiently tested by querying applicable network services or web application technologies instead of scanning all assets for a specific vulnerability.

Once a scanner has identified an issue it needs to report back to the central database. It’s great to aggregate the data in one place, but with fast-paced 0-days you need to know within seconds of identifying the vulnerability if you want to be first to report a bug. A notification system is a good idea to have in your pipeline that can be configured to get your attention depending on a variety of factors such as severity and confidence. Slack and Telegram provide free methods of sending notifications. AWS and Twilio can be used to send SMS messages. There are a lot of free and paid products that can be used to send events for a variety of events in your pipeline.

Building the Infrastructure

A large part of bug bounty hunting is to bootstrap a bunch of technologies together to achieve automation. Scripts have to be modular enough for you to be able to swap out tools and components. Some pieces in the pipeline are essential and are unlikely to be disrupted, however the code that glues it all together should allow for an easy upgrade.

Most of what I discussed in this article can be ran for $5-10 in the cloud each month, which is $60-120 a year. That is cheaper than most security tools and it can be used to fund itself through earned bounties.

I’m a huge fan of Axiom, which allows you to create a bug hunting image on DigitalOcean, AWS, etc and spin up a new instance via command line in a matter of seconds. The base instances that cloud providers release are generally sufficient for any type of scanning and are fairly cheap. Axiom wraps the infrastructure code into a bundle of command line tools that allow for IP rotation, distributed scanning, and most importantly pay-for-what-you-use tooling. Customizing the base Axiom images is fairly easy and provides a great starting point.

Automated tooling like this allows for a researcher to spin up an instance or several for a few hours to run through a test suite and then delete all of the instances to prevent additional costs. It ensures that they servers are using the latest copy and that there isn’t any remnant data that might cause problems.

I like use a queue to track the state of my scanners. As I said at the beginning of this article, I have a few million domains that I’m tracking. On a single instance, I likely can get through a few thousand scans in a couple of hours. Using software such as Redis, I can load all of my data into a job-specific queue and parse it with any programming language. I can pop the appropriate jobs from the queue for a given time-frame and then execute my tests. When the queue is empty, I can move on to the next test or decide to replenish the queue with fresh data from my database.

When deciding on an infrastructure, spend the time to play around with the technology until you feel comfortable bootstrapping with it. Ensure that there is enough community support to incorporate software into your stack because you will run into problems.

A Million Forks in the Road

Bug Bounty pipelines are necessary to bug hunters that are looking to test against a breadth of companies. They can range from simple bash scripts to entire networks of bots and micro-services. Pipelines allow for regression and excellent methods of staying organized. They can easily surpass what any person can manually accomplish, yet they will struggle on certain types of bug classes that can’t be easily automated. In it’s first year, my pipeline has managed to pay for itself for the next 20+ years.

There are a handful of improvements that can be made to cover various technical domains and techniques in my pipeline. Each person gets to choose how they want to build their pipeline and what they want it to focus on. It’s easy to extend tables and increase data sources. Automatically ingesting new CVEs and vulnerabilities from the community is powerful and requires minimal effort. When you have started building or planning your pipeline, I encourage you to ensure that the code you write is modular, reinvent the wheel as little as possible, and iterate consistently.

Hit me up on Twitter @wdahlenb with stories about your Bug Bounty pipeline.