Patterns of the Internet: Probes from the IPv6 nebula

Introduction

I'm going to break down some of the recent work I'm doing in the area of IPv6 and unsolicited scans that originate from the internet. I've identified a number of patterns and have followed the data to reach a number of conclusions. Although this is a work in progress, I'm starting to unravel what appears to be a sophistocated network of IPv6 scanner and harvester hosts that work together to probe internet-connected devices by connecting to their temporary, randomy-assigned IPv6 addresses which are often regarded as private or hidden.

The Landscape

The IPv6 address space is vast. That fact has been pounded into our heads for years now, and is pointed out in almost every IPv6 document in existence. A home with a /64 prefix has pentillions of addresses available for it to use at will. However only less than 0.000000000000002% of those addresses ever get used. It’s a vast nebula of address space.

It’s so vast in fact, that performing a traditional network scan, address by address, would take many lifetimes to complete. That option is basically off the table for any sort of network scanning/discovery efforts

Some assume that because the address space can’t be scanned, that the 10-20 IP addresses that are used within your /64 are neatly tucked away, safe and secure from outside predators.

But that couldn’t be further from reality…

Observations of a New Trend

I’ve observed a steady increase in rather intelligent IPv6 probes/scans targeting individual random / privacy extension (PE) IPv6 addresses within my home network. There’s nothing special about my particular network – I would assume these scans target many home and corporate networks. The method of these scans is particularly clever though. In every case, the PE address being scanned is behind a randomized IPv6 address – a privacy address – allocated and used by a device for a short period of time, usually less than two days. These addresses should be private and hidden from attackers… but it seems they aren’t.

I was first alerted to the existence of these scans as I was parsing through my firewall logs in Splunk. My firewall logs go back several years, and I can account for every inbound and outbound packet going in and out of the home network. This lends itself well to digital forensics and packet sleuthing. And it is infinitely interesting to me to analyze the latest techniques used by network attackers.

Each week I see new forms of attacks and clever new ways of scanning. And any time there is a new vulnerability in the wild, I can see the spike in traffic to the affected port(s). Oftentimes I can see a significant spike before the disclosure is even made public, which is interesting.

In the case of the IPv6 scans against the PE addresses within my network, I can see the scans targeting specific hosts. Clearly the entity (probably a script) doing the scans has been provided with the exact 128-bit randomized IPv6 address of each host that it scans (it's certainly not guessing). The question is, who or what provides this information to the scanner? A quick search for any outbound packets to the actual scanner host from my network comes up empty – the scanner address is never contacted by my hosts directly – it must be getting the private addresses indirectly, from another host that I am connecting to.

Dissecting the Attacker's Methods

It’s reasonable to assume that a privacy-extension (PE) IPv6 address isn’t known until it comes into existence. Some time after it is used for the first time one of the numerous hosts that it connects to is being a little evil and passing the address to the scanner.

This is the most reasonable explanation I assume, although I can think of others as well (malware on a PC passing its IPv6 addresses to the scanner via ipv4 for example) but this is less likely, so I’ve been focusing on the former hypothesis.

So if the scanner is some how provided with each private IPv6 address by one of the IPv6 sites that is accessed by the good host then we can start to narrow down the culprit or culprits with the use of Splunk, and that vast repository of packet logs that I mentioned.

Testing the Hypothesis

The first thing that came to mind Is to build a set of all hosts which I connected to using each private IPv6 address that was scanned, and then look for intersections between those sets. In fact I think this approach is promising, although it makes a number of assumptions if optimal results are to be expected. First and foremost it assumes that only one remote host triggers the scanning, not multiple different remote hosts.

I think in order to proceed with this hypothesis we have to accept this assumption. To minimize its affect on the result we can work with smallish sample sets – say 30 days of data at a time.

So over the last 30 days, I’ve seen the following scans against privacy extension addresses on my network (all were of course blocked at the firewall):

171 Unique sources sent ICMPv6 ping requests. Of those:
Periodic bursts of ICMPv6 Ping requests, with gradually increasing hop-limit field. Purpose: unknown!? Perhaps some sort of ranging or network depth probe?
12 unique sources (about 5,000 packets) sent straight-up TCP scans toward my network, with the intention of probing specific services.
Shodan – port scanning. This isn’t particularly surprising except that I don’t always know where they are harvesting my IPv6 PE addresses from.

Splunk Time!

To expose the hosts which are harvesting my private IPv6 addresses I run variations of the following query in Splunk:

[ search IN=cm DST=*:* SYN | top DST | eval SRC=DST | table SRC ] | chart limit=100 dc(SRC) by DST

Which does a lot of things… It first builds a list of my internal IPv6 addresses that have been targeted by probes. It then feeds that into the outer query as a search filter but first it reverses the DST/SRC fields so that it can find all outbound connections made by those private addresses.

Then, as a sort of union of sets, it performs a distinct count of destinations visited by those addresses. The distinct count represents the number of items in common between all of the sets.

In essence the query builds those sets of externally contacted sites that I mentioned earlier and finds the union of those sets indirectly by showing the top matches first – in other words, sort the list of results in descending order, by distinct count and you can uncover the hosts that were contacted by each privacy address that was probed.

Stated even more simply, it identifies the most likely suspects that may have triggered the scans.

It isn’t a perfect query yet. I’m working on that. It has a high probability of false-positives, especially for sites which are always contacted, by each address.

Observations

I performed a few honeypot probes against those addresses, each using a unique IPv6 source address, to see if I could trigger a response scan. So far I’ve tried about 50 and none resulted in a response scan. I have a hunch that I need to do more than a simple TCP connect to the host in order for it to harvest my IP and pass it to the scanner.

This is where I’m at now. Brainstorming a new hypothesis that I can test.

Conclusions

Scans are not launched by the same IP addresess that harvest the IPv6 addresses so the scanner network consists of numerous nodes.
Scans are triggered once and not repeated for days to weeks - the scanner or its controller has a persistent storage database of what has been scanned.
Not every PE address gets scanned. Perhaps because a very specific malware site or script has to run in the browser to contact the harvester host.
Triggering a scan doesn't seem to be as simple as just opening an HTTP connection to the harvester - This conclusion is supported only by circumstantial evidence - the 50 probes that I sent to the suspected harvesters didn't result in any response scans.
Many of the suspected harvesters are owned by Google / Facebook, so perhaps this is actually some sort of pet project to collect internet statistics of connecting devices, or a way of detecting bots.
In the case of scans originating from Shodan scanner hosts, at least one set of harvesters resides on the public torrent networks (downoading a CentOS ISO from a torrent results in reverse scans from Shodan).

Also Noteworthy

These scans are probably yielding good results by the attackers, for many reasons, including:

Devices with IPv6 addresses often don't have firewalls!!! Android phones for example.
Home gateway routers (eRouters) don't typically block inbound IPv6 traffic by default.

Patterns of the Internet

Thursday, October 22, 2015

Probes from the IPv6 nebula