Five Critical Data Source Considerations for External Threat Hunting

by | Aug 5, 2020 | Blog

Strong intelligence starts with good sources and when it comes to gaining the most context around suspicious events or adversaries of interest, nothing beats external hunting.

Most current threat hunting is rightfully focused on hunting inside the firewalls of an enterprise, but often, security teams cannot reach definitive conclusions due to large scale visibility gaps and a lack of effective log aggregation.

Many enterprises take years to mature a threat hunting team within a security operations center. Policy and business considerations between human resources, legal, IT, and engineering have to evolve and the business has to make the budget work.

Even then, it’s often difficult to prove that a malicious actor isn’t present on a network due to visibility and coverage gaps. Therefore, tremendous effort is consumed to produce processes and procedures that document these gaps and have them remediated over time by the security engineering and IT teams.

While many enterprises are consistently weighing the cost benefit analysis of storing, aggregating, and analyzing their own data to conduct internal threat hunting, they should understand that external threat hunting can drastically increase the context used in internal threat hunting, allowing faster times to detect and respond.

Network logs (Firewall, Proxy, DNS, Mail) and endpoint logs (system events and applications like web servers and VPN hosts) are the most important data to get started when thinking about what’s happening outside of the network. With that basic foundation, five data sources are critical for conducting effective external threat hunting:

Passive DNS

Passive DNS is a system of record that stores DNS resolution data for a given location, record, and time period. This historical resolution data set allows analysts to view which domains resolved to an IP address and vice versa. This data set allows for time-based correlation based on domain or IP overlap.

Many of these IPs and hosts are controllers being managed by malicious actors. Some of these host names and IP addresses hitting the internally collected logs can be used by passive DNS to identify additional host names and IP addresses that a network defender might not have seen through initial analysis on the internally-collected logs.

Global Netflow

Internally, the netflow protocol is used by IT professionals as a network traffic analyzer to determine its point of origin, destination, volume and paths on the network. Using internally collected logs such as application and firewall logs, imagine being able to cross reference that internal data of similar type but different collection activity external to the enterprise.

External netflow is important because it allows for storing large amounts of traffic data over time without the large storage requirement of full-packet capture. Pivoting on the passive DNS example above, next a network defender might want to know who is managing the controllers to gain more context on the “who” and the “why.”

They should want to know what kind of administrative traffic is flowing to those controllers such as SSH or RDP connections. This type of analysis is how defenders can ascertain consistent APT or crime activity such as persistent beaconing over commons or odd ports over consistent time intervals. This type of analysis can also be used to root out mistakes and leads to detecting an actors’ exploitation, survey/implant, sustained collection, and analysis infrastructure.

Mobile Data

Mobile data and adtech data collection is used to target advertisements to users through mobile apps and browser data. This data can sometimes contain personal information but more often than not contains a unique advertising identifier that does not identify an individual by name but rather by characteristics and history.

Some of these characteristics associated with your ad id include WiFi networks that you have connected to, IP addresses the device has been assigned, geographical location, model of phone/computer, browser version and in some cases deeper historical data centered around purchasing interests. Using this data, a hunter can identify a single device by IP or location and follow that device chronologically to determine activities that device performed from different addresses and networks.

Aggregation of Scanning Traffic

One of the main issues with monitoring traffic hitting external applications and devices is the sheer number of systems on the internet that are constantly scanning for open services and crawling applications for indexing. A cursory look at any firewall or application log without any sort of filtering can be overwhelming and time consuming.

This is where services that filter the noise from known scanning hosts and highlight more focused probing of devices and applications are very useful. These services monitor scanning activity using a number of listening posts on the internet as well as aggregated threat intelligence.

They then use data from these listening posts and threat intelligence to help identify hosts that are of little interest and can be filtered from logs when looking for targeted probing and attack infrastructure.

Open Source Media

Open source financial, geopolitical, and technical social media data often add valuable context to external threat hunting. Depending on the scenario, we’ve used weather patterns, geopolitical world events, foreign holidays and rituals, technical, and financial terms around bitcoin to provide additional details on the “why” of certain attacks or to help make sense of the timing of certain technical observations.

While darknet data can be useful in this regard, scraping this type of data adds a lot of noise unless it’s targeting a very specific requirement. We typically only see darknet data useful after an incident has taken place to catch leads from a particular operator.