Episode 32 — Run link analysis that reveals hidden clusters

In Episode 32 — Run link analysis that reveals hidden clusters, the emphasis is on learning how to connect dots that do not initially look related and turning those connections into insight. Many investigations stall at the point where individual indicators are understood but their relationships remain unclear. You may know that a domain is suspicious and that an address appears elsewhere, yet the broader structure stays hidden until you deliberately look for links. Link analysis is the discipline that helps you surface those relationships in a way that can be examined, explained, and defended. The aim is not to build the biggest possible map, but to reveal meaningful clusters that show coordination, shared control, or repeated behavior over time. When used carefully, link analysis transforms scattered observations into an intelligible network.

At its core, link analysis connects seemingly unrelated data points through shared attributes such as email addresses, phone numbers, infrastructure elements, or identifiers embedded in activity. A single data point often tells you very little by itself, but when that same attribute appears across multiple artifacts, its significance grows. The process begins by identifying attributes that recur and asking whether their recurrence is likely accidental or intentional. This approach is powerful because attackers reuse components, either out of convenience or necessity, and those reuse patterns leave traces. By focusing on shared attributes, you move from asking what happened in one case to asking who or what is behind multiple cases. This shift in perspective is what allows link analysis to reveal clusters rather than isolated events.

One practical example is using a shared phone number found in registration data to locate other malicious domains. A phone number is not commonly reused at scale by unrelated actors, especially when it appears alongside suspicious behavior. When you find the same number tied to multiple registrations, you gain a lead that suggests common control or at least common origin. From there, you can examine whether those domains share hosting providers, naming conventions, or activity patterns that reinforce the connection. The phone number alone is not proof, but it becomes a strong pivot point when supported by additional evidence. This is how link analysis grows incrementally, with each confirmed link adding weight to the emerging picture.

One risk to manage carefully is the temptation to build a map so complex that it becomes impossible to explain clearly. Link analysis tools can generate sprawling graphs that look impressive but communicate very little. A dense web of nodes and edges may capture everything you found, but if you cannot explain why specific links matter, the analysis loses value. Complexity should serve understanding, not replace it. A good practice is to start with a small set of high confidence links and expand only when new connections add explanatory power. If a node or edge does not contribute to a clear narrative about behavior or control, it may belong in notes rather than in the primary graph. Clarity is a feature, not a limitation.

To preserve that clarity, focus on the most unique attributes that are unlikely to be shared by chance. Attributes such as default hosting providers or widely used services are common and therefore weak signals. Unique registration details, uncommon configuration choices, or rare behavioral markers are much stronger. The rarity of an attribute increases its value as a linking element because coincidence becomes less plausible. When you select attributes deliberately, you reduce the risk of drawing false connections and increase confidence in the clusters you identify. This selectivity also makes your work easier to review, because others can see why a particular link deserves attention. Link analysis is strongest when it is selective rather than exhaustive.

Visualization plays an important role in this process, because it helps you and others see relationships that are hard to grasp in text. Imagine a web where every line represents a confirmed connection between two entities and every node represents something you can describe clearly. In this web, thickness or direction can indicate strength or type of relationship, but only if those meanings are consistent. Visualization should make patterns stand out, such as dense areas where many links converge or long chains that suggest progression over time. If the visual becomes cluttered, it is a signal that the scope may need to be reduced. The purpose of visualization is to support reasoning, not to replace it.

A helpful metaphor is to think of link analysis as a spider web connecting different parts of an investigation. The web is not built all at once, and it is not perfectly symmetrical. It grows where there is structure to support it and remains sparse where evidence is thin. When something moves in one part of the web, the vibration can be felt elsewhere, revealing relationships you might not have noticed otherwise. This metaphor emphasizes sensitivity and balance. Too many weak threads make the web meaningless, while too few threads leave important connections undiscovered. The analyst’s role is to decide which threads are strong enough to include.

As clusters emerge, pay attention to hubs where many indicators converge on a single entity or attribute. These hubs often suggest primary actors, shared services, or central points of control. A hub might be an email address used across multiple registrations, a name server that appears repeatedly, or an infrastructure element that ties many domains together. Identifying hubs helps you understand scale and coordination, because they often represent choices that are harder for an attacker to change quickly. At the same time, hubs require careful validation, because some hubs exist simply because a service is popular. The distinction between popularity and control is critical and must be supported by evidence beyond mere frequency.

One of the strengths of link analysis is its ability to show scale over time, not just in a single snapshot. By examining links across weeks or months, you can see whether an attacker’s infrastructure expands, contracts, or shifts direction. This temporal dimension adds depth to your understanding, because it reveals persistence and adaptation. An infrastructure cluster that appears repeatedly over many months suggests sustained effort, while a short lived cluster may indicate opportunistic activity. Tracking these patterns helps you move from incident response to threat understanding. It also supports forecasting, because past expansion patterns can hint at future behavior.

Link analysis is also useful for uncovering common infrastructure shared by different malware families. When distinct tools or payloads point back to overlapping domains, servers, or registration details, it suggests shared operators or shared support services. This does not mean the malware families are identical or part of the same campaign, but it does suggest coordination or reuse at some level. These insights can challenge assumptions and prompt deeper questions about how operations are structured. They also help defenders prioritize monitoring and response, because shared infrastructure represents a common dependency that can affect multiple threats. As always, these conclusions must be framed carefully and supported by multiple links.

Maintaining a high standard for what constitutes a link is essential to avoid making false connections. A link should represent a relationship that is meaningful and defensible, not just convenient. This means confirming that the shared attribute is not widely used and that its presence is not easily explained by chance. It also means being willing to discard links that do not hold up under scrutiny, even if they initially seemed promising. High standards protect the integrity of the analysis and the credibility of the analyst. They also make peer review more productive, because discussions focus on evidence rather than on speculation.

One specific check that helps maintain those standards is examining whether a link is based on a common service such as a public proxy or widely used platform. Many attackers route activity through shared services that thousands of unrelated users also use. Treating those services as direct links between activities can inflate clusters artificially and mislead analysis. When you encounter such services, the appropriate response is usually to note them as context rather than as connectors. The question to ask is whether the service use shows coordination beyond simple access. If not, the link may be too weak to include in the main graph.

Combining technical indicators with registration data often produces a more robust link analysis graph. Technical indicators such as infrastructure and behavior show how activity unfolds, while registration data can suggest who controls or provisions the resources. When these two perspectives align, confidence increases. For example, domains that share registration details and exhibit similar network behavior present a stronger case for common control than either signal alone. This combination also helps balance strengths and weaknesses, because technical data can be ephemeral while registration data can persist longer. Together, they provide a more stable foundation for analysis.

Documentation remains critical throughout this process, because link analysis often involves many small decisions that are easy to forget later. Recording why a link was included, what evidence supports it, and what alternatives were considered allows others to retrace your reasoning. This transparency is especially important when graphs are shared with stakeholders who were not involved in the investigation. Documentation also supports iteration, because you can update or remove links as new information emerges without losing context. In this way, link analysis becomes a living artifact rather than a static picture.

As you practice, you will notice that link analysis changes how you approach new indicators. Instead of asking only whether an indicator is malicious, you begin asking what it might connect to. This mindset encourages curiosity while maintaining discipline, because every new connection must earn its place in the graph. Over time, you develop an intuition for which attributes are likely to yield meaningful links and which are not. That intuition speeds up analysis and improves quality, but it remains grounded in evidence and review.

The ultimate value of link analysis is not the graph itself, but the understanding it supports. A well built graph helps you explain coordination, scale, and persistence in ways that lists of indicators cannot. It also helps teams align, because everyone can see the same structure and discuss it using shared reference points. When disagreements arise, they can be resolved by examining specific links rather than debating impressions. This shared view strengthens both analysis and communication.

Conclusion: Link analysis reveals connections so pick two indicators and find their common link. By deliberately selecting attributes that are unique, validating links carefully, and keeping your graphs clear and explainable, you uncover clusters that would otherwise remain hidden. Start small, document each step, and expand only when new connections add real insight. When you combine technical indicators with registration data and maintain high standards for inclusion, link analysis becomes a powerful tool rather than a confusing picture. Take two indicators from a recent case, look for a defensible common attribute, and trace what else connects to it, because that simple exercise is how hidden structure begins to surface.

Episode 32 — Run link analysis that reveals hidden clusters
Broadcast by