In Defense of the Dark Arts

...The art of collecting intelligence has evolved dramatically since Sun Tzu advised 'know thy enemy' over two millennia ago, one constant has remained...

It can't rain all the time.

Information becomes intelligence when it is relevant, timely, and useful. While the art of collecting intelligence has evolved dramatically since Sun Tzu advised 'know thy enemy' over two millennia ago, one constant has remained—counterintelligence. Whether your goal is to identify and counter terrorist threats, defend against hackers, or gather information as a journalist or academic, it's crucial to understand the variety of ways you may expose yourself and your research while online. A valuable technical access can quickly evolve into a well of poison and deceit when an adversary attributes your activity. In a worst-case scenario, this can even manifest as real-world threats to your personal safety. So, how do we carry on safely, ensuring our collection stays relevant, timely, and useful, while avoiding the many pitfalls that exist online?

First, everyone makes mistakes. From the most skilled hacker to the newest undergrad, this is a universal truth. I've recently noticed an increasing number of security vendors offering managed attribution platforms, but I can't help wondering if they're helping or hurting. There is intangible value to be gained through the thoughtful design of a managed attribution platform—value less likely obtained when a platform is bought not built. It's impossible to count the number of times an organization has blown its budget on a sophisticated managed attribution platform, only to have a well-intentioned employee, in the midst of a professional crisis, log in to a research account from a personal device or network the very next day. Some context is in order.

The road to hell is paved with good intentions...

Attribution is the process of assigning an observed behavior or activity to a group or individual based on an analysis of digital signatures and forensic artifacts. Over the last ten years, the discussion surrounding attribution has evolved fluidly. In the early days, organizations aimed for the lofty goal of non-attributable online activity; however, this goal was neither practical nor realistic.

Non-attribution requires an activity or event cannot be attributed because no digital signatures, forensic artifacts, or other evidence of the activity persists—think self-destructing messages. This goal was likely a holdover from the Cold War school of clandestine activity when intelligence services and special operations teams sought to infiltrate an adversary's physical borders, carry out some activity, and escape without detection. While this type of activity may still persist in the physical world to some degree, it is virtually impossible in the digital world where every click, keystroke, and scroll is logged and monetized. Then came misattribution.

When it becomes impossible or impractical to carry out clandestine activity without detection, some turn to misattribution, or the act of causing an activity or event to be attributed to a specific group or individual other than the one who carried it out. In practice, this most commonly manifests as deception. Online, it could be a persona masquerading as a member of a specific organization, or the use of compromised infrastructure to obfuscate the origin of malicious traffic.

Misattribution can exist as a component of a managed attribution system, but it's not practical as a holistic approach to online research and operations due to the intense level of effort required for success, as well as potential legal and ethical concerns surrounding both the misdirection of investigative efforts onto another party and the potential to draw investigative interest to benign activity.

[Enter 'Managed Attribution']

Managed attribution should manifest as the holistic and deliberate act of curating digital signatures and forensic artifacts to project a desired online image. Managed attribution leverages a broad range of techniques and resources to diversify behavioral indicators, obfuscate digital signatures, inject noise or randomness where appropriate, and minimize or craft forensic artifacts. This approach addresses numerous goals and outcomes through a variety of tailored solutions based on individual circumstances. The key to a successful managed attribution strategy lies not in software or hardware, but in flexibility, adaptability, and a positive user experience. If the system isn't easy to use, it will fail.

The goal of curating your digital footprint online should exist whether you're a leading threat intelligence provider or an independent researcher. While managed attribution platforms can be expensive, managed attribution doesn't have to be. I recently published a project on GitHub that attempts to demonstrate a reasonably secure, low-cost approach to managing the digital footprints your online research leaves behind. This project doesn't offer any unique capability or insight. In fact, the software is made freely available by the same organizations that service some of the largest organizations in the world. My only expense in implementing it was a VPN subscription and a couple of hours of my free time. Instead, the project attempts to help ease the path to execution.

Nonetheless, any tool is only as useful as the craftsman (or caveman) who wields it. When setting up a managed attribution platform, it's important to consider the various use cases it needs to address. A few broad categories might include:

Account Management
Data Processing and Analysis
Passive Browsing
Subscription-based Accounts
Fictitious Personas
Online Operations
Scanning and Automation

Before getting into the various use cases, let's set some ground rules. Some of these rules will be more firm than others, but as a general practice, you should establish a set of rules for yourself prior to adopting any new managed attribution strategy and reassess on a regular basis, including anytime you add a new use case or piece of hardware. Do this even if you are the only user of your platform to ensure you don't make compromises that end up hurting you in the long run.

Don't mix business with pleasure: Do not under any circumstances log in to an operational account (i.e., one used to access resources on the dark web or engage in other high-risk activity) from a personal device, and do not log in to a personal account from within your managed attribution platform.
1. Don't mix business with business: Similarly, you should not access operational accounts from your company-issued devices that are also used to access your company's sensitive internal resources. Nor should you log in to a corporate account associated with your identity or the identity of your employer from within your managed attribution platform.
Never grant a container privileges in excess of the host: For example, do not log in to a password manager in a VM if the password manager also manages secrets for the VM's host (or other VMs for that matter). Do not grant a container SSH access to a resource that has administrative privileges for the container's host.
Compartmentalization is key: Storage is cheap. Spills last forever. Create as many containers, virtual machines, and any other compartmentalization strategies as necessary to ensure that no two functions overlap. Don't let the loss of one container bring down your entire house of cards. Better yet, don't build your house out of cards. Contrary to popular opinion, unless you live somewhere where there is legitimate concern about physical access to your living space, consider recording passwords in hard copy (i.e., writing them down in a notebook like the monks did) rather than storing them digitally.
Cakes > Onions: Don't rely on one layer where three will do, but not all layers are made equal. There are numerous reasons why The Onion Router may not be appropriate for certain use cases, but everybody loves cake! And there's a cake for everyone. Your use case may have specific requirements based on the individual layers of your defense strategy. Think of this customization more like baking a cake than picking an onion. It takes time and precision, but the end result is far more appetizing to discerning palates.

A GIF from Portal's Aperture Laboratories with an appetizing cake glowing on a desktop. — The cake is a lie.

Account Management

Regardless of your other use cases, you're nearly guaranteed to have some administrative overhead. This may be keeping track of virtual wallets, managing cloud services, receiving one-time passcodes, or generating API keys for automation tools. Whatever the situation, these types of services frequently rely on specialized software and/or persistent cookies. Consistency is the theme for account management. If you've gone through the trouble of setting up a privacy LLC to establish commercial accounts and maintain your personal anonymity (see Mike Bazzell's Extreme Privacy: What it Takes to Disappear), you probably don't want to get locked out for "suspicious activity" because you logged in using Tor or your VPN was flagged.

In an ideal scenario, you would enlist a static VPN endpoint (see Tailscale or Outline) for this type of activity in order to emulate common business accounting practices. You may be able to get by with a commercial VPN provider that offers static IP addresses or SOCKS5 proxies as an alternative to self-hosting static VPNs. Additionally, use a common operating system and web browser with rules in place to allow the service provider to persist cookies. This decreases the likelihood of unintentionally triggering fraud prevention measures and losing access to critical resources.

For consistency and high availability, these types of instances are easiest to maintain in virtual machines stored on your host workstation, but you may also choose to store them on a removable hard drive for further protection when not in use. Either way, the images should be encrypted at rest and backed up offline on a regular basis. Most type-2 hypervisors (see VirtualBox or VMWare) have straightforward options for creating snapshots and can be combined with common file backup services for offline storage.

A note of caution: you're allowing a considerable amount of leeway to the service provider by persisting identifiers in this type of instance. So, it's important to avoid exceeding the instance's singular administrative scope. Do not cross-contaminate a research account with an administrative account, or you could end up losing both.

Data Processing & Enrichment

Another common use case involves various data processing, enrichment, and analytic tasks. A few examples include digital forensics, log analysis, reverse engineering, and password cracking—all of which have vastly different requirements. Some of these tasks (such as dynamic malware analysis) inherently incur higher risk and require a more thoughtful approach, while others such as password cracking are low-risk but necessitate greater computing resources than a container may be able to provide.

If you're engaged in these types of use cases, chances are you're familiar with the risks they involve. If not, now would be a good time to start learning. In general, it's best to avoid external connections from an analytic environment, but there are numerous exceptions, such as the need to query vendor APIs for enrichment data. Another common requirement includes the need to document results of your analysis and prove your work. Tools like OBS Studio, OpenShot, and/or DaVinci Resolve are useful in this context, while many virtualization platforms can also be configured to record sessions natively.

Passive Browsing

Passive browsing involves short-duration browsing sessions on the surface web that do not require accounts or login credentials. This type of browsing behavior is best limited to short durations and a narrow focus on widely accessible resources (i.e., news outlets, search engines, and social media sites not requiring a login). One-time use containers such as Kasm Workspaces, should be used exclusively for this type of activity, taking care not to commingle research on more than one topic of interest within a given session.

When performing passive browsing, it is often preferable to take a circuitous route to the information you seek and include irrelevant searches in the process. This can be thought of as a traditional surveillance detection route, except with the more limited goal of making it harder for a potential collector to discern your intent or correlate the activity to the rest of your research. This is considerably harder in practice than one might expect, which is why it's always good to have a plan before you begin.

As one of the most prevalent threats during passive browsing is correlation across activities, the greatest vulnerability is exposure to advertising technology. There exists an entire ecosystem dedicated to correlating devices, users, and activity that ought not be corollary. To the greatest extent possible, endeavor to use DNS filtering, fingerprint-resistant browsers, and similar tracking protections to avoid unnecessary correlation. The Electronic Frontier Foundation (EFF) is an excellent resource for this type of knowledge, while the Mullvad Browser goes to great lengths to provide similar protections as the Tor Browser, without requiring the tears of the onion.

Note: If using the Mullvad browser for passive browsing, do not configure the SOCKS5 proxy. Each Mullvad server has only one SOCKS5 proxy IP, which diminishes your overall entropy if used. Do consider using Mullvad's Defense Against AI-guided Traffic Analysis (DAITA), or similar features available through other providers. Ignore the AI marketing reference. The real value comes from Mullvad's padding of packets and generation of randomized network traffic which further complicates a sophisticated adversary's ability to discern your intent.

Subscription-based Accounts

There are a variety of use cases involving subscription-based access to information not otherwise publicly available. Many subscriptions have free-tier access and employ minimal verification, allowing easy access to valuable data in moderation. Occasionally, a provider may strongly prefer its subscribers be vetted prior to granting access to potentially sensitive data. Often, service providers have complex terms of service limiting who may use their service and under what circumstances. All of these types of services are a double-edged sword.

Subscription service providers have often done the hard work of collecting valuable information in bulk and shielding your intent from adversaries by allowing you to avoid interacting with their infrastructure. However, by necessity, a subscription provider will be able to correlate every search you ever run and every record you ever see. Further, not all service providers are created equal. Some may offer a free tier hoping to attract paid subscribers, but others may privately aggregate data about their users to sell to third-party vendors. Lastly, many noteworthy subscription services may share your data publicly as a stipulation of the service in furtherance of broader community research objectives.

Read the Terms of Service and Privacy Policy! Remember when 22k people agreed to perform community service in exchange for free WiFi? Also, why does Activision collect information about the way people smell? And who are they sharing that with? Is there some sort of Big Deodorant conspiracy I don't know about in the gaming community? (Dear Reader—If you work at Activision and have the scoop on this please reach out. I'm dying to know.)

Activision may collect about you the following categories of personal information [...] Audio, electronic, visual, thermal, olfactory, or similar information, collected from you, may be shared with our service providers...

All jokes aside, you need to understand how a provider will use your data before you can decide whether use of their service is an acceptable risk. This cost-benefit analysis will help determine whether you can approach them openly as a trusted provider (possibly even incorporating API access into your analytic workspaces), whether you can mitigate the risk through compartmentalization, or whether they should be written off altogether. Also, make sure your use case won't get you sued. While a large company may be willing to go to court over ToS, an unemployed security researcher probably isn't.

Fictitious Personas

Speaking of Terms of Service... Nearly every social media platform in existence provides some clause in its Terms of Service regulating or outright prohibiting the use of pseudonyms on its platform. A brief Google search will return a host of results detailing instances of civil and criminal proceedings involving the use of fictitious online accounts or 'sock puppets.' Some cases have succeeded where others have failed, but for many, the mere potential for legal jeopardy is enough to swear off this level of effort.

To the extent a DOJ FAQ from 2020 still carries weight, the Computer Crime & Intellectual Property Section's Cybersecurity Unit previously highlighted some Legal Considerations when Gathering Online Cyber Threat Intelligence and Purchasing Data from Illicit Sources. The document doesn't address the use of sock puppets on mainstream social media platforms but does (in my personal opinion that does not reflect the views or opinions of any element of the United States government) lean favorably toward the use of sock puppets in general. The document also goes on to discuss more intrusive measures and is worth memorizing and sharing with in-house counsel.

Prior to resorting to the use of fictitious personas, you should establish rules of engagement (RoEs) for yourself, including clear boundaries that shall not be crossed. Personas should be broken down into at least two different categories, with the potential for further segmentation in the future. In this section, we'll focus on personas with minimal engagement. These accounts exist primarily to gain access to forums and websites that require a valid account for consumption.

In most cases, these types of accounts are not actively approved or denied by an administrator. The limitation often exists solely to deter responsible web crawlers and search engines from indexing private data. In addition to the legal risks discussed above, a researcher should also consider whether site owners, administrators, or moderators pose a threat. If you are establishing an account on a forum owned or operated by criminals, terrorists, or even just really petty trolls (you know the one), then the answer is yes. Accessing the site poses a threat.

In addition to the security measures necessary to manage your attribution and legal considerations outlined by DOJ, you must also consider how your activity on the platform may be scrutinized by the administrators. Do they store logs? Do they implement ad tech? Can they read your messages? Do shared files retain metadata? All of these considerations and more should factor into your plan for how to operate a fictitious persona on the site, and what might be necessary to avoid correlating your account with other signatures that may allow the threat actor to perform attribution against you.

Online Operations

In the above category, we focused on the use of fictitious accounts solely for access to otherwise publicly accessible sources of information. Maybe you DM'd a bot to request a channel link or followed some really outrageous micro-blogger to keep up with their latest conspiracy, but the bar for further engagement was otherwise set pretty high. If you read the DOJ paper closely (You did, right? If not, go do it now. I'll wait...), there are potentially (I'm not your lawyer. I'm not even a lawyer.) legally permissible ways to operate a fictitious persona on the dark web in pursuit of sensitive, high-value intelligence relevant to your specific business need.

Asking a paranoid supervillain to share their mass murder plans with you is not something to be taken lightly. Nor is doing it on a platform operated by hackers whose livelihood depends on protecting the privacy of supervillains. I'm embellishing a bit here, but you get the point. In addition to the RoEs you've already established, this type of activity stipulates baseline knowledge, skills, and abilities you should not proceed without. You should also devise an operational plan prior to beginning and consider red-teaming the plan with a trusted colleague(s).

As potential risk and reward mount, so does complexity. At this stage, your risk matrix should include more than just how to tailor your virtual environment to mitigate the physical and digital threats of the adversary. You need to consider how your actions may influence an adversary, as well as what effect your activity may have on unseen efforts (i.e., military, law enforcement, and/or intelligence services also interested in the threat actor). Your presence and activity may unwittingly land you in the spotlight of a criminal investigation. If your tradecraft is bad, and you annoy a threat actor enough that they change their behavior, you could even potentially disrupt a sensitive access you never knew existed.

In reality, if you're on such a tight budget that you are recycling used hardware and operating over your home internet, you should really forego this use case. When properly deployed, these tools may be able to provide the level of protection necessary, but the support network required is often much larger than any one person.

Scanning and Automation

This is by far the hardest use case to defend, and the one many want to jump to first. When it becomes impractical to manually browse and collect information, or infeasible to use an existing service provider's API, you may find it necessary to implement your own scanning and/or automation techniques to identify and collect potential intelligence information. Once again, know your legal limits before proceeding.

Assuming you've determined it's legally permissible, this type of activity requires significantly greater technical understanding, as well as additional tooling, before proceeding. Yes, anyone can open up a Kali Linux VM and turn on one of dozens of tools designed to bang away at the internet. However, these tools are often not designed (or at least not configured by default) with stealth in mind. Attempting to scan, crawl, and index a security-minded organization's digital infrastructure is sure to set off more than a few alerts.

There are two general approaches to consider. If stealth is deemed unnecessary, you may opt to simply prefer a distributed approach. There are several existing tools that provide distributed forward proxy capabilities, allowing you to leverage hundreds of cloud microservices simultaneously instead of a single VPN provider. These tools are relatively cheap to deploy and may provide some level of deniability, but they will not mask the fact that the activity occurred. In fact, an attentive defender should very easily key in on a high volume of singleton requests from different cloud IPs. Some throttling may help, but ultimately, if scrutinized, the events are unlikely to reflect normal browsing patterns.

Alternatively, you may prefer stealth. Such a scenario is likely far outside the scope of this already lengthy post's original goal of providing a cheap, easy, and secure research platform. Primarily because this is neither easy nor cheap. Custom tooling takes time to build, talent to operate, and more time to maintain. It would also likely necessitate separate infrastructure from what was initially covered. In this scenario, also consider that if a custom tool is built, deployed, and compromised, it may ultimately lead to correlation and discovery of other efforts using the same methodology. This is... undesirable. If you're leaning in this direction, it's probably best not to rely on internet blogs for guidance. Instead, consult your friendly neighborhood hacker. My DMs are open.

Conclusion

This has been a lot. Probably too much for a single blog post. If it seems overwhelming, that's a good sign. It means you're probably approaching the matter with healthy doses of caution and respect. If you're just getting started but want to learn, start small. Practice configuring some VMs, and maybe set up a Kasm instance to play with. You don't have to immediately jump to scraping dark web hacker forums. Maybe just get comfortable tweaking browser configurations and watch how different changes impact your fingerprint. When you're ready to kick it up a notch, try hosting a local webserver and inspecting the logs to see what it looks like when you hit it with different tools. That's a post for another day, though.

If this post touched on topics your organization is struggling with, or you know someone hiring for a role that sounds like this, please reach out on LinkedIn or email philip@hax4libre.com. I'm open to work!

If you found this post helpful, or have constructive feedback or suggestions, I'd love to hear from you. I will attempt to make edits over time, as necessary.