T1005 — Data from Local System sits within the Collection tactic (TA0009) of the MITRE ATT&CK Enterprise matrix. Unlike network-based collection techniques, T1005 involves pulling data directly from the compromised endpoint's storage: its drives, databases, configuration files, and in some implementations, its live process memory. The technique has no sub-techniques, applying equally to Windows, Linux, macOS, ESXi, and network device platforms.
Its prevalence is a function of simplicity. An attacker who has already gained foothold can collect local data using nothing more than built-in operating system utilities, leaving a minimal forensic footprint compared to deploying custom tooling. The collected data is almost always staged locally before being transferred out through a separate exfiltration technique, making T1005 a precursor step that frequently appears earlier in an attack chain.
What Attackers Target on Local Systems
The files adversaries prioritize reflect their objectives. Espionage-focused operators tend to target documents, email archives, and configuration files containing credentials or network topology details. Financially motivated actors lean toward anything that enables lateral movement, ransomware staging, or immediate sale: credential stores, browser databases, PII, and financial records. Several consistent categories emerge across threat actor reports.
Office documents and productivity files are the most frequently targeted file type. Malware families such as BADNEWS, BadPatch, and AuTo Stealer are specifically coded to recursively walk the file system and collect files with extensions including .doc, .docx, .xls, .xlsx, .pdf, .ppt, .pptx, .txt, and .mdb. These formats reliably contain internal communications, financial records, project data, and intellectual property.
Credential and authentication material ranks equally high. Targets include SSH private keys from ~/.ssh/ directories, browser credential databases, saved passwords in application configuration files (FileZilla, WinSCP, remote desktop tools), Windows Credential Manager stores, and DPAPI-protected blobs. Collecting these locally avoids triggering network authentication alerts.
Network device configurations are a specialized but high-value target. Running configurations on routers, switches, and firewalls contain interface addresses, routing tables, access control lists, VPN pre-shared keys, and SNMP community strings — data that enables further network penetration with minimal effort. Nation-state actors targeting telecommunications and critical infrastructure consistently prioritize this material.
Database files and application data including SQLite databases, registry hive exports, and application-specific stores appear frequently in infostealer and banking trojan activity. Tools like QakBot specifically seek browser databases containing autofill credentials and session cookies.
Logs and event data are collected both for operational intelligence (understanding what the victim has detected) and to stage for later anti-forensic destruction. Aquatic Panda was documented using the Windows wevtutil utility to extract security event logs to an .evtx file before exfiltrating them.
T1005 covers data collected from the local host's storage and memory. It is distinct from T1039 (Data from Network Shared Drive), which covers collection from mapped network resources, and T1213 (Data from Information Repositories), which covers platforms like SharePoint, Confluence, or code repositories. The distinction matters for detection engineering because the data sources, monitoring controls, and alert signatures differ significantly between local and network collection.
Native Tools and Commands Abused
A recurring theme in T1005 usage is the reliance on living-off-the-land (LotL) tools — operating system utilities that are pre-installed, trusted by defenders, and rarely flagged by signature-based controls. Adversaries leverage these to avoid deploying custom binaries that may trigger endpoint detection. The following tools appear consistently across incident reports and malware analyses.
Windows: cmd.exe and dir
The dir command is a fundamental reconnaissance and staging tool. Attackers use the /s switch for recursive subdirectory searches and wildcards to filter by extension. The Voldemort backdoor malware, reported in August 2024, used dir to enumerate folders and files on compromised systems as a first-stage collection step before staging data for exfiltration.
# Recursive search for all Office documents
dir /s /b C:\Users\*.docx C:\Users\*.xlsx C:\Users\*.pdf
# Find files modified in the last 7 days
dir /s /b /a-d C:\Users\ | findstr /r "202[4-9]"
Windows: findstr
The findstr utility searches file contents for patterns or keywords, making it effective for identifying credential material without manually reviewing each file. In November 2024, CISA's updated advisory on the BianLian ransomware group documented the following command used to extract passwords from files across the compromised system:
# BianLian — documented by CISA, November 2024
findstr /spin "password" *.* >C:\Users\training\Music\output.txt
The flags in this command: /s searches subdirectories recursively, /p skips files with unprintable characters, /i makes the search case-insensitive, and /n prints the line number of each match. The output is redirected to a staging file in an innocuous directory.
Windows: PowerShell (Get-ChildItem, Select-String)
Get-ChildItem provides the PowerShell equivalent of dir with substantially more filtering capability. Mustang Panda (a Chinese APT group) uses Get-ChildItem within a script named getdata.ps1 to enumerate Desktop contents and stage documents for exfiltration. The Twelve hacktivist group, documented in May 2024, combined Get-ChildItem with Select-String to identify and collect sensitive files matching specific patterns:
# Mustang Panda — getdata.ps1 component
Get-ChildItem ([environment]::getfolderpath("desktop"))
# Twelve hacktivist group — May 2024
Get-ChildItem -Path C:\ -Recurse -Include *.doc,*.docx,*.xls,*.xlsx,*.pdf `
| Select-String -Pattern "confidential|secret|password" `
| Select-Object Path | Export-Csv C:\staging\hits.csv
Windows: wevtutil
wevtutil is the Windows Event Log utility, used legitimately by administrators to query, export, and clear event logs. Aquatic Panda was documented using it to export security event logs to .evtx files before exfiltration — providing both intelligence about what the victim's monitoring systems had captured and a potential anti-forensic tool to erase traces of the intrusion.
# Export Security event log (used by Aquatic Panda)
wevtutil epl Security C:\Windows\Temp\sec.evtx
Linux and macOS: ls, find, grep, tar
On Linux and macOS systems, the core collection toolkit shifts to native shell utilities. The find command provides recursive file system traversal with extensive filtering options. grep (and its recursive variant grep -r) serves the same content-search function as findstr/Select-String. Collected files are frequently staged with tar or zip for compression before exfiltration.
# Common Linux collection pattern
find /home /root /etc -type f \( -name "*.pem" -o -name "*.key" -o -name "id_rsa" \) 2>/dev/null
# Search for credentials in configuration files
grep -r "password\|passwd\|secret\|api_key" /etc /home --include="*.conf" --include="*.cfg" -l 2>/dev/null
# Stage findings for exfiltration
tar czf /tmp/.cache_data.tgz /tmp/staging/
Network Devices: CLI Commands
MITRE added network device support to T1005, reflecting observed behavior by state-sponsored actors who gain access to network infrastructure and use native CLI commands to export running configurations. On Cisco IOS, show running-config and show startup-config dump the full device configuration including interface addresses, routing tables, access lists, VPN settings, and authentication credentials.
Living-off-the-land execution of built-in tools is notoriously difficult to distinguish from legitimate administrative activity. A 2025 CardinalOps analysis found that enterprise SIEMs cover only 21% of MITRE ATT&CK techniques on average, and many collection-phase techniques remain in that uncovered majority. Behavioral baselining — understanding what is normal for a specific account, host, or process tree — is necessary to surface LotL abuse reliably.
Threat Actors Known to Use T1005
MITRE ATT&CK documents over 100 threat groups and software families that leverage T1005. The table below covers groups of particular current relevance, drawing from official ATT&CK procedure examples, CISA advisories, and vendor intelligence reports through early 2026.
| Actor | Origin | Sector Targets | T1005 Usage |
|---|---|---|---|
| Silk Typhoon (HAFNIUM) | PRC | Government, healthcare, law firms, NGOs, IT supply chain | Post-compromise data collection from on-premises environments including Active Directory dumps, key vault contents, and internal document stores. March 2025 Microsoft report documented collection via stolen API keys to access downstream customer environments. |
| Volt Typhoon | PRC | Critical infrastructure (energy, water, comms, transportation) | Collects local data using native Windows tools (PowerShell, WMIC, netsh) to avoid custom malware detection. Targets network device configurations and credential stores to enable long-term pre-positioning in critical infrastructure networks. |
| APT28 (Fancy Bear) | RU | Government, military, political organizations, defense contractors | Retrieves internal documents from victim environments using Forfiles to stage specific file types before exfiltration. Documented collection from German Bundestag, DNC, and multiple NATO-aligned government networks. |
| APT29 (Cozy Bear) | RU | Government, intelligence services, think tanks | Steals data from compromised hosts; UEFI-level implants and sophisticated RATs provide persistent local access for ongoing collection. Documented in the SolarWinds supply chain campaign and Microsoft Exchange intrusions. |
| Sandworm | RU | Energy, industrial control systems, government | Collects data as part of destructive campaigns; often combines T1005 with data wiping (T1485) to extract then destroy evidence. ANSSI documented collection from Centreon-based monitoring systems in 2021. |
| Kimsuky | DPRK | South Korean government, military, research, diplomacy | AppleSeed backdoor and RokRAT both include local data collection modules targeting documents and configuration files. Troll Stealer, documented in early 2024, specifically targets SSH keys from .ssh/ directories in addition to standard document collection. |
| Andariel (Lazarus subgroup) | DPRK | Defense, financial, energy, healthcare | Collects large numbers of files from compromised network systems for later extraction; particularly active against South Korean defense contractors and financial institutions. |
| APT41 (Winnti) | PRC | Technology, healthcare, gaming, government | Dual-purpose actor (espionage and financial crime). Uploads files and PII from compromised hosts; documented collecting data from U.S. state government networks in the C0017 campaign. Uses Brute Ratel C4 and Cobalt Strike for staged collection. |
| APT39 (Chafer) | IR | Telecommunications, travel, IT services | Uses various remote access tools to steal files from compromised hosts. Documented collecting personnel information from telecommunications providers — consistent with an intelligence-collection mission focused on tracking individuals. |
| BRONZE BUTLER (Tick) | PRC | Japanese enterprise, defense, research | Exfiltrates files stolen from local systems; campaign targeting Japanese enterprises documented by Secureworks in 2017 included sustained collection of technical documents and research data. |
| BianLian | Cybercrime | Healthcare, manufacturing, critical infrastructure | Shifted to exfiltration-only extortion by January 2024. Uses PowerShell cmdlets including findstr, Get-ChildItem, and Select-String to locate and stage sensitive files. CISA's November 2024 advisory contains specific command-line artifacts. |
| LockBit (various versions) | Cybercrime (RaaS) | Healthcare, finance, manufacturing, government | Double-extortion model requires local collection before encryption. StealBit exfiltration tool automates collection and staging. Affiliates use standard Windows and PowerShell collection tools. LockBit 5.0 (ChuongDong), observed September 2025, adds cross-platform collection for Windows, Linux, and ESXi. |
| Mustang Panda (TA416) | PRC | Government, NGOs, research institutes, religious organizations | Custom getdata.ps1 script uses Get-ChildItem to enumerate and collect documents. PlugX-based campaigns include file collection modules targeting documents matching reconnaissance-identified patterns. |
| APT1 (Comment Crew) | PRC | Defense, aerospace, energy, telecommunications | Foundational documentation from Mandiant's 2013 report established T1005 patterns still observed today. Collected files from victim machines at scale, systematically targeting IP, technical specifications, and business strategies. |
| Agrius | IR | Israeli technology companies, higher education | Collects data from database and critical servers before deploying wiping mechanisms. Combines T1005 with T1485 (Data Destruction) — stealing then destroying — making recovery and attribution more difficult. |
Malware and Tool Families That Implement T1005
Beyond direct operator execution of native tools, a substantial number of implants, RATs, and infostealer families include dedicated T1005 functionality as a core module. The following represent some of the more widely documented examples from ATT&CK's procedure catalog and recent threat intelligence reporting.
BADNEWS (associated with the Patchwork APT) crawls local drives on first execution and collects files with extensions including .doc, .docx, .pdf, .ppt, .pptx, and .txt. This automated, extension-based sweep is characteristic of espionage-oriented implants built to quickly identify productive targets.
BeaverTail, attributed to North Korean threat actors and documented by Unit 42 in October 2024 as part of the Contagious Interview campaign targeting tech sector job seekers, exfiltrates data collected from local systems. The campaign used fake technical interviews and malicious npm packages to deliver the implant to software developers — giving attackers access to development workstations with potentially valuable source code and credentials.
Bumblebee captures and compresses credentials from the Windows Registry and volume shadow copies, staging them for upload. Its credential-focused collection reflects its role as a loader in ransomware and access broker operations.
QakBot specifically targets browser databases containing stored credentials and session data in addition to collecting broader file system data. Its ability to harvest autofill data and session cookies enables immediate account takeover at scale.
RedLine is a commercially available infostealer that collects browser credentials, cryptocurrency wallets, FTP credentials, and application data from the local system. A November 2024 analysis documented its backend infrastructure and the scope of collection activity across compromised endpoints.
Troll Stealer, attributed to Kimsuky and analyzed in early 2024, specifically seeks SSH keys from the .ssh/ directory — reflecting an interest in pivoting to additional systems via key-based authentication in addition to standard document and credential collection.
Cobalt Strike, the widely abused penetration testing platform, includes built-in functionality for collecting local system data through its Beacon payload. Its download and file operations capabilities are used by both legitimate red teams and malicious operators across the T1005 use case spectrum.
In modern double-extortion ransomware operations, T1005 is a mandatory precursor step. Data must be collected and staged — often days or weeks before encryption begins — to enable the extortion threat. BianLian's full pivot to exfiltration-only extortion by January 2024 illustrates how T1005 has become the primary lever in these attacks, with encryption no longer required. Understanding the collection phase is therefore as important as understanding the ransomware payload itself.
Notable Real-World Incidents
Silk Typhoon — U.S. Treasury and IT Supply Chain (2024–2025)
Silk Typhoon (formerly tracked as HAFNIUM) conducted a high-profile intrusion into U.S. Treasury Department systems in December 2024 using a stolen BeyondTrust API key. Once inside, the group accessed Treasury employee workstations and collected internal documents from the unclassified network. The Office of Foreign Assets Control (OFAC), which administers economic sanctions, was among the compromised units. Microsoft's March 2025 report detailed how the group abused stolen API keys from privilege access management providers to reach downstream customer environments at scale, collecting data across multiple victim organizations through a single initial compromise.
Volt Typhoon — U.S. Critical Infrastructure Pre-Positioning
Volt Typhoon's sustained campaign against U.S. critical infrastructure, documented across multiple CISA advisories and confirmed by the FBI's January 2024 botnet disruption action, demonstrates T1005 in the context of long-term strategic pre-positioning. The group collects local data — including network device configurations and credential stores — not necessarily for immediate exfiltration, but to establish a persistent understanding of target environments that could enable disruption of water, power, and communications infrastructure in a future conflict scenario. By January 2025, the U.S. had identified over 100 Volt Typhoon intrusions across the country and its territories.
BianLian — Healthcare and Manufacturing Sector Targeting
BianLian's November 2024 CISA advisory update documented a mature local collection methodology built around PowerShell and native Windows tools. The group uses findstr /spin "password" *.* to sweep entire directory trees for credential material, redirecting output to innocuously named staging files in user directories. Their shift to exfiltration-only extortion makes T1005 execution the defining step of their attack — without successful local collection, there is no leverage for ransom demands.
Contagious Interview Campaign (DPRK) — Tech Sector Job Seekers
Unit 42 documented this North Korean campaign in October 2024. Threat actors posed as technical recruiters to lure software developers into installing malicious npm packages and BeaverTail implants under the guise of completing a coding challenge. Once installed, BeaverTail collected local system data from developer workstations — environments that typically contain source code repositories, API keys, cloud credentials, and access tokens for internal development infrastructure. The campaign specifically targeted the tech industry, where developer machines represent high-value collection targets.
LockBit — Ongoing Operations Through 2025
LockBit's data collection phase — executed before encryption in traditional double-extortion operations — relies on the StealBit exfiltration tool combined with affiliate-executed PowerShell collection scripts. Following Operation Cronos in February 2024 which seized LockBit infrastructure, the group demonstrated resilience: by September 2025 it had returned with LockBit 5.0 (ChuongDong), adding dedicated ESXi and Linux collection and encryption payloads to its Windows-focused toolkit. A May 2025 breach of LockBit's own admin panel leaked database records covering December 2024 through April 2025, providing insight into the scale of collection operations across hundreds of victim organizations during that period.
Detection Guidance
Detecting T1005 reliably requires combining multiple data sources. No single indicator is sufficient because the technique uses legitimate tools with legitimate purposes — the difference between an administrator reviewing files and an attacker collecting them often lies in behavioral context rather than the specific commands executed.
Process and command-line monitoring is the primary detection surface. Windows Event ID 4688 (process creation with command-line logging enabled) and Sysmon Event ID 1 capture the process execution chain. Detections should focus on unusual parent-child process relationships (e.g., Office applications spawning PowerShell), recursive file system enumeration initiated from non-standard parent processes, and command lines containing collection-indicative flags like findstr /spin, Get-ChildItem -Recurse with output redirection, or dir /s /b targeting document-heavy directories.
File system activity monitoring through Sysmon Event ID 11 (FileCreate) and Windows audit policy (Object Access auditing for sensitive paths) captures large-scale file reads or copies. Anomalies to watch for include processes accessing large numbers of files in a short window, file access patterns that span directories unrelated to the process's normal function, and compression or archiving of collected files into staging locations.
PowerShell logging — specifically Script Block Logging (Event ID 4104) — captures the actual content of PowerShell commands, not just the fact that PowerShell ran. This catches obfuscated collection scripts that may evade command-line argument monitoring. Module logging (Event ID 4103) additionally captures the output of collection cmdlets in verbose configurations.
User and Entity Behavior Analytics (UEBA) provides the behavioral baseline needed to contextualize native tool usage. A system administrator who regularly runs PowerShell against file servers looks identical in raw logs to an attacker doing the same thing — UEBA layers the question of whether this behavior is normal for this account, at this time, from this endpoint.
Data Loss Prevention (DLP) monitoring can detect bulk file staging operations, particularly when files are compressed or written to unusual directories. DLP tools positioned on egress points may also catch the subsequent exfiltration phase, which can be traced back to a T1005 collection event if logs are correlated across the attack timeline.
Network device monitoring requires different detection approaches. Network Device CLI monitoring — capturing commands issued through management interfaces — can surface show running-config or show startup-config executed outside of expected maintenance windows or from unexpected source addresses.
A June 2025 CardinalOps analysis found that enterprise SIEMs have detection coverage for only 21% of MITRE ATT&CK techniques on average, and an estimated 13% of existing detection rules are non-functional due to misconfigured data sources or missing log fields. Organizations should validate that Process Creation auditing and PowerShell Script Block Logging are actually enabled and ingesting correctly — not assumed to be active — before relying on T1005 detection rules.
Mitigations
MITRE ATT&CK does not list specific mitigations for T1005, noting that the technique uses legitimate system functionality that cannot be blocked without impacting operations. The defensive approach therefore focuses on reducing attacker access to sensitive data, increasing detection fidelity, and limiting the blast radius when collection does occur.
Privileged access restrictions limit what an attacker can collect even after achieving initial access. Standard user accounts should not have read access to sensitive file shares, credential stores, or configuration archives beyond what their role requires. Separating administrative credentials from user accounts reduces the likelihood that a compromised workstation provides access to high-value collection targets.
Data classification and sensitive path monitoring lets organizations define where high-value files live and apply tighter monitoring to access events in those locations. Applying DLP controls, enhanced audit logging, and alerting thresholds specifically to sensitive data stores makes collection from those locations more likely to surface in detection pipelines.
Endpoint Detection and Response (EDR) with behavioral analysis is the most effective technical control. EDR platforms that build behavioral baselines can identify anomalous file access patterns — large volumes of file reads by an unusual process, enumeration of sensitive directories by a process with no legitimate business reason — even when the specific tools used are not inherently malicious.
Credential hygiene directly reduces the yield of T1005 operations targeting authentication material. Eliminating plaintext credential storage in configuration files, rotating credentials regularly, using certificate-based authentication where possible, and deploying secrets management platforms (rather than storing API keys in files) reduces what an attacker can extract from a local collection sweep.
PowerShell Constrained Language Mode and AppLocker/WDAC policies limit the capabilities available to PowerShell and restrict which binaries can execute on endpoints. These controls increase the cost of collection for attackers who rely on PowerShell cmdlets, though sophisticated actors have workarounds including direct API access that bypasses these restrictions.
Logging completeness validation is an underappreciated mitigation. Ensuring that command-line auditing, PowerShell Script Block Logging, and file access auditing are actually enabled, correctly configured, and properly ingested by the SIEM is foundational to any detection strategy for T1005. A detection rule that depends on a misconfigured or absent data source provides false confidence.
Key Takeaways
- T1005 is nearly universal in multi-stage attacks: Whether the end goal is espionage, ransomware extortion, or financial fraud, local data collection appears in the chain. Its presence should be treated as an indicator that an attacker has already achieved substantial access and is preparing for exfiltration or further lateral movement.
- Living-off-the-land is the dominant execution method: The consistent use of cmd.exe, PowerShell, findstr, wevtutil, and similar native utilities means signature-based detection has limited value. Behavioral anomaly detection, UEBA, and process tree analysis are required for reliable coverage.
- Ransomware operators have made T1005 non-optional: BianLian's full shift to exfiltration-only extortion by 2024 illustrates that encryption is no longer the primary leverage mechanism for many ransomware groups. The collection phase is the attack.
- Nation-state actors increasingly target network device configurations and cloud credentials: Groups like Volt Typhoon and Silk Typhoon have expanded the T1005 target set beyond traditional documents to include router configs, API keys, PAM vault contents, and AAD/Entra Connect synchronization data — material that enables persistence and lateral movement at infrastructure scale.
- Detection coverage gaps are widespread: Industry data confirms that most enterprise SIEMs cover less than a quarter of ATT&CK techniques. Validating that process creation auditing and PowerShell logging are correctly enabled and ingested should be a prerequisite for any T1005 detection engineering effort.
T1005 is not a sophisticated technique in isolation — it is the logical consequence of having compromised a system. An attacker on a box will look for files. The challenge for defenders is that this behavior is indistinguishable from legitimate administration without sufficient behavioral context. Investment in baselining normal file access patterns, enabling comprehensive logging, and deploying EDR with behavioral analytics addresses the detection gap more effectively than attempting to block the technique at the tool level.