MITRE ATT&CK documents 64 threat groups and malware families that employ T1119, making it one of the most widely adopted collection techniques. In March 2025, Microsoft documented Silk Typhoon using automated cloud API calls to systematically exfiltrate email, OneDrive, and SharePoint data from compromised Microsoft 365 tenants via Microsoft Graph and Exchange Web Services. In May 2025, the FBI and international partners disrupted the LummaC2 infostealer infrastructure after it had infected approximately 10 million devices globally, automatically harvesting credentials, cryptocurrency wallets, and session tokens. Gamaredon, the Russian FSB-linked group targeting Ukraine since 2013, continuously deploys scripts that automatically scan for and collect documents matching intelligence requirements. The technique spans espionage, cybercrime, and financial fraud, covering Windows, Linux, macOS, IaaS, SaaS, and Office Suite platforms.
T1119 falls under the Collection tactic (TA0009). The technique covers any use of automated methods to collect internal data after establishing access. Methods include scripting interpreters that search for and copy files matching defined criteria (file type, location, name, content), malware that automatically harvests specific data types (credentials, emails, payment cards), cloud APIs that programmatically extract data from SaaS platforms, and scheduled tasks or persistent malware that repeatedly collects data at defined intervals. The key distinction from manual collection techniques like T1005 (Data from Local System) or T1114 (Email Collection) is that T1119 is programmatic — the adversary sets criteria and the collection runs automatically, often continuously.
How Adversaries Automate Collection
Scripted File System Sweeps
The foundational form of automated collection is a script that searches the filesystem for files matching defined criteria and copies them to a staging directory. On Windows, adversaries use PowerShell (Get-ChildItem -Recurse -Include *.doc,*.pdf,*.xlsx | Copy-Item -Destination $stagingDir), cmd batch scripts (for /R C:\ %f in (*.docx) do copy /Y %f %TEMP%\collection), and WMI queries to enumerate and collect files. On Linux, find with -name or -mtime filters piped to cp or tar achieves the same result. These scripts can run once or be deployed as scheduled tasks that repeatedly collect new or modified files. Gamaredon deploys scripts on compromised Ukrainian systems that automatically scan for documents of intelligence interest, running repeatedly to catch newly created files. APT1 (Comment Crew) used batch scripts to automate discovery and collection, saving output to text files for later exfiltration. Ke3chang (APT15) performed frequent and scheduled data collection from victim networks.
Information-Stealing Malware (Infostealers)
Infostealers represent the most operationally impactful form of T1119 in 2024–2025. These malware families are designed to automatically harvest specific data categories from compromised systems with no operator interaction required. LummaC2 (Lumma Stealer), operational since at least 2022, is a MaaS platform that automatically collects browser credentials, cryptocurrency wallet data, two-factor authentication browser extensions, cookies, and system information. LummaC2 has been distributed through fake CAPTCHA pages, cracked software downloads, and GitHub-based delivery, infecting approximately 10 million devices before its infrastructure was partially disrupted by the FBI and Microsoft in May 2025. StrelaStealer, first identified in late 2022, specifically targets email credentials from Outlook and Thunderbird, using large-scale campaigns that delivered the malware to hundreds of organizations across Europe and the Americas in early 2024. Raccoon Stealer, Redline Stealer, and Vidar operate similarly, each automatically collecting defined data categories and exfiltrating them to C2 infrastructure. The stolen credentials are then sold to initial access brokers who provide entry points for ransomware operations.
Cloud API and SaaS Collection
In cloud environments, automated collection has evolved beyond filesystem searches to leverage cloud-native APIs. Silk Typhoon (Hafnium), documented by Microsoft in March 2025, demonstrates the state of the art: after compromising cloud service providers and leveraging supply chain relationships to access downstream customers, the group uses Microsoft Graph API and Exchange Web Services (EWS) to programmatically exfiltrate email, OneDrive files, and SharePoint data from Microsoft 365 tenants. By manipulating service principals and OAuth applications with administrative consents, Silk Typhoon can automate collection across dozens of customer tenants without triggering per-user authentication alerts. In August 2025, CrowdStrike documented Silk Typhoon using a custom malware family called CloudedHope for automated email theft from cloud environments. Cloud APIs, Extract-Transform-Load (ETL) services, and cloud CLI tools (aws s3 sync, az storage blob download-batch, gsutil cp) all provide mechanisms for automated bulk collection from cloud storage.
Web Skimming (Magecart)
Magecart-style web skimmers represent automated collection in the financial fraud domain. Adversaries inject JavaScript into e-commerce checkout pages that automatically captures payment card data (card number, expiration, CVV, cardholder name) and billing information as customers enter it. The captured data is exfiltrated in real time to attacker-controlled servers. This collection is entirely automated: once the skimmer script is injected, it runs on every transaction without operator interaction. Magecart attacks have affected major e-commerce platforms and payment processors, with campaigns documented by multiple security vendors throughout 2024–2025. The automation extends to server-side skimmers that intercept payment data at the server level, making them invisible to client-side security scanning.
Telecommunications Interception (MESSAGETAP)
APT41's MESSAGETAP malware represents automated collection deployed at telecommunications infrastructure level. Installed on Short Message Service Center (SMSC) servers within telecom networks, MESSAGETAP uses the libpcap library to automatically monitor all SMS traffic flowing through the server. The malware filters messages based on target phone numbers, IMSI numbers, and keywords defined in configuration files, saving matching messages to CSV files for later exfiltration. This operates continuously and autonomously once deployed — intercepting SMS messages from targeted individuals across an entire carrier network without any per-message operator action. FireEye Mandiant identified MESSAGETAP deployed against at least four telecommunications companies in 2019, demonstrating Chinese state-sponsored interest in mass communications interception.
Removable Media and USB Collection
Adversaries targeting air-gapped or isolated environments deploy malware that automatically collects data from removable media. T9000 searches removable storage devices for files matching a predefined list of extensions. Tropic Trooper's USBferry targets air-gapped environments by automatically collecting and staging data on USB devices as they are inserted into compromised systems. Sednit (APT28/Fancy Bear) has used automated collection techniques targeting air-gapped networks, with malware that copies files from USB drives when they are connected. These techniques are especially relevant for military, intelligence, and critical infrastructure targets that maintain network isolation.
Why Automated Collection Matters
Scale and Speed
Automated collection enables adversaries to operate at a scale impossible through manual methods. An infostealer can harvest credentials from 10 million devices. A cloud API script can exfiltrate email from hundreds of tenants. A Magecart skimmer can capture payment data from every transaction on a compromised e-commerce site. A MESSAGETAP deployment can intercept every targeted SMS message across an entire carrier network. The automation means that the cost per stolen record approaches zero once the infrastructure is deployed, fundamentally changing the economics of data theft.
Persistence of Collection
Unlike manual collection, which stops when the operator disconnects, automated collection continues independently. Scripts run as scheduled tasks, infostealers persist through registry keys or services, web skimmers run on every page load, and telecommunications malware operates continuously in the background. This persistence means that data theft continues even while incident responders are investigating other aspects of the intrusion — new files are being collected, new credentials are being harvested, and new messages are being intercepted in real time.
The Infostealer-to-Ransomware Pipeline
Automated credential collection by infostealers directly feeds the ransomware ecosystem. Credentials harvested by LummaC2, Raccoon, and Redline are sold on criminal markets to initial access brokers, who in turn provide network access to ransomware affiliates. This pipeline has been documented extensively: a LummaC2 infection on a single employee workstation can yield VPN credentials, RDP passwords, and cloud authentication tokens that enable a full ransomware deployment weeks later. The automation of the initial credential theft — which requires no operator skill beyond deploying the malware — has dramatically lowered the barrier to entry for the ransomware supply chain.
Real-World Case Studies
Case 1: Silk Typhoon — Automated Cloud Email Exfiltration (2025)
Microsoft's March 2025 advisory documented Silk Typhoon targeting IT supply chain providers to gain downstream access to cloud environments. After compromising cloud service providers, Silk Typhoon used leaked API keys, stolen credentials from public repositories like GitHub, and zero-day vulnerabilities (CVE-2025-0282 in Ivanti, CVE-2025-3928 in Commvault) to access customer Microsoft 365 tenants. The automated collection workflow leveraged Microsoft Graph API and Exchange Web Services to programmatically extract email, OneDrive files, and SharePoint data. CrowdStrike's August 2025 report confirmed the group deployed CloudedHope, a custom malware family designed for automated cloud email theft. The campaign demonstrated how cloud-native APIs enable collection at scale: once authenticated to a tenant, the adversary can automate the extraction of every email and file across the organization without touching individual endpoints.
Case 2: APT41 / MESSAGETAP — Automated SMS Interception
APT41's deployment of MESSAGETAP on SMSC servers within telecommunications networks represents one of the most sophisticated automated collection operations documented. The malware, discovered by FireEye Mandiant in 2019, uses libpcap to monitor all network traffic on SMSC servers, parsing protocol layers including SCTP, SCCP, and TCAP to extract SMS message data. Collection is filtered through configuration files: parm.txt contains target phone numbers and IMSI numbers, while keyword_parm.txt contains keywords of geopolitical interest. The configuration files are XOR-encoded and deleted from disk after being loaded into memory, leaving minimal forensic artifacts. Matching messages are saved to CSV files. In addition to SMS interception, APT41 queried call detail record (CDR) databases to steal records corresponding to high-ranking foreign individuals. FireEye identified at least four telecommunications companies compromised with MESSAGETAP, confirming industrial-scale automated collection of communications data.
Case 3: LummaC2 — Industrial-Scale Credential Harvesting (2022–2025)
LummaC2 (Lumma Stealer) exemplifies the Malware-as-a-Service model for automated credential collection. Operating since at least 2022, the malware automatically collects browser-stored passwords, cookies, cryptocurrency wallet files, 2FA extension data, and system information from infected endpoints. LummaC2 was distributed through multiple vectors including fake CAPTCHA pages (ClickFix), YouTube malware distribution, cracked software on torrent sites, and GitHub-based delivery chains documented by Trend Micro in January 2025. The stolen data was sold on criminal markets, feeding the initial access broker ecosystem. In May 2025, the FBI, Microsoft, and international partners seized infrastructure associated with LummaC2 after the malware had infected approximately 10 million devices globally. Despite the disruption, the MaaS model means that operators can reconstitute infrastructure and resume operations — the malware's automated collection logic survives infrastructure takedowns.
Case 4: Gamaredon — Persistent Document Collection Against Ukraine (2013–2025)
Gamaredon (Primitive Bear / Shuckworm), the Russian FSB-linked group, has deployed automated document collection scripts against Ukrainian government and military targets since at least 2013. The group's approach is distinctive for its persistence and volume: scripts are deployed on compromised systems that continuously scan for documents matching intelligence criteria (file types, modification dates, directory locations) and stage them for exfiltration. Gamaredon is known for rapidly re-infecting systems after remediation, ensuring that automated collection resumes quickly even after incident response. ESET's June 2020 research documented the group's growing automation capabilities, and the campaign has continued through the Russia-Ukraine conflict, with CERT-UA issuing multiple advisories on Gamaredon's ongoing automated collection activities through 2024–2025.
Case 5: StrelaStealer — Targeted Email Credential Harvesting (2024)
StrelaStealer, first documented by DCSO CyTec in November 2022, evolved into a large-scale automated email credential harvester by early 2024. Palo Alto Unit 42 and IBM X-Force documented massive campaigns targeting organizations across Europe and the Americas. The malware specifically targets email client credentials from Microsoft Outlook and Mozilla Thunderbird, automatically extracting stored login data, IMAP/SMTP server configurations, and cached authentication tokens. Unlike broader infostealers that collect everything, StrelaStealer's narrow focus on email credentials makes it particularly dangerous for Business Email Compromise (BEC) operations: stolen email credentials provide direct access to corporate email accounts for fraud, intelligence collection, and further phishing distribution.
Detection Strategies
| Data Source | Detection Focus | Key Indicators |
|---|---|---|
| Process Creation (Sysmon EID 1) | Script-based collection | PowerShell with Get-ChildItem/Copy-Item patterns, cmd batch loops copying files by extension, Python/bash scripts with find/copy operations, scheduled tasks executing collection scripts |
| Command Execution (EID 4104) | PowerShell collection scripts | Script Block Logging showing file enumeration with -Include, -Recurse filters followed by Copy-Item or Compress-Archive; cloud API calls via Invoke-RestMethod to Graph API endpoints |
| File Access (EID 4663) | Bulk file access patterns | Single process accessing hundreds of files across multiple directories in rapid succession, especially targeting document and credential storage locations |
| Cloud API Logs | Automated cloud extraction | Microsoft Graph API calls for mail/messages, OneDrive file enumeration, SharePoint downloads; high-volume API calls from service principals; OAuth consent grants to new applications |
| Network Traffic | Exfiltration of collected data | Outbound HTTP/S connections carrying large data volumes from processes not typically generating network traffic; connections to known infostealer C2 domains |
| Scheduled Task Creation (EID 4698) | Persistent collection | New scheduled tasks executing PowerShell, Python, or batch scripts that perform file enumeration or data collection; tasks running at unusual intervals or during off-hours |
| Web Application Logs | Skimmer injection | Unauthorized JavaScript additions to checkout pages, iframe injections, external script loads from suspicious domains, POST requests to non-payment-processor endpoints during checkout |
Splunk / SIEM Detection Queries
Detect scripted file collection patterns on Windows:
index=sysmon EventCode=1
((Image="*\\powershell.exe" AND CommandLine IN (
"*Get-ChildItem*-Include*.doc*Copy-Item*",
"*Get-ChildItem*-Include*.pdf*-Recurse*",
"*Get-ChildItem*-Include*.xlsx*Copy-Item*",
"*gci*-fi*.doc*|*cp*", "*dir*findstr*.docx*"))
OR (Image="*\\cmd.exe" AND CommandLine IN (
"*for /R*in (*.doc*)do copy*",
"*for /R*in (*.pdf*)do copy*",
"*dir*findstr*.doc*"))
OR (Image="*\\python*" AND CommandLine IN (
"*shutil.copy*", "*glob.glob*")))
| stats count by Computer, User, Image, CommandLine, ParentImage
| sort -count
Detect automated cloud API collection (Microsoft 365):
index=azure OR index=o365 sourcetype="ms:aad:audit" OR sourcetype="o365:management:activity"
(Operation IN ("MailItemsAccessed", "FileDownloaded", "FileSyncDownloadedFull",
"SearchQueryPerformed", "FileAccessed")
AND (AppId!="known_legitimate_app_id"))
| bin _time span=1h
| stats count dc(ObjectId) as unique_objects values(Operation) as operations
by UserId ClientAppId AppDisplayName _time
| where count > 100 OR unique_objects > 50
| sort -count
Detect infostealer activity via browser credential access:
index=sysmon EventCode=11
(TargetFilename IN (
"*\\AppData\\Local\\Google\\Chrome\\User Data\\Default\\Login Data*",
"*\\AppData\\Roaming\\Mozilla\\Firefox\\Profiles\\*\\logins.json*",
"*\\AppData\\Local\\Microsoft\\Edge\\User Data\\Default\\Login Data*",
"*\\AppData\\Roaming\\Thunderbird\\Profiles\\*\\*",
"*\\AppData\\Roaming\\Outlook\\*"))
AND Image!="*\\chrome.exe" AND Image!="*\\firefox.exe"
AND Image!="*\\msedge.exe" AND Image!="*\\OUTLOOK.EXE"
| stats count values(TargetFilename) as accessed_files by Computer, User, Image
| where count > 2
| sort -count
Detect scheduled task creation for persistent collection:
index=sysmon EventCode=1
(Image="*\\schtasks.exe" AND CommandLine="*/Create*"
AND CommandLine IN ("*powershell*Get-ChildItem*",
"*powershell*Copy-Item*", "*cmd*/c*copy*",
"*python*", "*.ps1*", "*.bat*"))
OR (EventCode=4698
AND TaskContent IN ("*Get-ChildItem*", "*Copy-Item*",
"*Compress-Archive*", "*find*-name*"))
| stats count by Computer, User, CommandLine, ParentImage
| sort -count
Threat Actors and Malware
State-Sponsored Groups
| Actor | Collection Methods | Notable Context |
|---|---|---|
| Silk Typhoon / Hafnium (PRC) | Microsoft Graph API, EWS, CloudedHope malware | Automated email/file theft from M365 via supply chain compromise (2025) |
| APT41 (PRC) | MESSAGETAP SMS interception, CDR database queries | Automated SMS/call metadata collection from 4+ telcos |
| Gamaredon / Shuckworm (Russia/FSB) | Automated document scanning scripts | Persistent collection against Ukraine since 2013; rapid re-infection |
| Turla (Russia/FSB) | LightNeuron email interception | Automated email collection via Exchange Transport Agent |
| Ember Bear (Russia) | Mass collection from compromised systems | Bulk automated collection during intrusions |
| InvisiMole (linked to Gamaredon) | Automated screenshot capture, file collection | Espionage platform with scheduled collection modules |
| Patchwork / Dropping Elephant (India) | Automated file collection by extension | Targeting Pakistani government and military entities |
| SideWinder (India) | Automated document collection scripts | Targeting government and military across South Asia |
| Tropic Trooper (PRC) | USBferry automated USB collection | Air-gapped environment targeting |
Infostealers and Cybercrime
| Malware / Group | Collection Target | Notable Context |
|---|---|---|
| LummaC2 / Lumma Stealer | Browser creds, crypto wallets, 2FA tokens, cookies | ~10M devices infected; MaaS model; FBI disruption May 2025 |
| StrelaStealer | Outlook and Thunderbird email credentials | Large-scale EU/Americas campaigns (2024); narrow email focus |
| Raccoon Stealer | Browser creds, crypto wallets, system info | Operator arrested 2022; v2 resumed; IAB pipeline |
| Redline Stealer | Browser creds, crypto, VPN configs, FTP creds | Infrastructure seized by Dutch police (Oct 2024); prevalent IAB source |
| Magecart (various groups) | Payment card data (PAN, CVV, expiry, name) | JavaScript skimmers on e-commerce checkout pages; server-side variants |
| Valak | Enterprise credentials, email data | Modular loader with automated credential collection capabilities |
| Netwire RAT | Keylogging, credentials, screenshots | Automated keylogging and credential harvesting from infected systems |
| FIN6 | POS data, payment card credentials | Automated POS scraping in hospitality and retail environments |
Defensive Recommendations
- Monitor for bulk file access patterns. Automated collection generates distinctive filesystem activity: a single process accessing hundreds of files across multiple directories within seconds. Use Sysmon (EID 11 for file creation, EID 4663 for object access) or EDR telemetry to identify processes that enumerate and copy files at rates inconsistent with normal user behavior. Alert on processes accessing browser credential stores, email client data files, or cryptocurrency wallet directories when the accessing process is not the legitimate application.
- Implement cloud API monitoring and rate limiting. For Microsoft 365 environments, monitor Unified Audit Log for high-volume MailItemsAccessed, FileDownloaded, and SearchQueryPerformed operations. Establish baselines for normal API call volumes per application and per user. Alert on service principals making bulk data access calls, especially those associated with recently created or modified OAuth applications. Implement Conditional Access policies that restrict API access to trusted locations and compliant devices.
- Deploy endpoint detection for infostealer behavior. Infostealers access a predictable set of filesystem locations: browser credential databases (Login Data, logins.json), cookie stores, cryptocurrency wallet directories (Electrum, Exodus, MetaMask), and email client profiles. Create detection rules that alert when non-browser processes access these locations. EDR solutions with behavioral detection can identify the characteristic access pattern of reading multiple credential stores within a single execution.
- Restrict and audit scheduled task creation. Automated collection often relies on scheduled tasks for persistence. Monitor Event ID 4698 (scheduled task creation) and Sysmon for
schtasks.exeexecution with script parameters. Restrict scheduled task creation to authorized administrators and alert on tasks created by non-standard processes or during unusual hours. - Implement Content Security Policy for web applications. To defend against Magecart-style skimmers, deploy strict Content Security Policies (CSP) that restrict which scripts can execute on checkout pages. Monitor for unauthorized script injections, new external script references, and POST requests to non-approved endpoints during payment processing. Consider Subresource Integrity (SRI) for all third-party scripts.
- Encrypt sensitive data at rest and restrict access. Encrypting files that contain sensitive information limits the value of automated file collection. Use filesystem-level encryption (BitLocker, LUKS) and application-level encryption for databases. Implement strict file access controls so that automated scripts running under compromised user contexts cannot access files outside the user's authorized scope.
- Monitor for credential harvesting indicators. Track access to LSASS process memory, SAM registry hive reads, and NTDS.dit access attempts. These are precursors to automated credential collection. On Linux, monitor access to
/etc/shadow, SSH key directories, and browser profile directories from unexpected processes. - Implement network segmentation for collection-critical systems. Systems that process or store high-value data (mail servers, file servers, payment systems, telecommunications infrastructure) should be segmented from general user networks. This limits the scope of automated collection by preventing collection scripts on compromised workstations from reaching servers containing targeted data.
MITRE ATT&CK Mapping
| Field | Value |
|---|---|
| Technique ID | T1119 |
| Technique Name | Automated Collection |
| Tactic | Collection (TA0009) |
| Platforms | Windows, Linux, macOS, IaaS, SaaS, Office Suite |
| Sub-techniques | None |
| Data Sources | Process: Process Creation, Command: Command Execution, File: File Access, File: File Creation, Script: Script Execution, Cloud Service: Cloud Service Enumeration |
| Mitigations | M1041 (Encrypt Sensitive Information), M1029 (Remote Data Storage) |
| Related Techniques | T1005 (Data from Local System), T1039 (Data from Network Shared Drive), T1114 (Email Collection), T1074 (Data Staged), T1083 (File and Directory Discovery), T1560 (Archive Collected Data) |
| MITRE ATT&CK Reference | attack.mitre.org/techniques/T1119 |
Sources and References
The following references were used in compiling this technique briefing. Where possible, primary sources (vendor advisories, government alerts, original research) were prioritized over secondary reporting.
- MITRE ATT&CK — T1119 Automated Collection (updated October 2025): attack.mitre.org
- Microsoft Threat Intelligence — Silk Typhoon Targeting IT Supply Chain (March 2025): microsoft.com
- CrowdStrike — Silk Typhoon Attacks North American Orgs in the Cloud (August 2025): darkreading.com
- FireEye Mandiant — MESSAGETAP: Who's Reading Your Text Messages? (October 2019): darkreading.com
- MITRE ATT&CK — MESSAGETAP (S0443): attack.mitre.org
- MITRE ATT&CK — Lumma Stealer (S1213): attack.mitre.org
- Palo Alto Unit 42 — Large-Scale StrelaStealer Campaign in Early 2024 (March 2024): unit42.paloaltonetworks.com
- ESET — Gamaredon Group Grows Its Game (June 2020): welivesecurity.com
- Invictus IR — Silk Typhoon Threat Profile: Tactics and Defenses (2025): invictus-ir.com
- Atomic Red Team — T1119 Automated Collection Tests: github.com/redcanaryco