During our recent engagement targeting Indian SME infrastructure, I observed a recurring pattern: standard vulnerability scanners often fail when hitting assets behind Tier-2 ISP gateways like Alliance or BSNL. These gateways frequently utilize aggressive NATing and stateful packet inspection that triggers TCP resets when concurrent connection counts spike. To build an effective web security research automation pipeline, we moved away from monolithic scanners toward a modular approach using Go and Nuclei.
Defining Web Security Research Automation
Automation in this context isn't just about running a tool; it is about programmatically chaining discovery, fingerprinting, and targeted exploitation. We define it as the orchestration of disparate tools into a cohesive engine that can ingest a single domain and output verified, triaged vulnerabilities with minimal human intervention.
In our current workflows, I prioritize the use of Go-based tools because of their native concurrency models. While Python is excellent for quick scripts, Go's binary distribution and low memory footprint allow us to deploy scanners on low-cost VPS instances across different geographic regions to bypass regional IP blacklisting. Managing these distributed nodes is significantly easier when using a browser based SSH client that centralizes access without the overhead of local configuration.
Modern automation involves the use of Domain Specific Languages (DSLs) like the one used by ProjectDiscovery's Nuclei. This allows researchers to write "templates" that describe a vulnerability's signature rather than writing complex regex or socket logic for every new CVE. This shift allows us to react to zero-day disclosures within minutes of a PoC being released on the NIST NVD.
The Evolution from Manual Pentesting to Automated Workflows
Ten years ago, a standard penetration test involved manually crawling a site with Burp Suite and testing individual parameters. Today, the attack surface for a typical Indian enterprise has expanded to include thousands of subdomains, cloud buckets, and exposed API endpoints. Manual testing at this scale is mathematically impossible.
We have transitioned to "continuous recon." Instead of a point-in-time assessment, we maintain a persistent state of the target's infrastructure. When a new subdomain is registered, our pipeline automatically triggers a port scan, identifies the technology stack, and runs relevant Nuclei templates.
This evolution is driven by the speed of exploitation. For instance, when CVE-2024-21887 (Ivanti Connect Secure Command Injection) was disclosed, we saw exploitation attempts on Indian corporate VPN gateways within 24 hours. Automated workflows are the only way to identify these vulnerable assets before malicious actors do.
Why Automation is Essential for Modern Cybersecurity
The complexity of modern web stacks, involving microservices and K8s clusters, creates a "blind spot" for traditional security teams. Automation ensures that every asset, including that forgotten staging server on an old IP range, is subjected to the same rigorous checks as the main production environment.
In the context of the Digital Personal Data Protection (DPDP) Act 2023, Indian organizations now face significant financial penalties for data breaches. Automation provides a defensible audit trail of security testing, often integrated with a SIEM for real-time threat detection, proving that "reasonable security practices" were followed. It moves security from a reactive "fix after breach" model to a proactive "detect before exploit" model.
Scaling Vulnerability Discovery Across Global Attack Surfaces
To scale, we use a distributed architecture. We don't run one massive scan from one machine; we distribute the load. I've found that using interactsh for out-of-band (OOB) interaction testing is critical. Many vulnerabilities, like Blind SSRF or Log4Shell, don't provide an immediate response in the HTTP body.
When scanning global surfaces, latency becomes a factor. A scanner running from a Mumbai data center will have different timeout characteristics when hitting a US-based CDN than a scanner running from AWS US-East. Our automation framework accounts for this by adjusting the -timeout and -retries flags dynamically based on initial ping results.
Scaling also means intelligent filtering. We use httpx to probe for live web servers before passing them to Nuclei. This prevents wasting resources on dead hosts or non-web ports. The goal is to maximize "signal-to-noise" ratio during the initial discovery phase.
# Probing for live hosts and filtering by status code before scanning
cat subdomains.txt | httpx -sc -td -title -o live_targets.txt
Improving Consistency and Reducing Human Error
Manual testing is prone to fatigue. A researcher might miss a subtle header change or a specific path like /.git/config on the 500th subdomain. Automated templates do not get tired. They execute the same logic with 100% consistency across every target.
By version-controlling our Nuclei templates in a Git repository, we ensure that every team member is using the latest signatures. When I identify a new bypass for a WAF, I update the template, and the entire automated pipeline across all active projects is instantly upgraded.
Consistency also applies to reporting. Automated tools can output results in JSON or Markdown, which we then pipe into custom scripts to generate internal tickets. This reduces the time spent on administrative tasks and allows researchers to focus on complex manual exploitation that tools cannot handle.
Accelerating the Bug Bounty and Research Lifecycle
In the competitive landscape of bug bounties, speed is everything. The "First-to-Report" rule means that being an hour late can result in a duplicate report. Automation allows us to monitor CIDR ranges for new assets and automatically submit findings for low-hanging fruit like exposed .env files or misconfigured S3 buckets.
We use custom Go wrappers to prioritize targets. If a target is identified as running an outdated version of Zoho ManageEngine, it is moved to the top of the scanning queue. This "priority-based automation" ensures that the most critical vulnerabilities are discovered first.
Automated Reconnaissance and Asset Discovery
The foundation of any scanner is a solid recon phase. We start with passive discovery using tools like subfinder and assetfinder, then move to active techniques like DNS brute-forcing. For Indian targets, I've found that many assets are hidden behind non-standard TLDs or regional domain names.
Once subdomains are gathered, we perform port scanning to identify services beyond ports 80 and 443. Many Indian SMEs expose database management tools like phpMyAdmin or internal ERP systems on high ports (e.g., 8080, 8443, 9000). Identifying these is the first step toward finding high-impact vulnerabilities.
# Comprehensive recon pipeline
subfinder -d target.in -all -silent | httpx -p 80,443,8080,8443,9000 -silent -o targets.txt
Dynamic Analysis (DAST) and Fuzzing Techniques
After discovery, we initiate dynamic analysis. This involves sending payloads to the target and observing the response. Nuclei's fuzzing capabilities allow us to test for common injection points like those defined in the OWASP Top 10, including XSS, SQLi, and Open Redirects, by defining "payloads" and "matchers" in YAML.
I often use the -rl (rate-limit) flag to avoid crashing fragile legacy systems common in local Indian manufacturing sectors. Setting a rate limit of 20-30 requests per second is usually the sweet spot for these environments to avoid triggering ISP-level blocks while maintaining decent speed.
# Running Nuclei with controlled rate limiting for sensitive infrastructure
nuclei -u https://target.in -t custom-templates/ -rl 20 -bs 10 -c 5
Static Code Analysis (SAST) Integration
While this article focuses on web scanning, true automation integrates SAST when source code is available (e.g., via exposed .git directories or public GitHub repos of the target's employees). We use tools like semgrep to scan for hardcoded credentials or insecure API calls.
When our web scanner finds a /.git directory, it triggers a sub-process that downloads the repository and runs a SAST scan. This "chained automation" often leads to the discovery of internal API keys or database passwords that would be missed by a pure black-box scan.
Automated Exploitation and Proof-of-Concept Generation
The final stage of the pipeline is generating a PoC. Nuclei handles this by providing the exact request and response that triggered a match. For high-severity issues, we use custom Go scripts to take the Nuclei output and attempt to perform a non-destructive action, such as running id or whoami in an RCE scenario.
This automated verification is crucial for reducing false positives. If the tool claims there is an RCE, I want to see the output of a command. This saves hours of manual triage time and ensures that the reports we provide to clients or bug bounty programs are high-quality and actionable.
Leveraging Open-Source Tools for Custom Workflows
The power of modern security research lies in the interoperability of open-source tools. We follow the Unix philosophy: "Do one thing and do it well." By piping the output of one tool into another, we build complex workflows without writing thousands of lines of code.
For example, we use naabu for fast port scanning, httpx for web service identification, and nuclei for vulnerability scanning. These tools are all written in Go and share a common JSON output format, making integration seamless.
# Installing the core toolkit
go install -v github.com/projectdiscovery/subfinder/v2/cmd/subfinder@latest go install -v github.com/projectdiscovery/httpx/cmd/httpx@latest go install -v github.com/projectdiscovery/nuclei/v3/cmd/nuclei@latest
Scripting with Python and Go for Security Tasks
While Go is our choice for high-speed scanning, Python remains the king of data manipulation and rapid prototyping. We use Python scripts to clean our target lists, remove duplicates, and perform complex logic that doesn't fit into a Nuclei template.
I've observed that Go's net/http package is significantly more robust than Python's requests for handling thousands of concurrent connections. In one test, our Go-based scanner handled 10,000 requests per minute with 50MB of RAM, whereas a similar Python script consumed over 500MB and suffered from frequent socket hangs.
Integrating Security Scanners into CI/CD Pipelines
For corporate environments, we integrate Nuclei into the GitHub Actions or GitLab CI pipeline. Every time a developer pushes code, the automated scanner runs a subset of "light" templates to check for common misconfigurations before the code reaches production.
This "Shift Left" approach is particularly relevant for Indian software export houses that must comply with international standards like SOC2 or ISO 27001. Automating security at the commit level ensures that vulnerabilities are caught when they are cheapest to fix.
Managing False Positives and Data Noise
One of the biggest challenges in automation is the sheer volume of data. A single scan can generate thousands of "info" level findings. To manage this, I use strict "matchers" in my custom Nuclei templates. Instead of just looking for a 200 OK status code, I look for specific strings in the response body or headers.
For instance, when scanning for exposed .env files in Indian SME ERPs, I don't just check if the file exists. I check if it contains sensitive keys like DB_PASSWORD or APP_KEY. This drastically reduces the noise and ensures that every alert is worth investigating.
id: indian-erp-exposed-env
info: name: Exposed .env in Common Indian SME ERPs author: warnhack-research severity: critical http: - method: GET path: - "{{BaseURL}}/.env" - "{{BaseURL}}/api/.env" - "{{BaseURL}}/config/db.php.bak" matchers-condition: and matchers: - type: word words: - "DB_PASSWORD=" - "APP_KEY=" part: body - type: status status: - 200
Handling Complex Authentication and Session State
Automating scans for authenticated areas remains a hurdle. Most scanners struggle with MFA or complex OAuth flows. We solve this by using "authenticated templates" where we provide a session cookie or a JWT directly to Nuclei. We use a separate Go script to refresh these tokens periodically during long-running scans.
For targets using local Indian payment gateways or custom auth modules, we often have to write custom "headless" templates. These use a browser engine to interact with the page, fill in credentials, and then pass the authenticated state back to the scanner. This is slower but necessary for deep-surface scanning.
Bypassing Rate Limiting and WAF Detection
WAFs like Cloudflare or Akamai are common on high-value Indian targets. To bypass them, we use techniques like header rotation (e.g., X-Forwarded-For) and jittered request timing. However, the most effective method is identifying the "Origin IP." Many Indian sites only protect the main domain, leaving the direct server IP exposed and unprotected.
We automate the search for origin IPs by scanning historical DNS records and SSL certificate transparency logs. Once the origin IP is found, we point our automated scanner directly at it, completely bypassing the WAF's protection layers.
Using LLMs for Vulnerability Identification and Code Review
We are currently experimenting with using Large Language Models (LLMs) to write Nuclei templates. By feeding the LLM a CVE description or a snippet of vulnerable code, we can generate a draft template in seconds. This is particularly useful for complex logic where a simple regex isn't enough.
However, I've observed that LLMs often hallucinate matchers. Every AI-generated template must be manually verified against a known vulnerable target (using a Dockerized lab environment) before being added to our production pipeline. AI is a co-pilot, not the captain.
Predictive Analysis for Emerging Threat Patterns
By analyzing the results of thousands of scans across different Indian sectors (FinTech, EdTech, Govt), we can start to predict where the next vulnerabilities will appear. For example, we've noticed a trend of insecure API implementations in the "Micro-lending" sector in India, specifically around IDOR vulnerabilities in KYC modules.
We use this data to proactively build templates for these specific patterns, often finding vulnerabilities in new apps before they are even officially launched. This predictive approach is the future of high-end security research.
Balancing Automated Scanning with Manual Verification
Automation finds the "known unknowns," but manual testing is required for "unknown unknowns." Business logic flaws, such as the ability to change a price in a shopping cart or bypass a specific workflow, are still very difficult to automate. I recommend a 70/30 split: 70% of the effort on automated wide-scale scanning and 30% on deep-dive manual analysis of the most interesting targets.
In the Indian context, manual verification is also necessary to understand the impact. A "Critical" vulnerability on a dev server with no data might be less important than a "Medium" vulnerability on a production server containing Aadhaar data. Context is king.
Maintaining Tooling and Keeping Signatures Up-to-Date
The threat landscape moves fast. A scanner that hasn't been updated in a month is nearly useless. We automate our tool updates using a simple cron job that pulls the latest versions of ProjectDiscovery tools and syncs the Nuclei template library daily.
# Daily update routine for the scanning engine
nuclei -update-templates go install -v github.com/projectdiscovery/nuclei/v3/cmd/nuclei@latest
Ethical Considerations in Automated Security Research
Automated scanning can be disruptive. High-intensity scans can cause Denial of Service (DoS) on older infrastructure. We always ensure that our scanning intensity is calibrated to the target. In India, where many government and SME sites run on limited resources, "polite scanning" is not just ethical—it's necessary to avoid legal trouble.
Under the DPDP Act 2023, unauthorized access to data is a serious offense. When our automated tools find exposed data, we stop immediately, document the finding, and report it through the proper channels (like CERT-In) without attempting to download or "explore" the data further.
Summary of the Impact of Automation
Automation has fundamentally changed the economics of security research. It allows a single researcher to monitor an entire country's IP space for specific vulnerabilities. For Indian enterprises, it means the window of opportunity to patch a vulnerability before it is exploited has shrunk from weeks to hours.
The recent surge in attacks against Indian critical infrastructure highlights the need for this level of automation. We've seen high exploitation rates for CVE-2023-35078 (Ivanti EPMM) and CVE-2024-27198 (JetBrains TeamCity) within the Indian corporate sector. Without automated discovery, most organizations wouldn't even know they were running these vulnerable services.
Final Thoughts on Staying Ahead of Evolving Threats
The future of web security research is not in better tools, but in better orchestration. The ability to quickly write a template, deploy it across a distributed cluster, and triage the results in real-time is what separates elite researchers from the rest. As Indian infrastructure continues to digitize at a rapid pace, the demand for sophisticated, automated security testing will only grow.
To stay ahead, I focus on the "custom" part of custom web scanners. Don't just run the default templates; understand the underlying protocol, identify the unique quirks of the local environment, and build signatures that find what everyone else is missing.
# Final command for the day: Scanning for high-severity CVEs with JSON output for triage
nuclei -list-templates -type http -severity critical,high -o discovery_results.txt -jsonl
