detecting_web_malwareIn this article I’ll demonstrate step by step how to setup Splunk analytics to detect successful known and unknown malware attacks on web hosting systems in real time.

In addition the same solution will include instructions to deploy fully automated investigative analytics to discover the origins of attackers (IP addresses) as well as any modifications within the file system.

This information is essential to discover and immediately eliminate all possible backdoors and exploits that attacker tried to plant.

Real time alerts will be delivered via email to system administrator as soon as attack occurs. The same information will be available via Splunk web interface for further analysis.

The information presented will help system administrators as well as hosting service providers to devise measures to detect close to 100% of possibly successful cyber attacks and take immediate actions before malware tries to propagate further and cause significant damage or loss to business.

Deployment of such system not only can help to prevent significant monetary and business losses for the enterprise but also can assist in avoiding an embarrassment and negative publicity where customers and news outlets learns about successful hack before the actual business does.

While the steps described are somewhat specific to CentOS Linux hosting based on WHM/Cpanels – the same approach can be adjusted to any type of hosting and operating environments due to Splunk’s multi-platform support capabilities.

If you’ve ever administered web hosting servers or been a webmaster of a self hosted website – the issues with trojans, backdoors, exploits, viruses, defacing and all kinds of web malware should be very familiar to you.

Dealing with malware becomes critical when sites are built with popular content management systems like WordPress. Self hosted wordpress sites are one of the most attractive and easiest target for hackers and spammers.

Scanning for outdated plugins, buggy themes and using 0-day vulnerabilities hackers are trying to penetrate web hosting defences 24/7/365. Once successful – the hacked website and quite often the hosting server itself become zombies in the arms of malicious attackers.

To help with a hosting needs of many clients I manage and administer dedicated web hosting services based on CentOS + WHM/Cpanel and most of the client’s sites are built with wordpress.

This meant I have to deal with many acts of malware infections, defacing, sudden outgoing spam activities, injection of malicious content, planted phishing pages, redirects to spammy sites as well as uploads of malware and trojans.

Using anti-malware and security scanning software helps to detect the presence of malware and to receive alerts on detected infections up to the point.

The problem is that it only detects about 75-85% of malware occurrences and by no means it contains any information about how malware got in – and that is the big problem.

Just like a living virus – malware ensures self sustainability by planting it’s own copies and installing hidden backdoors all over the hosting file system space.

Deleting 9 our of 10 malware occurrences means that one hidden backdoor still remains somewhere and it will be reused to reinfect and take control over the system again. Detecting all occurences of malware is essential to protect the file system space.

In addition it is no less important to detect attacker’s origins to set firewall blocks and especially trace his steps leading into successful break-in. This will allow system administrators to discover weak spots and previously unknown vulnerabilities and quickly close the loopholes.

Reminder Note: Some folder paths and settings mentioned here are specific to CPanel / WHM based hosting on CentOS 6 Linux. Although concepts described can be ported to any configuration and even Windows environments due to flexibility of Splunk.

The problem I encountered with other malware detection systems is their delayed, signature-based scans that not only miss unknown infections but does not provide any insights into who, how and when factors of successful cyber attacks.

To solve this problem I was looking into setting up file monitoring daemons, such as inotify or auditd and similars. Idea is that monitoring daemon would generate log of alerts which subsequently will be imported into Splunk.

The problem with these tools is that they either require complex configurations or provide tons of extra useless event information or fail to monitor subtrees, etc…

I also dislike adding more and more moving parts into existing solutions as this complicates maintenance tasks overseeing all of them.

To monitor for the presence of malware we need to:

  1. Be able to monitor file system (user account web home directories) in real time starting from the given directory recursively. In my case it’s /home/* and subtrees.
  2. Be able to monitor for all essential events of interest: adds, updates, renames.
  3. Be able to scan modified file contents for suspicious fragments in real time.
  4. Be able to discover Web origins of suspicious modifications, such as IP addresses of users caused modification to occur.
    This will allow to trace back malicious activity and discover the root cause of security breach.

I decided to solve tasks 1,2,3 in non-traditional way: Instead of installing and configuring some monitoring daemons and twisting their arms and legs to make them do what I need – why can’t I use Splunk for the same purpose? Splunk is perfectly suitable to monitor changes within subtrees, so why not to use this ability to monitor user filespaces.

So I decided to throw all the garbage of user file system content into Splunk and see how it feels.

Splunk loves garbage! 🙂 And that’s the beauty of it: throw any type of data regardless of format to Splunk and it will do it’s best to make sense of it, index it and make it available for searches. Even with all defaults – no data will be confusing to Splunk.

Essentially this approach tells Splunk to consider user file system content as data inputs. All user scripts: *.php, *.js, *.py, *.pl similars – Splunk will consider them as a source of “eventing” data.

In our case actual event of interest is ‘source’ (actual file name) as well as _raw content of it.

To setup this I’ve installed Splunk forwarder on actual hosting server and used second server (could be VPS or dedicated as well) to install Enterprise Splunk receiving side and configure all alerting logic.

This way I put minimal load to hosting server and it’s only task is to send data in real time to indexer for further analysis. Technically it is possible to setup everything on the single machine if wanted to.

Splunk forwarder will be sending all data about web traffic (apache’ access_combined events), send all events about user file scripts modifications as well as contents of apache configuration file:

We need httpd.conf information to be able to map user accounts + source files, such as:
/home/johnsmith/public_html/bad-script.php – to actual domain name.

httpd.conf contains configuration entries such as this:

    DocumentRoot /home/johnsmith/public_html    

When we detect that bad-script.php is suspicious – we want to find IP address of malicious attacker.

The way WHM/CPanel setup works is that each CPanel user’s space is separate from any other user. In other words it’s impossible (without root access to server itself) for WEB attacker to plant stuff from ‘johnsmith’ user space to any other user. Although if CPanel user ‘johnsmith’ hosts number of websites within the same CPanel such as: and – attacker could easily copy or modify script files within each of these sites. When we know the ‘source’ field name of suspicious file – which is nothing else but the actual filename: /home/johnsmith/public_html/bad-script.php – we can derive account name, which is ‘johnsmith’.
Then using configuration data from httpd.conf and the time of attack – we can “triangulate” list of IP addresses that visited any site belonging to ‘johnsmith’ account within this timeframe. From this point we will have enough data to recognize attacker and to trace his steps backwards to see where and how the break in occurred.

We will setup Splunk forwarder to monitor historical logs for investigative purposes as well as real time WEB traffic logs.

Here’s how architecture of this solution looks like:


Splunk indexer will be configured to deliver customized email alerts about file modifications as well as about suspicious pattern detected events.

Installing Splunk on indexer (destination machine)

  • Follow Splunk installation instructions. My usual sequence of steps is:
    • [Register and] Login to
    • Get ‘wget’ download link from here:
      This link or complete ‘wget’ command will look like this (pick your architecture):

      > wget -O splunk-6.2.4-271043-linux-2.6-x86_64.rpm ''
    • > rpm -Uvh splunk-6.2.4-271043-linux-2.6-x86_64.rpm
    • > $SPLUNK_HOME/bin/splunk start --accept-license
    • > $SPLUNK_HOME/bin/splunk enable boot-start
    • > $SPLUNK_HOME/bin/splunk edit user admin -password NewPassword -role admin -auth admin:changeme
  • Find available free ports on this machine. We need to pick unused port to make forwarder to send data to:
    • > netstat -tln | tail -n +3 | awk '{ print $4 }
    • Pick the port not listed there
    • Login to Splunk WEB UI
    • Go to: Settings -> Forwarding and Receiving -> Receive Data -> Add new
    • Input unused port number, [Save]
  • Select “Search” app.
  • Create 4 indexes to receive data to with the following names:
    apache_domlogs – to receive real time WEB hits events
    apache_acc_logs – to receive historical logs stored in compressed form in user account spaces
    fs – to receive complete content of user accounts files and scripts.
    httpdconf – to receive content of a single file: /usr/local/apache/conf/httpd.conf
    I love creating new indexes for each type of data as it will make life much easier later on to configure entitlements per user group if necessary as well as to cleanup indexes and free disk space in case such need will arise.
  • Edit (or create) the file:
    > $SPLUNK_HOME/etc/apps/search/local/props.conf
    • Add to it the following contents:
      LINE_BREAKER = (<VirtualHost.*)
      SHOULD_LINEMERGE = false
    • This will instruct receiving Spunk how to break httpdconf sourcetype into separate events. This way each event will be created per domain/subdomain containing information about domains/subdomains names as well as their target location within the client’s file system space.
  • Restart Splunk

Installing Splunk forwarder on Web hosting server

  • These are good places to refer to regarding installation instructions:
  • Usual sequence of steps:
    • Download forwarder from here:
      > wget -O splunkforwarder-6.2.4-271043-linux-2.6-x86_64.rpm ''
    • > rpm -Uvh splunkforwarder-6.2.4-271043-linux-2.6-x86_64.rpm
    • > rm -f splunkforwarder-6.2.4-271043-linux-2.6-x86_64.rpm
    • > /opt/splunkforwarder/bin/splunk start --accept-license
    • > /opt/splunkforwarder/bin/splunk enable boot-start
    • > /opt/splunkforwarder/bin/splunk edit user admin -password NewPassWoRd -role admin -auth admin:changeme
    • > /opt/splunkforwarder/bin/splunk add forward-server -auth admin:NewPassWoRd
      (Where is IP address of destination server-indexer. 8900 - is a free port number found in previous steps)
    • Check list of forwardees:
    • > /opt/splunkforwarder/bin/splunk list forward-server
    • Stop Forwarder for now:
      > /opt/splunkforwarder/bin/splunk stop
  • Add the following contents to /opt/splunkforwarder/etc/apps/search/local/inputs.conf :
 blacklist = (ftpxferlog|log\.offset|bytes_log|\-ftp_log|\-(200[0-9]|201[0-4]))
 index = apache_domlogs
 sourcetype = access_combined
 disabled = false
 blacklist = (ftpxferlog|log\.offset|bytes_log|\-ftp_log|\-(200[0-9]|201[0-4]))
 index = apache_acc_logs
 sourcetype = access_combined
 disabled = false
 whitelist = \.(php|php3|php4|php5|aspx|cgi|rb|py|pyc|pl|jsp|tpl|cfm|js|vbs|vbe|jse|pm|bat|cmd|exe|scr)$
 blacklist = /(virtfs|home/[a-z0-9\_]+/www)/
 index = fs
 sourcetype = scripts
 disabled = false
 crcSalt = <SOURCE>
 disabled = false
 index = httpdconf
 sourcetype = httpdconf

These instructions tell Splunk forwarder which locations to monitor, which indexes to assign for each location and which source types belong to each monitored space.

whitelist for /home/* path monitors only scripts that are WEB-executable.

blacklists excludes very old historical logs to save space as well as some other unnecessary files.

crcSalt = <SOURCE> construct tells Splunk to use full pathname (instead of only filename) in detecting whether file is modified or not. This way any copied or renamed file will be treated as new by splunk and splunk will re-index it. We need this functionality to detect potential malware activity in moving and renaming of files and scripts.

  • Add the following contents to: /opt/splunkforwarder/etc/apps/search/local/props.conf :
CHECK_METHOD = entire_md5

This will instruct Splunk forwarder to reindex the whole apache configuration file when any changes within it are detected. So any time new site/domain is added by any web hosting user – splunk will immediately update everything.

  • Restart splunk forwarder:
    > /opt/splunkforwarder/bin/splunk restart

At this point Splunk forwarder will begin sending data to Splunk indexer. Depending on amount of logs and user scripts on web server it make take a few hours for all data to arrive to Splunk indexer.

Continue to Part 2…

Connect with me on LinkedIn
Gleb Esman is currently working as Senior Product Manager for Security/Anti-fraud solutions at Splunk leading efforts to build next generation security products covering advanced fraud cases across multiple industry verticals.
Contact Gleb Esman.