splunk-ato-detection-step-1Full Series:

In these series of posts I’ll cover the complete strategy of utilizing Splunk Enterprise in detecting customer account takeover fraud as well as setting up an automated alerts when such activity is detected.

While I’ve helped to implement these measures for large financial firm – the same approach can be applied to any online enterprise where it is essential to protect online customer accounts, quickly detect suspicious activity and to act and prevent monetary and business losses.

The techniques I am going to describe generate pretty low level of false positives and contain efficient ways to adjust trigger thresholds within multiple metrics for specific business needs. In addition – it is tested and works really well for portals that generate up to 3,000,000-5,000,000 hits a day.

Specific use case that is covered in these posts applies to situation where credentials of multiple clients (sometimes thousands or more) got in the hands of attackers who will try to take advantage of these for monetary, competitive or political gains. With the help of Splunk, enterprise will be able to quickly and automatically detect such situation and take necessary measures to protect business and clients.

Account takeover fraud comes into play when fraudster gains access to customer account credentials via any means: phishing campaigns, malware, spyware or by buying sets of stolen customer credential data on darknets or black online markets and forums.

I won’t get into the details of multiple possible ways customer credentials may be compromised but the end result is an ability of unauthorized person(s) to access multiple customer accounts and cause significant damages to customers and to business, including large monetary losses.

The worst way the enterprise can learn about cyberattack on their own customers is from CNN.

Splunk gives us all the necessary tools to quickly detect such attacks and stay on top of the game.

The source of data for this task could be either web server access logs or another facility or system (such as IBM Tealeaf) that is able to supply raw stream of access log events into log files.

Iteration 1:

The general logic of detecting unauthorized access via access logs is by setting up alerts whenever someone tries to login from IP address that given account never used before *and* from the browser (USER_AGENT string) that has never been used before.

In reality above logic will generate humongous amounts of false positives because people changing their devices (getting new tablets, mobile devices, laptops) and locations (travels, hotels, wifi hotspots, coffee shops) pretty often.

When portal traffic is high – such as millions of hits a day with several new logins *per second* – implementing above logic and preventing false positives becomes impossible task.

Iteration 2:

Lets see how we can improve that. To detect account takeover of a single account would require behavior analysis in addition to IP and User Agent matching. I’ll be writing about it in one of the later posts.

Here I will focus more on mass account takeover attack – the case when attacker obtained credentials for multiple accounts and wants to try them. This often cause way more damaging results.

So now we actually want to detect logins to multiple accounts from the IP address never used before *and* browser’s USER_AGENT never used before.

This definitely shouldn’t present that many false positives.
Although we should note the following concerns/conditions:

  1. Legitimate user might have dynamic IP address that changes often.
  2. The attacker might use the same browser that some of legitimate users do.
  3. Need to define time period to gather most recent access statistics for.
  4. Need to define reference time period – how far backward we want to scan each account’s login history?
  5. Need to whitelist legitimate services of traffic that might look “suspicious” (such as financial aggregators, testing services, etc…)

Iteration 3:

To compensate for condition #1 – we will adjust task as follows:

Detect when C-class IP subnet (X.Y.Z.0/24) tries to access multiple accounts for the very first time (meaning none of the accounts were previously accessed from this subnet) *and* from an unknown device (browser’s USER_AGENT).

To compensate for condition #2 above, we need to add another rule that considers total number of accounts accessed by the same subnet and calculate percentage of accounts that were accessed for the very first time. The trigger will make sure that percentage of unknowns is significant enough. This will compensate for accidental matches of attacker’s USER_AGENT with legitimate user’s USER_AGENTs.

Iteration 4, the final challenge defined:

Here’s how our final task sounds now:
Detect and alert when C-class IP subnet tries to access at least 5 different accounts within an hour and at least 75% of total accounts touched has never been accessed from this subnet *and* from this USER_AGENT within the last 45 days.

The *and* rule implies that alert condition is not triggered when either IP or USER_AGENT has been used by the given account before. This is normal case of user either accessing portal from the new location but using the same device, or using newly purchase device (or upgraded browser) from the same place.

To implement the above logic we need the query that does approximately this:

  1. Scan last hour of access log data and find the list of subnets that tried to access multiple accounts within that hour.
  2. For each of these accounts – take username, IP, USER_AGENT and scan the previous 45 days of traffic history to find usernames that has ever been touched by this IP/USER_AGENT combo.
  3. Alert if number of found accounts is above threshold.

One of my client’s portal generates 3-6 millions hits per day.  That will evolve into more 100,000,000+ events per month. It will take prohibitively long time to execute part #2 – scan each account for matching or missing IP/USER_AGENT combo within 45 days worth of past history.

To make this part of query manageable and running as fast as possible – we need to generate reference summary index – summary index containing only information about login events. This will help to run query much faster as amount of data needed to be back referenced will be much smaller.

Continue to:

Detecting Bank Accounts Takeover Fraud Cyberattacks with Splunk. Part 2: Building Reference Summary Index of Logins Data.

Connect with me on LinkedIn
Gleb Esman is currently working as Senior Product Manager for Security/Anti-fraud solutions at Splunk leading efforts to build next generation security products covering advanced fraud cases across multiple industry verticals.
Contact Gleb Esman.