Uncovering Human Emotions Within Haystack of Machine Data
Through another independent analysis I’ve decided to isolate and investigate the actual, mostly negative user feedback, client complains and actual customer support requests.
The purpose of this research was to correlate the person’s actual emotional states (clearly seen through person’s own words) with their detailed WEB session activity and online behaviors that occurred within the same time frame.
The following emotional states were observed and extracted from actual user’s submissions written in a negative to very negative manner:
- Aggressiveness, Hostility
Most negative emotions were strongly associated with client’s negative financial situation, his perceived inability to be in control of results of their actions, having real world events or other people actions not matching client’s expectations as well as client’s frustration with features, abilities, response times or complexities of certain systems.
With enough information – I found that it is possible to find fingerprinting matches between emotional states and digital patterns contained within generated raw logging events and user session data.
One particular metric I found to be worthwhile to pay attention to is “interrupted hit” condition described above.
What quite often happened is that depending on person’s current emotional state and other personal inner factors – person’s perceived “normal” response time to request (patience limit) could stretch from 20-30-60 seconds (calm, patience, contentment) all the way down to 0.3-0.5-1 second (agitation, anxiety, anger, frustration, despair, fear).
For example if person perceives the urgent need to accomplish certain actions especially related to financial transactions such as placing a trade, getting real time account balance, initiation of bill payment – he would become impatient if request is being served “slowly” or intermediary steps or pages are presented (such as additional verification requests deemed annoying and unnecessary by the client).
In latter case I observed not only higher number of interrupted hits but also them coming in adjacent sequences of 3-5 or more together exposing person’s inner state of anxiety and frantic, agitated attempts to take forceful control over external resources, events and conditions.
I found that in cases of bank account takeover attacks and other fraudulent activities – the typical behavior of an attacker is an anxious attempt to quickly move money out of the victim’s account could very well be sprinkled with digital signs of being in a hurry in attempts to speed up the desired outcome, quickly get victim’s account financial information and to complete fraudulent action as soon as possible before attacker could be caught and fraud session be detected and interrupted.
It’s highly likely that the same metrics of behavior could be detected and applied to uncover malicious corporate or government insiders trying to access and move information in an unauthorized or criminal manner.
Relative timing of adjacent requests as well as the patterns of interrupted hits are the key metrics to pay attention to.
Finding the Missing Data about Interrupted Requests
The problem is that, as configured, most web servers as well as web applications today does not actually report on interrupted hits whatsoever.
That’s why we neither see much of an information or research on this subject nor have any visibility into these dimensions.
When user (or attacker) navigates over some page of online portal but quickly interrupts the loading process – the web server actually will report HTTP status = 200 (success). Same happens when user quickly clicks from one page to another without waiting for the first page to finish loading.
If you’d search for HTTP status 0 across all your logs – likely you won’t find anything.
There is no way to detect interrupted hits from the standard apache or IIS server logs.
Why is this happening? The cause lays in the way modern browsers and computers in general are operating. All systems and software are heavily multitasked and each WEB request from browser to server originates in a different thread. When user clicks on button to interrupt the transfer or switches from one page to another – the request is not actually get cancelled. It is still being processed on a background, in a different thread – and usually until full and successful completion. Additional caching happens within computer’s operating system network interfaces. Each TCP connection (over which HTTP request is made) is being gracefully processed, cached in some cases and no hard stopping is involved – no processing threads are being forcefully killed just because user clicked on something.
So how is it possible to detect valuable metadata related to interrupted hits?
One (and my favorite) ways of doing that is actually using passive capture appliance or data analytics software such as Splunk. In case of Splunk – the whole setup can be achieved by utilizing Splunk Enterprise (comes free with <500MB/day data input) + free Splunk App for Stream that can be found here.
In real time data analytics world – Splunk + Splunk App for Stream is the second best thing after sliced bread. This software can passively capture Web HTTP traffic (alongside with 25 other low level network protocols) and allows access to every tiny detail and stats about all aspects of network traffic in real time.
Plus of course Splunk Enterprise comes with the whole layer of tools to build custom analytic dashboards, reports and alerts.
In case of capturing HTTP Web traffic data (sourcetype=stream:http) – this is what raw captured data looks like:
The critical pieces of information here are true values (up to the microsecond) of times when each requests started (field: timestamp) and completed (field: endtime).
Normal web server logs usually only contain event time field (start time) rounded up to a nearest second. This type of coarse data quite often does not even allow determining of the correct order of events if they happened within the same second. The endtime field usually does not present in standard web logs.
Having the type of data capture precision offered by Splunk App for Stream allows placing events in perfectly correct order as well as detecting interrupted hits.
Interrupted hits are detected using the following logic:
if timestamp of the next clickable event within the same session is less than endtime of previous clickable event within the same session – then the previous event is considered to be interrupted.
This logic of course needed to be applied carefully and only to clickable pages (not to static resources or URL’s endpoints loaded on the background via AJAX requests, etc…).
Here’s the sample of Splunk query that can be used to detect interrupted hits:
| streamstats current=0 window=1 global=0 last(timestamp) as next_starttime
| eval timeseen=strptime(next_starttime, "%Y-%m-%dT%H:%M:%S.%9N")-strptime(timestamp, "%Y-%m-%dT%H:%M:%S.%9N")
| eval interrupted=if(next_starttime<endtime, 1, 0)
| iplocation src_ip
| table _time, timeseen, interrupted, src_ip, Country, Region, City, http_method, src_port, status, bytes_out, site, uri_path
And here is the sample output of such query:
Having this information extracted from raw traffic logs – we may get back to matching of discovered patterns with user’s emotional states.
It is important to note that each industry sector, each web application will have it’s own, specific digital fingerprint matching it’s users emotional states.
What works for retail banking sector might not work for large e-commerce portal. And vice versa.
This is one of the first attempts to visualize emotional components detected within web sessions based on the presence of sequences of interrupted hits and timing between events over 24 hours period:
While my attention was mostly focused on the subjects of fraud detection and account takeover attacks – the positive side effect of this research was a discovered ability of an early detection of problematic causes within Web applications.
For example if certain web app functions starts to slow down or malfunction – this quite often is being followed by an interrupted hits from irritated users. So monitoring for this occurrences will benefit enterprise monitoring departments as well as application support groups.
Here are few extra observations over human emotions and matching behaviors related to real world sessions with the signs on interrupted hits:
- Agitation/Irritation (mostly legitimate users):
Users have tendency to complete session early when session page content or response times are not matching their expectations. Loss of profits for e-commerce portals.
- Moderate irritation/Impatience (mostly legitimate users):
Interrupted hits comes in adjacent batches. Interrupted hit’s timeseen values (how long user was looking at the page that was subsequently interrupted) cycle from very short: 0.3-1 second to long – 30-60 seconds and than back. User is making fair attempt to wait but gets progressively impatient.
- Short bursts of irritation (case for fraud detection):
User quickly cancels out of page he deems not important, not interesting, useless.
In case of account takeover fraud this may indicate first time reconnaissance effort from an attacker side looking for a quick action.
- Urgency/Anxiety (different patterns between legitimate users and attackers):
Actions related to money movement – bill payments, securities trading, account balance checks.
Observed higher ratio of interrupted hits where user eager for quick action, yet external system behaves in a perceived slow or uncooperative manner.
The timing and pattern of interrupted hits and behaviors likely be different between legitimate account holders and attackers.
The ability to detect and interpret emotional state of the user behind the screen can play key role in detecting and preventing all kind of cyber attacks. As an extra bonus this approach may help legitimate users and clients who otherwise might quietly experiencing problems while interacting with your enterprise services.
It is important to note that this is an early research stage and more data and time will be required to deploy production quality solution able to detect and act upon underlying emotional state of the user.
Based on my experience, no matter what industry you are in, the good starting point in utilizing such an advanced user behavior analysis techniques will be gathering of an actual user/client feedback and submissions and correlating this data with related digital fingerprints.
This will have a great potential to open new and powerful dimension to detect and prevent previously unknown malicious activities, cyber attacks and fraud attempts.
Connect with me on LinkedIn
Gleb Esman is currently working as Senior Product Manager for Security/Anti-fraud solutions at Splunk leading efforts to build next generation security products covering advanced fraud cases across multiple industry verticals.
Contact Gleb Esman.