Summary indexing is a great way to speedup Splunk searches by pre-creating a subset of only necessary data for specific purpose. In our case we need to filter out of all available WEB traffic data only login events. This will allow us to have very fast, much smaller data subset with all the information we need to reference against when matching with new, suspicious login events.
To proceed with building summary index we need to make a set of assumptions. These assumptions are needed to build the query and all other elements of the solution. You’ll be able to substitute names to your specifics later on if wanted to.
- Lets assume you have your WEB logs with all the event data indexed in Splunk already.
All web events are located within index named: logs.
- Field names (or aliases):
- HTTP request method (GET, POST, HEAD, etc..): method
- URL of page accessed: page
- Username field: username
- IP address of visitor: ip
- USER_AGENT value: ua
To generate summary index of login data – we need to create index itself first.
Creating summary index:
- Navigate to your Splunk instance.
- Click on: Settings -> Indexes and then click [New] button
- Type the name of your summary index, such as “summary_logins”.
- Press [Save] button.
Now that we have summary index created – we need to create scheduled search to generate data into this summary index. There will be two parts of this task:
- Design and schedule summarizing search to send logins summary data into summary index that we’re created (summary_logins) on an hourly basis.
- Run fill_summary_index.py script to backfill summary index with previous data.
Design and schedule summarizing search:
To create scheduled search navigate to: Settings -> Searches, Reports and Alerts.
You’ll be presented with dialog. Fill it in according to this image:
Then press [Save].
This will make your search to run once an hour to feed data to summary index of logins.
The actual Splunk search query looks like this (it is a bit corrected from the one used within image):
index=logs method=POST page=/Login.aspx
| eval username_lower=lower(username)
| dedup username_lower, ip, ua
| eval ip_subnet=ip
| rex mode=sed field=ip_subnet "s/^(\d+\.\d+\.\d+\.).*/\1x/g"
| fields _time, ip, ip_subnet, username, username_lower, ua
| fields - _raw
What it does is this:
- uses index=logs to pull all WEB traffic data. This assumes that indexed data already contains either fields or aliases:
username, ip, ua, page and method.
- considers only login-specific events by this query: method=POST page=/Login.aspx
This of course needs to be modified using specifics of your application.
- lowercased username is created because username usually is not case sensitive field and users may type it differently:
| eval username_lower=lower(username)
- all hourly login events are deduplicated: dedup username_lower, ip, ua
- ip_subnet field is created. If input IP address looks like this: 184.108.40.206 then ip_subnet will take this value: 12.3.45.x
- Then we specify which fields we want to send to our summary index and exclude original _raw field (which is huge and unnecessary to keep).
Please note that in this demo I used destination App = “Search & Reporting”. Although it is recommended to create a separate App dedicated to security and security-related alerting needs.
Run fill_summary_index.py to backfill summary index with previous data:
The last part to keep in mind is that we need to backfill summary index.
According to the definition of our main challenge – the main alerting query will need to reference login history data ‘within the last 45 days‘. And so we better populate out summary index with these events – assuming your web traffic log index keeps data going that far.
Backfill summary index is executed by the python script: %SPLUNK_HOME%/bin/fill_summary_index.py
General syntax is as follows (execute it from %SPLUNK_HOME%/bin/ folder):
splunk cmd python fill_summary_index.py -app your-app-name -name “Summary: logins” -et -45d -lt -1h@h -dedup true -j 4 -owner admin -auth admin:YouRAdminPasSw0rD
Please note that:
- You need to adjust syntax of above according to your setup
- Depending on the volume of data – it is wise to execute summary index backfull script in small chunks.
Above script sample says: ‘-et -45d’ – which may be too much to handle in one shot. Try to execute it for one day and see how fast it will work and then build from there.
- Adjust scheduled search name: ‘-name …’ parameter if you named it differently.
- Adjust access credentials
It is wise to execute back fill script from cron job on a regular basis to backfill missed hourly summaries. It could happen if your computer may went offline due to temporary maintenance or upgrade tasks.