Type the name of your summary index, such as “summary_logins”.
Press [Save] button.
Now that we have summary index created – we need to create scheduled search to generate data into this summary index. There will be two parts of this task:
- Design and schedule summarizing search to send logins summary data into summary index that we’re created (summary_logins) on an hourly basis.
- Run fill_summary_index.py script to backfill summary index with previous data.
Design and schedule summarizing search:
To create scheduled search navigate to: Settings -> Searches, Reports and Alerts.
You’ll be presented with dialog. Fill it in according to this image:
Then press [Save].
This will make your search to run once an hour to feed data to summary index of logins.
The actual Splunk search query looks like this (it is a bit corrected from the one used within image):
index=logs method=POST page=/Login.aspx
| eval username_lower=lower(username)
| dedup username_lower, ip, ua
| eval ip_subnet=ip
| rex mode=sed field=ip_subnet "s/^(\d+\.\d+\.\d+\.).*/\1x/g"
| fields _time, ip, ip_subnet, username, username_lower, ua
| fields - _raw
What it does is this:
- uses index=logs to pull all WEB traffic data. This assumes that indexed data already contains either fields or aliases:
username, ip, ua, page and method.
- considers only login-specific events by this query: method=POST page=/Login.aspx
This of course needs to be modified using specifics of your application.
- lowercased username is created because username usually is not case sensitive field and users may type it differently:
| eval username_lower=lower(username)
- all hourly login events are deduplicated: dedup username_lower, ip, ua
- ip_subnet field is created. If input IP address looks like this: 126.96.36.199 then ip_subnet will take this value: 12.3.45.x
- Then we specify which fields we want to send to our summary index and exclude original _raw field (which is huge and unnecessary to keep).
Please note that in this demo I used destination App = “Search & Reporting”. Although it is recommended to create a separate App dedicated to security and security-related alerting needs.
Run fill_summary_index.py to backfill summary index with previous data:
The last part to keep in mind is that we need to backfill summary index.
According to the definition of our main challenge – the main alerting query will need to reference login history data ‘within the last 45 days‘. And so we better populate out summary index with these events – assuming your web traffic log index keeps data going that far.
Backfill summary index is executed by the python script: %SPLUNK_HOME%/bin/fill_summary_index.py
General syntax is as follows (execute it from %SPLUNK_HOME%/bin/ folder):
splunk cmd python fill_summary_index.py -app your-app-name -name “Summary: logins” -et -45d -lt -1h@h -dedup true -j 4 -owner admin -auth admin:YouRAdminPasSw0rD
Please note that:
- You need to adjust syntax of above according to your setup
- Depending on the volume of data – it is wise to execute summary index backfull script in small chunks.
Above script sample says: ‘-et -45d’ – which may be too much to handle in one shot. Try to execute it for one day and see how fast it will work and then build from there.
- Adjust scheduled search name: ‘-name …’ parameter if you named it differently.
- Adjust access credentials
It is wise to execute back fill script from cron job on a regular basis to backfill missed hourly summaries. It could happen if your computer may went offline due to temporary maintenance or upgrade tasks.
Connect with me on LinkedIn
Gleb Esman is currently working as Senior Product Manager for Security/Anti-fraud solutions at Splunk leading efforts to build next generation security products covering advanced fraud cases across multiple industry verticals.
Contact Gleb Esman.