DEV Community

Gabe
Gabe

Posted on

Splunk - Buttercup Enterprise Dashboard

The Scenario

• Buttercup Enterprises is a large national online retailer
operating in the US, which sells a variety of books, clothing
and other gifts through its online webstore
• Recently invested in Splunk and
now they want to start making use of it across the business.

My Role

My responsibility is to provide insights to following teams throughout
the company:
• IT Operations
• Dev Ops
• Business Analytics
• Security and Fraud


IT Operations team:

Investigate successful vs unsuccessful web server
requests over time

Query 1: index=main sourcetype=access_combined | timechart count by status limit=10

Visualization: Column Chart
Format: Stacked Mode
Panel Title: IT Ops - Web Server Status Codes Over Time

Image description

Let's break down this specific SPL query:
index=main sourcetype=access_combined | timechart count by status limit=10

index=main: specifies that we want to search within a specific index called "main". Think of an index like a database or a collection of data.

sourcetype=access_combined: filters the results to only include events (data points) with a source type of "access_combined". Source types are categories that describe the type of data being collected, such as network logs, system logs, or application logs. In our case access_combined refers to HTTP web server logs.

| timechart count by status limit=10: The | symbol indicates that we're piping the output of our initial query into the new part of the query.

Here's what this clause does:
timechart: This command generates a graph based on the data.
count: We want to count the number of events (data points) for each group.
by status: We want to group the results by the "status" field. Think of it like categorizing the data into different buckets based on values in that field.
limit=10: This sets a limit on the number of groups we see in the chart, showing only the top 10 most frequent statuses.


DevOps Team:

Show the most common customer operating systems and
which web browsers are experiencing the most failures

index=main sourcetype=access_combined | top limit=20 platform showperc=f

This generates a list showing the top 20 most common values for the platform field, which represents the types of devices or operating systems used in our environment.

Breaking down the SPL query:

sourcetype=access_combined: Filters to only include HTTP web server logs.
| : The | symbol pipes the output into this part of the query.
top: This command shows the most common values for a specified field. In this case, we're looking at the "platform" field, which represents the types of devices or operating systems used by customers.
limit=20: Limits the output to show only the top 20 most frequent platforms.
showperc=f: This parameter hides the percentage distribution for each platform.

index=main sourcetype=access_combined status>=400
| timechart count by useragent limit=5 useother=f


Business Analytics Team: Assessing Lost Revenue

index=main sourcetype=access_combined action=purchase status>=400 | lookup product_ codes.csv product_id | timechart sum(product_price)

This query helps the business analytics team quantify the financial impact of failed purchases by calculating the total lost revenue over time. For example, if there is a spike in failed purchases during peak shopping hours, it might indicate server overload or payment gateway issues.
By identifying these trends, the team can work with IT and marketing to address bottlenecks and improve the checkout process.
Visualization Options:

Chart Type: Line chart or area chart.
Panel Title: "Lost Revenue from Failed Purchases."
Screenshot Example:
Lost Revenue


Security & Fraud Team: Monitoring Geographic Activity

index=main sourcetype=access_combined | iplocation clientip | geostats count by City

This query helps the security and fraud team identify unusual geographic activity, such as spikes in traffic from regions where the business does not operate. For example, a sudden increase in requests from Eastern Europe might indicate a potential DDoS attack or fraudulent activity.
By monitoring these trends, the team can implement geolocation-based security measures to block suspicious traffic.

Visualization Options:
Chart Type: Heat map or world map with city-level granularity.
Panel Title: "Geographic Activity Heat Map."

Screenshot:


**Final Notes
Interactivity: Users can interact with the dashboard by hovering over data points to see tool-tips with exact values.
Updates: The dashboard should update in real-time or at regular intervals to reflect the latest data.
Permissions: Ensure that only authorized users have access to sensitive data, such as geographic activity or revenue metrics.
By organizing the queries and visualizations in this way, teams can collaborate effectively and address issues proactively.

Top comments (0)