DEV Community

Will Drygla
Will Drygla

Posted on

Using K6 to check the performance of our system

Today, I come here to talk about a project I have been doing for the past few weeks. I was working with performance tests, and I wanna share how I am taking metrics out of my tests. My challenge was to identify endpoints that, when the system was loaded above normal, presented problems or excessive slowness.

I use Azure to run my tests, and export the results to Datadog (also export as XML to Azure, only to see basic info). The setup to run tests is simple:

  • Install K6, docker and other libs ( I use a custom AWS agent )
  • Configure the project and bundle
  • Start the Datadog docker ( who receives the data )
  • Run tests

And, in my configuration of scenarios in K6, I tag them to later I be able make custom dashboards, to extract data I use three tags:

  • Test run ID: to identify the test run in general
  • Scenario ID: separated by scenario, this could be the action that we are simulating or the screen we are simulating loads
  • Request ID: this has the name of the request

With these three tags, I created a custom dashboard with three tables, each one with data based on a tag, and also, general graphs ( timing, status code). Using these tables, we can analyze the results of each request, but also analyze the groups/actions/screen results. So, after this, it was only about running tests, increasing the number of users or iterations, and extract metrics from the dashboard.

Reading the graphs, we can extract some data:

  • Using requests per second + HTTP request duration: we can see the increase of the duration of requests as the number of requests gets bigger
  • Using requests per second + Status code: we can determine the number of requests per second that causes errors on the system
  • Using REQ_AVG_DURATION and REQ_MAX_DURATION: analyzing these two metrics allows us to assess both the typical performance and the stability of response times. A large gap between these metrics can indicate sporadic spikes or outliers, suggesting potential instability under certain conditions. Together, they help pinpoint whether performance issues are consistent or isolated, guiding us in optimizing both the average experience and managing worst-case scenarios. All this information must be analyzed alongside test duration, stage configurations, user loads, and other settings to gain a deeper understanding of system performance. This analysis allows us to identify areas for improvement or develop new hypotheses for further testing.

With these metrics, I was able to talk with our team and define a plan to improve these endpoints, to be more performative. I will let some prints of the dashboard ( the prints are individual of each graph). With three or four runs, you should be able to collect enough data to make a good presentation, or map some improvements.
The image shows a dashboard of the test results on datadog, showing number of users, requests per second

Top comments (0)