DEV Community

Cover image for Analyzing billing information using BigQuery
Oluwatobiloba Akinbobola
Oluwatobiloba Akinbobola

Posted on

Analyzing billing information using BigQuery

INTRODUCTION

Google BigQuery is a powerful tool for studying big datasets effectively. In this tutorial, we'll go over how to import billing data into BigQuery, analyze the dataset, and perform SQL queries to glean insightful information.

PROCEDURE

Task 1: Import Data into BigQuery

  1. Sign in to BigQuery and Create a Dataset
    • Navigate to the Cloud Console and select BigQuery from the Navigation menu.
    • Click the View actions icon next to your project ID (e.g., qwiklabs-gcp) and select Create dataset. Create dataset
    • Specify the basic details basic details
    • Click Create Dataset. 2.Create a Table and Import Data
    • Click the View actions icon next to billing_dataset and select Create Table.
    • For the source and destination, specify: source
    • Click Create Table. Once the job completes, the table will appear under the dataset in the left pane.

Task 2: Examine the Table

  1. Click on sampleinfotable to view its schema. BigQuery automatically generates this schema based on the imported data, which includes strings, integers, timestamps, and floating values. 2.Navigate to the Details tab to check the number of rows. In this case, the table contains 415,602 rows. Details
  2. Use the Preview tab to explore a sample of the data.

Task 3: Compose a Simple Query
To analyze the data, construct a query to filter rows where the Cost field is greater than 0:

SELECT * FROM `billing_dataset.sampleinfotable`
WHERE Cost > 0
Enter fullscreen mode Exit fullscreen mode

Run the query to find that 70765 rows have a cost greater than 0.
Cost

Task 4: Analyze the Billing Dataset with SQL
1.Retrieve All Billing Data
Run the following query to extract key billing details:

SELECT
  billing_account_id,
  project.id,
  project.name,
  service.description,
  currency,
  currency_conversion_rate,
  cost,
  usage.amount,
  usage.pricing_unit
FROM
  `billing_dataset.sampleinfotable`
Enter fullscreen mode Exit fullscreen mode

This query returns 415,602 rows of billing data.
billing_dataset
2.Find the Latest 100 Records with Charges
To identify the most recent charges, use:

SELECT
  service.description,
  sku.description,
  location.country,
  cost,
  project.id,
  project.name,
  currency,
  currency_conversion_rate,
  usage.amount,
  usage.unit
FROM
  `billing_dataset.sampleinfotable`
WHERE
  Cost > 0
ORDER BY usage_end_time DESC
LIMIT 100
Enter fullscreen mode Exit fullscreen mode

LIMIT 100
3.To find all charges that were more than 10 dollars, for Compose New Query, paste the following in Query Editor:

SELECT
  service.description,
  sku.description,
  location.country,
  cost,
  project.id,
  project.name,
  currency,
  currency_conversion_rate,
  usage.amount,
  usage.unit
FROM
  `billing_dataset.sampleinfotable`
WHERE
  cost > 10
Enter fullscreen mode Exit fullscreen mode

cost > 10
4.Determine the Product with the Most Billing Records
To find the product with the highest number of records, use:

SELECT
  service.description,
  COUNT(*) AS billing_records
FROM
  `billing_dataset.sampleinfotable`
GROUP BY
  service.description
ORDER BY billing_records DESC
Enter fullscreen mode Exit fullscreen mode

The result shows that Compute Engine has the most billing records (281,136).
Compute Engine
5.To find the most frequently used product costing more than 1 dollar, for New Query, paste the following in Query Editor:

SELECT
  service.description,
  COUNT(*) AS billing_records
FROM
  `billing_dataset.sampleinfotable`
WHERE
  cost > 1
GROUP BY
  service.description
ORDER BY
  billing_records DESC
Enter fullscreen mode Exit fullscreen mode

The query reveals that Cloud SQL has the most billing records (15) costing over $1.
6.Identify the Most Commonly Charged Unit of Measure
To determine the most frequently charged unit, use:

SELECT
  usage.unit,
  COUNT(*) AS billing_records
FROM
  `billing_dataset.sampleinfotable`
WHERE cost > 0
GROUP BY
  usage.unit
ORDER BY
  billing_records DESC
Enter fullscreen mode Exit fullscreen mode

The result shows that Byte-seconds is the most commonly charged unit.
usage.unit
7.Find the Product with the Highest Aggregate Cost
To calculate the total cost per product, run:

SELECT
  service.description,
  ROUND(SUM(cost),2) AS total_cost
FROM
  `billing_dataset.sampleinfotable`
GROUP BY
  service.description
ORDER BY
  total_cost DESC
Enter fullscreen mode Exit fullscreen mode

This query identifies the product with the highest total cost.

CONCLUSION

BigQuery may be used to import, analyze, and effectively extract insights from massive datasets. Whether you are looking through billing records or running intricate SQL queries, BigQuery gives you the resources you need to make informed decisions.

Top comments (0)