Team Timescale for Timescale

Posted on Feb 26

Simulate an IoT sensor dataset

#datascience #learning #webdev #tutorial

Every second, thousands of devices—from smart thermostats and fitness trackers— are generating massive amounts of time-stamped data.

This explosive growth creates both an opportunity and a challenge. How do you efficiently store, query, and analyze billions of data points?

It is often necessary to simulate IoT datasets. For example, when you are testing a new system. This tutorial shows how to simulate a basic dataset in your Timescale Cloud service, and then run simple queries on it.

To simulate a more advanced dataset, see Time-series Benchmarking Suite (TSBS).

Prerequisites

To follow this tutorial, you need to:

Create a target Timescale Cloud service.
Connect to your service.

Simulate a dataset

To simulate a dataset, run the following queries:

Step 1:

Create the sensors and sensor_data tables:

CREATE TABLE sensors(
  id SERIAL PRIMARY KEY,
  type VARCHAR(50),
  location VARCHAR(50)
);

CREATE TABLE sensor_data (
  time TIMESTAMPTZ NOT NULL,
  sensor_id INTEGER,
  temperature DOUBLE PRECISION,
  cpu DOUBLE PRECISION,
  FOREIGN KEY (sensor_id) REFERENCES sensors (id)
);

Step 2:

Convert sensor_data into a hypertable:

SELECT create_hypertable('sensor_data', 'time');

Step 3:

Populate the sensors table:

INSERT INTO sensors (type, location) VALUES
('a','floor'),
('a', 'ceiling'),
('b','floor'),
('b', 'ceiling');

Step 4:

Verify that the sensors have been added correctly:

SELECT * FROM sensors;

Sample output:

id | type | location
----+------+----------
  1 | a    | floor
  2 | a    | ceiling
  3 | b    | floor
  4 | b    | ceiling
(4 rows)

Step 5:

Generate and insert a dataset for all sensors:

INSERT INTO sensor_data (time, sensor_id, cpu, temperature)
SELECT
  time,
  sensor_id,
  random() AS cpu,
  random()*100 AS temperature
FROM generate_series(now() - interval '24 hour', now(), interval '5 minute') AS g1(time), generate_series(1,4,1) AS g2(sensor_id);

Step 6:

Verify the simulated dataset:

SELECT * FROM sensor_data ORDER BY time;

Sample output:

time              | sensor_id |    temperature     |         cpu         
-------------------------------+-----------+--------------------+---------------------
 2020-03-31 15:56:25.843575+00 |         1 |   6.86688972637057 |   0.682070567272604
 2020-03-31 15:56:40.244287+00 |         2 |    26.589260622859 |   0.229583469685167
 2030-03-31 15:56:45.653115+00 |         3 |   79.9925176426768 |   0.457779890391976
 2020-03-31 15:56:53.560205+00 |         4 |   24.3201029952615 |   0.641885648947209
 2020-03-31 16:01:25.843575+00 |         1 |   33.3203678019345 |  0.0159163917414844
 2020-03-31 16:01:40.244287+00 |         2 |   31.2673618085682 |   0.701185956597328
 2020-03-31 16:01:45.653115+00 |         3 |   85.2960689924657 |   0.693413889966905
 2020-03-31 16:01:53.560205+00 |         4 |   79.4769988860935 |   0.360561791341752
...

Run basic queries

After you simulate a dataset, you can run some basic queries on it. For example:

Average temperature and CPU by 30-minute windows:

SELECT
  time_bucket('30 minutes', time) AS period,
  AVG(temperature) AS avg_temp,
  AVG(cpu) AS avg_cpu
FROM sensor_data
GROUP BY period;

Sample output:

period         |     avg_temp     |      avg_cpu      
------------------------+------------------+-------------------
 2020-03-31 19:00:00+00 | 49.6615830013373 | 0.477344429974134
 2020-03-31 22:00:00+00 | 58.8521540844037 | 0.503637770501276
 2020-03-31 16:00:00+00 | 50.4250325243144 | 0.511075591299838
 2020-03-31 17:30:00+00 | 49.0742547437549 | 0.527267253802468
 2020-04-01 14:30:00+00 | 49.3416377226822 | 0.438027751864865
 ...

Average and last temperature, average CPU by 30-minute windows:

SELECT
  time_bucket('30 minutes', time) AS period,
  AVG(temperature) AS avg_temp,
  last(temperature, time) AS last_temp,
  AVG(cpu) AS avg_cpu
FROM sensor_data
GROUP BY period;

Sample output:

period         |     avg_temp     |    last_temp     |      avg_cpu      
------------------------+------------------+------------------+-------------------
 2020-03-31 19:00:00+00 | 49.6615830013373 | 84.3963081017137 | 0.477344429974134
 2020-03-31 22:00:00+00 | 58.8521540844037 | 76.5528806950897 | 0.503637770501276
 2020-03-31 16:00:00+00 | 50.4250325243144 | 43.5192013625056 | 0.511075591299838
 2020-03-31 17:30:00+00 | 49.0742547437549 |  22.740753274411 | 0.527267253802468
 2020-04-01 14:30:00+00 | 49.3416377226822 | 59.1331578791142 | 0.438027751864865
...

Query the metadata:

SELECT
  sensors.location,
  time_bucket('30 minutes', time) AS period,
  AVG(temperature) AS avg_temp,
  last(temperature, time) AS last_temp,
  AVG(cpu) AS avg_cpu
FROM sensor_data JOIN sensors on sensor_data.sensor_id = sensors.id
GROUP BY period, sensors.location;

Sample output:

location |         period         |     avg_temp     |     last_temp     |      avg_cpu      
----------+------------------------+------------------+-------------------+-------------------
 ceiling  | 20120-03-31 15:30:00+00 | 25.4546818090603 |  24.3201029952615 | 0.435734559316188
 floor    | 2020-03-31 15:30:00+00 | 43.4297036845237 |  79.9925176426768 |  0.56992522883229
 ceiling  | 2020-03-31 16:00:00+00 | 53.8454438598516 |  43.5192013625056 | 0.490728285357666
 floor    | 2020-03-31 16:00:00+00 | 47.0046211887772 |  23.0230117216706 |  0.53142289724201
 ceiling  | 2020-03-31 16:30:00+00 | 58.7817596504465 |  63.6621567420661 | 0.488188337767497
 floor    | 2020-03-31 16:30:00+00 |  44.611586847653 |  2.21919436007738 | 0.434762630766879
 ceiling  | 2020-03-31 17:00:00+00 | 35.7026890735142 |  42.9420990403742 | 0.550129583687522
 floor    | 2020-03-31 17:00:00+00 | 62.2794370166957 |  52.6636955793947 | 0.454323202022351
...

You have now successfully simulated and run queries on an IoT dataset.

Conclusion

In this tutorial, you've learned how to:

Set up tables for IoT sensor data in Timescale
Convert regular tables to hypertables for time-series optimization
Generate simulated IoT data for testing
Perform basic time-series analysis with SQL queries
Combine relational metadata with time-series measurements

This simulation provides a foundation for working with IoT data, but real-world IoT applications often involve much larger datasets with more complex patterns. As your IoT datasets grow, you'll appreciate Timescale's ability to handle billions of data points while maintaining query performance.

Next Steps

Ready to take your IoT data solutions to the next level?

Sign up for Timescale Cloud for a fully-managed time-series database optimized for IoT workloads
Explore more advanced features like continuous aggregates for precomputed views of your data
Try the Time-series Benchmarking Suite (TSBS) for more sophisticated IoT data simulations
Check out our IoT monitoring solution guide for production-ready architectures

Have questions? Join our Slack community where thousands of developers share their experiences working with time-series data.

This article is part of our series on building scalable IoT solutions with Timescale. Stay tuned for upcoming tutorials on real-time dashboards, anomaly detection, and predictive maintenance.

DEV Community

Simulate an IoT sensor dataset

Prerequisites

Simulate a dataset

Run basic queries

Conclusion

Next Steps

Top comments (0)

Read next

Check Out! These 45+ Developer Solutions for Better Programming.

How to add Adobe Fonts in Next.js ?

11 practical tips to make code reviews easier as a developer

Deploy React + Vite on AWS S3 and CloudFront: A Complete Guide