DEV Community

Cover image for Simulate an IoT sensor dataset
Team Timescale for Timescale

Posted on

Simulate an IoT sensor dataset

Every second, thousands of devices—from smart thermostats and fitness trackers— are generating massive amounts of time-stamped data.

This explosive growth creates both an opportunity and a challenge. How do you efficiently store, query, and analyze billions of data points?

It is often necessary to simulate IoT datasets. For example, when you are testing a new system. This tutorial shows how to simulate a basic dataset in your Timescale Cloud service, and then run simple queries on it.

To simulate a more advanced dataset, see Time-series Benchmarking Suite (TSBS).

Prerequisites

To follow this tutorial, you need to:

Simulate a dataset

To simulate a dataset, run the following queries:

Step 1:

Create the sensors and sensor_data tables:

CREATE TABLE sensors(
  id SERIAL PRIMARY KEY,
  type VARCHAR(50),
  location VARCHAR(50)
);
Enter fullscreen mode Exit fullscreen mode
CREATE TABLE sensor_data (
  time TIMESTAMPTZ NOT NULL,
  sensor_id INTEGER,
  temperature DOUBLE PRECISION,
  cpu DOUBLE PRECISION,
  FOREIGN KEY (sensor_id) REFERENCES sensors (id)
);
Enter fullscreen mode Exit fullscreen mode

Step 2:

Convert sensor_data into a hypertable:

SELECT create_hypertable('sensor_data', 'time');
Enter fullscreen mode Exit fullscreen mode

Step 3:

Populate the sensors table:

INSERT INTO sensors (type, location) VALUES
('a','floor'),
('a', 'ceiling'),
('b','floor'),
('b', 'ceiling');
Enter fullscreen mode Exit fullscreen mode

Step 4:

Verify that the sensors have been added correctly:

SELECT * FROM sensors;
Enter fullscreen mode Exit fullscreen mode

Sample output:

id | type | location
----+------+----------
  1 | a    | floor
  2 | a    | ceiling
  3 | b    | floor
  4 | b    | ceiling
(4 rows)
Enter fullscreen mode Exit fullscreen mode

Step 5:

Generate and insert a dataset for all sensors:

INSERT INTO sensor_data (time, sensor_id, cpu, temperature)
SELECT
  time,
  sensor_id,
  random() AS cpu,
  random()*100 AS temperature
FROM generate_series(now() - interval '24 hour', now(), interval '5 minute') AS g1(time), generate_series(1,4,1) AS g2(sensor_id);
Enter fullscreen mode Exit fullscreen mode

Step 6:

Verify the simulated dataset:

SELECT * FROM sensor_data ORDER BY time;
Enter fullscreen mode Exit fullscreen mode

Sample output:

time              | sensor_id |    temperature     |         cpu         
-------------------------------+-----------+--------------------+---------------------
 2020-03-31 15:56:25.843575+00 |         1 |   6.86688972637057 |   0.682070567272604
 2020-03-31 15:56:40.244287+00 |         2 |    26.589260622859 |   0.229583469685167
 2030-03-31 15:56:45.653115+00 |         3 |   79.9925176426768 |   0.457779890391976
 2020-03-31 15:56:53.560205+00 |         4 |   24.3201029952615 |   0.641885648947209
 2020-03-31 16:01:25.843575+00 |         1 |   33.3203678019345 |  0.0159163917414844
 2020-03-31 16:01:40.244287+00 |         2 |   31.2673618085682 |   0.701185956597328
 2020-03-31 16:01:45.653115+00 |         3 |   85.2960689924657 |   0.693413889966905
 2020-03-31 16:01:53.560205+00 |         4 |   79.4769988860935 |   0.360561791341752
...
Enter fullscreen mode Exit fullscreen mode

Run basic queries

After you simulate a dataset, you can run some basic queries on it. For example:

  • Average temperature and CPU by 30-minute windows:
SELECT
  time_bucket('30 minutes', time) AS period,
  AVG(temperature) AS avg_temp,
  AVG(cpu) AS avg_cpu
FROM sensor_data
GROUP BY period;
Enter fullscreen mode Exit fullscreen mode

Sample output:

period         |     avg_temp     |      avg_cpu      
------------------------+------------------+-------------------
 2020-03-31 19:00:00+00 | 49.6615830013373 | 0.477344429974134
 2020-03-31 22:00:00+00 | 58.8521540844037 | 0.503637770501276
 2020-03-31 16:00:00+00 | 50.4250325243144 | 0.511075591299838
 2020-03-31 17:30:00+00 | 49.0742547437549 | 0.527267253802468
 2020-04-01 14:30:00+00 | 49.3416377226822 | 0.438027751864865
 ...
Enter fullscreen mode Exit fullscreen mode
  • Average and last temperature, average CPU by 30-minute windows:
SELECT
  time_bucket('30 minutes', time) AS period,
  AVG(temperature) AS avg_temp,
  last(temperature, time) AS last_temp,
  AVG(cpu) AS avg_cpu
FROM sensor_data
GROUP BY period;
Enter fullscreen mode Exit fullscreen mode

Sample output:

period         |     avg_temp     |    last_temp     |      avg_cpu      
------------------------+------------------+------------------+-------------------
 2020-03-31 19:00:00+00 | 49.6615830013373 | 84.3963081017137 | 0.477344429974134
 2020-03-31 22:00:00+00 | 58.8521540844037 | 76.5528806950897 | 0.503637770501276
 2020-03-31 16:00:00+00 | 50.4250325243144 | 43.5192013625056 | 0.511075591299838
 2020-03-31 17:30:00+00 | 49.0742547437549 |  22.740753274411 | 0.527267253802468
 2020-04-01 14:30:00+00 | 49.3416377226822 | 59.1331578791142 | 0.438027751864865
...
Enter fullscreen mode Exit fullscreen mode
  • Query the metadata:
SELECT
  sensors.location,
  time_bucket('30 minutes', time) AS period,
  AVG(temperature) AS avg_temp,
  last(temperature, time) AS last_temp,
  AVG(cpu) AS avg_cpu
FROM sensor_data JOIN sensors on sensor_data.sensor_id = sensors.id
GROUP BY period, sensors.location;
Enter fullscreen mode Exit fullscreen mode

Sample output:

location |         period         |     avg_temp     |     last_temp     |      avg_cpu      
----------+------------------------+------------------+-------------------+-------------------
 ceiling  | 20120-03-31 15:30:00+00 | 25.4546818090603 |  24.3201029952615 | 0.435734559316188
 floor    | 2020-03-31 15:30:00+00 | 43.4297036845237 |  79.9925176426768 |  0.56992522883229
 ceiling  | 2020-03-31 16:00:00+00 | 53.8454438598516 |  43.5192013625056 | 0.490728285357666
 floor    | 2020-03-31 16:00:00+00 | 47.0046211887772 |  23.0230117216706 |  0.53142289724201
 ceiling  | 2020-03-31 16:30:00+00 | 58.7817596504465 |  63.6621567420661 | 0.488188337767497
 floor    | 2020-03-31 16:30:00+00 |  44.611586847653 |  2.21919436007738 | 0.434762630766879
 ceiling  | 2020-03-31 17:00:00+00 | 35.7026890735142 |  42.9420990403742 | 0.550129583687522
 floor    | 2020-03-31 17:00:00+00 | 62.2794370166957 |  52.6636955793947 | 0.454323202022351
...
Enter fullscreen mode Exit fullscreen mode

You have now successfully simulated and run queries on an IoT dataset.


Conclusion

In this tutorial, you've learned how to:

  • Set up tables for IoT sensor data in Timescale
  • Convert regular tables to hypertables for time-series optimization
  • Generate simulated IoT data for testing
  • Perform basic time-series analysis with SQL queries
  • Combine relational metadata with time-series measurements

This simulation provides a foundation for working with IoT data, but real-world IoT applications often involve much larger datasets with more complex patterns. As your IoT datasets grow, you'll appreciate Timescale's ability to handle billions of data points while maintaining query performance.

Next Steps

Ready to take your IoT data solutions to the next level?

Have questions? Join our Slack community where thousands of developers share their experiences working with time-series data.

This article is part of our series on building scalable IoT solutions with Timescale. Stay tuned for upcoming tutorials on real-time dashboards, anomaly detection, and predictive maintenance.

Top comments (0)