DEV Community

Hana Wang
Hana Wang

Posted on

OpenMLDB SQL Emulator — a Development and Debugging Tool for OpenMLDB SQL

In this blog, we would like to introduce an excellent standalone tool from the OpenMLDB community — OpenMLDB SQL Emulator (https://github.com/vagetablechicken/OpenMLDBSQLEmulator). This tool allows users to develop and debug OpenMLDB SQL more efficiently and conveniently.

To efficiently implement time-series feature calculations, OpenMLDB SQL has improved and extended standard SQL. In practical use, beginners often encounter problems such as unfamiliar syntax and confusing execution modes when using OpenMLDB SQL. Suppose one directly develops and debugs on OpenMLDB itself, due to issues including deployment, index building, handling large data volume, and so on. In that case, he/she will end up wasting a lot of time on irrelevant tasks and might not be able to pinpoint the root cause easily.

The OpenMLDB SQL Emulator is a lightweight simulation development and debugging tool for OpenMLDB SQL. It allows for SQL validation and debugging operations without the need for deploying an OpenMLDB cluster. We strongly recommend this tool to our application developers. They can use this tool to quickly verify the correctness and deployability of SQL, before switching to the actual OpenMLDB environment for deployment.

Installation

Download the runtime package emulator-1.0.jar from the project page at https://github.com/vagetablechicken/OpenMLDBSQLEmulator/releases. Use the following method to start (Note that the current release version 1.0 corresponds to SQL syntax for OpenMLDB 0.8.3):

java -jar emulator-1.0.jar
Enter fullscreen mode Exit fullscreen mode

Note: If you want to execute SQL using the run command to validate results, you will also need to download toydb_run_engine from the same page and store it in the system's /tmp directory.

Creation of Virtual Databases and Tables

Once started, it will directly enter the default database emudb, and no additional database creation is required.

  • Databases don’t need to be explicitly created. Just use use or specify the database name when creating tables, and the database will be created automatically.

  • Use the addtable command or t to create a virtual table. Repeatedly creating a table with the same name is considered an update operation and will use the latest table schema. We use a simplified SQL-like syntax to manage tables. For example, the following example creates a table with two columns:

addtable t1 a int, b int64
Enter fullscreen mode Exit fullscreen mode
  • Use the showtables command or st to view all current databases and tables.

Validation of OpenMLDB SQL

To verify whether OpenMLDB SQL can be deployed in the cluster setting, you can use DEPLOY. Note that DEPLOYMENT and index need to be managed. For instance, if a certain DEPLOYMENT is not needed, it needs to be manually deleted. Similarly, if unnecessary indexes are created, they must be cleaned up.

Hence, it is suggested to test and verify in the Emulator instead. You can use val and valreq to perform online batch mode and online request mode (i.e., service deployment) OpenMLDB SQL validation. For example, testing whether a SQL can be DEPLOYed online using the valreq command:

addtable t1 a int, b int64
valreq select count(*) over w1 from t1 window w1 as (partition by a order by b rows between unbounded preceding and current row);
Enter fullscreen mode Exit fullscreen mode

If the test fails, it will print SQL compilation errors. If successful, it will print validate * success. The entire process happens in a virtual environment, without any concerns about resource utilization after table creation, and without any side effects. Any SQL that passes the valreq validation will definitely be able to be deployed in a real cluster.

Testing of OpenMLDB SQL

OpenMLDB Emulator can also return computation results to verify if the implemented SQL gives correct calculation results. You can continuously perform computations and online validations until the implemented SQL meets the expectation. This functionality can be achieved through the run command in the Emulator. Note that using the run command requires additional support from toydb_run_engine. You can use the built-in toydb from the emulator package, or download it from this page (https://github.com/vagetablechicken/OpenMLDBSQLEmulator/releases) and place it directly in /tmp.

Assuming Emulator already has toydb installed, here are the steps for testing SQL:

# step 1, generate a yaml template
gencase
# step 2, modify the yaml file to add table and data
# ...
# step 3, load yaml to get table catalog, 
# then using val/valreq sql to validate the sql in emulator
loadcase
valreq <sql>
# step 4, dump the sql, this will rewrite the yaml file
dumpcase <sql>

# step 5, run sql using toydb
run
Enter fullscreen mode Exit fullscreen mode

The command gencase generates a YAML template file, defaulting to the directory /tmp/emu-case.yaml. You'll need to edit this YAML file as shown below. When editing, consider the following:

  • Modify table names, table schema, and their data, which can not be changed in the Emulator.

  • Modify the run mode to accept batch or request mode.

  • You may leave the SQL section blank. SQL can be written to the file in the Emulator using dumpcase . A common usage is to validate SQL first, then dump it to the case, and finally, use the run command to confirm if the SQL calculation meets expectations.

  • The table’s indexes don’t need manual filling. When using dumpcase, indexes can be automatically generated based on the table schema (indexes are not specific to SQL and are only needed to create at least one index when creating the table). If you are not using dumpcase, please manually specify at least one index.

# call toydb_run_engine to run this yaml file
# you can generate yaml cases for reproduction by emulator dump or by yourself

# you can set the global default db
db: emudb
cases:
  - id: 0
    desc: describe this case
    # you can set batch mode
    mode: request
    db: emudb # you can set default db for case, if not set, use the global default db
    inputs:
      - name: t1
        db: emudb # you can set db for each table, if not set, use the default db(table db > case db > global db)
        # must set table schema, emulator can't do this
        columns: ["id int", "pk1 string","col1 int32", "std_ts timestamp"]
        # gen by emulator, just to init table, not the deployment index
        indexs: []
        # must set the data, emulator can't do this
        data: |
          1, A, 1, 1590115420000
          2, B, 1, 1590115420000
    # query: only support single query, to check the result by `expect`
    sql: |

    # optional, you can just check the output, or add your expect
    # expect:
    #   schema: id:int, pk1:string, col1:int, std_ts:timestamp, w1_col1_sum:int, w2_col1_sum:int, w3_col1_sum:int
    #   order: id
    #   data: |
    #     1, A, 1, 1590115420000, 1, 1, 1
    #     2, B, 1, 1590115420000, 1, 1, 1
Enter fullscreen mode Exit fullscreen mode

For simplicity, let’s not make any modifications and directly use this template to demonstrate how to modify the running case. In the Emulator, executing loadcase will load the table information from this case into the Emulator. You can confirm the successful loading of the case's tables by using st/showtables.

emudb> st
emudb={t1=id:int32,pk1:string,col1:int32,std_ts:timestamp}
Enter fullscreen mode Exit fullscreen mode

You can see that the table information has been successfully loaded. Now, we can use valreq to confirm if the SQL we've written is syntactically correct and deployable. Subsequently, you can perform a computation test on this SQL using the dumpcase and run commands. For example:

valreq select count(*) over w1 from t1 window w1 as (partition by id order by std_ts rows between unbounded preceding and current row);
dumpcase select count(*) over w1 from t1 window w1 as (partition by id order by std_ts rows between unbounded preceding and current row);
run
Enter fullscreen mode Exit fullscreen mode

dumpcase command actually writes the SQL and default indexes into the case file, and the run command executes this case file. Therefore, if you are skilled enough, you can directly modify this case file and then run it in the Emulator using run, or alternatively, use toydb_run_engine --yaml_path=... to run it. After execution, you will obtain the computed results for debugging and inspection purposes.

More

The OpenMLDB SQL Emulator also features a genddl function that helps users generate optimal index creation statements directly from SQL. This feature aids in avoiding redundant indexes and currently supports only single database. In future, there will be improvements in index handling, providing simpler and more convenient operations to guide users in index management.

Additionally, for usage of the Emulator, it’s recommended to utilize the ?help and ?list-all prompts. Commands are in lowercase, but SQL parameter inputs are case-insensitive and do not require additional double quotes, aligning with CLI conventions. Functionalities such as command history and exporting the current environment will be added to facilitate user operations and integration with real OpenMLDB clusters in future updates.


For more information on OpenMLDB:

This post is a re-post from OpenMLDB Blogs.

Top comments (0)