DEV Community

Cover image for Data Validation and Sanitization in WordPress
GP Web Dev
GP Web Dev

Posted on

Data Validation and Sanitization in WordPress

When developing for WordPress, appropriate data validation and sanitization is critical for proper security of our themes and plugins.

What's the difference?

  • Validation – Check to see that the data we have received is what it should be. Example: check that an e-mail looks like an e-mail address, that a date is in date format and that a number is (or is casted as) an integer/decimal.
  • Sanitization / Escaping – Apply filters to data to make it ‘safe’ in a specific context. Example: to display HTML code in a text area all HTML tags must be replaced by entity equivalents, otherwise browser will render the HTML.

Rule No. 1: Do not trust users

Never assume that any data entered by the user is safe. Also never assume that data coming from database is safe – even if you had made it ‘safe’ prior to inserting it there. This applies to frontend forms and WP admin dashboard.

Rule No. 2: Validate Early, Escape Late

Validate on data input early, as soon as you receive it from the user. Escape (or sanitize) on output late, just before you use, display or return data. What form this sanitization takes, depends entirely on the context you are using it in.

Rule No. 3: Do trust WordPress

As opposed to direct SQL queries, when using native WordPress functions data is properly sanitized (for the appropriate context) by WP core. One more reason to always prefer using native functions, wherever possible, instead of custom code.


Data Validation

The first concern when receiving data from a user is not safety but rather validity. What ‘valid’ means is up to you and if you’re doing it right, WordPress will take care of safely adding the data to the database. Validity could mean a valid email address, a positive integer, text of a limited length, or one of an array of specified options. WordPress offers a lot of functions that can help with any validity we require.

  • Numbers – When expecting numeric data, it’s possible to it with is_int or is_float and it is usually sufficient to simply cast the data as numeric with: intval or floatval. For zero padding, WordPress provides the function zeroise().
  • E-mails – To check the validity of e-mails, WordPress has the is_email() function.
  • HTML – WordPress provides a family of functions of the form wp_kses_* . The wp_kses() is a very flexible function, allowing you to remove unwanted tags, or just unwanted attributes from tags. Specifying every allowed tag and attribute can be a laborious task so WordPress provides wp_kses with pre-set allowed tags and protocols:
  • Filenamessanitize_file_name( $filename ) sanitizes by removing characters that are illegal in filenames. Replaces spaces with dashes and consecutive dashes with a single dash and removes periods, dashes and underscores from the beginning and end of the filename. wp_unique_filename( $dir, $filename ) returns a unique (for directory $dir), sanitized filename (it uses sanitize_file_name).
  • Text Fields – WordPress provides sanitize_text_field() to strip out extra white spaces, tabs and line breaks, as well as stripping out any tags when receiving data from a text field.
  • Keys – WordPress also provides sanitize_key that ensures the returned variable contains only lower-case alpha-numerics, dashes, and underscores.

Further information about Input validation can be found here.


Data Sanitization

While validation is about making sure data is valid – data sanitization is about making it safe. While some of the validation functions might be useful in making sure data is safe – in general, it is not sufficient. Even ‘valid’ data might be unsafe in certain contexts. What is safe to use in one context, is not necessarily safe in another. This is why WordPress often provides several functions for the same content, for instance:

  • the_title – for using the title in standard HTML (inside header tags, for example)
  • the_title_attribute – for using the title as an attribute value (usually the title attribute in <a> tags)
  • the_title_rss – for using the title in RSS feeds

Sometimes though, we’ll need to perform our own sanitization – often because we have custom input beyond the standard post title, permalink, content etc. that WordPress handles for us.

  • Escape HTML – to avoid Cross-site scripting with injected scripts, WordPress provides the well known esc_html function. This should always be used when printing variables: <h1> <?php echo esc_html($title); ?> </h1>
  • Escape Attributes – to escape unsafe characters (such as quotes and double-quotes), WordPress provides the function esc_attr. Like esc_html it replaces ‘unsafe’ characters by their entity equivalents. Example: <input type="text" name="myInput" value="<?php echo esc_attr($value);?>"/>
  • Translated Escapes – Both esc_html and esc_attr also come with __, _e, and _x variants.
  • HTML Class Names – WordPress provides sanitize_html_class – this escapes variables for use in class names, simply by restricting the returned value to alpha-numerics, hyphens and underscores.
  • Escape URLs – When printing variables into the href attribute you should use:
    • esc_url – for escaping URLs that will be printed to the page.
    • esc_url_raw – for escaping URLs to save to the database or use in URL redirecting.
  • Escape JS – When you want to print PHP variables in JavaScript you should be using wp_localize_script() – which handles sanitization for you. In the case you do want to do directly then you can use the esc_js function to make it safe: <script> var myVar = '<?php echo esc_js($variable); ?>'; </script>
  • Escape Textarea – For this WordPress provides esc_textarea, which is almost identical to esc_html, but does double encode entities. Essentially it is little more than a wrapper for htmlspecialchars.
  • Antispambot – WordPress provides antispambot, which encodes random parts of the e-mail address into their HTML entities and protects them from e-mail harvesters.
  • Query Strings – the safest and easiest way is to use add_query_arg and remove_query_arg. These functions handle all the necessary escaping for for the arguments and their values for use in the URL.

More info about Output Sanitization can be found here.


Database Escaping

When using functions such as get_posts or classes such as WP_Query and WP_User_Query, WordPress takes care of the necessary sanitization in querying the database. However, when retrieving data from a custom table, or otherwise performing a direct SQL query on the database – proper sanitization is then up to you. WordPress, however, provides a helpful class, the $wpdb class, that helps with escaping SQL queries.

For a more complete overview of SQL escaping in WordPress, see database Data Validation.


Top comments (0)