Modern websites, like complex machinery, have lots of moving parts. For a typical WordPress website, that would include WordPress core, your theme (and child theme in some cases), and any number of 3rd-party plugins.
When you also take into consideration the hundreds of device types, browsers, and personal settings for each visitor, it quickly becomes apparent that ensuring your website is functioning correctly for each of its users - whether that's administrators, editors, members, customers, etc. - quickly becomes an impossible task to do manually.
How can we catch and fix errors on our site before our customers send us an email about it? We take a layered approach - to take a phrase from the Cybersecurity field - "defence in depth"!
In this article we're focusing on Visual Regression Testing (a.k.a. Change Monitoring or Snapshot/Screenshot Testing).
I'll soon be publishing articles about other monitoring approaches you can take to keep an eye on any issues that may affect your website.
Firstly, let's define what a regression is:
Definition: Regression - A return to a previous, less advanced, and/or worse state.
Example: After a plugin update, a contact form stops working. The submit button disappears, and visitors are no longer able to submit the contact form. Note: If this form is the only way visitors can reach you, they'll never be able to report this issue to you!
Example: An update to the theme code adds some styling updates to product pages, but causes layout issues on the cart and checkout pages preventing customers from checking out.
Regressions can come about in may different ways. In order to handle every case the development community has invested thousands of man hours building test suites to identify regressions - ideally before they make it to production!
In this article, our focus will be Visual Regression Testing. As the name implies, we are looking for regressions visually.
Visual Regression Testing is also known as change detection or screenshot comparison.
A visual regression is an unexpected change on a web page or component that you can identify by eyesight. That might be a broken widget, a layout shift, unexpected error messages, etc.
Not all changes are regressions though, if you are adding a new component to a page, or updating some copy, you would expect the page layout to change to accommodate this.
An expected change will get flagged for manual by a visual regression tool and will need to be marked as a false positive. This has the benefit of enabling the QA tester to validate that expected changes are being displayed correctly.
We are essentially doing an image comparison (of web page screenshots) over time, using a correct reference (baseline) image, and looking for unexpected changes in subsequent screenshots.
In order to identify regressions, software testing professionals need to write reliable and consistent tests that work across different browsers, browser versions, and devices.
The alternative to writing tests is a more general approach, which we'll cover later in this article.
Unit testing examines individual modules of an application in isolation.
If we're talking about visual regression testing, then we'd say that a unit test would focus on one particular low-level element - like a button - isolated from the rest of the website.
While developers can write unit tests to detect visual regressions for individual elements, it's often a bit too low-level. Generally most tests are written for more complex components, which combine these low-level elements, e.g. web forms, shopping carts, etc.
Integration testing checks whether a group of individual modules work correctly when grouped together.
We could focus on a web form as an example. Forms combine form fields, buttons, labels, validation states, etc. into a larger component, which we can write an integration test for.
Developers often write integration tests for important User Interface (UI) components. These tests run in isolation from the rest of a web page, as the UI component can be tested independently.
End-to-End testing attempts to identify issues that occur as if real users were interacting with your website.
Tests are written to represent "user flows". A user logging into their account, making a purchase, downloading a file, and logging out, is an example of a flow.
Writing end-to-end tests allow us to identify regressions that may not be apparent when a web page is loaded for the first time. Perhaps there's only a regression for logged in users, or for customers who have added an item to their shopping cart.
An end-to-end test will allow a testing professional to write one script, that can automate a standard set of actions across any number of browsers (and browser versions) across any number of devices. Any variance from the reference screenshots will result in a failed test, which can be flagged for manual inspection.
Playwright is my go-to tool for end-to-end visual regression testing.
Unfortunately, while the concept is straightforward enough, actually writing test suites is quite a bit more involved - usually requiring in-depth knowledge of a system and its edge cases.
Tests also require maintenance. As a website evolves old tests may need to be rewritten, and new tests may need to be developed. Unless you have a dedicated QA team for the entire lifecycle of your website, testing is a process that often gets left by the wayside.
For companies heavily invested in building their own platforms at scale - SaaS platforms, news sites, banks, eCommerce marketplaces, etc. - hiring and maintaining a QA team to build a test suite is a no-brainer, considering the impact a regression could have on their bottom lines and consumer trust.
For smaller businesses, hiring even a single QA tester or maintaining a test suite is too much of an ask. Even though the tools are often open source and free to use, it's too much of an effort to create and maintain a test suite.
There are a few things that you can do to help identify regressions before your website visitors do. They may require a bit of up-front time investment to set-up, and technical knowledge to diagnose any errors they report, but they are mostly "hands off" once they've been set-up.
Tools like Playwright allow for non-developers to generate end-to-end tests semi-automatically using their Codegen tool. If you're comfortable with the command line and don't mind working your way through a bit of technical documentation, then it's straightforward enough to set-up and run.
There aren't many visual regression tools built for non-developers. Most require integrations with build and/or continuous integration/delivery tools, or writing configuration files manually.
Visual regression tools that are available for non-technical website managers do exist, but they have limitations. Usually they will only be able to screenshot the page as it appears once it has loaded, but before the user has interacted with it. If a visual regression occurs after a user interaction - like a form submission - these tools won't detect that change.
It is important take a "defence in depth" approach though. By having layers of monitoring and tests - not just screenshot comparisons - you're more likely to identify regressions and flag them before your customers do.
We can analyse the HTTP response headers, contents and timings to monitor our site and set up an early warning system for issues like:
I like to use UptimeRobot for monitoring server responses.
Web servers keep access and error logs to record requests from visitors. The error logs in particular are helpful for identifying issues as - you guessed it - they will log errors!
With some hosting providers you'll be able to search and filter the logs via a control panel, but others may only have SSH access, requiring a bit of command line knowledge.
You can set up your own scripts and alerts to notify you when errors occur, but this is usually something you'd need a professional DevOps person to set-up properly.
Server logs can also be sent to external services that will allow for real time reporting and analytics. This is particularly useful if you're managing lots of services or your site has a lot of functionality and/or traffic. I use New Relic.
A typical marketing site is likely only to have minimal JS for functionality - like form validation and submission, analytics/tracking, and perhaps for a carousel or two. Most of the JS code will usually be written 3rd party libraries.
Pixel by Pixel comparison is the most simple way to identify changes on a web page. Simply put, a reference (baseline) screenshot is compared with test screenshots to create "visual diffs". Any differences - at a pixel level - are highlighted.
Most Visual Regression Testing software uses the Pixel Comparison method.
While pixel comparison is the most common method for identifying changes and regressions, it has its drawbacks:
While a layout shift is important to identify as a possible regression, when we're using a pixel comparison we run into a couple of issues.
For example, let's take an extreme case - the site header changes height, from 80px tall, to 70px tall, because the styling for a Call-to-Action (CTA) button changes, removing its padding. In this case the layout shift will cause the header, and everything below it to move up by 10px.
When using a pixel comparison for the above example, the entire web site will be flagged as having changed! While this makes sense, any other possible changes/regressions on the website could be missed. We could fix the CTA button regression and rerun the tests to identify other issues, but that requires new test screenshots, and is resource intensive.
To do a pixel perfect comparison, everything needs to be set up the same way. The same testing environment, device, browser settings, screen resolution, etc. Any changes to these external variables may result in false positives, as even slight changes to them can result in a very slight shift.
It is possible to address this by adding in additional processing to allow for thresholds. If a very small change is detected across the screenshot (something like 0.1%) then the tests can still pass checks. Most modern tools support thresholding.
The Document Object Model (DOM) is a representation of a website's HTML document as a tree structure, with nodes representing each HTML element.
By saving a copy of the DOM tree as a baseline reference, future versions of the DOM tree can be compared against it to identify changes and regressions.
It's important to note that the DOM tree can only be used for part of the comparison. Each HTML node will most likely have associated CSS rules, which will affect the style of the HTML element. This must be taken into consideration using this technique.
One of the issues with parsing the DOM is that HTML is almost too flexible. Not every website will be compliant with W3C HTML specifications, and as such - while they may display correctly to a human visitor - they can cause parsing issues, which can lead to unexpected results.
Microsoft published a paper - VIPS: a Vision-based Page Segmentation Algorithm - back in 2003 which parses the DOM and segments leaf nodes into visually related groups. This method could be used as an alternative to Image Processing Comparison, and it addresses the layout shift issue associated with Pixel Comparison. I have yet to see a real-world implementation of this method.
Image processing is my favourite option for regression testing, it doesn't require complex analysis of the DOM tree that you have with DOM Comparison and it addresses layout shift issues that you have with Pixel Comparison.
Some companies market this as "Visual AI", but I have yet to see any that actually use artificial intelligence. It's just a buzzword for image processing and computer vision.
Firstly, we capture a screenshot of the reference (baseline) web page. This image is then processed to extract unique visual elements from the screenshot, grouped together by proximity. We do this using OpenCV, a powerful image processing library.
This processing results in hundreds of smaller images - basically we've extracted all of the key elements and created mini screenshots of each one.
Now that we have our reference, we capture screenshots of the web page under test, perform the image processing, and extract the mini screenshots. Now we have something to compare!
We know the size and coordinates for each mini screenshot, so the first thing we do is rule out any identical mini screenshots between the reference and test. These will be any elements which haven't shifted position or changed style/content.
Now, we're left with a small group of screenshots that we can't find an exact match for. This could be because of a few reasons:
Using OpenCV and our own algorithm - described above - we're able to identify changes to individual elements between the reference and test web pages and flag any discrepancies to human reviewers.
Using image processing is more resource intensive than a direct Pixel Comparison, as we need to do a lot of image processing and calculations to identify visual changes. Technically it's also harder for developers to implement and tweak, requiring specialist knowledge.
Image processing provides more accurate and useful results for human reviewers. We are able to highlight exactly which visual elements have changed on a web page, not just because of another element that has caused a layout shift.
Manual comparison involves rapidly flicking between browser tabs or screenshots. If you're good at spot the difference, there's a good skill transfer.
Unfortunately manually comparing web pages doesn't scale very well.
If you've only got a brochure-ware website with 5 pages then it's possible to simply go through each page on your laptop and phone, which may be enough. Ideally you'd want to check each page on a variety of devices, browsers, etc.
Just because a web page looks right for you, doesn't mean there isn't a regression for other people using other devices, browsers, and configurations.
The main use case for visual regression testing is to identify unexpected changes. If something breaks, you want to be the first to know about it!
Visual regression testing won't fix a bug, but it will highlight the effect is has on your website and bring it to your attention.
You can get really sophisticated with visual regression testing, by examining specific parts of a web page for changes, or even performing and validating a series of steps - like a product purchase flow, or a multi-step contact form.
As businesses grow their online presence, 3rd-party integrations are becoming more prevalent. It's not just your own code and UI that you need to verify, but all of the other stuff too.
It's likely to be too time intensive for you to audit all of the 3rd-party code yourself, but you can at least test it on a staging server and validate that it is - at least visually - functioning as expected.
If an attacker is able to gain access to your website, they may attempt a defacement attack. A website defacement replaces the content on your website.
The new content could be anything the attacker chooses, usually a public message of some sort. The result for the website owner is usually public embarrassment and costly security audits.
Setting up visual regression scans won't prevent a defacement, but it'll alert you to the defacement much earlier, reducing the likelihood that visitors will see the defacement, and giving you time to address the security breach.
If you store personal data on your site, you must comply with privacy regulations - such as GDPR - to keep that data secure.
Examples of Personally Identifiable Information (PII) include addresses and contact information for eCommerce customers, data submitted via contact forms, membership information, cookies, etc.
You may also be storing sensitive business data on your website, e.g. access keys, internal reports, passwords, banking details, etc.
Having data being leaked could result in security breaches, targeted phishing attacks, loss of revenue, loss of consumer confidence, and heavy fines from regulators.
Visual regression tools can be configured to scan and parse text, looking for strings that may contain sensitive information.
Identifying a PII leak in a production environment due to a visual regression obviously isn't good, but knowing you have leaked PII is the first step. Ideally, any potential sources of leaks would have been caught earlier on during testing.
Visual regression testing allows for more granular monitoring than you can get from traditional uptime monitoring which just monitors server responses and text.
We can define more specific rules for what counts as "uptime". We want to ensure that the site is up for real users, not just technically up. Perhaps we want to verify that an event has fired, a 3rd-party script has executed successfully, or that there are no errors or warnings in the console logs.
The great thing about visual regression testing, is that you can test almost any website regardless of the technology it uses. This is known as black box testing.
There are benefits to having an underlying knowledge of the system though (white box testing), as it allows you to create tests for specific components and user flows, as well as with helping to add automations.
There isn't a visual regression tool that integrates specifically with WordPress, but I am building one - WP Diff!
As a Content Management System (CMS) WordPress as a has a market share of ~64%, the biggest by a huge margin. Because of this, it makes sense to build a white box visual regression tool that works natively with WordPress.
With access to the underlying WordPress CMS a visual regression tool like WP Diff can do a lot of things that a generic black box visual regression tool wouldn't be able to do without requiring a lot of configuration.
WP Diff automatically detects changes, sends you alerts, and helps keep your WordPress website up-to-date and performant.
Visit WPDiff.com to register your interest and apply to be a beta tester.
While writing this article I've personally reviewed as many of the visual regression testing services and tools I could get my hands on, in order to find out how they operate and their features.
Most tools are targeted towards professional development and QA teams. If you're reading this article, it's unlikely that you fall into either of those categories.
I'd like to focus on the visual regression testing services that non-technical website managers can use without writing a lot of tests, setting up config files, or interacting with the command line/Docker. There aren't many, so this is a short list!
|Service||Visual Regression Methodology||My Notes|
|WP Diff||Image Processing Comparison||WP Diff is still under development. My goal is to make it the best visual regression testing platform exclusively for WordPress websites.|
|Diffy||Image Processing Comparison||Diffy one of the best tools I have used. It's one of the handful of tools which does more than just pixel comparisons. It's image processing algorithm is excellent and helps eliminate false positives. It is quite expensive though!|
|Fluxguard||Pixel by Pixel Comparison||Fluxguard offers a lot out of the box, it identifies pages on your site automatically and will notify you by email when changes are detected. Its free tier is quite generous.|
|TestingBot||Manual Comparison||TestingBot will let you test one URL at at time across multiple browser types/versions. I found it a bit limited, as it won't let you run bulk tests, or automate visual comparisons.|
|VisualPing||Proprietary||Visualping is a great tool that's very easy to use, you add the URLs or elements on a web page you want to monitor for changes and it handles the rest. You can upload URLs in bulk, but it doesn't automatically detect new URLs for a given website, so you would need to update that list manually.|
Mojoaxel on Github has compiled a comprehensive list of Visual Regression tools and online services, if you're looking for more!