Why Selenium Sucks for End-To-End Testing in 2021

selenium Feb 21, 2021

Let's get to the basics, shall we?

What is an end-to-end test? We define it as a test that can potentially span multiple UIs and perform testing from an end-user's perspective.

Well, Selenium is not a good fit for neither cross-system testing nor for emulating a user’s real world experience.

Let me explain...

How does Selenium work?

Selenium was created in 2004. Way before the Single Page Apps were in favor and when the pages looked like this:

and HTML at the time looked like this:

Let's compare this with the modern version of similar part of Amazon's page:

Notice how the complexity grew exponentially? Where it used to be just one simple table, there are now 10+ levels of nested div elements!

Selenium by design encourages its users to stick to XPath locators. This approach might have worked in 2005 with simple stable structures of pages. But in 2021 pages have insanely complex barely-human-readable structures that are constantly changing. HTML was NOT designed for the fancy UI we use it for now. It is impossible to rely on any technical information like XPaths to make a reference to elements stable enough in an actively developed application. And things like ids and data-test-ids are not really working for list and table elements. I'm not even talking about lack of ids in at all in React.

Let's look at the XPath from the example above for an Amazon a-tag: /html/body/div[4]/div[2]/div/div[1]/div/div[2]/div/div[1]/div/div[1]/div[2]/div/div[2]/a

And this is the best Google Chrome could come up with:

//*[@id="zg_left_col1"]/div[1]/div[2]/div/div[2]/a

Even fancy SelectorsHub extension could only come up with this:

//div[@id='8mNf9lO2-mC1H7sJJMcE_g']//a[@class='a-link-normal']

This is absolutely unreadable and would be creating a maintenance nightmare technical debt!

Basically, the current way of working with the page has the following issues:

  1. It is nearly impossible to understand what element being referenced unless your Selenium code is heavily documented and that documentation is not our of sync with the code;
  2. Only developers can understand test failures since the error descriptions are cryptic;
  3. The structure had not been designed to properly handle modern apps with forms and tables - it lacks a stable reliable way to refer to elements.

The end result? Instead of creating more tests, you have to spend an increasingly large amount of time on test maintenance! We have often seen that after 1 or 2 years of developing tests, people spend 50%+ of their time on test maintenance instead of doing something productive.

Now compound that with cross-systems testing where you don't control the HTML of the systems under test. No amount of BDD/Shift-left will help you to reduce the amount of maintenance required to constantly catch up with someone-else's changes in 3rd party apps (think Salesforce).

How the end-to-end testing should work?

Think about it. What end-to-end tests are supposed to do? They are supposed to help you validate that your functionality works from the end-user's perspective.

Therefore, the way you should refer to elements should be from end-user's perspective. There should be an easy, stable way to work with forms and tables emulating a user interacting with a browser or device.

Users only care that they can enter data into the same field or click on a link in a table row that contains their unique reference.

Forms. Let's see an example on Amazon again:

with HTML:

Notice both the id and name of the element is clear and descriptive! Great! But is it?

The moment you change your UI framework to React your fancy ids are gone! When you migrate to some back-end-hooked rigid framework (or a new version of it) your name would probably have to change as well (think ASP.NET). And, this is EXACTLY when you want your end-to-end tests to work! Because you just migrated to a new framework!

Therefore, a proper end-to-end framework should never hook up onto the internals of your applications, but, rather, how it looks from end-user's perspective! Look at the City input. I'd argue that it will always have either placeholder saying "City" or whatever an end-user perceives as a "label".

Again, based on our experience (don't trust us, check for yourself) not everyone would have such a proper HTML structure like Amazon with label for structure in place. So, unfortunately, you can't rely on that either.

Therefore there should be a way to describe input from end-user's perspective relying on what is considered a "label" or a placeholder.

And it should look something like this: enter "San Francisco" into "City"

Right?

Next let's talk about tables shall we?

Here is one of the most widely used examples from Salesforce:

What the user cares about is validating that the row containing the ProperUniqueCompany has a certain status. Or that the down icon on the last column on that row can be clicked.

So, ideally, it should look something like:

Validate that table at row containing "ProperUniqueCompany" and column "Lead Status" contains "Open - Not Contacted"

or

Click on the table at the row containing "ProperUniqueCompany" and the last column

which should work regardless of how the table is rendered. As a HTML <table> like in Salesforce or using <div>-based rendering like in Amazon.

So using XPath to refer to elements is similar to this:

What users certainly don't care about are those ids, names, or data-test-ids of those elements. Moreover, they often would lead to situations where those ids/names/etc changed causing the test to fail even though from end-user's perspective everything is perfectly fine. And this is what would reflect the test stability! Think about it, if you only need to maintain your test when the application actually changes as opposed to when HTML code would change, wouldn't it be wonderful?

Great! You've successfully subscribed.
Great! Next, complete checkout for full access.
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.