Selenium Webdriver Architecture

Before starting the automation using any automation tool, it is very important to know how that tool works and how it is architecture. This will helps to take the good advantage of the tool at the same time it will helps to make right automation framework.

At a very high level, Selenium is a suite of three tools.

Selenium IDE is an extension for Firefox that allows users to record and playback tests. The record/playback paradigm can be limiting and isn’t suitable for many users.

Selenium WebDriver provides APIs in a variety of languages to allow for more control and the application of standard software development practices.

*An application programming interface (API) is a particular set of rules (‘code’) and specifications that software programs can follow to communicate with each other. It serves as an interface between different software programs and facilitates their interaction, similar to the way the user interface facilitates interaction between humans and computers.

So Selenium WebDriver is also a well-designed object oriented API which helps in communication between languages and browsers.

Selenium Grid makes it possible to use the Selenium APIs to control browser instances distributed over a grid of machines, allowing more tests to run in parallel.

Here I have tried to Simplify as much as possible to understand how the architecture of Selenium formed.

WebDriver Architecture can be easily understood if you break it into parts and then learn it. So here we will divide the Architecture into three parts:

  1. Language level bindings
  2. Selenium WebDriver API
  3. Browser Drivers

Language Level Bindings
To support multiple languages, selenium community has developed language bindings. If you want to use the browser driver in Java, use the Java bindings for Selenium WebDriver. If you want to use the browser driver in C#, Ruby or Python, use the binding for that language.

Selenium WebDriver API
It is an API which makes possible to communication between programming languages and browsers. It follows object oriented concepts. It has multiple classes and interfaces.

Browser drivers

A browser driver helps in communication with browser without revealing the internal logic of browser’s functionality. The browser driver is the same regardless of the language used for automation.

Now let’s understand the whole process:

Let say you have written test using java (binding code) against Selenium API and that binding code is going to issue commands across JSON wire protocol which is a rest-based web service that is able to interpret those commands.

The browser Driver receives the HTTP request (commands) then executes the commands at actual browser and sends the result through the WebDriver API to your system where you have written the code and then you can actually see the result.

Note: JavaScript Object Notation (JSON) is used to represent objects with complex data structures. It is used primarily to transfer data between a server and a client on the web. It has very much become an industry standard for various REST web services, playing a strong alternative to XML.

wire protocol refers to a way of getting data from point to point.

Our WebDriver uses the same approach to communicate between client libraries (language bindings) and drivers, such as Firefox Driver, IE Driver, Chrome Driver, and so on

Leave a Reply

Your email address will not be published. Required fields are marked *