Dive into the browser rendering

As a front-end developer, the most commonly used tool is the browser, but I only have a basic understanding of how code is transformed into a website, and lack the details inside.

Before diving into the browser architecture, there is a basic concept of processes and threads. A process can be described as the execution program of an application; a thread is located within a process and executes any part of the process program. When an application is started, the operating system creates a process and a large block of memory. The program may create threads to help it works, although this is not necessary. When the application is closed, the process will also disappear and the memory will be released.

Chrome

In different browsers, the usage of processes and threads may vary completely, so I will primarily focus on Chrome. The browser can be create multiple processes, each with its own role.

Browser Process: interface display, user interaction tab bar, sub-process management, and more.
Renderer Process: it converts HTML, CSS and JavaScript into interactive web. The Blink engine and V8 engine both run in this process, and Chrome creates a rendering process for each tab.
GPU Process: Chrome did not have a GPU process at release, the original intention was to achieve 3D CSS effects, and later the web page and Chrome UI interface were controlled by GPU.
Network Process: responsible for loading network resources and initiating network requests.
Plugin Process: controls all plugins used by websites, such as Flash. Note that this is not related to Chrome Extensions.

For processes with multiple rendering programs like this, it is possible for one tab to become unresponsive without affecting the activity of other tabs. Generally speaking, when Chrome is running on powerful hardware, it may split each service into different processes to enhance stability; however, on devices with limited resources, Chrome will merge all services into one process in order to save memory usage.

Site Isolation

The rendering process allows cross-site iframes to run in a single rendering process, sharing memory space between different websites, allowing a.com and b.com to run in the same rendering process. Starting from Chrome 67, Site Isolation is enabled by default, ensuring that each cross-site iframe will have its own independent rendering process.

We’ve been talking about one renderer process per tab model which allowed cross-site iframes to run in a single renderer process with sharing memory space between different sites.

Opening devtools on a page with iframes running on different processes means devtools had to implement behind-the-scenes work to make it appear seamless. Even running a simple Ctrl+F to find a word in a page means searching across different renderer processes.

When you type a URL into the address bar, your input is handled by Browser process’s UI thread.

When a user starts to type into the address bar, the first thing UI thread asks is “Is this a search query or URL?”.

When a user hits Enter, the UI thread initiates a network call to get site content. Loading spinner is displayed on the corner of a tab, and the network thread goes through appropriate protocols like DNS lookup and establishing TLS Connection for the request.

Once the response body (payload) starts to come in, the Network thread looks at the first few bytes of the stream if necessary. If the response is an HTML file, then the next step would be to pass the data to the renderer process, but if it is a zip file or some other file then that means it is a download request so they need to pass the data to download manager. Once the data is available, SafeBrowing, an internal version of Google’s site-security system that checks if an ip address is in a blacklist, checks whether a site is malicious to make sure sensitive cross-site data does not make it to the renderer process.

Once all of the checks are done and Network thread is confident that browser should navigate to the requested site, the Network thread tells UI thread that the data is ready. UI thread then finds a renderer process to carry on rendering of the web page.

The browser process passes the data to the renderer process through the IPC pipeline, officially entering the renderer process. Once the renderer process “finishes” rendering, it sends an IPC back to the browser process. At this point, the UI thread stops the loading spinner on the tab.

Not an absolutely “finishes” rendering, because client side JavaScript could still load additional resources and render new views after this point.

Inner working of a Renderer Process

The renderer process’s core job is to turn HTML, CSS, and JavaScript into a web page that the user can interact with.

When the renderer process receives a commit message for a navigation and starts to receive HTML data, the main thread begins to parse the text string(HTML) and turn it into a DOM.

In addition, a website usually uses external resources such as images, CSS, and JavaScript, and these files need to be loaded from the network or from the cache. The main thread can request them one by one as it parses and builds the DOM, but to speed them up, a “preload scanner” runs concurrently. If there are things like <img> or <link> in the HTML document, preload scanner peeks at tokens generated by HTML parser and sends request to the network thread in the browser process.

But when the <script> tag is found during HTML parsing, it will pause the HTML parsing and execute the JavaScript code. Can not wait until the parsing is done to load the JavaScript, because the browser will not know if the execution of the JavaScript will affect the struture of DOM tree. If there is document.write in the JavaScript code to modify the HTML, then the previous HTML parsing is meaningless, that is why you need to put the script tag in the proper place. If your JavaScript does not use document.write, you can add async or defer attribute to the script tag. The browser then loads and runs the JavaScript code asynchronously and does not block the parsing.

After HTML parsing is completed, the DOM Tree will be generated. But at present, the specific display of nodes on DOM Tree is unknown, and main thread needs to parse CSS and determine the calculation style of each DOM Node. If there is no custom CSS style, the browser will have its own style sheet to parse. At now, the renderer process knows the structure of a document and styles for each nodes, which also needs to determine the coordinates of the node and how much area that node needs to occupy, called Layout. This is a process to find the geometry of elements.

The main thread generates a Layout Tree by traversing the DOM and computing style. Each Node on the Layout Tree records the x and y coordinates and the size of border.

The DOM Tree and the Layout Tree do not correspond one by one, and If the node is set to display: none, it will not appear on the Layout Tree(However, an element with visibility: hidden is in the Layout Tree). The Layout Tree is generated from the DOM Tree and computed styles, which corresponds to the node displayed on the page.

Painting

Finally, the z-index attribute will determine the order in which nodes are painted, impacting the hierarchy of node painting. Rendering based on the hierarchical relationship in the DOM Tree may lead to incorrect rendering. Therefore, to ensure correct hierarchy display on screen, the main thread traverses the Layout Tree to create a Paint Record that represents the order in which the records were painted. This stage is called painting, the act of converting content into pixels for display on computer screen is called rasterizing.

Compositing

Chrome currently uses a more complex reasterzing process called Compositing, which is dividing parts of a page into different layers, rasterizing them and compositing the page in a compositor thread. In simple terms, all the elements of the page are divided into layers according to some rules, and the layers are rasterized. At now, just need to composite the content of the view port into a frame to show the user.

Note: Quote from the original.

In order to find out which elements need to be in which layers, the main thread walks through the layout tree to create the layer tree. Once the layer tree is created and paint orders are determined, the main thread commits that information to the compositor thread. The compositor thread then rasterizes each layer. A layer could be large like the entire length of a page, so the compositor thread divides them into tiles and sends each tile off to raster threads. Raster threads rasterize each tile and store them in GPU memory. Once tiles are rastered, compositor thread gathers tile information called draw quads to create a compositor frame.

Draw quads: Contains information such as the tile’s location in memory and where in the page to draw the tile taking in consideration of the page compositing.

Compositor frame: A collection of draw quads that represents a frame of a page.

A compositor frame is then submitted to the browser process via IPC. At this point, another compositor frame could be added from UI thread for the browser UI change or from other renderer processes for extensions. These compositor frames are sent to the GPU to display it on a screen. If a scroll event comes in, compositor thread creates another compositor frame to be sent to the GPU.

There is also an article about how compositor is enabling smooth interaction when user input comes in. (Inside look at modern web browser (part 4))

Wrap-up

After the network thread in the browser process requests HTML data, it sends the data to the main thread of the renderer process through IPC. The main thread parses HTML to construct a DOM Tree, then performs style calculation to generate a Layout Tree, and generates a painting record by traversing the Layout Tree.

Then, the Layer Tree is generated by traversing the Layout Tree, and the main thread passes the Layer Tree and the painting information to the compositor thread, which divides the layers according to the rules.

The layer is divided into smaller tiles and passed to the raster thread for rasterization. After rasterization, the compositor will obtain the “draw quads” of the block information transmitted by the raster thread.

From this information, the compositor thread composite a compositor frame, which is then passed back via IPC to the browser process, which then passes it to the GPU for rendering.

And then it goes to the screen.

Why avoid a large of `Repaint` and `Reflow`

When the size and position properties of an element are changed, the browser will compute style, layout, paint, and all subsequent processes, which is called Reflow.

When only the color property is changed, the layout will not be recalculated, but style calculation and painting will still be triggered, which is called Repaint.

Reflow will cause Repaint, but Repaint may not necessarily cause Reflow.

Both of these cases will occupy the main thread, and JavaScript is also running on the main thread.

Since they are all executed on the main thread, there is a problem of racing to execution time, and if an animation is written that constantly causes repaint and reflow, the browser needs to run the style componting layout and painting operations at every frame.

As mentioned earlier, both DOM changes and style changes will trigger re-render. So, the browser will merge all the changes together in a queue and execute them at once, avoiding multiple re-renders.

div.style.color = 'blue';
div.style.marginTop = '30px';

There are two style changes, but the browser will only trigger a repaint and reflow once. But the writing is very bad, it will trigger twice.

div.style.color = 'blue';
var margin = parseInt(div.style.marginTop);
div.style.marginTop = (margin + 10) + 'px';

After setting the color for the div element in the above code, the second line requires the browser to give the position of the element, so the browser has to immediately reflow.

How to improve performance

Multiple reads(or writes) of the DOM should be grouped. Do not insert a write operation between two read operations.

// bad
div.style.left = div.offsetLeft + 10 + "px";
div.style.top = div.offsetTop + 10 + "px";

// good
var left = div.offsetLeft;
var top  = div.offsetTop;
div.style.left = left + 10 + "px";
div.style.top = top + 10 + "px";

If there are need to do a large of operations on the element. First, set the element to display: none, then operate this element, and finally restore the display. This only needs to trigger one repaint and reflow rather than a large of repaint and reflow. Because invisible elements do not affect repaint and reflow. In addition, the visibility: hidden element only affects the repaint, not the reflow.

The requestAnimationFrame() api solves this problem. This method is called every frame, and then it can divide into the JavaScript running tasks into smaller task blocks, pausing JavaScript execution before the frame time is end, returning the main thread. In the next frame, the main thread can execute layout and painting on time.

There is an animation property transform in CSS, which allows animations to run directly on the compositor thread and raster thread without going through layout and painting, so it is not affected by JavaScript execution on the main thread.

Dive into the browser rendering

Chrome

Site Isolation

Navigation Flow

Inner working of a Renderer Process

Painting

Compositing

Wrap-up

Why avoid a large of `Repaint` and `Reflow`

How to improve performance

Reference

Dive into the browser rendering

Chrome

Site Isolation

Navigation Flow

Inner working of a Renderer Process

Painting

Compositing

Wrap-up

Why avoid a large of Repaint and Reflow

How to improve performance

Reference

Why avoid a large of `Repaint` and `Reflow`