A browser story

I find it very cool how browsers manage to make all this World Wide Web possible! So I couldn't help but peak behind the scenes and see how they do it.

into the night
Photo by Zhen Hu on Unsplash

Ever since I saw Addy Osmani talking about building a browser to understand how they work, I wondered how much do I really know about the things that run my daily code?! So, after heavy research and experimentation, I now have a pretty good idea which I'll share through this article.

Oh, one more thing: I'm not involved in any way with any of the browsers or the people who contribute to them. So this information is not from the "inside" but rather from "outside", from developing for the Web for quite some time plus doing some tinkering on the side to understand better what the heck is going on.

# HTML: not so boring after all

We might easily dismiss HTML as the least interesting of all three. After all, we all know about DOM so that should be pretty much it, right? Well, not really...

Did you know that HTML is a progressive language, meaning the browser can make use of it as it comes off the network. Think of sending a cake piece-by-piece. I don't need the whole cake to start eating it, just like the browser doesn't need the whole HTML to do something with it.

But don't take my word for it, try for yourself by running the below demo.

Progressive HTML
  1. git clone https://github.com/iampava/a-browser-story
  2. npm install
  3. npm start
  4. Navigate to: localhost:8080/progressive-html

Great! Now, for every single part the browser does 3 things:

Tokenzation

Before constructing the DOM, the browser splits the HTML code into so called tokens, each being either a Start tag, End tag or text between tags. Here's a visual representation for it, inspired by Yoav Weiss article in Smashing Book 6. [PS: to say this is a great book is an understatement. Buy it, enjoy it and share the news!]

Tokenization step
Tokenization step

Preload scanner

Those tokens are needed for step 2, the pre-loading of assets. This is an optimization made to fully use the main-thread and the network at all times. This explanation might be a little confusing, so let's look at a concrete example of how things used to work.

<!DOCTYPE html>
<html lang="en">
<head>
    <script src="main.js"></script>
</head>
<body>
    <img src="bob.png">
    <img src="alice.png">
    
    <script src="analytics.js"></script>
    
    <style rel="stylesheet" href="style.css"></style>
</body>
</html>

As we now, JS scripts are blocking. They stop the HTML parsing until they've been fetched and executed. So, while main.js is being downloaded and executed, nothing else is being transferred over the network.

After that's done, the parsing continues and the browser downloads in parallel bob.png and alice.png. Then it encounters another script - analytics.js - and again stops everything while it fetches and executes it. Only then it finally moves on, finds the last 3rd party resource and completes the DOM.

Bleah, a lot of wasted time that could be used to download things...

Correct! So this is where this optimizations comes into play. Before kicking off DOM creation, the browser looks at the tokens from the last step and based on some internal rules starts downloading those assets that it's pretty sure it will need in the future. This way we can make use of assets immediately after JS finishes running.

I thought long and hard of how could I test this myself. All modern browsers come with this optimization and I didn't really want the headache of installing very old IE versions. Still, I think I found a solution. I added a while(true) { } inside a script placed in <head> and inspected the network. All the assets after the script have been downloaded, which wouldn't have happened without this optimization. Hooray!

DOM

And finally the DOM is created and voila, we're reading for the next step: CSS.

# CSS: can't skip styling

Having the DOM is unfortunately not enough. We need some styling to actually see the content.

So, the browser will take it's default rules - div's are block elements, body's color is black, etc. - combine them with our own custom styles to create, after resolving all the priority rules, what is called the CSSOM (CSS Object Model). This is a data-structure as well as an API, containing the styles of all the elements and pseudo'elements in our page.

CSSOM representation
CSSOM representation

Being an API as well as a data structure means we can access those styles from JavaScript via the getComputedStyle API.

This is why, no matter how fast our external script arrives, it won't run until every piece of CSS before it has been downloaded (in case of external sheets), parsed and trasformed into CSSOM.

But, don't take my word for it! Here's a demo:

Waiting for CSS demo
  1. git clone https://github.com/iampava/a-browser-story
  2. npm install
  3. npm start
  4. Navigate to: localhost:8080/blocking-scripts

Ok, back to the styling flow. Now that we've figured out all the priorities, it's time to create the Render tree. This is a 1-to-1 matching between the DOM and CSSOM just for the visible elements. By visible I mean everything except those with display: none which are definitely not in the page.

Next is layout. This is where the browser calculates the size and position of all elements. There's also an API allowing us to access these results: getBoundingClientRect().

Layout representation
Layout representation

Only 2 steps left now. Second to last is paint where the browser paints each element onto the layer it's part of.

Wait, what! Layers?

Yes, because elements can overlap one another so each will be painted onto a different layer.

Painting on different layers

Finally those layers are combined in the right order, to compose the final page.

Compositing the different layers

# Fonts: why u not simple?

There are a lot of easy things in life...
...neither of which is called "font loading"!

Knowing how external fonts are loaded goes a long way into optimizing our apps. Because when faced with such a situation, the browser delays painting the text until:

Font loading demo
  1. git clone https://github.com/iampava/a-browser-story
  2. npm install
  3. npm start
  4. Navigate to: localhost:8080/font-loading

Have a look at the demo above. On the left side the text shows after 3 seconds, the time it takes for the font to arrive. On the right however, the browser gives up waiting after about 3 seconds. You can imagine it being like:

Hey dude, it's taking too long! Better show some text to this user of yours!

The text will be showed in what is called a fallback font, one already available on the device. But, and here's the interesting thing, when the font does finally come, the browser will swap it with the fallback one creating a rather unpleasant experience if you're in the middle of reading.

# Cool, but now what?

Good question. Knowing this stuff definitely helps you build more performant apps, but only if you're working on an app that needs performance. I know I wasn't in this situation at my first workplace so this article wouldn't have helped much...

Still, you can maybe benefit from it by being cool AF at your next interview. We all know the classic CSS question:

How do you center an element horizontally as well as vertically?

To which you can now respond...

To hell with this! How about YOU tell me how CSS works behind the scenes, cool guy! 😎

...

But maybe you don't want to try it, it's your job after all and I don't blame you. Thankfully there are also some pragmatic take'aways from here:

In terms of HTML it means that if you have a trully gigantic one, you might want to move your very critical assets (CSS, JS, Fonts) towards the top of the document, so they'll be part of the first round-trip. This way, they will get picked up by the preload scanner and downloaded as soon as possible.

Knowing the styling flow will help you in creating extremely performant CSS animations. Maybe you've heard advice like:

Never animate width or height!

OR

Only animate transition and opacity!

If you did, did you ever wonder what it's based on?

Changing CSS properties on the fly means re'doing some of the CSS steps discussed before. Some properties require the browser to start all the way from layout, then paint and composite. An example of this is the margin property. Modifying it means recalculating the positions of all other elements. Expensive!

Other properties, like color, only need re'running paint and composite because they don't affect the layout. But some are even more performant, requiring just the composite phase. In Chrome and Firefox these are - you've guessed them - transform and opacity! Here's a useful website with every CSS property and the different phases of the process it affects based on browser: CSS Triggers

And finally, knowing how fonts and text work means we can make sure our users have the best reading experience possible. Step one is the font-display CSS property allowing us a little bit of control on the wait times. Although it's support is not 100%, it's polyfillable! ❤

font-display CanIUse table
font-display CanIUse table

Then there's also font preloading. By adding a <link> like this in <head>

<link rel="preload" as="font" href="/dist/assets/fonts/comfortaa-v12-latin-regular.woff2" type="font/woff2" crossorigin="anonymous" />

we tell the browser:

Listen to me: I know what I'm doing so start downloading this font NOW!
Portrait of Pava
hey there!

I am Pava, a front end developer, speaker and trainer located in Iasi, Romania. If you enjoyed this, maybe we can work together?