Skip to main content

Command Palette

Search for a command to run...

How a Browser Works: A Beginner-Friendly Guide to Browser Internals

Published
6 min read
How a Browser Works: A Beginner-Friendly Guide to Browser Internals

Have you ever typed a URL into your browser and pressed Enter and wondered: “What actually happens behind the scenes to show me this page?

It might seem instant, but your browser goes through a series of steps to fetch, interpret, and display the website:

  1. Fetches resources like HTML, CSS, JS, and images from the server.

  2. Parses HTML into the DOM and CSS into the CSSOM.

  3. Combines DOM and CSSOM into a render tree, calculates layout, paints pixels, and finally displays the page.

Think of it like ordering food at a restaurant:

  • You place your order (type a URL).

  • The kitchen prepares each ingredient (fetching HTML, CSS, JS).

  • The chef assembles the dish (render tree, layout, paint).

  • The waiter serves it to you (pixels on the screen).

In this guide, we’ll explore each step in a visual, story-driven way, so you can understand how browsers turn code into the websites you see.


What Is a Browser?

Most people think a browser just “opens websites”. But behind the scenes, it’s a complex application made of multiple components working together to fetch, interpret, and display web pages.

Think of a browser as a restaurant kitchen:

  • You give it an order (type a URL).

  • It fetches ingredients (HTML, CSS, JS).

  • Prepares the dish (renders the page).

  • Serves it to you (pixels on the screen).


Main Parts of a Browser

At a high level, a browser has:

  • User Interface (UI):

    This is everything you see except the webpage itself. It includes the address bar, back/forward buttons, bookmarking menu, and the refresh button.

  • Browser Engine:

    Coordinates actions between the UI and the rendering engine. When you type a URL and hit Enter, the Browser Engine tells the other components to start their jobs.

  • Rendering Engine:

    The most critical part for developers. Its job is to display the content.

    • It parses HTML and CSS to create a visual representation on your screen.

    • Different browsers use different engines: Blink (Google Chrome and Microsoft Edge), WebKit (Apple Safari), and Gecko (Mozilla Firefox).

  • Networking:

    This handle handles all the TCP/IP and HTTP/HTTPS communication. it is responsible for Fetching resources of a website (HTML, CSS, JS) over the internet from a server & handing them to the Rendering Engine.

  • JavaScript Engine:

    Since modern websites are interactive, they need a dedicated engine to execute JavaScript code. Chrome uses the famous V8 Engine, while Firefox uses SpiderMonkey.

  • UI Backend:

    This is used for drawing basic "widgets" like combo boxes and windows. It uses the operating system's (Windows, macOS, Linux) native methods to render these basic interface elements.

  • Data Storage:

    Browsers need to remember things locally so they don't have to ask the server every time. This layer manages:

    • Cookies

    • LocalStorage and SessionStorage.

    • IndexedDB (a small database in your browser).

    • Cache (storing images and files so pages load faster the second time).


User Interface

The UI is what you see and interact with:

  • Address bar → Enter URLs

  • Tabs → Multiple pages

  • Buttons → Back, forward, reload

Think of it as the front desk of a restaurant, taking your orders.


Browser Engine vs Rendering Engine

  • Browser Engine: The coordinator — connects the UI to the rendering engine.

  • Rendering Engine: The chef — takes HTML/CSS and produces the visual page.

Example:

  • You click “reload” → Browser engine tells rendering engine to rebuild the page.

Networking: Fetching Resources

When you type a URL:

  • Browser checks cache.

  • Browser sends an HTTP request to the server.

  • Server responds with HTML, CSS, JS, images, etc.

  • Browser starts parsing the HTML immediately, even before all resources are fully loaded.

Analogy: The kitchen starts chopping vegetables while the meat is still being delivered.


HTML Parsing and DOM Creation

When the Rendering Engine receives a chunk of HTML from the network, it doesn't just display it. It has to translate that "string of text" into a structured map that the browser can actually manipulate. This process is called Parsing, and the resulting map is the DOM (Document Object Model).

Step 1: HTML Parsing

  • Browser reads HTML top-to-bottom

  • Breaks it into tokens (tags, text, attributes)

Step 2: DOM Creation

  • Tokens are converted into a tree structure

  • This structure is called the DOM (Document Object Model)

What is DOM?

DOM is a tree representation of the HTML document.

Analogy: Family Tree

  • <html> is the root

  • <body> is a child

  • Elements become parents, children, and siblings

The DOM allows the browser and JavaScript to:

  • Traverse elements

  • Modify structure

  • Apply styles


CSS Parsing and CSSOM Creation

CSS is parsed separately.

CSS Parsing

  • Browser reads selectors and rules

  • Determines which styles apply to which elements

CSSOM (CSS Object Model)

CSSOM is a tree structure representing all CSS rules.

Why CSS blocks rendering

Browser cannot render until CSS is parsed because:

  • Styles affect layout.

  • Layout depends on styles.

Analogy: Dress Rehearsal

You don’t position actors on stage before knowing:

  • Their costumes.

  • Their sizes.


DOM + CSSOM = Render Tree

Once both trees are ready:

  • DOM provides structure

  • CSSOM provides styling

They combine to form the Render Tree.


Render Tree characteristics:

  • Contains only visible elements

  • Excludes elements like display: none

  • Includes computed styles

This tree represents what will actually appear on the screen.


Layout (Reflow), Painting, and Display

Now the browser calculates:

  • Width

  • Height

  • Position

This step is called Layout or Reflow.

  1. Render Tree: Combines DOM + CSSOM, ignoring hidden elements.

  2. Layout (Reflow): Calculates the exact position and size of each element.

  3. Paint: Fills pixels for text, colors, borders, and images.

  4. Display: Browser shows the final result on the screen.

Analogy:

  • Layout → Arranging tables in the restaurant

  • Paint → Plating the food

  • Display → Serving it to the customer


Very Basic Idea of Parsing (Simple Example)

Parsing means breaking input into meaningful structure.

Example:

2 + 3 * 4
  • Browser parses it into a tree:
   +
  / \
 2   *
    / \
   3   4
  • Shows how parsing converts a linear string into a structured tree — just like the DOM.

Same idea:

  • HTML → DOM tree

  • CSS → CSSOM tree

Parsing is how browsers understand meaning, not just text.


Full Browser Flow: From URL to Pixels

  1. User enters URL

  2. Network request is made

  3. HTML → DOM

  4. CSS → CSSOM

  5. DOM + CSSOM → Render Tree

  6. Layout (Reflow)

  7. Paint

  8. Display

This entire process happens in milliseconds.


Conclusion

A browser turns code into visuals through a clear flow: request → parse → build → render → display.

You don’t need to memorize every internal part — just understand the journey from URL to pixels. Once you see that flow, how the web works becomes much easier to grasp.