What happens when you type holbertonschool.com

Miguel Pacheco
7 min readSep 11, 2021

--

Hello and welcome to another blog, in this case i want to talk about What happens when you type https://holbertonschool.com/ on your browser and press enter

We use Internet every single day for our daily tasks, for have some fun or for all kind of things. So much that we’re most of the time content with just knowing that our browser works and does what it’s asked. But it’s not magic (or maybe is it?), and the web pages we see in that rectangle machine must come from somewhere. So how is it all happening? What is happening behind the scenes between the moment we enter an URL (Uniform Resource Locator) in the search bar and the moment we receive the content of the desired page?

The Client-Server model

Before diving into the details of the web infrastructure it’s important to understand the Client-Server model

Client-Server Model | Image from TechTerms.com

On the www (World Wide Web), the network is organized between clients, which requests data and servers which stores the data and manages most of the processing of this data. For example, a browser is considered a client, and a server would be the computer program serving data to that client. Server is also a term describing the physical machine on which the server program is running. Each website, application, service can have multiple servers working behind the scenes to perform the processes needed by the client(s). In general, physical servers are regrouped on data centers.

There are many other layer between the client and the server in a client-server model, so, let’s break it down

The DNS request

When we type the URL https://www.holbertonschool.com or whatever URL into our browser (Chrome, Edge, Firefox, Safari, etc) and we press ‘Enter’, the first thing that the browser is going to do is break down the URL in pieces. The browser is going to consider the www.holbertonschool.com part first, which is a domain name. If the browser doesn’t know that domain name (it’s not stored in it’s cache), it is going to ask the Domain Name System (DNS) for the IP address corresponding to this particular domain name. This IP (Internet Protocol) address is in fact the unique address of the main server hosting the website (the data, the text files, the code, the services…) www.holbertonschool.com. It’s a suite of four numbers ranging from 0 to 255, separated by dots. (for example, 54.172.4.191) The reason we have domain names in the first place is because humans remember words better than numbers. Thankfully, the DNS is here for us to remember the IP of each domain.

The DNS request first goes through the resolver. The resolver is usually our Internet Service Provider, and if it doesn’t find the IP in its cache, it’s going to request the root server. The root server knows where the TLD (Top-Level Domain) server is. In our case, the top-level domain is .com. Other types of TLD are .net, .fr, etc. If the TLD server doesn’t know the IP, it points the resolver to the Authoritative Name Servers for the domain name. Usually, there is more than one name server attached to one domain name. But any of those name servers can give the IP for the domain name they are attached to. Now the resolver has the IP address(for example, 54.172.4.191), and can send it back to the browser which will perform its request to the corresponding server.

Simple diagram showing the web stack without the DNS part

Protocols: TCP/IP

As we mentioned earlier, domain names actually represents an IP address, but IP is not the only type of protocol use by the internet. The Internet Protocol Suite is often referred as TCP/IP (TCP stands for Transmission Control Protocol and IP stands for Internet Protocol), and it also contains other types of protocols. It’s a set of rules that define how servers and clients interact over the network, and how data should be transferred, broken into packets, etc.

The Firewall

To protect themselves from attacks and hackers, servers are often equipped with a firewall. A firewall is a software that set rules about what can and what can’t enter or leave a part of a network. In the case of our example, when the browser asks for the website at the address 54.172.4.191, that request has to be processed by a firewall which will decide if it’s safe, or if it’s a threat to the server security.

Security : HTTPS/SSL

Now that the browser has the IP address, it’s time to take care of the other part of the URL, the https:// HTTPS stands for HyperText Transfer Protocol Secure, and is a secure version of the regular HTTP (HyperText Transfer Protocol). This transfer protocol defines types of requests and responses served to clients and servers over the network. In other terms, it’s the main way to transfer data between a browser and a website. HTTP and HTTPS requests include GET, POST, PUT, and others requests. The HTTPS requests and responses are encrypted, which ensure the users that their data can’t be stolen or used by third-parties. For example, if we put our credit card information in a website that uses HTTPS, we are sure that this info is not going to be stored in plain text somewhere accessible to anybody. Another key component in securing websites is the SSL certificate. SSL stands for Secure Sockets Layer also known as TSL (Transport Layer Security). The certificate needs be issued froma trusted Certificate Authority. When a website has this certificate, we’re able to see a little lock icon next to the website name, on some browsers and with certain types of SSL certificates, the bar turns green.

How it’s displayed on a Browser a website with HTTPS/SSL certificate

Load Balancer

As we mentioned earlier, websites live on servers. For most website where the traffic is consequent, it would be impossible to be hosted on a single server. Plus, it would create a Single Point of Failure (SPOF), because it would only need one attack on said server to take the whole site down.

As needs for higher availability and security rises, websites started augmenting the number of servers they have, organizing them in clusters (clusters are a set of computers that work together so that they can be viewed as a single system) , and using load-balancers. A load-balancer is a software program that distribute network requests between several servers, following a load-balancing algorithm. HAproxy is a very famous load-balancer, and example of algorithms that we can use are the round-robin, which distributes the requests alternating between all the servers evenly and consequentially, or the least-connection, which distributes requests depending on the current server loads.

The Web Server

Once the requests have been evenly distributed to the servers, they will be processed by one or more web servers. A web server is a software program that serves static content, like simple HTML pages, images or plain text files. Examples of web servers are Nginx or Apache. The web server is responsible for finding where the static content corresponding to the address asked for is living, and for serving it as an HTTP, or HTTPS response.

The Application server

Having a web server is the basis of any web page. But most sites don’t just want a static page where no interaction is happening, and most websites are dynamic. That means that it’s possible to interact with the site, save information into it, log in with a user name and a password, etc.

This is made possible by the use of one or more application servers. These are software programs responsible for operating applications, communicate with databases and manage user information, among other things. they work behind web servers and will be able to serve a dynamic application using the static content from the web server.

The Database

The last step in our web infrastructure is the Data Base Management System (DBMS). A database is a collection of data that could be from a piece of paper to a software basically almost everything that can store data could be a Database, and the DBMS is the program that is going to interact with the database and retrieve, add, modify data in it.

There are several types of database models. The two main ones are relational databases, and non-relational databases. A relational database can be seen as a collection of tables representing objects, where each column is an attribute and each row is an instance of that object. We can perform Structured Query Language (SQL) queries on those databases. MySQL and PostgreSQL are two of the most popular relational databases. A non-relational database can have many forms, as the data inserted in it doesn’t have to follow a particular schema. They are also called NoSQL databases. MongoDB is one of the most popular NoSQL databases

As we can see a web stack has many layers, and only we touched just the surface of it. When we type a URL in a browser, it takes only a few microseconds for all the agents we talked about to form a response and serve it to the client. Even knowing what is happening behind the scenes, it is still pretty magical to see it happening before our eyes.

I wish you a very nice day and see you the next time

Happy Coding!

--

--

Miguel Pacheco

Student at Holberton School | Aspiring to be a Front-End Web developer