What happens when you type holbertonschool.com in your browser and press Enter.
Nowadays we have all the information right to one click in our browsers but have you ever questioned what happens behind the scenes. Well, if you want to know it this is the right post for you.
first of all let’s define what is the internet
We can define the internet as sophisticated set of applications built on top of a complex network of interconnections serving to millions of user on an uninterrupted manner using the TCP/IP protocols.
the TCP (transmission control protocol) is a communications protocol that creates reliable end to end transmision of the data. this protocol communicates with the receiver and waits for his response assuring that no information is lost in the process.
The IP (internet protocol) creates a numeric standart to identify hardware on the network assigning a unique number.
So as we can see the internet is based on a reliable transmission of data and an efficient way to identify each member of the network.
The road start with the user sat on his desk typing www.holbertonschool.com in his browser and then enter. The text that is wrote on the spacebar is called the URL(unified resource locator) and contains some information that can be represented on the following image:
- Protocol scheme: this section of the url tells what kind of protocol is been used to transfer the information. HTTPS belongs to hypertext transfer protocol secure but there is also protocols like http, ftp and more. The most used is the https that is encrypted transfer protocol that gives secure transfer of the information end to end using SSL that uses a set of public and private keys to encrypt the information.
- Domain: this section is the name of our page, this name is unique on the entire web and is pointing to our server. this domain is represented for an unique IP(internet protocol) address that allows us to send the request to our server.
- Subdomain: this section tells what part of the domain we are accessing for example a domain can contain several services like mail.holbertonschool.com, doc.holbertonschool.com etc. so the submain is used to denote several instances of a domain.
- TLD: this is the most generic domain in the DNS heralquical internet system. This servers are managed for internet corporations like Assigned Names and Numbers (ICANN), which operates the Internet Assigned Numbers Authority (IANA).
Now we know how a url is composed, the next step is to get the IP address of our server so we can send the information. This is done using the DNS (domain name system) request. The DNS works like a big directory where all the domains are stored and to each domain there is an unique IP address. The main purpose of the DNS is to ease the way we remember the page we want to access because IP addresses are difficult to handle for us. The request for the IP address works in a hierarchical ways, this means that the first place the IP is searched is on the cache memory of the computer, if there is no information the request is sent to the internet to the DNS server, if the information is not found there, the next step is to request to a more high level server called root server, and if there is no information about the IP, the request is sent to TLD, and if no information is found the request is sent to the maximum authority of the servers that is the authoritative name server. This is last instance to solve the IP address if the information is found the IP is sent to the DNS and then to the user computer. If not a 404 error code is sent representing that the domain name doesn’t exists.
Now we have the IP of our server, that can be seen like the plate address where all the information is going to be sent using TCP/IP protocols. We have to take into account that the packages we sent can be accessed for other people during the traveling to our server so generally the protocol has to be secured. this is done by SSL( secure socket layer).
The secure socket layer is the base protocol for the HTTPS communication between our browser and the servers. This protocol is based on the use of public and private keys. The public keys can be shared with anyone so they can encrypt the information they want to send and the private key that just you should have it, so only you can decrypt the messages that are sent to you. This keys are based on mathematical principles that makes this protocol almost impossible to decrypt the message using the computational power that we have on a reasonable time. We know now the base protocol for HTTPS, so in order for you to have the secured certificate that is represented on your url locator by a lock, you have to prove your identity with a SSL certificate that can be generated with different service providers like certbot and place them on your server. So now we understand how a secured protocol between client and server is established.
So let’s review our path so far, the user type the name domain he want to access on the browser, this domain is then traduced to an IP address that represent the name of the server and with this address the messages we want to send can find the best path over the physical network. The messages are sent using the TCP/IP protocol and secured using SSL creating the HTTPs , so now we are on the gates of our server, this path can be represented with the following diagram:
let’s define now what is a server:
“In computing, a server is a piece of computer hardware or software (computer program) that provides functionality for other programs or devices, called “clients”. This architecture is called the client–server model. Servers can provide various functionalities, often called “services”, such as sharing data or resources among multiple clients, or performing computation for a client.”
for our particular situation a server will have inside the following elements represented on the image:
let’s start explaining what is a firewall:
The firewall is a software program that manages the incoming and outgoing traffic of our server. it helps to protect the unused ports of the server so nobody can enter using them. A port is an specific gate where application can be called for example the port for HTTPs communication is 443, the default port for HTTP is 80 and there is ports for application like SQL or SSH. So our server has the 443 port activated so it can receive and send messages also it has other port like the ssh that is useful to send commands to the server and configure it.
the web server is our main software. This is in charge to receive the request from the client (all the steps we did before) and process it. This web server can be used with the NGINX application, this application has all the static content of the web page that we usually see. So our request for holbertonschool.com is recieved for this program and then the static content is sent back to the client so it can be shown in your browser.
The app server is a software used to create dynamic content. The dynamic content is useful when we have different page contents with the same format page like wikipedia. So it will be unefficient to have millions of html files representing every definition of wikipedia. Instead of that we have a database where we can store all the frames for every definition and according to different request this information can be uploaded dynamically to the html page. This application servers needs dedicated programs to perform this task and also needs a database to store all the objects.
Sometimes the web page has a lot of traffic an one server doesn’t have the enough resources to attend all the request. To solve this problem we can create an array of server that are managed by the load balancer. The function of the load balancer is to distribute all the request to different server and assure that all request are attended.
A database is a software program that stores information according to different formats, they can be SQL(structured query language), NoSQL and many others. The objective of this database is to manage big quantity of information in an ordered and efficient manner, so it can be accessed anytime for our appserver.
So this is our entire travel for the request of a web page. Lets make a review of the entire process. First the user type the domain name he wants to access, this domain is translated by the DNS into an IP address, then the request is encrypted(using the public key of the server) and sent through the internet using HTTPS protocol that is based on the TCP/IP protocol and SSL secure protocol. Then the packages travels via the internet from the client to the server IP. Then the server receives the request and the web server process this solicitude and returns the correct content back to the client using the internet. Finally the web browser receives the messages, makes the decrypt using the private key and then the html and css files are shown like the normal web page you are used to see, something nice and understandable. This is an amazing process, Don't you think so?.