Monday, 16 January 2017

What are the series of steps that happen when an URL is requested from the address field of a browser?

This is a question whose answer could grow into an entire course on networking, so here's a version that only details some of the cases. There could probably be followup questions.

  1. The browser extracts the domain name from the URL.
  2. The browser queries DNS for the IP address of the URL. Generally, the browser will have cached domains previously visited, and the operating system will have cached queries from any number of applications. If neither the browser nor the OS have a cached copy of the IP address, then a request is sent off to the system's configured DNS server. The client machine knows the IP address for the DNS server, so no lookup is necessary.
  3. The request sent to the DNS server is almost always smaller than the maximum packet size, and is thus sent off as a single packet. In addition to the content of the request, the packet includes the IP address it is destined for in its header. Except in the simplest of cases (network hubs), as the packet reaches each piece of network equipment between the client and server, that equipment uses a routing table to figure out what node it is connected to that is most likely to be part of the fastest route to the destination. The process of determining which path is the best choice differs between equipment and can be very complicated.
  4. The is either lost (in which case the request fails or is reiterated), or makes it to its destination, the DNS server.
  5. If that DNS server has the address for that domain, it will return it. Otherwise, it will forward the query along to DNS server it is configured to defer to. This happens recursively until the request is fulfilled or it reaches an authoritative name server and can go no further. (If the authoritative name server doesn't recognize the domain, the response indicates failure and the browser generally gives an error like "Can't find the server atwww.lkliejafadh.com".) The response makes its way back to the client machine much like the request traveled to the DNS server.
  6. Assuming the DNS request is successful, the client machine now has an IP address that uniquely identifies a machine on the Internet. The web browser then assembles an HTTP request, which consists of a header and optional content. The header includes things like the specific path being requested from the web server, the HTTP version, any relevant browser cookies, etc. In the case in question (hitting Enter in the address bar), the content will be empty. In other cases, it may include form data like a username and password (or the content of an image file being uploaded, etc.)
  7. This HTTP request is sent off to the web server host as some number of packets, each of which is routed in the same was as the earlier DNS query. (The packets have sequence numbers that allow them to be reassembled in order even if they take different paths.) Once the request arrives at the webserver, it generates a response (this may be a static page, served as-is, or a more dynamic response, generated in any number of ways.) The web server software sends the generated page back to the client.
  8. Assuming the response HTML and not an image or data file, then the browser parses the HTML to render the page. Part of this parsing and rendering process may be the discovery that the web page includes images or other embedded content that is not part of the HTML document. The browser will then send off further requests (either to the original web server or different ones, as appropriate) to fetch the embedded content, which will then be rendered into the document as well.

Second

  As far as i know....

When you enter google.com in the address bar of the browser then the following series of things happens

1. the browser need to know the numerical IP address so it first looks into its browser cache followed by OS cache, router cache, ISP DNS cache then a recursive search into ISP's DNS server begins with through the TLD nameserver until it founds the required ip address. 

there is a concept of load balancer which also comes into play . it is just a  piece of hardware that listens on a particular  IP address and forwards the requests to other servers. Major sites will  typically use expensive high-performance load balancers


2.after obtaining the IP the browser sends a HTTP request to the web server

3. the google server then responds with a permanent redirect (301) . it tells the browser to go "http://www.google.com/" instead of "http://google.com/"

4. The browser follows the redirect and sends a another Get request

5. The server sends a HTML response back to the client. the Content-type of header instructs the browser to render the response content as HTML, instead of say downloading it as a file. 

6.The browser begins rendering the HTML and sends the request for object embedded in HTML as many sites deliver their CSS,Images/Sprite files and scripts file from a content delivery network (CDN). the browser will again send the GET request for each of the embedded URL which again goes by the same procedure of look up and other above mention steps.

7. After this the browser may send further AJAX request to communicate with the web server even after the page is rendered. 


so this is the bigger picture of how this works. there are many low level details which i left out intentionally (because i don't know about them :p)

No comments:

Post a Comment