ࡱ > _ a ^ [@ kI bjbj44 ` Vi Vi kA ~ ~ ~ ~ ~ ~ ~ : : : : $ ^ 4 j$ h p # # # # # # # $ % R $( ~ $ ~ $ ~ ~ $$ ~ ~ # # h p V E" @ ~ ~ # ,[Y : " # D :$ 0 j$ " x ( " ( # ~ ~ ~ ~ # 4 ( ~ =# h $ $ D d The HTTP protocol is used to send HTML documents through the Internet. The HTTP protocol sends the HTML documents in packets, using TCP/IP. With each packet, the HTTP protocol attaches a header, which contains information such as the name and location of the page being requested, the name and IP address of the remote server that contains the Web page, the IP address of the local client, the HTTP version number, and the URL of the referring page. This information is referred to as the server variables. Internet programmers are able to retrieve the values in the header. It is important to know that HTTP version 1.0 is a stateless protocol. This means that when a client requests a document from the Web server, the server will return the Web page to the client and end all communications with the client. If the client requests another page, the Web server normally has no way of knowing that the client has previously visited the Web site. However, by using methods such as cookies, session variables, text files and databases the server can maintain state that is, recognize the client over multiple transactions and thereby remember information from each transaction and link it with the specific client. HTTP 1.0 is documented in the informational RFC 1945; it is not an official Internet standard because it was primarily developed outside the IETF by early browser and server vendors. HTTP 1.1 is a proposed standard being developed by the W3C and the HTTP working group of the IETF. It provides for much more flexible and power communication between the client and the server. Its also a lot more scalable. The primary improvement in HTTP 1.1 is state. HTTP 1.0 opens a new connection for every request. In practice, the time taken to open and close all the connections opened in a typical web session can outweigh the time taken to transmit the data, especially for sessions with many small documents. HTTP 1.1 allows a browser to send many different requests over a single connection; the connection remains open until it is explicitly closed. The requests and responses are all asynchronous. A browser doesnt need to wait for a response that consists of a series of headers, followed by a blank line, followed by MIME-encoded data. There are a lot of smaller improvements in HTTP 1.1: Requests include a Host MIME header so that one web server can easily serve different sites at different URLs. Servers and browsers can exchange compressed files and particular byte ranges of a document, both of which can decrease network traffic. HTTP 1.1 is designed to work much better with proxy servers HTTP 1.1 is a strict superset of HTTP 1.0, so HTTP 1.1 web servers have no trouble interacting with older browsers that speak only HTTP 1.0. Java Networking Programming by OReilly In version HTTP 1.1, the Web server and the client can maintain their connection across Web pages. The NTWeb server known as Internet Information Server can be configured to support this keep alive HTTP 1.1 feature. Internet Programming with VBScript and JavaScript by Kathleen Kalata The TCP/IP protocols are used to establish connections between machines, but Berners-Lee also had to develop a set of procedures for identifying the page being requested and returning that page to the user. These procedures are called the Hypertext Transfer Protocols (HTTP), and this is the protocol whose name appears at the beginning of most URLs. Simple example. Imagine that you are browsing a Web page and have just clicked on a link whose URL is HYPERLINK "http://www.cob.mnsu.edu/faculty.html" http://www.cob.mnsu.edu/faculty.html. The following sequence of events will take place to let you access that page: Your Web browser will determine the URL associated with the link and will extract the name of the machine to which it must connect in this case, HYPERLINK "http://www.cob.mnsu.edu" www.cob.mnsu.edu. The browser will use the TCP/IP protocols to establish a connection across the Internet between your computer and HYPERLINK "http://www.cob.mnsu.edu" www.cob.mnsu.edu. When the connection between these two machines has been established, your browser will send a special HTTP message called GET, which indicates that it wants the destination machine to retrieve a page. The GET command contains the name of the desired page, in this case faculty.html. The remote machine HYPERLINK "http://www.cob.mnsu.edu" www.cob.mnsu.edu locates the file name in the GET message, reads it, copies it, and returns the copy to your browser, again using TCP/IP and the Internet. Your browser receives the page and displays its contents on your screen. Your machine ( Internet ( www.cob.mnsu.edu Link Link = HYPERLINK "http://www.cob.mnsu.edu" www.cob.mnsu.edu (a) Using TCP/IP to Establish a Connection to the Destination Machine GET ( Intenet ( faculty.html faculty.html (b) Sending an HTTP GET Message to the Destination to Fetch the Desired Page Web browser ( Internet ( computer ( faculty.html faculty.html (c) Returning a Copy of the Page to the Requesting Node and Displaying It Using the Web Browser An Invitation to Computer Science Second Edition by G. Michael Schneider & Judith L. Gersting HTTP, the Hypertext Transfer Protocol, is the standard protocol for communication between web browsers and web servers. HTTP specifies how a client and server establish a connection, how the client requests data from the server, how the server responds to that request, and finally how the connection is closed. HTTP connections use the TCP/IP protocol for data transfer. HTTP 1.0 is the currently accepted version of the protocol. It uses MIME to encode data. The basic protocol defines a sequence of four steps for each request from a client to the server: Making the connection. The client establishes a TCP connection to the server, on port 80 by default; other ports may be specified in the URL. Making a request. The client sends a message to the server requesting the page at a specified URL. The format of this request is typically something like: GET /index.html HTTP 1.0 GET is keyword. /indexl.html is a relative URL to a file on the server. The file is assumed to be on the machine that receives the request, so there is no need to prefix it with HYPERLINK "http://www.thismachine.com/" http://www.thismachine.com/ . HTTP 1.0 is the version of the protocol that the client understands. The request is terminated with two carriage return/linefeed pairs (\r\n\r\n in Java parlance) regardless of how lines are terminated on the client or sever platform. Although the GET line is all that is required, a client request can include other information as well. This takes the following form: Keyword: Value The most common such keyword is Accept, which tells the server what kinds of data the client can handle (though servers often ignore this). For example, the following line says that the client can handle four MIME types, corresponding to HTML documents, plain text, and JPEG and GIF images. Accept: text/html, text/plain, image/gif, image/jpeg User-Agent is another common keyword that lets the server know what browser is being used. This allows the server to send files optimized for the particular browser type. The line below says that the request comes from Version 2.4 of the Lynx browser: User-Agent: Lynx/2.4 libwww/2.1.4 Finally the request is terminated with a blank line; that is, two carriage return/linefeed pairs, \r\n\r\n. A complete request might look like: GET /index.html HTTP 1.0 Accept: text/html Accept: text/plain User-Agent: Lynx/2.4 libwww/2.1.4 In addition to GET, there are several other request types. HEAD retrieves only the header for the file, not the actual data. This is commonly used to check the modification data of a file, to see whether a copy stored in the local cache is still valid. POST sends form data to the server, and PUT uploads a file to the server. The response. The server sends a response to the client. The response begins with a response code, followed by MIME header information, then a blank line, then the requested document or an error message. Assuming the requested file is found, a typical response looks like this: HTTP 1.0 200 OK Server: NCSA/1.4.2 MIME-version: 1.0 Content-type: text/html Content-length: 107