How does HTTP file upload work?

来源:互联网 发布:剑三捏脸正太数据 编辑:程序博客网 时间:2024/06/05 20:57

转自:http://stackoverflow.com/questions/8659808/how-does-http-file-upload-work


Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

How does HTTP file upload work?

up vote98down votefavorite
32

When I submit a simple form like this with a file attached:

<form enctype="multipart/form-data" action="http://localhost:3000/upload?upload_progress_id=12344" method="POST"><input type="hidden" name="MAX_FILE_SIZE" value="100000" />Choose a file to upload: <input name="uploadedfile" type="file" /><br /><input type="submit" value="Upload File" /></form>

How does it send the file internally? Is the file sent as part of the HTTP body as data? In the headers of this request, I don't see anything related to the name of the file. 

I just would like the know the internal workings of the HTTP when sending a file.

shareimprove this question
 
 
I have not used a sniffer in a while but if you want to see what is being sent in your request (since it is to the server it is a request) sniff it. This question is too broad. SO is more for specific programming questions. – Blam Dec 28 '11 at 18:39
 
...as sniffers go, fiddler is my weapon of choice. You can even build up your own test requests to see how they post. –  Phil Cooper Jan 31 '14 at 12:04

4 Answers

activeoldestvotes
up vote75down voteaccepted

Let's take a look at what happens when you select a file and submit your form (I've truncated the headers for brevity):

POST /upload?upload_progress_id=12344 HTTP/1.1Host: localhost:3000Content-Length: 1325Origin: http://localhost:3000Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryePkpFF7tjBAqx29L<other headers>------WebKitFormBoundaryePkpFF7tjBAqx29LContent-Disposition: form-data; name="MAX_FILE_SIZE"100000------WebKitFormBoundaryePkpFF7tjBAqx29LContent-Disposition: form-data; name="uploadedfile"; filename="hello.o"Content-Type: application/x-object<file data>------WebKitFormBoundaryePkpFF7tjBAqx29L--

Instead of URL encoding the form parameters, the form parameters (including the file data) are sent as sections in a multipart document in the body of the request.

In the example above, you can see the input MAX_FILE_SIZE with the value set in the form, as well as a section containing the file data. The file name is part of the Content-Disposition header.

The full details are here.

shareimprove this answer
 
 
Does this mean that port 80 (or the port serving http requests) is unusable during the time of the file transfer?. For e.g. if a huge file (about a GB) is being uploaded will the web server not be able to respond to any other requests during this time? –  source.rar Apr 23 '14 at 16:39 
2 
@source.rar: No. Webservers are (almost?) always threaded so that they can handle concurrent connections. Essentially, the daemon process that's listening on port 80 immediately hands off the task of serving to another thread/process in order that it can return to listening for another connection; even if two incoming connections arrive at exactly the same moment, they'll just sit in the network buffer until the daemon is ready to read them. –  eggyal Apr 30 '14 at 8:56
 
The threading explanation is a bit incorrect since there are high performance servers that are designed as single threaded and use a state machine to quickly take turns downloading packets of data from connections. Rather, in TCP/IP, port 80 is a listening port, not the port the data is transferred on. – slebetman Oct 13 '14 at 17:08
1 
When an IP listening socket (port 80) receives a connection another socket is created on another port, usually with a random number above 1000. This socket is then connected to the remote socket leaving port 80 free to listen for new connections. –  slebetman Oct 13 '14 at 17:10
1 
@slebetman First of all, this is about HTTP. FTP active mode doesn't apply here. Second, listening socket doesn't get blocked on every connection. You can have as many connections to one port, as the other sides has ports to bind their own end to. –  Slotos Nov 12 '14 at 20:58 
up vote7down vote

How does it send the file internally?

The format is called multipart/form-data, as asked at: What does enctype='multipart/form-data' mean?

Once you see some examples of it, it will be really easy to understand how it works.

You can produce examples using nc -l or an ECHO server and an user agent like a browser or cURL.

Save the form to an .html file:

<form action="http://localhost:8000" method="post" enctype="multipart/form-data">  <p><input type="text" name="text" value="text default">  <p><input type="file" name="file1">  <p><input type="file" name="file2">  <p><button type="submit">Submit</button></form>

Create files to upload:

echo 'Content of a.txt.' > a.txtecho '<!DOCTYPE html><title>Content of a.html.</title>' > a.html

Run:

while true; do printf '' | nc -l localhost 8000; done

Open the HTML on your browser, select the files and click on submit and check the terminal. 

nc prints the request received. Firefox sent:

POST / HTTP/1.1Host: localhost:8000User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:29.0) Gecko/20100101 Firefox/29.0Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8Accept-Language: en-US,en;q=0.5Accept-Encoding: gzip, deflateCookie: __atuvc=34%7C7; permanent=0; _gitlab_session=226ad8a0be43681acf38c2fab9497240; __profilin=p%3Dt; request_method=GETConnection: keep-aliveContent-Type: multipart/form-data; boundary=---------------------------9051914041544843365972754266Content-Length: 554-----------------------------9051914041544843365972754266Content-Disposition: form-data; name="text"text default-----------------------------9051914041544843365972754266Content-Disposition: form-data; name="file1"; filename="a.txt"Content-Type: text/plainContent of a.txt.-----------------------------9051914041544843365972754266Content-Disposition: form-data; name="file2"; filename="a.html"Content-Type: text/html<!DOCTYPE html><title>Content of a.html.</title>-----------------------------9051914041544843365972754266--

Therefore it is clear that:

  • Content-Type: multipart/form-data; boundary=---------------------------9051914041544843365972754266 sets the content type to multipart/form-data and says that the fields are separated by the given boundary string.

  • every field gets some sub headers before its data: Content-Disposition: form-data;, the field name, the filename, followed by the data.

    The server reads the data until the next boundary string. The browser must choose a boundary that will not appear in any of the fields, so this is why the boundary may vary between requests.

shareimprove this answer
 
up vote3down vote

An HTTP message may have a body of data sent after the header lines. In a response, this is where the requested resource is returned to the client (the most common use of the message body), or perhaps explanatory text if there's an error. In a request, this is where user-entered data or uploaded files are sent to the server.

http://www.tutorialspoint.com/http/http_messages.htm

shareimprove this answer

0 0
原创粉丝点击