cgi module

Synopsis

The cgi module provides an object-oriented interface for writing CGI and CGI-style programs. It provides an abstraction layer so that the same code can be used with either standard CGI or replacement technologies such as FastCGI.

Usage

Code to handle a request must subclass the abstract class cgi.Handler. This class uses a single method process which receives a single parameter of type cgi.Request, which is used to retrieve information about the request and to send the response. Instances of this subclass will be created to handle requests.

When the standard CGI protocol is used, a new process is created to handle each request, but with more complicated protocols such as FastCGI, a process may handle more than one request simultaneously in multiple threads. However, even in this situation, each instance of the cgi.Handler subclass will only be used to process one request at once. This means that the instance can use self to store per-request data.

A subclass of cgi.Request is used to to call the handler. Which subclass is used depends on the protocol used to communicate with the web server. This module provides cgi.CGIRequest which implements the standard CGI protocol, and also cgi.GZipCGIRequest which is the same but uses zlib to compress the response when the user's browser indicates it can do this.

Example:

import jon.cgi as cgi
class Handler(cgi.Handler):
  def process(self, req):
    req.set_header("Content-Type", "text/plain")
    req.write("Hello, %s!\n" % req.params.get("greet", "world"))
cgi.CGIRequest(Handler).process()

Note: by default, output from the handler is buffered. If the output from the script is going to be large (for example, if the output is not an HTML file), then buffering should be disabled using set_buffering.

class: Error(Exception)

The base class for all exceptions defined by the cgi module.

class: SequencingError(Error)

An exception class which is raised when cgi object methods are called out of order.

class: Request

Request objects provide information about a CGI request, as well as methods to return a response. This class is not used directly, but is subclassed depending on what protocol is being used to talk to the web server.

Public Instance Variables

params

The params map contains the CGI form variables recovered from the QUERY_STRING, and in the case of POST requests, stdin. In the case of each key, the name is a string and the type of the value depends on whether or not the name has one of a number of special suffixes.

If the key has no special suffix, then the value is a string, or None. (None occurs when a URL-encoded string contains a name without a corresponding equals sign and value. If the string contains a name and an equals sign but no value then this is represented as an empty string.) If the key ends with the string "*" then the value is a sequence containing one or more values, each of which is either a string or None (to support multiple values with the same name, e.g. HTML <select> input fields). If the key ends with the string "!" then the value is a mime.Entity object (to support file uploads). If the key ends with the string "!*" then the value is a sequence of one or more mime.Entity objects.

If a form variable is found with a name ending in "!" or "!*" but it did not arrive in the form of a MIME section then it is ignored and is not placed into the map. If more than one value with the same name is found and the name does not end in "*" or "!*" then only one of the values will be entered into the map, and the others will be discarded. This means that, even in the face of malicious input, the types of the values are guaranteed to match that indicated by their key's suffix.

Note that the suffixes must be present in the CGI variables themselves. The programmer does not indicate to the cgi module what CGI variables he is expecting. Example:

<select multiple name="types*">
<option>gif</option><option>jpg</option><option>png</option>
</select>

stdin

A file or file-like object which represents the "standard input stream" for the request. For example, for a genuine CGI request this will be a reference to sys.stdin.

Note: the first time you access the params variable, this stream may be read to retrieve the form variables. Therefore, you must not access both params and stdin during the same request.

cookies

The cookies variable is a Cookies.SimpleCookie object which contains cookies passed to the server by the client.

environ

The environ map contains the environment variables associated with the request. All keys and values in the map are strings.

aborted

If the aborted variable references a true value then the request has been aborted (usually because the client has gone away). If the request is aborted then all further output using the write method will be discarded. The programmer may inspect the aborted variable occasionally and exit if the request has been aborted, but it is not necessary to do so.

Public Methods

__init__(self, handler_type)

handler_type: cgi.Handler subclass

Create new Request instance. Instances of handler_type will be created to handle requests.

The array of HTTP headers will be initialised to contain a Content-Type header with the value text/html; charset=iso-8859-1. If this is not appropriate then the content type should be overridden by specifying a new one with the set_header method.

output_headers(self)

Output the accumulated array of HTTP headers. If the headers have already been output then a cgi.SequencingError exception is raised.

clear_headers(self)

Clear the accumulated array of HTTP headers. If the headers have already been output then a cgi.SequencingError exception is raised.

add_header(self, hdr, val)

hdr: string
val: string

Add a header to the array of HTTP headers. If the headers have already been output then a cgi.SequencingError exception is raised.

Example:

req.add_header("Set-Cookie", "foo=bar; path=/")

get_header(self, hdr, index=0)

hdr: string
index: integer

Retrieves a header from the array of HTTP headers (this is the array of output headers the handler will be returning to the user agent, not the input headers from the user agent). If there is more than one header with the same name, the index parameter is used to specify which one is required. If the named header was not found, or there were not enough occurrences of it to satisfy the index requirement, None is returned. Header names are matched case-insensitively.

set_header(self, hdr, val)

hdr: string
val: string

Add a header to the array of HTTP headers. If a header or headers of the same name already exist in the array, then they are deleted before the new header is added. If the headers have already been output then a cgi.SequencingError exception is raised. Header names are matched case-insensitively.

Example:

req.set_header("Content-Type", "image/jpeg")

del_header(self, hdr)

hdr: string

Remove all headers with the name hdr from the array of HTTP headers. If the headers have already been output then a cgi.SequencingError exception is raised. Header names are matched case-insensitively.

append_header_value(self, hdr, val)

hdr: string
val: string

Add a value to a header that contains a comma-separated list of values (e.g. Content-Encoding, Vary, etc). If the header does not already exist, it is set to val. If the header does exist, and val is not already in the list of values, it is added to the list. If the headers have already been output then a cgi.SequencingError exception is raised. Header names and values are matched case-insensitively.

Example:

req.append_header_value("Vary", "Accept-Language")

set_buffering(self, f)

f: true or false value

Specify whether or not client output sent using write will be buffered. If buffering is disabled when output has already been buffered then the existing buffer will be flushed immediately. At the start of a new request, buffering defaults to 'on'.

flush(self)

Flushes any buffered output to the client. If the HTTP headers array has not already been sent then it will be sent before any other output. Generally speaking, you do not need to call flush, even if buffering is enabled, because it is automatically called when the Handler.process method exits.

close(self)

Calls flush and then closes the output stream. It is essential that this method is called when the request is complete, however in general you do not need to call it manually because it is automatically called when the Handler.process method exits.

clear_output(self)

Discards any output that has been buffered. If output buffering is not enabled then a cgi.SequencingError exception is raised.

error(self, s)

s: string

This is a placeholder method that must be over-ridden by a subclass of the Request class. It should log the string parameter s somewhere on the server (e.g. in the error_log). The string must not be output to the client.

set_encoding(self, encoding, [inputencoding])

encoding: string or None
inputencoding: string or None

Sets the character encoding used for the response. The default encoding is None, which means that no encoding is performed (in which case you cannot send unicode objects to write and normal strings are output unchanged). If you specify an encoding other than None then you can send unicode objects to write and they will be encoded correctly. Remember in this case you will probably want to call set_header to update the Content-Type header to indicate the character encoding you are using.

inputencoding is only used if encoding is not None. Normally if you pass a non-unicode object to write then it will be assumed to be in Python's default character encoding. If you specify a non-None inputencoding then it will be assumed to be in that character encoding instead.

Example:

req.set_encoding("utf-8", "iso-8859-1")
req.set_header("Content-Type", "text/plain; charset=utf-8")
req.write("hello \xa1\n") # iso-8859-1 assumed due to inputencoding specified above
req.write(unicode("hello \xa1\n", "cp850"))

set_form_encoding(self, encoding)

encoding: string or None

Sets the character encoding used when reading form data from the browser. It defaults to None, but if set to the name of a character encoding, the keys and values in the params mapping will be Unicode strings instead of normal strings. Note that browsers will generally send form data using the encoding used by the HTML of the submitting page.

get_encoding(self)

Returns the character encoding being used for the response, or None if no encoding is being used.

get_form_encoding(self)

Returns the character encoding being used for form data, or None if no encoding is being used.

write(self, s)

s: string

Sends the string parameter s to the client. If buffering has been enabled using set_buffering then the string will not be sent to the client immediately but will be buffered in memory. If buffering has not been enabled and the HTTP headers array has not already been sent then it will be sent before any other output.

If you wish to be able to output unicode objects using this function, then you should first call set_encoding to specify the output character encoding.

traceback(self)

Calls traceback to send a traceback to the error log, and outputs a generic error page to the browser.

Protected Instance Variables

_handler_type

The _handler_type variable is initialised by the handler_type parameter to the __init__ method.

Protected Methods

_init(self)

Initialises the instance ready for a new request.

_write(self, s)

s: string

This is a placeholder method that must be over-ridden by a subclass of the Request class. It should output the string parameter s to the client as part of the response.

_flush(self)

This is a placeholder method that may be over-ridden by a subclass of the Request class. If whatever mechanism the subclass's implementation of _write uses can result in data being buffered then this method should ensure that the data is flushed to the client.

_mergevars(self, encoded)

encoded: string

This is a utility method for the use of subclasses of the Request class. It parses the URL-encoded string parameter encoded and merges the key/value pairs found into the self.params mapping.

_mergemime(self, contenttype, encoded)

contenttype: string
encoded: file-like object

This is a utility method for the use of subclasses of the Request class. The parameter encoded must provide a file-like read method which is then used to parse a MIME-encoded input stream. contenttype should contain the value of the Content-Type header for the stream (which should presumably always indicate the multipart/form-data type). MIME sections found with Content-Disposition: form-data are merged into the self.params mapping.

_read_cgi_data(self, environ, inf)

environ: map
inf: file-like object

This is a utility method for the use of subclasses of the Request class. Examines the environment strings contained in the map parameter environ as per the standard CGI protocol. If the environ variable QUERY_STRING is available then it is parsed using the mergevars method. If the environ variable REQUEST_METHOD is POST then the inf parameter (which must provide a file-like read method) is used to read an input stream which is passed to either the mergevars method or the mergemime method depending on the environ variable CONTENT_TYPE. Finally, if the environ variable HTTP_COOKIE is available then it is parsed into the self.cookies instance variable.

class: GZipMixIn

GZipMixIn is a class that can be mixed-in to a sub-class of the Request class to enable gzip compression of responses to user agents that indicate they can accept it. Make sure you specify the GZipMixIn class before the transport class on the class line.

Example:

class GZipCGIRequest(cgi.GZipMixIn, cgi.CGIRequest):
  pass

Public Methods

gzip_level(self, level=6)

level: integer

Specifies the compression level used by gzip for this request. The default level if you do not call this method is 6. This method can also be used to disable compression for a particular request by setting level to 0 - for example if the handler is returning an image file to the user then compression should be disabled as images are already compressed. If the headers have already been output then a cgi.SequencingError exception is raised.

class: CGIRequest(Request)

CGIRequest subclasses the Request class to implement the standard CGI protocol. Environment variables are read from os.environ, input is read from sys.stdin, output goes to sys.stdout and errors go to sys.stderr.

Public Methods

process(self)

Initialises the instance ready for a new request by calling the _init method, then reads the CGI input and sets up the various instance variables. A Handler object of the type passed to the CGIRequest.__init__ method is then instantiated and its process method is called. If an exception is thrown by this method then the traceback method is called to display it.

Example:

cgi.CGIRequest(Handler).process()

class: GZipCGIRequest(GZipMixIn, CGIRequest)

For convenience, this class provides the standard CGIRequest class with the GZipMixIn class already mixed in.

Example:

cgi.GZipCGIRequest(Handler).process()

class: Handler

This is an abstract class which should be subclassed by the programmer to provide the code which handles a request.

Public Methods

process(self, req)

req: object of type cgi.Request

This method must be overridden by subclasses. It is called to process a request. The req parameter references the Request object (actually, an instance of a subclass of Request) which should be used to inspect the request and to send the response.

Note that even in multithreaded situations such as FastCGI, any individual instance of a Handler subclass will only have one process method executing at once.

traceback(self, req)

req: object of type cgi.Request

This method may be overridden by subclasses. It is called to handle an exception thrown by the process method. The default implementation calls Request.traceback to send a traceback to the error log and output a generic error page to the browser.

class: DebugHandlerMixIn

This mix-in class provides a traceback method that outputs debug information to the browser as well as to the error log. This class may be used during development to aid debugging but should never be used in a production environment since it will leak private information to the browser.

Example:

class Handler(cgi.DebugHandlerMixIn, wt.Handler):
  pass

Public Methods

traceback(self, req)

req: object of type cgi.Request

This method may be overridden by subclasses. It is called to handle an exception thrown by the process method. The default implementation calls traceback to send a traceback to both the error log and the browser.

class: DebugHandler(DebugHandlerMixIn, Handler)

For convenience, this class provides the standard Handler class with the DebugHandlerMixIn class already mixed in.

Example:

class Handler(cgi.DebugHandler):
  def process(self, req):
    req.set_header("Content-Type", "text/plain")
    req.write("Hello, world!\n")

Globals

Functions

html_encode(raw)

raw: any
Returns: string

HTML-encodes (using entities) characters that are special in HTML - specifically at least all of & < > " and ' are guaranteed to be encoded. raw is passed to str, so almost any type can be passed in to this parameter.

Example:

>>> cgi.html_encode("<foo>")
'&lt;foo&gt;'

url_encode(raw)

raw: any
Returns: string

URL-encodes (using %-escapes) characters that are special in URLs. Characters that are special in HTML are guaranteed to be escaped, so the output of this function is safe to embed directly in HTML without the need for a further call to html_encode. raw is passed to str(), so almost any type can be passed in to this parameter.

Example:

>>> cgi.url_encode("<foo>")
'%3Cfoo%3E'

url_decode(enc)

enc: string
Returns: string

Converts + to space characters in the enc string and then decodes URL %-escapes.

Example:

>>> cgi.url_decode("%3Cfoo%3E")
'<foo>'

traceback(req, html=0)

req: object of type cgi.Request
html: true or false value

This function should only be called while an exception is being handled (i.e. in an except section). It emits a detailed traceback about the exception to the server's error log. If html references a true value then the traceback is also sent as HTML to the browser. If html is false then the browser output is not altered in any way, so it is up to the caller to arrange for suitable output to be sent.

$Id: cgi.html,v 0416d65875b7 2014/03/05 17:37:06 jon $