An Openstack ATC Discovery


Continue reading

wsgi application debug (for chunked PUT)

import os
import sys
import cStringIO
from swift.common.ext_web import init
from swift.common.ext_web import application as swift_application

def application(environ, start_response):

err = environ[‘wsgi.errors’]
print >> err, “PID: %s” % os.getpid()
print >> err, “UID: %s” % os.getuid()
print >> err, “GID: %s” % os.getgid()
keys = environ.keys()
for key in keys:

print >> err, ‘%s: %s’ % (key, repr(environ[key]))

return respond_with_debug(environ, start_response)

def respond_with_debug(environ, start_response):

err = environ[‘wsgi.errors’]
input = environ[‘wsgi.input’]
headers = []
headers.append((‘Content-Type’, ‘text/plain’))
write = start_response(‘200 OK’, headers)
input = environ[‘wsgi.input’]
output = cStringIO.StringIO()
print >> output, “PID: %s” % os.getpid()
print >> output, “UID: %s” % os.getuid()
print >> output, “GID: %s” % os.getgid()
print >> output
keys = environ.keys()
for key in keys:

print >> output, ‘%s: %s’ % (key, repr(environ[key]))

print >> output

while (buf):

buf =
print >> err, ‘Body: {0}’.format(len(buf))


print >> err, “Exception while reading {0}”.format(sys.exc_info()[0])
print >> err, ‘Body:’

*based on

wsgi middleware debug

import os
import sys

def application(environ, start_response):

err = environ[‘wsgi.errors’]
print >> err, “PID: %s” % os.getpid()
print >> err, “UID: %s” % os.getuid()
print >> err, “GID: %s” % os.getgid()
keys = environ.keys()
for key in keys:

print >> err, ‘%s: %s’ % (key, repr(environ[key]))

return swift_application(environ, start_response)

*based on

Clinet Chunked PUT

import httplib
import time
url = ‘/v1/myaccount/mycontainer/myobject’
chunk = “abcdefghijklmnop”
auth = ‘AUTH_whatevertokenreceivedduringauthentication’
conn = httplib.HTTPConnection(‘′,’8080’)
#send headers
conn.putrequest(‘PUT’, url)
conn.putheader(‘Transfer-Encoding’, ‘chunked’)
conn.putheader(‘Content-Type’, ‘application/octet-stream’)
conn.putheader(‘X-Auth-Token’, auth)
#send body in chunks
conn.send(‘%x\r\n%s\r\n’ % (len(chunk), chunk))
# process the response
resp = conn.getresponse()
print(resp.status, resp.reason)

Does Asynchronous I/O make sense in OpenStack?

OpenStack uses the Eventlet library. Eventlet offers a ‘standard’ threading interface to the python programmer while being implemented using asynchronous I/O underneath. The use of asynchronous I/O rather than a standard python threading or multi-process approach is aimed at reducing the memory footprint of network servers and can help an application reach high levels of concurrency.

In this post the use of Asynchronous I/O is discussed in the context of OpenStack in general and Swift in specific. In many ways this discussion is more general than to just Eventlet and OpenStack. It may be applicable for other asynchronous I/O frameworks and applications.

The main claim here is that like any technology, asynchronous I/O is ideal for certain applications but not suitable for others. If your application is memory bound, asynchronous I/O is probably going to help scale the application. But if the application is CPU bound or I/O bound it will not. Instead it may scale it down as discussed below.

Why Eventlet
Eventlet’s use of  asynchronous I/O may help solve Dan Kegel’s presented C10K problem by reducing the memory footprint used to serve each request (See here, a well presented counter view, advocating the use of threading). At the same time, Eventlet offers a semi-standard pythonic threading interface which helps hide the involved complexity under the hood – making asynchronous I/O programming bearable and accessible to all.

So one good reason to use Eventlet is to reduce the memory footprint. If an application is memory bound, a reduced memory footprint would help scale it. As a prime example, a web server may serve many open sessions. Each session has an associated state. Most sessions may be idle at any given time(1). Some of the sessions are active, processing outstanding requests. Processing an outstanding request may require a certain amount of I/O and CPU prior to sending a response. If at a certain concurrency level, the server is neither  I/O bound nor CPU bound, increasing the level of concurrency will yield performance improvement – i.e. more outstanding requests could be processed at any given time.

Bottom line, if an application is memory bound, Eventlet can help scale it up. But what if the application is I/O bound or CPU bound? Will asynchronous I/O scheme help? Can the use of Eventlet become a problem?

Is there a Right Concurrency Level for I/O bound Applications?
Let`s consider the Object Server of Swift as an example. It is likely that this server would be disk I/O bound. At a certain level of concurrency, the application will reach 100% I/O utilization. Increasing the concurrency level would not make our disks spin or seek faster. Instead, increasing the concurrency level further would trash our cache, page table and overall divide our resources thin, reducing the performance rather than increasing it.

Let’s assume an I/O bound application that can fully utilize its I/O with just 100 threads. What happens if 10,000 threads will open and read from 10,000 files descriptors of 10,000 files? Since we are I/O bound, the 10,000 read requests will be queued for execution. Eventlet’s deterministic scheduler will run any thread from the 10,000 as a chunk is ready to be read. So although most of the application threads are idle, by the time a thread is served, its related information would probably be cached out. More so, the memory pages used by the thread may have been paged out – so extra CPU and I/O will be next used to get the thread running. This means our overall performance could severely degrade.

Similar problem occurs with the underlying file system driver at the kernel and at the underlying disk driver. Both ill perform as we increase the application’s ‘indecisiveness’. File systems and disk drivers use their own caching algorithms, and perform best with somewhat predictable users. Reaching high concurrency levels, an application cannot take advantage of such algorithms.

The best approach for an I/O bound application is to limit the concurrency of the served requests while queuing the remaining requests. The queue serves the application to absorb demand burstiness while ensuring that the system is not trashed by trying to serve everyone at the same time.The application serializes the request handling with the hep of the queue.

An application bound by I/O should restrict its concurrency level to avoid trashing the system. Queuing can be used to absorb burstiness and serialize the requests handling.

Bottom line: An I/O bound application would perform best if the right level of concurrency is used – no more, no less. Finding this ‘right level of concurrency’ is system and application dependent and  may be left to late tuning. Yet,  the design should consider a reasonable default number. ‘The more the merrier’ approach does not apply here.

What about CPU bound applications?
A CPU bounded application using Eventlet will always have multiple threads waiting to be executed. Eventlet’s deterministic scheduler will serve each of the threads ready for execution in turn and without preemption. As a result:

  • Longer requests are served faster while short requests take longer to complete. A system serving many short requests may delay any processing of such requests until a request taking longer to compute has completed and yielded.
  • No thread is guaranteed CPU within any amount of time. Hence timeouts cannot be enforced in any reasonable level of accuracy. As the concurrency level increases, the time to serve a given thread increases. As an example, consider the thread that waits on a socket for an incoming request (or listens for an incoming connection). This thread must read the request (or accept the connection) within a given amount of time to avoid loosing data (the kernel queue is only that long). No guarantee can be made in a CPU bound system without preemption that the thread would indeed be served on time. As we increase the concurrency level, the problem becomes more severe.

Bottom line: A CPU bound application must either avoid Asynchronous I/O or be structured such that any group of green threads is processing requests of similar length and placing any time sensitive calculation in a separate thread/process to ensure it may enjoy the benefits of preemption.

What about OpenStack Servers?
Each OpenStack server may have different bounds. Eventlet can help in memory bound OpenStack servers but may reduce the scale and performance of other OpenStack servers. Further the use of Eventlet may be beneficial when running on one system and counter productive on another.

A server using threading requires tuning to determine the right default concurrency level of the server. The concurrency level should also be controllable by the admin as different systems would gain from different concurrency levels of the same server. Note that such tuning is missing today from OpenStack.

Q: What are the applicable default concurrency-levels for Nova, Swift and other OpenStack servers, such that they would become either CPU bound or I/O bound? Are OpenStack servers really memory bound? In other words, will using CPython threading really scale OpenStack down? Is the C10K problem applicable to OpenStack?

Multicore and Multiprocessing aspects
The standard CPython threading is implemented using the underlying Linux threading services such that every Python thread is a Linux thread. As a result, a single process, multi-threaded python executable can take advantage of multiple cores.  That said, Python’s Global Interpreter Lock (GIL) would limit the ability of the process from becoming scalable with the number of cores. Q: What is a realistic parallelism level one can expect with Python threading considering the GIL?

Eventlet implements green threads (i.e. user-land threads). Hence, an Eventlet enabled executable runs on a single core and cannot take advantage of other cores. OpenStack is configurable to work with multiple workers, each forked as a separate process. Hence while using Eventlet on an N core machine, using N workers of any executable would help fully utilize the Node CPU. Note that using CPython threading may suffice with less workers.

Should OpenStack use multi-function executables?
What if an executable does several functions – as an example, one function that is memory bound, one function that is I/O bound and one function that is CPU bound. The memory bound function scale with increasing concurrency levels. The I/O bound best performs in a given level of concurrency. The CPU bound function may cause the other two functions to fail meet timeouts or respond on time to events. What a mess.

Bottom line: Separation of concerns is needed – avoid placing different functions with potentially different bounds in the same Asynchronous I/O domain. It seems like the existing coupling of every OpenStack request processor with a web front end may therefore be a source of trouble.

Asynchronous I/O can scale a memory bounded application but is not helpful if the bound is elsewhere. Specifically, if the application is I/O bound or CPU bound. Further, separation of concerns need to be maintained to ensure separate asynchronous I/O domains per function.  I/O bound functions need to be limited to the right level of concurrency which may differ from one function to another and from one system to another.

The next steps…

  1. Collect initial inputs from the OpenStack community
  2. Collect information about the real bounds of different OpenStack request processors and the default concurrency level that should be used in case of CPU/IO bound request processor.
  3. Add support to tune concurrency levels

(1)  Thanks to Nadav Harel for pointing this out.

OpenStack’s intergral web front-end: pros an cons

OpenStack servers are designed with an integral web front-end – a pythonic wsgi mini “Web Server” which opens up its own socket and serves http requests directly. The incoming requests accepted by the integral web front-end are then forwarded to a wsgi application (the request processor) for further handling, possibly via wsgi middleware sub-components. See [A] in the diagram below.

A. Integral web front-end as used by OpenStack today. B. Using a full-fledged external Web Server as a Proxy Server in front of OpenStack. C. Striped down OpenStack request processor served directly by a full-fledged Web Server.

In view of the limited support offered by the integral web front-end, some installations may also take advantage of a full-fledged Proxy Server in front of the installation (See [B] above).

An alternative design would be to strip the integral web front-end off the OpenStack code and use standard full-fledged Web Server instead. Under such an alternative design, OpenStack servers can be reincarnated as a set of request processors, extending any standard Web Server that includes wsgi support (e.g. Apache2) as shown in [C] above.

This post aims to collect the pros and cons of the current design – or else help answer the question: will OpenStack benefit from adding support for working as a request processor, attached to a standard WSGI server?  The discussion examples will mainly concern Swift, as this is becoming a main focus for me these days, but can be generalized for others OpenStack packages.

Adam Young had started this thread of though as related to Keystone.  Joshua Harlow responded by suggesting that: “…it should be possible for all WS endpoints to be hosted in apache (or other server)… This might be connected to extracting/abstractig out eventlet…“.  This is inline with the subject of this post. Other good people views were collected and summarized below.

The OpenStack integral Web Front-End
The integral web front-end used by OpenStack is the eventlet wsgi one. Eventlet offers ‘standard’ threading interface to the python programmer while being implemented using asynchronous I/O underneath. See here a post discussing whether Asynchronous I/O makes sense in OpenStack.

A “Web Server” per Request Processor

Each OpenStack server has its own integral web front-end, each working on a single port.  For example, an integrated Swift node may have four web front-ends: one for the Swift Proxy Server, one for the Swift Account Server, one at the Swift Container Server and one at the Swift Object Server. Each working on a dedicated port. One may decide to have these servers located on separate nodes (e.g Swift Proxy Server on an Interface Node and a Storage Node running the remaining servers, each with its integral Web Server).

A single Open Stack Node may include multiple servers, each with its own integral web front-end.

A Web Server per Node
It is important to note that this design of plural web front-ends is a design choice and may be replaced by a single standard full-fledged Web Server (e.g. apache2). In such a case, Swift code will be stripped off the eventlet wsgi web front-end and will run as request processors off the full-fledged Web Server (e.g. using the wsgi_mod). A single Web Server for any request processor running on the Node. If anyone decides to run non-Swift OpenStack request processor (e.g. Keystone) or any non-OpenStack application, he/she may re-use the same Web Server for all its needs. There is nothing new in the alternative design presented here – it is a standard way of building web applications. Hence a comparison to the design choice in OpenStack is in place.

An alternative Swift node design using a full-fledged Web Server acting as a WSGI Server for the node request processors

One outcome of this alternative design is that it becomes up to the server if Asynchronous I/O is used or not. Apache2 would use a combination of multiprocessing and multithreading approach but no Asynchronous I/O. As discussed in the post about Asynchronous I/O, some OpenStack request processors  may benefit from Asynchronous I/O while others may best avoid it.

Note that when using a server like Apache2, which uses CPython threads, if multiple threads are needed to process a request, Eventlet’s green threads can still be used. In this case Asynchronous I/O will be used within the scope of a single request (no cooperation between threads of different requests).

Separation of Concerns
Each OpenStack server includes a dedicated request processor with a given function. Coupling the request processor function with that of a web front end forces the design to become a compromise. Take for example the concurrency level required by each server:

  • The web front end needs to be highly concurrent such that any incoming TCP connection is answered before the underlying socket queue becomes full. While the web front end  maintains all open client sessions (each as a thread), many of these sessions could be idle. After a response is sent back to the client, it is custom to keep the TCP session open, allowing the client to send its next request(1).
  • The request processor on the other end, is session-less (as it is REST based) and should not maintain data with regard to any request that was completed.   This is important to help scale the request processor. Further, if the request processor is I/O bound, it is important to avoid trashing it with more requests than it can process (See in this post more details). An adequate design would therefore be to queue the remaining requests rather than processing them concurrently. As shown in the next diagram, to help the request processor scale, it should concurrently process only a subset of the outstanding requests which is a subset of the open TCP sessions.

Ideally the requests being processed by the request processor will be a subset of the outstanding requests, where the remaining requests will be queued to avoid trashing. The outstanding requests are a subset of the open TCP connections.

Today OpenStack servers put everything into one big asynchronous I/O pool. This means that the web front end which is coupled in the same process with the request processor share the same asynchronous I/O domain. As a result the two functions share the same concurrency level and there are no measures to ensure that each one has the right level of concurrency for best performance. The problem is even more severe if a server reaches a CPU bound.

Q: How can one ensure that while the web front end runs as many concurrent requests as it can, the concurrency level of the request processor is controlled?  Q: Maybe add a queue to OpenStack’s common/ implementation.

Which is a better design overall?
This 1M$Q depends on many factors including performance, manageability, security, scalability and other customer requirements. Beyond technical aspects there may be other considerations (e.g. enabling a customer to gain better control of the service). One of the reasons for putting out this post is to trigger people to consider the implications of the two designs.

Which is a better design technically?
Different factors may come to play here. The current design seem to be best tuned for memory bound servers that suffice with running on a single core and seek to service as many requests as possible concurrently. I/O bound applications may resort to trashing when using the current approach and could potentially scale better with the presented  alternative approach. The patterns used by each application server may differ significantly. Hence, the right level of concurrency required should be tuned per server, in order to gain 100% utilization of each of the servers. Measures need to be designed in to allow such tuning and control.

One significant concern is that there seem to be no known measures to determine the scalability or resulting performance of the two designs. Further, a search in scholar for “greenlet eventlet performance” yields zero results. Eric Day posted results that includes an Eventlet micro benchmark Yet it is unclear how these results can be compared to the alternative node design discussed here. The man behind mod_wsgi, Graham Dumpleton, views that one can’t meaningfully compare two web servers detached from the  specific application requirements. Free translation to our case could be – we need to support alternative wsgi servers as part of OpenStack such that people could evaluate which is the right/best server for their specific app. – {TBD ask Graham if this aligns with his view}.

OpenStack’s extendability
Adam Young raised the issue of using a standard Web Server instead of the integral web front-end  for OpenStack’s Keystone. In his note he brought up an interesting argument. Eventlet’s Asynchronous I/O design is completely dependent on removing any blocking code from the application. As long as the code is written in Python, Eventlet has a trickery to monkey patch the code. This is likely to become a limiting factor for the usage of DB drivers written in C and used via a python interface. In fact any C extension code may be blocking, such that OpenStack’s extendability may become an issue.

Using a Proxy Server
In view of the limited feature support offered by the current OpenStack integral web front-end, it was suggested that administrators may deploy a Proxy Server in front of the OpenStack service. Yet it is unclear if such a Proxy would still be needed/used if a per-node full-fledged Web Server is used instead of the current per-server web front-end.

The next steps…

  1. This post is intended to be updated with any new information gathered on the subject.
  2. Adam Young posted a description how to use Keystone with httpd.  I will be posting a description of how to use Swift with apache2.
  3. As I am working on attaching Swift as request processor to apache2, I am encountering some non standard behavior of the Swift wsgi application – these will hopefully be pushed back as bug fixes.
  4. Some comparative performance measurements are needed to gain understanding of specific examples.
  5. Push for blueprint to add such support: Swift first (and if the commonality is reasonable to other servers).

(1) thanks to Nadav Harel for pointing this up


3, 2, 1, Launching…

As a first stage, this blog will be used for covering Openstack open source activities.

Specifically, my current interest is in Swift, a well designed scalable object store. As  I am exploring swift code, I will be using this blog to dive into the design choices and summarize some of the discussions I am having with others in the field on the subject matter.

The intent of each post is to dive deeply into one topic, may this be a design choice, an issue with the solution, a limitation or a consequence, etc. I will be doing my utmost to continuously update my posts as I am hit with new understandings and additional views by colleagues.

A word of warning, everything within this blog is provided as is and expresses my own understanding, views  and thoughts + a free style summary of views made by others – nothing more. This blog is not to be considered an authoritative source of information for anything but my name and my poor English.

Last, I am having fun putting this stuff out, so don’t take anything written too seriously, or you may fail to see the forest for the trees.