Frameworks part 2: server talks to framework – mod_php, mod_wsgi and friends

What happens behind the scenes when you unpack the web framework, like Django, CakePHP or CodeIgniter, which we did in this previous post, and you see the welcome message in your browser? How does the server get the framework to the job it’s supposed to do? This post is about that.

(The Warriors of the Net video)

Have you seen Tron? The original 1982 movie introduced a very entertaining way of representing computer programs as little men running around busily inside the computer, programmed by the creative types and doing things for the users. For the sake of this post we are sometimes going to imagine that the machine running the web server is that computer, that the web framework is the program, and you are going to be both the creative and the user. We’ll be using the Tron metaphor to understand the very basic dynamics between web servers and web frameworks. And since I’m a girl, all the little characters inside the server will be little program girls too.

In Tron, data gets transported through data disks that we are going to refer to as frisbees. When a user writes an url into the browser, she is launching a frisbee (HTTP request). The browser knows that that’s a web frisbee, so she’ll have the web server deal with it. The web server is a powerful program, so she delegates a lot. Web scripts written in PHP, Python or Perl are industrious little girls that do a lot of work, but they can’t do it on their own because they are just text, so they need to be interpreted before they can be run. Like a recipe can’t produce anything edible without someone to cook the meal. Which means there are other little Tron girls involved, interpreters, which look at the recipe, go through the steps, and output the results. Sometimes these girls are part of the Apache inner team – modules, like mod_php. Other times Apache outsources, and uses gateway interface modules like mod_cgi and mod_wsgi to contact an outside interpreter to do the job and hand in the results. That’s the basic idea. CakePHP and CodeIgniter use mod_php, Django is written in Python so mod_php can’t help, mod_wsgi is preferred (it outsources). To read the rest of the story, do click “Continue reading”.

BTW, for Tron like animated tutorials about how the Internet works, check out the Warriors of the Net video (available in different languages), embedded above.

THE POST INDEX

The bottom line – from static to dynamic to frameworks
URL handling
CakePHP, say hi
Mod_rewrite is a bunch of programs forwarding frisbees around
CodeIgniter, still the PHP way
WSGI – the server talks to Python and Django
Conclusion

THE BOTTOM LINE – FROM STATIC TO DYNAMIC TO FRAMEWORKS (back to index)

So, the bottom line is – a web framework is not a special web server handling request on its own. It is just a dynamic website written in a certain way, and needs a server like any other web page does. Django does ship with a little web server, but that is only for development and playing around, and is not a part of Django as much as sent along with Django to help the user get it up and running sooner. In order to understand dynamic web pages, we need to understand the static ones, so skip the next paragraph if you already know all that. And skip the paragraph after that if you already understand dynamic web pages too. After that we’ll be dealing with the web frameworks as such.

With a static web page, you type an URL into a browser, say http://www.example.com/content/page/1 and the browser contacts the server hosting the website http://www.example.com and looks into the /content/page/ subdirectory of the “document root”. That is the folder where the web accessible material is held, like /var/www/html. Basically, in this case, the server would look for a file or a directory called 1 in the /var/www/html/content/page folder. This file or folder will be a text, image, html or something similar, already existing on the server in its definitive form, which will be shipped such as found to the user. The Tron version is that the user launches frisbee, web server takes frisbee, web server reads what the user wants, and web server goes into the storage room where it finds a ready package it ships to the user. Another frisbee – the response one. Static pages are always the same. I mean – if you type in the same URL you will always get the same file, read from the disk just as it is, unless the HTML has been modified.

With a dynamic website, the web pages and sometimes even images are created dynamically, in real time, every time the user asks for a page. This usually happens when the user requests a file that contains a PHP or a Python or a Perl script, which gets interpreted and run, and the output gets shipped to the user (texts, html, images, etc.) In Tron speak, instead of a storage room imagine a factory, where other programs assemble the package that is sent to the user, personalized for the user, every time. Think webmail – every time you log in, the programs on the server manage to find your mailbox for you and fetch the latest e-mails. It would be impossible for the website to statically contain all the e-mails you are ever going to receive, in hand-written HTML.

URL HANDLING (back to index)

A web framework is a tool for building dynamic web sites. A web site created with a web framework is still just a dynamic website, only it’s created in a certain way, built around a framework ‘skeleton’. So, when we download and unpack the framework, we are creating another dynamic website, built around the framework. The web server acts as it normally does – it handles the user requests, sees that the requests are framework’s business and delegates the dynamic page creation to the framework. Details depend on the framework. The first thing that a web framework does is that it contacts an URL handler.

URL handling is the method of associating patterns of URLs with specific behaviors. So we can define behaviors such as: when the framework gets the URL http://www.example.com/blog/post/1, the framework which “lives” on http://www.example.com is programmed to look into the database, and if a post exists that has an id of 1, it generates a web page displaying the contents of that post. Similarly, you can associate any number of patterns to any number of behaviors. I’ll be explaining the details of URL patterns in another post. This one is about what happens right before that, about the server contacting the framework to get the work done, before the framework analyzes the URL and decides what to do.

How do the server and the framework get in touch?

CAKEPHP, SAY HI (back to index)

CakePHP and CodeIgniter work with a proper server like Apache right away, and PHP is almost always used with the web server. So, mod_php module of the Apache web server is normally installed, and Apache has no problem talking to PHP applications. In Tron terms, this means that the Apache web server program has a team of experts, programs working for him on specific tasks, called modules. And one of these experts called modules does the job of having PHP code interpreted and run for him.

So, the framework contains PHP scripts, and Apache has mod_php interpret and run them. The clever thing about how frameworks are organized is that they create a single point of entry where users’ orders are supposed to pass through. So, instead of having the server access scripts like displaypost.php and listposts.php directly, the server just talks to a script called index.php which looks at the user requests and tells the controller (or the view, if it’s Django). The controller is the part of the Model View Controller pattern, the one that organizes the other parts. It gets the logic and data from the Model and then has the View help with displaying that to the user. With CakePHP, the server calls the index.php script, and index.php calls the controller. That is why sometimes you see index.php show up in URLs, with parameters passed to it which tell it what to do exactly. With PHP you can have index.php in the URL or you can avoid that by using Apache module mod_rewrite. You won’t see the index.php in the URL, but it will still be working for you in the background. In Tron terms, the idea is that not all URLs are what they appear to be, and Apache has this little helper called mod_rewrite to explain what does the user really mean when she says I want this or that.

After downloading and unpacking CakePHP you get a welcome page right away. But it may or may not be nice and colorful with working CSS style files. If it’s not nice and colorful you’ll find yourself reading the manual page about mod_rewrite and .htaccess files, and maybe wondering what on Earth is going on. If you don’t want to bother with all that right away, you can just delete (or rename) the .htaccess files in three locations and uncomment a line in the CAKE/app/config/core.php file that goes:

Configure::write(‘App.baseUrl’, env(‘SCRIPT_NAME’));

As the manual explains: “This will make your URLs look like http://www.example.com/index.php/controllername/actionname/param rather than http://www.example.com/controllername/actionname/param.” If that’s ok for you, you can skip the next section of this post. Otherwise, you can configure this frisbee forwarding module, so that Apache can have another way of understanding where it keeps its CakePHP stuff, including the CSS files. This is explained in this part of the CakePHP online manual.

MOD_REWRITE IS A BUNCH OF PROGRAMS FORWARDING FRISBEES AROUND (back to index)

Mod_rewrite basically throws request frisbees around your server. In Tron terms. It’s just a bunch of little programs who have this fun job of waiting for a frisbee to come their way so they can hurl it to other little programs. Mod_rewrite is an Apache module, like all the mod_something bunch. You can find a nice mod_rewrite tutorial here, which I have found through an answer Nandizzly’s gave someone on the Q&A for coders site Stack Overflow. Basically, if a user asks for a certain file, the mod_rewrite can tell the server to send out another file instead, while the URL in your browser will still look the same. Mod_rewrite is clever – it can have conditionals, so you can say substitute the file only on certain conditions, for instance – if the file does not exist. And mod_rewrite can also add stuff to the URL along the way. Like values passed with GET, say question mark name=value, which is why you can bookmark search engine results – all the useful information about what you are searching is in the URL.

In this CakePHP scenario, we have two index.php files. One is in the normal document root directory, where your server would normally go for CakePHP files, like /var/www/html/cake_1_3, and it is used when the mod_rewrite thing is not working. If, instead, you have modified your httpd.conf file so that Apache uses mod_rewrite in that directory and if you have kept the .htaccess files, something else happens. Imagine a user wants the blog post with id of 1 on the http://www.example.com website. Without mod_rewrite, user asks for http://www.example.com/cake_1_3/index.php/post/1 or something like that, depending on the configuration. With mod_rewrite, the user asks for http://www.example.com/cake_1_3/blog/id/1 or something similar. What happens? There is no directory /var/www/html/blog, no directory /var/www/html/cake_1_3/blog/id and no file or directory /var/www/html/cake_1_3/blog/1. And still everything works. How come? The server looks into the /var/www/html/cake_1_3/ directory and runs into a .htaccess file that contains the following rules:

<IfModule mod_rewrite.c>
RewriteEngine on
RewriteRule ^$ app/webroot/ [L]
RewriteRule (.*) app/webroot/$1 [L]
</IfModule>

That means that all users requesting http://www.example.com/cake_1_3/ with no characters added to it, instead of being served /var/www/html/cake_1_3 will be served /var/www/cake_1_3/app/webroot. If the request is for the same URL with some characters after that, like http://www.example.com/cake_1_3/blog/id/1, the server looks for /var/www/cake_1_3/app/webroot/blog/id/1. Now that still doesn’t exist. No problem. The /app/webroot directory has an .htaccess file of its own, which contains:

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ index.php?url=$1 [QSA,L]
</IfModule>

So if the thing we are looking for is neither a file nor a directory, and this applies to /var/www/cake_1_3/app/webroot/blog/id/1 which does not exist, the server will call on app/webroot/index.php and attach parameters to it with ?url=$1, where $1 is the part of the URL that goes after http://www.example.com/cake_1_3/, which in this case is blog/id/1. That part will be used by the framework to figure out what to do. In this case, it will use the controller called blog, which contains the function called id, which will be called with a parameter 1. The controller is the part of the framework that coordinates all the parts. In this case it will consult the model to get the data for the blog post with the id of 1, and then use an appropriate view to display that on the screen.

These examples were just meant to explain the basic server-framework interaction. Of course, you can get better URLs than that, but that will be discussed in future posts.

CODEIGNITER, STILL THE PHP WAY (back to index)

CodeIgniter is similar. There is an index.php in the URL and you can remove it with the use of an .htaccess file. The online documentation suggest the rewriting rules similar to:

<IfModule mod_rewrite.c>
RewriteEngine on
RewriteCond $1 !^(index\.php|images|robots\.txt)
RewriteRule ^(.*)$ index.php/$1 [L]
</IfModule>

I’ve added the IfModule part and removed a slash before index to make it work (since the .htaccess file was in the same directory as index.php). I’m unsure about the slash part.

That means that the server treats any string in the URL after http://www.example.com/CodeIgniter/ that isn’t index.php or images or robots.txt as if it was index.php/ followed by that string, without index.php having to appear in the browser’s address bar.

The CodeIgniter documentation doesn’t explain the whole mod_rewrite situation, I wonder if anyone was confused because of that.

WSGI – THE SERVER TALKS TO PYTHON AND DJANGO (back to index)

How does web server talk to Django and pass it the users’ browsers’ requests to handle?

The recommended way is WSGI. Previously, mod_python was also used, but that appears to be outdated. There is also the CGI variant, but Django much prefers the WSGI (there are discussions about this on Stack Overflow, like this one). What do all these acronyms mean?

With modules like mod_php, the Apache has a module that interprets and runs its PHP scripts internally. With mod_cgi, it calls on an external PHP program that does that. The GI letters in CGI and WSGI stand for Gateway Interface. They are different interfaces that create gateways between servers and scripts that generate web pages dynamically. C stands for Common, WS stands for Web Server, and then there’s FastCGI as well.

Wikipedia explains that “Historically Python web application frameworks have been a problem for new Python users because, generally speaking, the choice of web framework would limit the choice of usable web servers, and vice versa. Python applications were often designed for either CGI, FastCGI, mod_python or even custom API interfaces of specific web-servers.” (2011-02-04)

SOME LINKS:

http://docs.djangoproject.com/en/dev/howto/deployment/modwsgi/
http://code.google.com/p/modwsgi/
http://www.wsgi.org/wsgi/, http://www.wsgi.org/wsgi/Learn_WSGI
http://www.python.org/dev/peps/pep-0333/
http://stackoverflow.com/questions/3319545/mod-wsgi-mod-python-or-just-cgi
http://www.electricmonk.nl/docs/apache_fastcgi_python/apache_fastcgi_python.html
http://grok.zope.org/documentation/tutorial/installing-and-setting-up-grok-under-mod-wsgi/installing-and-configuring-mod-wsgi

WSGI is how web servers talk to Python and applications running on Python, such as Django based websites. User opens browser, types in URL, presses enter, browser sends a request to the server, the server uses the WSGI gateway interface to run the Django application with a Python interpreter outside itself, not an Apache module. The module, mod_wsgi, contacts the interpreter, instead of doing the interpreting. In Tron terms, instead of Apache having a Python expert in its team of experts, she outsources.

In order to make that work, you need to install the mod_wsgi Apache module. Then you need to add a line to the httpd.conf file that goes something like this, as Django online documentation suggests:

WSGIScriptAlias / /path/to/mysite/apache/django.wsgi

“This tells Apache to serve any request below the given URL using the WSGI application defined by that file.” The wsgi is a script which explains the details and gets the WSGI mechanism running

The development server you use when just trying out Django does the same thing. If you want to take a peek at the source code, you can start from django/core/ in management/commands/runserver.py and handlers/wsgi.py files. All of that in your site-packages directory (see previous post in series).

CONCLUSION (back to index)

Framework based websites/applications are just like any other dynamic website/application. With PHP Apache has a module that interprets its scripts, with Python it uses a module to communicate with the interpreter.

Corrections and comments are useful and welcome. I’m still learning. I’ll always be learning.

THE WHOLE SERIES, SO FAR:

0) No More Static Web Sites (learning from mistakes #1)
1) part 1 : downloading – planting the framework tree
2) part 2: how does the server talk to the framework?
3) Django, CakePHP and Codeigniter, part 3: Models, data, relationships and foreign keys

Advertisements

About apprenticecoder

My blog is about me learning to program, and trying to narrate it in interesting ways. I love to learn and to learn through creativity. For example I like computers, but even more I like to see what computers can do for people. That's why I find web programming and scripting especially exciting. I was born in Split, Croatia, went to college in Bologna, Italy and now live in Milan. I like reading, especially non-fiction (lately). I'd like to read more poetry. I find architecture inspiring. Museums as well. Some more then others. Interfaces. Lifestyle magazines with interesting points of view. Semantic web. Strolls in nature. The sea.
This entry was posted in frameworks, tutorials and tagged , , , , , , , , , , . Bookmark the permalink.

2 Responses to Frameworks part 2: server talks to framework – mod_php, mod_wsgi and friends

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s