Milan’s public transport (atm-mi) path finder simplified interface, PHP script (one week scripting exercise n. 2)

I’ve written a script to create a simpler interface for Milan’s public transport (atm-mi) path finder web site. It took me a few days, and a lot of the time I was figuring out how to deal with exceptions (imprecise addresses etc.)

I even managed to make a small improvement to the original interface. When the user types an imprecise address, the website offers a select of similar addresses, but loses the street number. I, instead, grab the street number and add it to each option in the select.

It is very interesting to explore the Internet this way! You can use and interface information as you want, with a few lines of code (or in other ways).

My atm-mi public transport path finding simplified interface PHP script on a mobile phone screen

I used some regular expressions this time, but I also learned an important lesson about them – it is important to know regular expressions but just as important to know how to avoid using them when not necessary, when a simple string formatting function can do just well. I’ve also learned that complex regular expressions need to be optimized, if you don’t want them to crash the server.

The Internet is huge and full of useful information, and we access that information in all kinds of ways. Not all ways are equally practical, though. For instance, I’ve tried using Milan’s public transportation’s company’s site (http://www.atm-mi.it) for finding a way of getting from point A to point B, and it took a long time to load with my mobile phone’s Internet connection, having to load images and Google maps and all, and it was also harder to navigate with all that complexity. So I’ve written a small PHP script that creates an alternative interface, very minimal, for the same purpose. It still contacts the atm-mi.it website, but only downloads the HTML text. Then it gets the relevant part of the page, and displays only that. Sounds simple. And it was. So I took a couple of days to write it, and decided that it’s going to be my ‘script in a week (or less)’ exercise number 2.

Like the one-week-scripting exercise number one, this one also deals with “powerbrowsing” and web scraping. Once upon a time the Internet was the Internet, where the information moved around hectically through networks, and the Web was the Web, a nice place you visited, more or less nice and more or less static. Things have changed. Sometimes your e-mail, which used to travel to you, now appears in the browser – seemingly part of the Web, and you have to go to it instead of waiting for it to come to you. Other times the web pages come to you, through the RSS feeds.  There are many ways of accessing the information on the Internet, and you can mix and match flows of data, filter them, create different interfaces and access points, through the command line or the browser or the mail or any other way.

For creating this script, where did I start? First I wanted to see what happens behind the scenes when I click around the page I was interested in. So I used the ‘Live HTTP Headers’ add-on for Firefox. It lets me discover what the browser and the server are telling each other. When the browser asks the server for a web page, it usually uses a GET request, which should be used when we only want to read, and not to write, to the server (no side effects). With a post request we can send some values to the server, but they are sent as the part of the URL. With the post request, we can send more data to the server. I went to the atm-mi site and saw that when I insert two addresses and click ‘Calcola’ to get the path that leads from one to the other, it uses a complex POST request to the server – “http://www.atm-mi.it/it/Giromilano/Pagine/default.aspx?bwid=778906ea-dbac-4d25-a4fe5cd41fc837d0-4d262339&wbt=nav&contextname=778906ea-dbac-4d25-a4fe5cd41fc837d0-4d262339&ORIGINE=milano&DESTINAZIONE=milano&sthm=via%20dante%201&edhm=corso%20sempione%2014” with a lots more of data in the  request body. That was confusing. So was the view of the web site’s source code.

Screen shot from atm-mi.it  web site.I tried to replicate the request, changing some values, leaving some values out but wasn’t very satisfied. I looked around the web site, finding a ‘mobile’ section, but it didn’t include path calculation. I looked at the widgets, but that didn’t help much. Finally I came across a way to make a GET request and get what I wanted.

When you search for a path between addresses it gives you an option to email the path instructions to a friend. The friend receives a simple URL for a working GET request, formed like this: http://www.atm-mi.it/it/Giromilano/Pagine/default.aspx?s_place_hd=milano&e_place_hd=milano&s_add_hd=via%20dante%201&e_add_hd=corso%20sempione%2014&bwdatehh=634296481900000000. Initially, I wanted to study the bwdatehh field, but in the end I decided to just leave it out.

But what happens if the user inserts an imprecise address? The atm-mi website offers a choice, a select input field, of similar addresses. So I decided to grab that select, modify it slightly, and use it on my simplified interface. It was strange to notice that I spent more time dealing with possible particular user behaviors, than anything else. What if the user inserts something strange, what if the address doesn’t exist, what if it isn’t precise. If the user behavior was more predictable, most programs would be much shorter. But that would also take away most of the fun in software, especially when it comes to innovation.

web interfaces screenshot

I also managed to introduce a small improvement to the interface. When the user inserts an imprecise address to the atm-mi.it web site, he gets a select of similar addresses, but the web-site loses the street number the user has inserted. That is why I decided to grab the number from the end of the address (or a number followed by a slash and a letter etc.) and add it to every option in the select, preserving the street number in the select.

I’m curious to start using this script outdoors, for actually finding my way to places.

I have learned a lot writing this script. I wouldn’t be surprised if it wasn’t perfect, programming has very much to do with predicting the unexpected and adapting to what you haven’t predicted. And I can’t predict everything. And I must learn from my mistakes.

I would be very thankful for any comments or corrections.

2010-01-21 I have upgraded the script. read about it here.

Advertisements

About apprenticecoder

My blog is about me learning to program, and trying to narrate it in interesting ways. I love to learn and to learn through creativity. For example I like computers, but even more I like to see what computers can do for people. That's why I find web programming and scripting especially exciting. I was born in Split, Croatia, went to college in Bologna, Italy and now live in Milan. I like reading, especially non-fiction (lately). I'd like to read more poetry. I find architecture inspiring. Museums as well. Some more then others. Interfaces. Lifestyle magazines with interesting points of view. Semantic web. Strolls in nature. The sea.
This entry was posted in my code and tagged , , , , , , . Bookmark the permalink.

2 Responses to Milan’s public transport (atm-mi) path finder simplified interface, PHP script (one week scripting exercise n. 2)

  1. +mala says:

    Hi Olja, I have read your article and source code and I found them very interesting. Just out of curiosity, what is the difference in size (i.e. downloaded KB) between your “light” version and the official one? Considering that the original has images and a lot of HTML code I think the difference should be huge… and this is really cool as it would allow users to access information in a much cheaper way.
    I also liked your moderate approach towards regexps. When I was reading the code I first thought “why didn’t she do this with a regexp”, then I realized it’s just me that, having a hammer in my hands, I look at everyhing around me as if it was a nail… 😉

    • Thank you so much for your comment!
      The difference between the amount of downloaded KB with my interface and those downloaded with the official one is actually significant. I’ll try to calculate the exact numbers when I find the time (shame for the Google maps content which gets lost for now). And my interface is also easier to use, so it’s not just the equivalent of using lynx or another textual browser. You only get shown the form you need to compile, nothing else. And only given the answer you actually need. The first version of the script was rather basic, in the second I also combined two pages into one and did some behind the scenes “clicking” on behalf of the user. I’m getting very interested in the mash-up techniques that are now getting very popular and elaborate (more in my next posts).
      As far as for the regexp, thanks for the appreciation. I still haven’t found the right balance for their usage. For now the logic is – if I can do it with simple string manipulation methods I don’t use a regexp. Maybe it’s because I managed to crash Apache a few times:) (on a test virtual machine managed by a sysadmin who, rightly, thinks that increasing the available memory would be ‘diseducativo’, harmful for my learning). Although that doesn’t mean I won’t study regexp optimization methods so I can use them more and more without crashing Apache. 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s