I’ve written a script to create a simpler interface for Milan’s public transport (atm-mi) path finder web site. It took me a few days, and a lot of the time I was figuring out how to deal with exceptions (imprecise addresses etc.)
I even managed to make a small improvement to the original interface. When the user types an imprecise address, the website offers a select of similar addresses, but loses the street number. I, instead, grab the street number and add it to each option in the select.
It is very interesting to explore the Internet this way! You can use and interface information as you want, with a few lines of code (or in other ways).
I used some regular expressions this time, but I also learned an important lesson about them – it is important to know regular expressions but just as important to know how to avoid using them when not necessary, when a simple string formatting function can do just well. I’ve also learned that complex regular expressions need to be optimized, if you don’t want them to crash the server.
The Internet is huge and full of useful information, and we access that information in all kinds of ways. Not all ways are equally practical, though. For instance, I’ve tried using Milan’s public transportation’s company’s site (http://www.atm-mi.it) for finding a way of getting from point A to point B, and it took a long time to load with my mobile phone’s Internet connection, having to load images and Google maps and all, and it was also harder to navigate with all that complexity. So I’ve written a small PHP script that creates an alternative interface, very minimal, for the same purpose. It still contacts the atm-mi.it website, but only downloads the HTML text. Then it gets the relevant part of the page, and displays only that. Sounds simple. And it was. So I took a couple of days to write it, and decided that it’s going to be my ‘script in a week (or less)’ exercise number 2.
Like the one-week-scripting exercise number one, this one also deals with “powerbrowsing” and web scraping. Once upon a time the Internet was the Internet, where the information moved around hectically through networks, and the Web was the Web, a nice place you visited, more or less nice and more or less static. Things have changed. Sometimes your e-mail, which used to travel to you, now appears in the browser – seemingly part of the Web, and you have to go to it instead of waiting for it to come to you. Other times the web pages come to you, through the RSS feeds. There are many ways of accessing the information on the Internet, and you can mix and match flows of data, filter them, create different interfaces and access points, through the command line or the browser or the mail or any other way.
For creating this script, where did I start? First I wanted to see what happens behind the scenes when I click around the page I was interested in. So I used the ‘Live HTTP Headers’ add-on for Firefox. It lets me discover what the browser and the server are telling each other. When the browser asks the server for a web page, it usually uses a GET request, which should be used when we only want to read, and not to write, to the server (no side effects). With a post request we can send some values to the server, but they are sent as the part of the URL. With the post request, we can send more data to the server. I went to the atm-mi site and saw that when I insert two addresses and click ‘Calcola’ to get the path that leads from one to the other, it uses a complex POST request to the server – “http://www.atm-mi.it/it/Giromilano/Pagine/default.aspx?bwid=778906ea-dbac-4d25-a4fe5cd41fc837d0-4d262339&wbt=nav&contextname=778906ea-dbac-4d25-a4fe5cd41fc837d0-4d262339&ORIGINE=milano&DESTINAZIONE=milano&sthm=via%20dante%201&edhm=corso%20sempione%2014” with a lots more of data in the request body. That was confusing. So was the view of the web site’s source code.
I tried to replicate the request, changing some values, leaving some values out but wasn’t very satisfied. I looked around the web site, finding a ‘mobile’ section, but it didn’t include path calculation. I looked at the widgets, but that didn’t help much. Finally I came across a way to make a GET request and get what I wanted.
When you search for a path between addresses it gives you an option to email the path instructions to a friend. The friend receives a simple URL for a working GET request, formed like this: http://www.atm-mi.it/it/Giromilano/Pagine/default.aspx?s_place_hd=milano&e_place_hd=milano&s_add_hd=via%20dante%201&e_add_hd=corso%20sempione%2014&bwdatehh=634296481900000000. Initially, I wanted to study the bwdatehh field, but in the end I decided to just leave it out.
But what happens if the user inserts an imprecise address? The atm-mi website offers a choice, a select input field, of similar addresses. So I decided to grab that select, modify it slightly, and use it on my simplified interface. It was strange to notice that I spent more time dealing with possible particular user behaviors, than anything else. What if the user inserts something strange, what if the address doesn’t exist, what if it isn’t precise. If the user behavior was more predictable, most programs would be much shorter. But that would also take away most of the fun in software, especially when it comes to innovation.
I also managed to introduce a small improvement to the interface. When the user inserts an imprecise address to the atm-mi.it web site, he gets a select of similar addresses, but the web-site loses the street number the user has inserted. That is why I decided to grab the number from the end of the address (or a number followed by a slash and a letter etc.) and add it to every option in the select, preserving the street number in the select.
I’m curious to start using this script outdoors, for actually finding my way to places.
I have learned a lot writing this script. I wouldn’t be surprised if it wasn’t perfect, programming has very much to do with predicting the unexpected and adapting to what you haven’t predicted. And I can’t predict everything. And I must learn from my mistakes.
I would be very thankful for any comments or corrections.
2010-01-21 I have upgraded the script. read about it here.