2011 in review, happy 2012 to all!

What a year! I’ve returned to programming, learned so much and found a new job! I entered the world of e-learning platforms both as student and as coder, I’ve meet Drupal and deepened my knowledge of Code Igniter,  I extended my object oriented construction and patterns skills, practiced some SQL, even tried taking courses in machine learning and AI!

I’ve figured out a lot about what to do and what not to do, about managing my energies and nerves, preserving my enthusiasm through it all – after all, this is a marathon, not a sprint!

Now a new year starts, and I’m a Junior Developer. Let’s hope it all goes well. I’ll keep you posted. Yes, I haven’t blogged lately, but that’s about to change.

Happy 2012 to you all!

BTW, the WordPress.com stats helper monkeys prepared a 2011 annual report for my blog.

Here’s an excerpt:

A New York City subway train holds 1,200 people. This blog was viewed about 7,200 times in 2011. If it were a NYC subway train, it would take about 6 trips to carry that many people.

Click here to see the complete report.

Posted in musings | Leave a comment

Beyond good enough, the more you study for the easy parts the better you’ll be at what’s difficult

Ok, I’m back. It’s been a busy few months, and I’ll likely explain why in another post. I’m a busy PHP Junior Developer now, and I’m also taking 3 online courses (offered in partnership with the Stanford University), as you’ve read in the previous posts. I’m following Introduction to Databases, Machine Learning, Artificial Intelligence. And, obviously, I’m having a hard time balancing it all: especially the time, the energy and the motivation. I’ve just taken my DB midterm quiz, and am not too happy with the results. I’m ok with the AI homework, if I manage to submit the one that is due tomorrow. And I’m behind with the Machine Learning programming exercises.

I’m not starring the courses. But that’s ok, this is just a test anyway, to see if I can both work and study. I must learn some of the topics, but mostly I must learn from my mistakes and learn how to learn.

I have, clearly, made mistakes. But the important part is, what have I learned from them? Here’s a top 5 that answers that. Click “Continue reading” for the details.

5) Plan ahead. Plan the rest days as well. Then live each day to the fullest.

4) Add some aesthetics and some fun. Take great notes. Detailed great notes.

3) Don’t consider a course easy if a part of it is easy, or difficult if a part of it is difficult.

2) 80% of the work may be done with 20% effort, but A students do more than that.

1) The more effort you put into the easy stuff, the better you will be on the trickier parts.

Continue reading

Posted in learn-from-mistakes, learn-to-learn, motivational, musings | Tagged , , , , | Leave a comment

First steps in Machine Learning

My e-learning has started, a week before the official start of the Stanford online classes. They have already made available some videos and quizzes, and us students have started strong. The Machine Learning class Twitter account announces some interesting numbers, talking about tens of thousands of students enrolled and quizzes attempted.

I have already started learning the basics of Machine Learning, which starts with the definition of Machine Learning itself. At a certain point, it has been noted that there are some problems for which it is better to let the computer learn by itself, instead of programming it explicitly. That’s what the 1959 definition by Arthur Samuel is about: it states that ML is the field of study that gives computers the ability to learn without being explicitly programmed. This may sound strange, since we know that computers can’t really do anything that we haven’t programmed them to do, or some variation of that. But think about this: Arthur Samuel was not a great checkers player, but he managed to teach the computer to play and improve with every time, until the computer became better at checkers than he was. Also when scientists wanted to teach a helicopter to fly autonomously, they found that the best thing was to let it learn on its own. But how does that happen?

The same way that we do, computers sometimes learn from experience. They repeat and learn, repeat and learn. That brings us to the second definition of Machine Learning: it allows computers to improve its performance over time on a specific task, from experience. It was coined by Tom Mitchell in 1998 and the complete phrasing is that a computer is said to learn from experience E with respect to some task T and some performance measure P if its performance on T, as measured by P, improves with experience E.

What does that mean concretely? One way is that basically we can give a computer a training set of data as input, and it can try to find a function which fits the data reducing the amount of errors, it can basically infer a function to  use to predict further similar results (like apartment prices given a set of variables). And I’ve already taken a sneak peek at the mathematics behind that.

Which is obviously extremely exciting and I would like to thank Stanford, Prof. Andrew Ng and the whole ml-class.org team for this opportunity.

Click “Continue reading” for more details.

[EDIT] You can find some Machine Learning video lectures by Andrew Ng at http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=MachineLearning

Continue reading

Posted in Artificial Intelligence, Machine Learning, online courses | Tagged , , , , | Leave a comment

Stanford classes: Artificial Intelligence, Machine Learning and Introduction to Databases, free, online

I’m about to take my e-learning to a new level – I enrolled in all three of the Stanford online classes: Artifical Intelligence, Machine Learning and Introduction to Databases. It’s free, it’s serious – with tests and all, and it’s going to take a lot of time. How am I going to make it, now that I have an 8 hour work day? I’ll just have to find a way to make it work, plus find extra time for my first Open Source contributions and Github experiences, and the blog and web app security updates etc. I’m just hoping that this thing a friend of mine said once is true: the more you do things the more things you manage to do. Activity furthers activity.

You can still join the classes for a few days, so go to http://www.ai-class.com/, http://www.ml-class.org and http://www.db-class.org if you are interested. Click “Continue reading” if you want to learn more.

[EDIT] You can find some Machine Learning video lectures by Andrew Ng at http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=MachineLearning

Continue reading

Posted in online courses | Tagged , , , , , , | Leave a comment

Algorithm basics – recognize patterns, avoid nested loops and divide and conquer often

Algorithms. Now there’s a huge topic. Where do you start? Good question. There is the theorym, the usual searching and the sorting, the math, the complication, the scary details to be precise about. So you start studying, memorizing, understanding, and the purpose of it all maybe escapes you or gets lost along the way. No need for it to be so. So, what’s the point? Algorithms teach us to think. Algorithm wins us performance, and performance is like currency we can use to trade for other stuff (as Prof. Charles Leiserson explains in his MIT lecture I’ve already blogged about). Even if you don’t work in some algorithm heavy programming, like for Google or something, you can benefit from learning some basic patterns and antipatterns like: quadratic is bad – avoid nested loops, sometimes it’s better to sort first so you can search faster later, etc.

What on Earth am I talking about? As always, in order to reason we need to measure. How do we measure computer program performance in a way that shows us the speed of the software independently of the computer it is running on? We concentrate on the growth speed, not the speed itself. We don’t so much care about the time (or maybe memory space) it takes to execute some piece of code for N size of input, as much as how fast does the performance get reduced depending on the growth of the size of input N. Something is slow when it doesn’t scale well, and might need to scale. So we chose an instruction to follow, count how many times it gets executed, and create a function that calculates that number depending on the size of the input, such as C1*n^2+C2*n+C3 for a nice traditional nested loop. Then we ignore the constants, and try to fit that function and algorithm into a family of algorithms. Algorithms in a family have a shared upper bound that’s a line they will never cross, for a sufficiently large N. That’s Big O notation, simplified. Big O notation gives us a function which is the upper bound of the worst case scenario of our algorithm. There are other notations for other bounds as well. So, mostly we are interested in the worst case, but sometimes it is also important to know the lower bound, that is to know that a certain operation can’t be done with less then a certain amount of work.

As the Wikipedia page explains:

“Suppose an algorithm is being developed to operate on a set of n elements. Its developers are interested in finding a function T(n) that will express how long the algorithm will take to run (in some arbitrary measurement of time) in terms of the number of elements in the input set. The algorithm works by first calling a subroutine to sort the elements in the set and then perform its own operations. The sort has a known time complexity of O(n2), and after the subroutine runs the algorithm must take an additional 55n3 + 2n + 10 time before it terminates. Thus the overall time complexity of the algorithm can be expressed as

T(n)=O(n^2)+55n^3+2n+10.\

This can perhaps be most easily read by replacing O(n2) with “some function that grows asymptotically slower than n2 “. Again, this usage disregards some of the formal meaning of the “=” and “+” symbols, but it does allow one to use the big O notation as a kind of convenient placeholder.”

Let’s demystify the very basics, then. If I get something wrong, please don’t be too harsh with me. I’m only just learning, and putting it in writing for myself, and maybe others. So I finally get to learn it well and can stop having to study it from start over and over again:)

O(N2)

If you have to remember just one thing, make it this one – quadratic is bad. That pretty much means avoid nested loops. I’m serious, quadratic (or worse, cubic, etc.) running times grow like crazy – the same things that takes seconds with an N log N growth rate, takes weeks with n^2.  That can become important if we do some heavy calculations. But even in less extreme conditions, it’s nice to be careful. As I’ve mentioned before in this post and also in a previous post, performance is like currency, if we save up enough of it we can trade it for features, usability and what not. But we have to save it up first. So, if you can avoid doing something for n per m times, do.

As most things, algorithms are about pattern recognition. We learn to look at a piece of code, at an algorithm, and figure out what type of behaviour we can expect from it.

To read an explanation of Big O notation and some info on logarithmic growth times, click “Continue reading”. Continue reading

Posted in Algorithms | Tagged , , , | Leave a comment

5 productivity ideas: every smart hacker always has a next doable step

“We overestimate what we can do in a day, and underestimate what we can do in a year.”

1) MICROMANAGE THE YEAR

Since I found that phrase on the Web, it’s been in my toolbox and helping me organize my time. I used to make huge plans, start out big, than lose steam and feel disappointed. Too many expectations, too little planning. Planning is key. Planning and keeping track of what you actually do, since – surprisingly – our idea of where our hours have gone doesn’t really match the reality of what’s been going on. Learning to organize yourself can be scary, but not once you get used to it. You’ll have to find what works for you, and also find some existing techniques from books and websites (hard to find the really good ones, comes down to friends’ advice, luck and Serendipity). A year is a lot of time, but it’s even more time if you use it well. I give myself objectives for the season. Focus on something for the month and the week. And then I parcel out the days of the week into sections like before arriving at work (the commute and all), the work before lunch, the lunch break, the after-lunch time at work, the evening commute, the evening. Plus the weekends. Each of these 1, 2 or 4 hour time blocks has a meaning and a purpose. It’s not easy, as I mentioned previously, an easy game is a contradiction in terms (a quotation from the “Spaghetti hacker” book).

2) EVERY SMART HACKER ALWAYS HAS A NEXT DOABLE STEP (POP REFERENCE) NEXT DOABLE STEP MAKES DREAMS INTO REALITIES

3) SEPARATE PLANNING FROM DOING

4) WORK EXPANDS TO FILL THE TIME AVAILABLE FOR ITS COMPLETION

5) IF YOU HIT A WALL JUST REMEMBER, NO WALL LASTS FOREVER

Click “Continue reading” to get the details:) Continue reading

Posted in learn-to-learn, motivational | Tagged , , , , , | Leave a comment

Trees with nested sets, conditional SQL with the Case Statement, SQL beyond the very basics made easier

As Chad Fowler notes  in his wonderful book “The Passionate Programmer”, the relationship between programmers and databases is not always the best one. Towards the end of chapter 7, “Be a Generalist”, he writes that:

“software developers are growing increasingly lazy and ignorant about how to work with databases.”

We are delegating too much to DBAs. I found it reassuring that I wasn’t the only one to have had the same problem. When we fear something, we avoid studying it in detail, and I used to fear databases. Then I decided to get to know them better, and have been learning something new every week at least.

Lately I picked up a book called “SQL Antipatterns” which is very interesting, and is making this voyage of going beyond the very basics of (My)SQL more fun and easy. It’s written by Bill Karwin and is published by the Pragmatic Bookshelf in 2010. It presents 24 antipatterns (bad practices), legitimate uses of those, and possible alternatives. It ends with some thoughts on normalization.

“To call SELECT only one statement in that language is like calling an engine only one part of an automobile.”

The book invites us to look beyond the basics and the obvious and the widespread habits in SQL.

The second  antipattern in the book is about trees. Naive trees. It notes that if we have to keep comments in a database, and each comment can be a reply to another comment, if we create our table in a way that each comment row “knows” its parent comment, we can only query so many levels of the tree at a time, because the JOIN statement will only take us so far. Adding a node is easy with this Adjacency List method, it’s enough that the new row “knows” its parent’s id. But deleting is more complex.

There are three alternatives: Path Enumeration (fun to learn that is what a UNIX paths like /usr/local/lib really is), Nested Sets (the one I took an interest in), and Closure Table (a neat separate table which contains node relationships).

It was fun to discover that, with nested sets, each node knows not its parent, but knows its “territory”, that is its “left and right” values in between which its descendants are found.  If our nodes fall between another node’s left and right, we have an ancestor. This means that we can simply query the descendants and the ancestors with a ON … BETWEEN … WHERE ,  and we can insert a node by first doing a big shift with a CASE statement.

The Nested Set Model

The Nested Set Model diagram from Wikipedia, http://en.wikipedia.org/wiki/File:NestedSetModel.svg

The Nested Sets Model

The Nested Sets Model Representation from Wikipedia, http://en.wikipedia.org/wiki/File:Clothing-hierarchy-traversal-2.svg

What’s the Case Statement? It’s the way to have some conditional logic in your SQL code (CASE/WHEN/THEN/ELSE). Curious about how to do all of that? Click “Continue reading” and get your queries 🙂 If you have actually used any of this, especially the nested sets or the Case Statement, please leave a comment! It would be very useful. Continue reading

Posted in books, SQL | Tagged , , | Leave a comment