The Roach is rather boring - I tend to go to work, come home, go to the gym (sometimes), work on my site (sometimes) and play some games before hitting the sack. What I do at work is interesting, but it still involves sitting at a desk staring at a computer screen - which places me high in the bracket of people with advanced anti-social skills according to the more "normal" people around me. The guy who sits next to me at work, Prabhakar - is NOT rather boring though. Sure he sits in front of a computer screen at work too, but he is an enthusiastic participant in several events and extra-curricular activities that don't involve being in front of one. He is probably a bit like what I must have been (or hoped to be) before I got fat, old and cynical (read lazy).
Anyway, let's not dwell on the past or my rotundity here. It so happens that Prabhakar had attended a big developer summit last year - because he tends to do such interesting things. Hence he knew about the event this year too, and got an invite from the organizers to come attend at discount price. Since I sit right next to him, and occasionally grumble to him about various things, he was nice enough to include me in his summit plans.
So while I would normally stay well away from something titled GIDS… Great Indian Developer Summit - keyword Developer - I found myself thinking, "Hey, maybe it won't be such a bad idea to attend. I'm sure the food will be good… and it means not having to wear formals for 2 working days!"
Why the anti-developer stand? Well if you have been put through any type of technical training, you will be quick to realize that techies can be a very boring lot when it comes to making presentations. And although I am a Computer Engineer according to my Bachelor's Degree, I have little love for coding in Java - which seems to be everyone's favorite cup of coffee, and hence an implicit theme at any Developer Summit.
I also admit I view the Java developer community with a degree of disdain. It's tough to put my finger on why exactly, but it is probably to do with their seeming narrow-mindedness. Caught up with their mundane code they are often unable to appreciate anything else the world comes up with unless it makes their job a little easier. Oh and they make horrid presentations and websites - but I shall leave that rant for another day.
You may now wonder why I seem to be specific about being Anti-Java-Developer, or to use their naming convention antiJavaDeveloper. I'm not. I'm just anti-developer. It's just that I am from the Java side (kind of like being from the girl's side at an Indian wedding, or the Light side if we were Jedi) at work and there just seem to be a whole lot more Java people in the world. I haven't interacted enough with the .Net variety to have formed as strong an opinion, but I assume they have similarly endearing qualities.
Anyway. Given this notion I have of the developer community at large (the part about not being able to appreciate the value of new tech and concepts) I should not have been surprised to see session with 4 participants (including Prabhakar and me), or people walk out of sessions mid way. But I would never have imagined this could be the case with Cloud Computing.
I was wrong.
Cloud Computing is everywhere. I would not even excuse my non-IT industry readers if they claimed they had not heard of it. With the amount of buzz created around the concept over the past few years, it is almost impossible to not have at least heard the term, and then wondered what it could possibly be. Large billboards by Microsoft talking about Cloud Computing just outside major airports in India surely would have caught your eye if other media like newspapers or the internet did not. Most importantly, even for people like me (I'm one of the few guys who comprise the Cloud Computing Center of Excellence at work), Cloud Computing is still not a cut and dry concept. That's because it's still relatively new, and everyone seems to have a different definition for it. So there's always more to learn even if you do know a bit about it.
Strangely though, the developer community seemed least interested in understanding what Cloud Computing is. Funnier still, is the fact that GIDS 2010 was majorly sponsored by Amazon Web Services. Poor chaps. The fact that they had 6 sessions (including 2 Keynote addresses and a Workshop) must have helped though - in terms of the probability of developer attending some of them. Some of the other guys were not so lucky.
Mathew McCullough was one such unlucky presenter. Here's Mathew, according to the GIDS website:
Matthew McCullough is an energetic 12 year veteran of enterprise software development, open source education, and co-founder of Ambient Ideas, LLC, a Denver consultancy. Matthew currently is a member of the JCP, reviewer for technology publishers including O'Reilly, author of the DZone Maven RefCard, and President of the Denver Open Source Users Group. Matthew jumps at opportunities to evangelize and educate teams on the benefits of open source. His current focuses are Cloud Computing, Maven, iPhone, Distributed Version Control and OSS Tools.
Mathew presented a great session (with not-so-great turn out) on Hadoop - an Apache project that is used by ALL the big names in the IT industry for highly scalable, distributed, high performance computing. That is what Cloud Computing is, amongst other things. You can check out who uses Hadoop here if you don't take my word for it.
Mathew opened his session with giving Moore's law a formal burial. The era of clock speeds increasing that most of us grew up with is done. In fact, according to Mathew, the computers we have today are probably the fastest (in terms of clock speed) we will ever own! It makes sense really - gone are the days when every year Intel would bring out a processor that was a few 100MHz or even a GHz faster than previous year's CPU. Instead both Intel and AMD have gone the multi-core way.
Multi-CPU = Parallel processing. Hence as Mathew states, the "era of processing parallelism has begun".
Hadoop runs on as many computers as you give it, in a parallel fashion. It uses a concept called MapReduce that was built based on a paper written by the brains at Google. What it does is to take a large problem, split it into smaller tasks that are assigned to individual machines in the cluster, to run in a parallel, and then combine the output of all the machines to get the final output.
To make the whole system more efficient, MapReduce works by moving around the code for the problem rather than the (much larger) data it has to operate on. This is completely opposite to the way most other systems work - including your CPU. Just to give you a perspective, in the case of Hadoop MapReduce jobs, the code is often in the Kilobyte range, while the data it operates on can be in Petabytes (1012KB). Imagine if Hadoop tried moving that data around instead of just sending the code to it.
Hadoop comes with its own filesystem that helps handle its parallel, distributed nature and the aforementioned mammoth files. Called HDFS (Hadoop Distributed File System), like MapReduce, it takes a large file and chops it up into 64MB chunks that it spreads evenly across a cluster of machines. It also replicates the chunks across the cluster - giving it reliability and high availability. Each small MapReduce task mentioned earlier operates on a single such chunk of data (called a block). Since the blocks are spread out onto different machines, the MapReduce tasks can run in parallel on individual machines where the data is stored, instead of moving data around the cluster.
Hadoop also comes with some other interesting stuff, such as Pig, HBase, Chukwa and Hive. You can check them out at the Hadoop site .
Mathew revealed the economics behind Hadoop's success. For $74.85 you can either buy 4GB of RAM, or a 1TB HDD. With Hadoop you can use that 1TB HDD effectively for processing, the way RAM is usually used on a single machine. Even if your Math is as bad as mine, you will know which one gives you more bang for the buck.
Also, for $10,000 you could buy 10 decent desktop machines, or one really cool server. With Hadoop you can combine the power of all 10 desktops by using them in parallel, and automatically handle machine failures. Which would you buy? The combined power of 10 desktops would beat the single server, in case you were wondering.
And it is not just you - Google does this too, so it has got to make sense. Google's Data Centers apparently run on thousands of desktop class machines. Sure a several dozen die every week, but who cares? Failure is inevitable, as Mathew says. So why pay the price for high end servers? They too will fail one day.
Perhaps the good thing about not many people knowing what Cloud Computing is, is that we got a lot of time to talk to Mathew post session. Other than 2 guys from IBM, we were the only guys who knew anything about Hadoop and could talk to him "intelligently". Brought a smile to his face too - to finally meet someone at the summit who knew what he was talking about. As he put it, we were amongst perhaps 1500 people worldwide, who seemed to understand Cloud Computing and Hadoop.
For this I can only thank my managers at work, Nirmallya and Venu - for their foresight, and the developer community in general - for having none.