You are currently browsing the archives for the Open Source category


Compiling nginx 1.4.0 With SPDY on CentOS 6

Just a few days ago, the latest version of nginx at 1.4.0 was released to the public. The version bump adds a lot of new capabilities for your web stack. The most interesting for me was support for SPDY 2 protocol.

Excerpts from Chromium SPDY’s page reads below:

As part of the “Let’s make the web faster” initiative, we are experimenting with alternative protocols to help reduce the latency of web pages. One of these experiments is SPDY (pronounced “SPeeDY”), an application-layer protocol for transporting content over the web, designed specifically for minimal latency.  In addition to a specification of the protocol, we have developed a SPDY-enabled Google Chrome browser and open-source web server. In lab tests, we have compared the performance of these applications over HTTP and SPDY, and have observed up to 64% reductions in page load times in SPDY. We hope to engage the open source community to contribute ideas, feedback, code, and test results, to make SPDY the next-generation application protocol for a faster web.

In order for SPDY to work, one will need an SSL certificate and OpenSSL 1.0.1c at least to compile and run a website successfully with nginx. SPDY needs NPN enabled with OpenSSL and CentOS only provides 1.0.0. According to a blog post here, we can just add a repo to get OpenSSL to work nicely.

Here are the steps needed to compile nginx with SPDY support:
$ rpm -ivh --nosignature http://rpm.axivo.com/redhat/axivo-release-6-1.noarch.rpm
$ yum --enablerepo=axivo update openssl
$ cd /opt/src
$ wget http://nginx.org/download/nginx-1.4.0.tar.gz
$ tar xfz nginx-1.4.0.tar.gz
$ cd nginx-1.4.0
$ ./configure --with-pcre --with-http_ssl_module --with-http_spdy_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_stub_status_module --prefix=/usr/local/nginx
$ make -j4
$ make install

Now that the steps above are through, it’s time enable SPDY with your websites assuming that you already have a working nginx configuration with SSL enabled. It’s actually really simple, the full explanation is located at nginx’s SPDY documentation.

server {
listen 443 ssl spdy;
ssl_certificate server.crt;
ssl_certificate_key server.key;
...
}

Now test your website at spdycheck.org to see if your SPDY implementation is successful. Cheers!

chuck-norris-php

It is what is – A PHP client to get Chuck Norris from The Internet Chuck Norris Database: icndb.com.

Codes at the usual - https://github.com/tistaharahap/chuck-norris-php

Chuck will come after you…you have been warned!

Naive Bayes Classifier – Revisited

During the last week, I’ve been following up work with a side project to do machine learning with Urbanesia’s comprehensive data. A lot of late night reading and fiddling with foreign codes were the highlights of my last week. Wanted to elaborate my implementations and how several kinds of technologies affect benchmarks particularly with classification performance.

The repo for the codes is at Github here.

During time span of the first batch of codes until now, I have made lots of changes to the codes and also the data store. I wasn’t sure at first, which database will bring the best performance. I’m testing on a fairly low spec hardware which is a Macbook Air Late 2011 with 4 GB DDR3, SSD and Intel Core i5 1.7GHz, this is nothing compared to a real server relatively. By the way, although relatively low spec, she’s got a name, it’s Claire.

My first challenge was to abstract data stores and deal with the algorithm later. To keep things familiar and easy, MySQL was the first store I dealt with. After getting the tables ready, I coded the algorithm with help from Alexandru Nedelcu‘s excellent Hacker News posting to implement Naive Bayes Classifier in Ruby. The alpha version was produced.

The alpha sucks really bad in terms of performance, it took +1000 seconds to classify a single word. MySQL was expectedly not up for the task. Since the data is actually a collection of words, I was intrigued to use MongoDB as the data store. Since the abstraction layer is already there, I wrote a MongoDB store quite painless and hoping to get better results. The codes were done and the benchmark showed with MongoDB, it only took +400 seconds to classify a single word. Still not good enough, I wasn’t prepared to write scheduled backend services which will explode the servers with +50.000 users at least and not to mention the 200.000+ businesses we have, it’s gonna be a Sys Admin’s nightmare.

Real work was catching up with side projects so I decided to take a break until last week, I managed to get some time to write more codes. So I read along Hacker News to look for the perfect NoSQL database to work with the data we have. I remembered a friend of mine Dondy Bappedyanto talking about Redis and how it is a superset of Memcache. So I went straight to Redis.io and compiled the source code.

Disclaimer: I knew the algorithm wasn’t optimized as I would have liked it to be with the MySQL and MongoDB store, wanted to focus on macro optimizations and do micro optimizations afterwards.

Redis is quite unique because it’s “Memcache-like” storing data as key values, the logic changes dramatically and further learning of Redis’ data types will help a lot. My aim was to study Redis while doing the project so I opted to do the codes with primitive data types first and optimize along the way. So with a lousy algorithm and a not-so-optimized data model in Redis, I classfied a keyword and it was instant love. It only took ~1 second to do it.

So in my mind, I already got the optimization I wanted on a macro level, it’s time to get dirty now. Being my nature of enjoying new stuffs as they come up, I researched other implementations of Naive Bayes Classifier in other languages. I was thinking about implementing a Node.js + Socket.io proxy to do the JavaScript communication with our V2 client side codings and was interested to know more about Node.js.

A quick google introduced me to several Node.js modules to do the job. One that I was particularly interested was Classifier by Heather Arhur. I read through the source code and finding some clever methods to speed up things, get all the data first and do the calculations afterwards. But, I was curious about Node.js and wanted to learn to code with it. So I did a more optimized of my previous algorithm in PHP and implemented it in JavaScript. Wanted to know how my codes will perform against the Classifier Node.js module. Both codes were using Redis as the data store.

The quick answer is that both my codes and the Classifier module achieved sub second performance, classifying single keywords in ~300 milliseconds. This was a great morale boost but the fun only lasted a while. It turns out that sometimes both implementations won’t spit out results in medium to large datasets. Being a newbie with Node.js, I didn’t know what to do. My guess it’s got something to do with memory because the both implementions didn’t emit the finish events. Could be a Node.js problem or rather the redis and hiredis node modules.

This makes me code in PHP again. Heavily modified the implementation in PHP to get the data first and calculate later. I was surprised with the result. It took only ~0.01 second to classify a single keyword after the optimization was done. This gives me an idea to do the calculation in PHP and using Node.js + Socket.io as a frontend to JavaScript clients.

Since it was really painless to do WebScoket with Socket.io, it took only a few minutes to produce the Node.js frontend available here. During a subjective benchmark, it took 68 milliseconds to classify and deliver the result to JavaScript clients. This was a near realtime result and I found my solution.

Last night was full with fiddling around with the algorithm, trying to get the best accuracy from it and during last night and today, the PHP implementation is now at version 0.3.0. A coding session this afternoon led to a helper to produce blacklist/stopwords from a collection of text. I couldn’t just import the most frequent words to the blacklist collection because it’s really subjective depending on languages. Urbanesia’s data is a mix of Indonesian and English so it will take more time to analyze. If there’s an acceptable automation method, I will share it at the repo.

The conclusion of this project was to think less and do more. Algorithms to do machine learning is available through out the Internet, I mean smart and talented developers before and after us will keep finding new ways to organize data, it’s the implementation that counts. Each problems has its own domain and I’m sure my codes will not cater all problems. However, learning by doing is also an excellent experience.

Naive Bayes Classifier is a probability calculation of each keyword being independent to the other keywords classified so it’s really suited to mine preferences, related content, etc but in some cases when a group of keyword is actually what we want to know about, Naive Bayes Classifier’s accuracy won’t be so great. This calls for another solution, if you have any ideas about this, please do comment, would love to know what you think.

Cheers!

[Techtorial] Responsive With Zurb Foundation & HTML5

Responsive techniques with websites have been around for a while now. Not many websites here in Indonesia are responsive. Being responsive for me is out of necessity, mobile web traffic is increasing very rapidly and being responsive is the next logical step despite already having a mobile web. It will look good with search engines too :)

Zurb Foundation is one of a handful collection of frontend Responsive frameworks out there. However, to get your website to be really resilient, you should start from the server side. There’s a slide show here by Yiibu covering all the aspects why being responsive starts from the server side part. Keep in mind that frontend Responsive frameworks does not actually help with optimizing the images your clients will download nor do it will strip HTML fragments that shouldn’t be included, it simply hides them with CSS, but with some clever JavaScript, you could take them off from DOM but then again that’s more work for handsets with minimum CPU power.

To get into perspective, by making your website Responsive, you start Mobile First. Why? Because mobile version is considered as the lowest fidelity in terms of the Information Architecture and also from a visual point of view. By doing this, you can actually devise a scale of priority and get to know your products/features deeper. If you’re still not sure, you can always A/B test it.

For the purpose of this Techtorial, we’re going to build a simple news reader application for my friend‘s Tech Blog DailySocial.net. We’re going to extract contents using only frontend technologies, JavaScript to be exact. There will be no caching (persistency) at all, you are welcomed to fork and do your own implementation, you could try to persist by using Local Storage or even cookies (beware of the 4 KB limit for this).

Before going through, these are the things you will need for later:

  1. Zurb Foundation 3.0
  2. Code editor, Smultron will suffice for Mac users
  3. Some HTML5 Knowledge
  4. Github repository for the source code in this tutorial is available here

We’re gonna start by building the layout. Every layout elements in Foundation is made up of grids. Every row, there are 12 columns with a default gutter size of 30 pixels. The interesting part of Foundation 3.0 is that now it provides mobile columns. Mobile columns spans 4 columns and it will be really useful in cases like when on the Desktop the first column is 4 columns wide, the second is 8 columns while on the mobile version you’d want it to be equally divided columns. You can create your own custom Foundation by the way.

Starting up we want a layout with a website name at the top, articles at the belly, sidebar on the right and some fancy notes for the footer.

You can see how easy it is to do the columns. Of course, it’s just a single row occupying the whole width. Let’s make it more interesting by dividing into 2 belly columns consisting of the articles and a sidebar.

When you resize your browser, the layout will automagically be adjusted with the screen resolution. Looking the code above, I encourage you to hack the left column by adding a mobile-three class and a mobile-one class on the right. The behaviour of the layout changes by persisting the sidebar to always be on the right. Doesn’t look good right? Revert it.

Now the footer is done and you’ve got a working layout that is responsive and ready for some JavaScript manipulations.

The data source comes from DailySocial’s JSONP endpoint. If you don’t know what JSONP is, there’s a good reading about it here. Because of the nature of JSONP, all we gotta do is just create a callback function in JavaScript and include a script from DailySocial’s JSONP endpoint after your callback function is declared.

We’re done! So quick and painless to finish this Techtorial right? There are a lot of improvements in store for this DailySocial reader. On the next tutorial, we’re gonna cache our JSON into the browser’s Local Storage. So for now, have fun with the codes!


photo of Batista Batista R Harahap [email protected]
Jl. Bango II/29C, Pondok Labu
Cilandak , DKI Jakarta , 12450 Indonesia
62817847023

This hCard created with the hCard creator.