You are currently browsing the archives for the Technology category


Compiling nginx 1.4.0 With SPDY on CentOS 6

Just a few days ago, the latest version of nginx at 1.4.0 was released to the public. The version bump adds a lot of new capabilities for your web stack. The most interesting for me was support for SPDY 2 protocol.

Excerpts from Chromium SPDY’s page reads below:

As part of the “Let’s make the web faster” initiative, we are experimenting with alternative protocols to help reduce the latency of web pages. One of these experiments is SPDY (pronounced “SPeeDY”), an application-layer protocol for transporting content over the web, designed specifically for minimal latency.  In addition to a specification of the protocol, we have developed a SPDY-enabled Google Chrome browser and open-source web server. In lab tests, we have compared the performance of these applications over HTTP and SPDY, and have observed up to 64% reductions in page load times in SPDY. We hope to engage the open source community to contribute ideas, feedback, code, and test results, to make SPDY the next-generation application protocol for a faster web.

In order for SPDY to work, one will need an SSL certificate and OpenSSL 1.0.1c at least to compile and run a website successfully with nginx. SPDY needs NPN enabled with OpenSSL and CentOS only provides 1.0.0. According to a blog post here, we can just add a repo to get OpenSSL to work nicely.

Here are the steps needed to compile nginx with SPDY support:
$ rpm -ivh --nosignature http://rpm.axivo.com/redhat/axivo-release-6-1.noarch.rpm
$ yum --enablerepo=axivo update openssl
$ cd /opt/src
$ wget http://nginx.org/download/nginx-1.4.0.tar.gz
$ tar xfz nginx-1.4.0.tar.gz
$ cd nginx-1.4.0
$ ./configure --with-pcre --with-http_ssl_module --with-http_spdy_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_stub_status_module --prefix=/usr/local/nginx
$ make -j4
$ make install

Now that the steps above are through, it’s time enable SPDY with your websites assuming that you already have a working nginx configuration with SSL enabled. It’s actually really simple, the full explanation is located at nginx’s SPDY documentation.

server {
listen 443 ssl spdy;
ssl_certificate server.crt;
ssl_certificate_key server.key;
...
}

Now test your website at spdycheck.org to see if your SPDY implementation is successful. Cheers!

chuck-norris-php

It is what is – A PHP client to get Chuck Norris from The Internet Chuck Norris Database: icndb.com.

Codes at the usual - https://github.com/tistaharahap/chuck-norris-php

Chuck will come after you…you have been warned!

Naive Bayes Classifier – Revisited

During the last week, I’ve been following up work with a side project to do machine learning with Urbanesia’s comprehensive data. A lot of late night reading and fiddling with foreign codes were the highlights of my last week. Wanted to elaborate my implementations and how several kinds of technologies affect benchmarks particularly with classification performance.

The repo for the codes is at Github here.

During time span of the first batch of codes until now, I have made lots of changes to the codes and also the data store. I wasn’t sure at first, which database will bring the best performance. I’m testing on a fairly low spec hardware which is a Macbook Air Late 2011 with 4 GB DDR3, SSD and Intel Core i5 1.7GHz, this is nothing compared to a real server relatively. By the way, although relatively low spec, she’s got a name, it’s Claire.

My first challenge was to abstract data stores and deal with the algorithm later. To keep things familiar and easy, MySQL was the first store I dealt with. After getting the tables ready, I coded the algorithm with help from Alexandru Nedelcu‘s excellent Hacker News posting to implement Naive Bayes Classifier in Ruby. The alpha version was produced.

The alpha sucks really bad in terms of performance, it took +1000 seconds to classify a single word. MySQL was expectedly not up for the task. Since the data is actually a collection of words, I was intrigued to use MongoDB as the data store. Since the abstraction layer is already there, I wrote a MongoDB store quite painless and hoping to get better results. The codes were done and the benchmark showed with MongoDB, it only took +400 seconds to classify a single word. Still not good enough, I wasn’t prepared to write scheduled backend services which will explode the servers with +50.000 users at least and not to mention the 200.000+ businesses we have, it’s gonna be a Sys Admin’s nightmare.

Real work was catching up with side projects so I decided to take a break until last week, I managed to get some time to write more codes. So I read along Hacker News to look for the perfect NoSQL database to work with the data we have. I remembered a friend of mine Dondy Bappedyanto talking about Redis and how it is a superset of Memcache. So I went straight to Redis.io and compiled the source code.

Disclaimer: I knew the algorithm wasn’t optimized as I would have liked it to be with the MySQL and MongoDB store, wanted to focus on macro optimizations and do micro optimizations afterwards.

Redis is quite unique because it’s “Memcache-like” storing data as key values, the logic changes dramatically and further learning of Redis’ data types will help a lot. My aim was to study Redis while doing the project so I opted to do the codes with primitive data types first and optimize along the way. So with a lousy algorithm and a not-so-optimized data model in Redis, I classfied a keyword and it was instant love. It only took ~1 second to do it.

So in my mind, I already got the optimization I wanted on a macro level, it’s time to get dirty now. Being my nature of enjoying new stuffs as they come up, I researched other implementations of Naive Bayes Classifier in other languages. I was thinking about implementing a Node.js + Socket.io proxy to do the JavaScript communication with our V2 client side codings and was interested to know more about Node.js.

A quick google introduced me to several Node.js modules to do the job. One that I was particularly interested was Classifier by Heather Arhur. I read through the source code and finding some clever methods to speed up things, get all the data first and do the calculations afterwards. But, I was curious about Node.js and wanted to learn to code with it. So I did a more optimized of my previous algorithm in PHP and implemented it in JavaScript. Wanted to know how my codes will perform against the Classifier Node.js module. Both codes were using Redis as the data store.

The quick answer is that both my codes and the Classifier module achieved sub second performance, classifying single keywords in ~300 milliseconds. This was a great morale boost but the fun only lasted a while. It turns out that sometimes both implementations won’t spit out results in medium to large datasets. Being a newbie with Node.js, I didn’t know what to do. My guess it’s got something to do with memory because the both implementions didn’t emit the finish events. Could be a Node.js problem or rather the redis and hiredis node modules.

This makes me code in PHP again. Heavily modified the implementation in PHP to get the data first and calculate later. I was surprised with the result. It took only ~0.01 second to classify a single keyword after the optimization was done. This gives me an idea to do the calculation in PHP and using Node.js + Socket.io as a frontend to JavaScript clients.

Since it was really painless to do WebScoket with Socket.io, it took only a few minutes to produce the Node.js frontend available here. During a subjective benchmark, it took 68 milliseconds to classify and deliver the result to JavaScript clients. This was a near realtime result and I found my solution.

Last night was full with fiddling around with the algorithm, trying to get the best accuracy from it and during last night and today, the PHP implementation is now at version 0.3.0. A coding session this afternoon led to a helper to produce blacklist/stopwords from a collection of text. I couldn’t just import the most frequent words to the blacklist collection because it’s really subjective depending on languages. Urbanesia’s data is a mix of Indonesian and English so it will take more time to analyze. If there’s an acceptable automation method, I will share it at the repo.

The conclusion of this project was to think less and do more. Algorithms to do machine learning is available through out the Internet, I mean smart and talented developers before and after us will keep finding new ways to organize data, it’s the implementation that counts. Each problems has its own domain and I’m sure my codes will not cater all problems. However, learning by doing is also an excellent experience.

Naive Bayes Classifier is a probability calculation of each keyword being independent to the other keywords classified so it’s really suited to mine preferences, related content, etc but in some cases when a group of keyword is actually what we want to know about, Naive Bayes Classifier’s accuracy won’t be so great. This calls for another solution, if you have any ideas about this, please do comment, would love to know what you think.

Cheers!

Gotchas Upgrading C# Projects to Visual Studio 2012

My previous post is about my experience developing for Windows 8 RTM using Visual Studio 2012, there were BREAKING changes with Visual Studio 2012 and I feel obliged to map my own efforts for the benefit of other developers. So I’m going straight to the topic. Most of the content is NOT from my own experience, I put them down here to compile the gotchas.

Everything you see here is actually available as a whitepaper published by Microsoft here.

From Shai Raiten’s excellent guide here, find and replace all namespaces found below to the value on its right:

Type Old Value New Value Comment
Class PointerEventArgs PointerRoutedEventArgs
Style FocusVisualBlackStrokeBrush FocusVisualBlackStrokeThemeBrush
Style FocusVisualWhiteStrokeBrush FocusVisualWhiteStrokeThemeBrush
Style ApplicationHoverTextBrush ApplicationPointerOverForegroundThemeBrush
Style ApplicationPressedTextBrush ApplicationPressedForegroundThemeBrush
Style ControlDisabledTextBrush ButtonDisabledForegroundThemeBrush
Style ApplicationSecondaryTextBrush ApplicationSecondaryForegroundThemeBrush
Style AppBarItemForegroundBrush AppBarItemForegroundThemeBrush
Style AppBarItemBackgroundBrush AppBarItemBackgroundThemeBrush
Style AppBarItemHoverBackgroundBrush AppBarItemPointerOverBackgroundThemeBrush
Style AppBarItemHoverForegroundBrush AppBarItemPointerOverForegroundThemeBrush
Style AppBarItemForegroundBrush AppBarItemForegroundThemeBrush
Style AppBarItemPressedForegroundBrush AppBarItemPressedForegroundThemeBrush
Style AppBarItemDisabledForegroundBrush AppBarItemDisabledForegroundThemeBrush
Style BackButtonBackgroundBrush BackButtonBackgroundThemeBrush
Style BackButtonGlyphBrush BackButtonForegroundThemeBrush
Style BackButtonPressedGlyphBrush BackButtonPressedForegroundThemeBrush
Style BackButtonHoverBackgroundBrush BackButtonPointerOverBackgroundThemeBrush
Style BackButtonHoverGlyphBrush BackButtonPointerOverForegroundThemeBrush
Style BackButtonGlyphBrush BackButtonForegroundThemeBrush
Style ListViewItemPlaceholderRectBrush ListViewItemPlaceholderBackgroundThemeBrush
Style ListViewItemOverlayBackgroundBrush ListViewItemOverlayBackgroundThemeBrush
Style ListViewItemOverlayTextBrush ListViewItemOverlayForegroundThemeBrush
Style ListViewItemOverlaySecondaryTextBrush ListViewItemOverlaySecondaryForegroundThemeBrush
Method this.Dispatcher.InvokeAsync( this.Dispatcher.RunAsync

My own additions:

Type Old Value New Value
XAML VerticalScrollMode=”Rails” IsVerticalRailEnabled=”True”

If you also wanna help with your experience, please leave a comment and I will update the list accordingly.


photo of Batista Batista R Harahap [email protected]
Jl. Bango II/29C, Pondok Labu
Cilandak , DKI Jakarta , 12450 Indonesia
62817847023

This hCard created with the hCard creator.