REXML could not parse this XML/HTML: 
<ul class="posts">
    
		<li>
			<div class="idea">
				
					<h1><a href="ruby-on-rack-datamapper-and-thin.html">Ruby on Rack, Datamapper & Thin </a></h1>
					
					<div class="postdate">14 April, 2013
						<ul>
						
						</ul>
					</div>
					
					<p>A while back I started a hobby project and wanted to try out a different tech stack, than the regular .net mvc or rails that I&#8217;m used to.</p>

<p>And I was also looking for something very lightweight. I&#8217;ve used Rack as a middleware with rails application before, but not on its own. Although I was aware of Rack being capable of serving HTTP request on its own.</p>

<p>I had a look at Sinatra as well, which is something I&#8217;ve played around briefly, might get around to writing some Sinatra based app sometime soon.</p>

<p>Ok, so starting with Rack, I had to achieve the below goals</p>

<ul>
<li>Create routes</li>

<li>Serve static content</li>

<li>Serve content from database</li>

<li>Keep the endpoints secure (https).</li>
</ul>

<h3 id='routing'>Routing</h3>

<p>Rack enables one to create a <code>Proc</code> and execute it when invoked, via <code>map</code> method.</p>

<p>An example of defining a Rack route would be something like</p>

<pre><code>map &quot;/hello&quot; do
    run Proc.new {|env| [200, {&quot;Content-Type&quot; =&gt; &quot;text/plain&quot;}, &quot;Hello world!&quot;]}
end</code></pre>

<p>If one were to put this in the <code>config.ru</code> file, fire up the app using <code>rackup</code>, it would expose <code>localhost:&lt;port&gt;/hello</code> as an endpoint that returns &#8220;Hello World&#8221;.</p>

<p>So far so good, now how do we tell Rack to do something beyond &#8220;Hello World&#8221; ?</p>

<p>I did this by creating a class that would take a request and process it. An example class that looks up a user would be like this</p>

<pre><code>class UserLookup

    def call(env)
	    load_messages	
	    @req = Rack::Request.new(env)
	    if @req.get?
		    user = User.first(:employee_id =&gt; @req.GET[&#39;employee_id&#39;])
		    return [404, {&quot;Content-Type&quot; =&gt; &quot;text/plain&quot;}, [@messages[&quot;unrecognized_user&quot;]]] if user.nil? 
		    [200, {&quot;Content-Type&quot; =&gt; &quot;text/plain&quot;}, [user.to_json(:methods =&gt; [:reserved_books, :full_name, :interests])]]
	    else
		    [405, {&quot;Content-Type&quot; =&gt; &quot;text/plain&quot;}, [@messages[405]]]
	    end
    end

    private

    def load_messages
      @messages = YAML::load(File.read(File.expand_path(&#39;config/en.yml&#39;,&#39;.&#39;)))
    end

end</code></pre>

<p>In <code>config.ru</code> add a mapping for this lookup like this</p>

<pre><code>map &quot;/user&quot; do
    run UserLookup.new
end</code></pre>

<p>Now this exposes another endpoint that would look like</p>

<pre><code>localhost:&lt;port&gt;/user</code></pre>

<p>Now, the <code>UserLookup</code> class needs to be designed to accept parameters. One way of sending the parameters is via querystring in the request. Rack request can lookup querystring using GET object.</p>

<h3 id='nested_routes'>Nested Routes</h3>

<p>Nesting routes with Rack is simple enough</p>

<pre><code>map &quot;/foo&quot; do
	map &quot;/bar&quot; do
		run FooBar.new
	end
	map &quot;/baz&quot; do
		run FooBaz.new
	end
end</code></pre>

<p>This will expose a &#8220;/foo/bar&#8221; and a &#8220;/foo/baz&#8221; endpoints.</p>

<h3 id='serving_static_content'>Serving Static Content</h3>

<p>Ok, so now we have all the requests hitting our application and each Route mapping invokes a Rack process. But like all other web applications, we have static resources to serve as well. In order to serve static content via Rack, it takes a different kind of mapping, like below -</p>

<pre><code>map &quot;/scripts&quot; do
	run Rack::Directory.new(File.expand_path(&quot;./scripts&quot;))
end

map &quot;/css&quot; do
	run Rack::Directory.new(File.expand_path(&quot;./css&quot;))
end

map &quot;/img&quot; do
	run Rack::Directory.new(File.expand_path(&quot;./img&quot;))
end</code></pre>

<p>And that&#8217;s all it takes to serve all the content for a web application.</p>

<h3 id='setup_thin_webserver'>Setup Thin webserver.</h3>

<p>Thin is a web server that is designed for higher throughput. More about Thin is available <a href='http://code.macournoyer.com/thin/'>here</a>. Just adding the <code>thin</code> gem in the <code>GemFile</code> and doing a <code>bundle install</code> was enough to setup Thin for me. By default <code>rackup</code> opens up the application using Webrick. Installing <code>Thin</code> gem launches the application using Thin.</p>

<h3 id='why_thin'>Why Thin?</h3>

<p>There are various selling points for Thin. There is a comparison between the number of concurrent requests Thin can serve versus Mongrel, Webrick etc. So high concurrency is reason good enough.</p>

<p>My motive was simple - I needed to have my website running under SSL (https), this was because my web application access the webcamera of the user using Webkit&#8217;s <code>webkitGetUserMedia</code> API, and it is annoying how everytime there is a call to this API, the browser asks for confirmation. Secure sites require confirmation only once, and the browser (Chrome) remembers the action in subsequent requests.</p>

<p>Thin comes with an option to enable SSL, so why not use it?</p>

<p>Enabling SSL over Thin is straightforward enough. As a first step I had to generate a self-signed certificate.</p>

<p>After that all I had to do was start Thin from the same directory as <code>config.ru</code> using these options</p>

<p>thin start &#8211;ssl &#8211;ssl-key-file ../cert/server.key &#8211;ssl-cert-file ../cert/server.crt</p>

<p>In absence of Thin, the other way to setup SSL is to have the application behind an <code>nginx</code> proxy and have <code>nginx</code> configured to use SSL.</p>

<h3 id='data_access__datamapper'>Data access - Datamapper</h3>

<p>I&#8217;ll not repeat stuff here, I think Datamapper&#8217;s website is elaborate enough to highlight &#8221;<a href='http://datamapper.org/why.html'>Why Datamapper?</a>&#8221;</p>

<p>However, I do want to put down my experiences using Datamapper (beyond regular table mappings). But that is possibly a different post in itself, this one has got long enough.</p>

<p>One thing I can say is I am in love with Datamapper, especially since, these days, most of my ORM time is consumed by NHibernate!</p>
					<br />
					<a href="ruby-on-rack-datamapper-and-thin.html#disqus_thread">Comments</a>
				
			</div>
		</li>
    
		<li>
			<div class="idea">
				
					<h2><a class="postlink" href="async-controller-testing-nunit-gotchas.html">Async Controller testing gotchas with NUnit</a></h2>
					<div class="postdate">27 March, 2013
						<ul>
						
						</ul>
					</div>
					<p>NUnit released version 2.6.2 with support for testing asynchronous methods. I decided to give it a shot.</p>

<p>And my trial was to write some tests for an async WebApi controller. Lets say we have a controller called <code>UserController</code>. And I want to add an action called <code>Get</code> which looks like this -</p>

<pre><code>public Task&lt;User&gt; Get(string userId)
{
        if (string.IsNullOrEmpty(userId))
        {
            throw new ArgumentException(&quot;No username specified.&quot;);
        }

        return Task.Factory.StartNew(
            () =&gt;
                {
                    var user = userRepository.GetById(userId);
                    if (user != null)
                    {
                        return user;
                    }

                    throw new HttpResponseException(Request.CreateErrorResponse(
                    	HttpStatusCode.NotFound, string.Format(&quot;{0} - username does not exist&quot;, userId)));
                });
}</code></pre>

<p>Great, so we have an action that is asynchronous. Now there are two kinds of tests that we would have to write, one for the happy path, and another to ensure that the exception is thrown with the right properties set.</p>

<p>Here&#8217;s how I ended up writing the happy path test -</p>

<pre><code>[Test]
public async void ShouldGetUserDetails()
{
    var mockUserRepository = new Mock&lt;IUserRepository&gt;();
    mockUserRepository.Setup(x =&gt; x.GetByUserName(It.IsAny&lt;string&gt;())).Returns(new User { UserName = &quot;foo&quot; });
    var userController = new UserController(mockUserRepository.Object);

    var user = await userController.Get(&quot;foo&quot;);

    Assert.IsNotNull(user);
    Assert.That(user.UserName, Is.EqualTo(&quot;foo&quot;));
}</code></pre>

<p>All good! The test passes. The only difference here is the use of <code>async</code> and <code>await</code> keywords in the test.</p>

<p>Now comes the fun part, I tried writing tests to assert on the exception part. It took some effort and <a href='http://stackoverflow.com/questions/15634542/nunit-async-test-exception-assertion'>guidance on StackOverflow</a> to figure out a way. The question has some examples of tests that I expected to work, but didn&#8217;t.</p>

<p>The question also lists out various scenarios where my attempts to write a test to assert the exception thrown wasn&#8217;t working as expected. There are explanations in the accepted answer.</p>

<p>Following the hint from the answer, I ended up writing a helper method that will help assert the exception thrown, along with specific properties to verify. Here is how I did it -</p>

<pre><code>public static class AssertEx
{
    public static async Task ThrowsAsync&lt;TException&gt;(Func&lt;Task&gt; func) where TException : class
    {
        await ThrowsAsync&lt;TException&gt;(func, exception =&gt; { });
    } 

    public static async Task ThrowsAsync&lt;TException&gt;(Func&lt;Task&gt; func, Action&lt;TException&gt; action) where TException : class
    {
        var exception = default(TException);
        var expected = typeof(TException);
        Type actual = null;
        try
        {
            await func();
        }
        catch (Exception e)
        {
            exception = e as TException;
            actual = e.GetType();
        }

        Assert.AreEqual(expected, actual);
        action(exception);
    }
}</code></pre>

<p>And this is how I use it</p>

<pre><code>[Test]
public async void ShouldThrow404WhenNotFound()
{
    var mockUserRepository = new Mock&lt;IUserRepository&gt;();
    mockUserRepository.Setup(x =&gt; x.GetByUserName(It.IsAny&lt;string&gt;())).Returns(default(User));
    var userController = new UserController(mockUserRepository.Object) { Request = new HttpRequestMessage() };

    Action&lt;HttpResponseException&gt; asserts = exception =&gt; Assert.That(exception.Response.StatusCode, Is.EqualTo(HttpStatusCode.NotFound));
    await AssertEx.ThrowsAsync(() =&gt; userController.Get(&quot;foo&quot;), asserts);
}</code></pre>

<p>Note that the asserts are now wrapped into an action and passed to <code>AssertEx.ThrowsAsync</code>.</p>
					
					<a href="async-controller-testing-nunit-gotchas.html#disqus_thread">Comments</a>
				
			</div>
		</li>
    
		<li>
			<div class="idea">
				
					<h2><a class="postlink" href="pragmatism-and-purity.html">Pragmatism and Purity</a></h2>
					<div class="postdate">29 January, 2013
						<ul>
						
						</ul>
					</div>
					<p>One of the lessons I&#8217;ve learnt from experience is that there are good systems, and there are bad systems. But they do the job for their users. The difference between good and bad comes into picture only when the systems fail to adapt or evolve at the same (if not faster pace) than the user&#8217;s needs.</p>

<p>I am motivated to write this post by some of the messages exchanged between me and another user at <a href='http://stackoverflow.com/questions/14374075/timeout-connecting-to-sql-server-express-2012#comment20060455_14374075'>StackOverflow</a>. The question admits inefficiencies in the system, and seeks a way to work within the limitations. Personal abuses aside, there is one school of thought that mandates drastic changes to fix the root cause.</p>

<p>In reality, very few root causes are easily fixable. If they were, I&#8217;d expect them to be fixed already and all conversations around them would not be happening.</p>

<p>I&#8217;ve been involved in multiple transformation projects, where the challenge is to upgrade the system landscape without the users&#8217; lives being disrupted. Martin Fowler wrote about <a href='http://martinfowler.com/bliki/StranglerApplication.html'>Strangulation Pattern</a>, where he gives out some points on how this could help in re-writing legacy applications. There is a discussion on <a href='http://stackoverflow.com/questions/1118804/application-strangler-pattern-experiences-thoughts'>StackOverflow</a> which has some good insights on the challenges and approach that can be taken to phase out an application.</p>

<p>On the practical front, there are various types of challenges, and not all of them are technical. But this is a bigger topic, and possibly worth another writeup.</p>

<h3 id='some_observations'>Some Observations</h3>

<ul>
<li>There is no perfect system. No matter how much you optimize, there is always going to be a few areas to improve upon. The main reason is Software development projects work towards a moving target - changing business.</li>

<li>A purist is someone who is tempted to do everything by the book, to write the perfect peace of software. A lot of this is driven by the personal drive to write the best piece of code. A pragmatic programmer, on the other hand, is someone who can weigh both sides and decide on a balance.</li>

<li>No one has infinite time or resource. So there is always going to be someone who draws the line on how perfect the software needs to be. Worse still, once a project is marked as complete, only a subset of the time/resource spent earlier is spent on maintaining it. Hence, there will be someone taking a call to do features or technical maintenance, and in most case, maintenance is driven by business needs.</li>
</ul>

<h3 id='in_summary'>In Summary</h3>

<p>While purism is good, and purists have a place to criticize (constructively, of course) projects, almost all of the ones I&#8217;ve come across lack something more important (at least in my mind) - empathy. A little pragmatism goes a long way in achieving the goals, but not at the cost of technical compromise. There are often decisions made, shortcuts taken which will make a purist cry. But sometimes, situations demand it. The fact that people are crying out for help indicates the urgency involved. However, part of being pragmatic also means - Track the system&#8217;s technical debt. Close the debt as soon as possible, else there is a heavy price to pay later.</p>
					
					<a href="pragmatism-and-purity.html#disqus_thread">Comments</a>
				
			</div>
		</li>
    
		<li>
			<div class="idea">
				
					<h2><a class="postlink" href="sources-for-serendipitous-learning.html">My sources for serendipitous learning</a></h2>
					<div class="postdate">29 November, 2012
						<ul>
						
						</ul>
					</div>
					<p>A few days ago, I wrote a <a href='/how-open-source-community-helps-me.html'>post</a> about how open source community help me. I&#8217;ve been thinking about it, and have realized that open source projects are one of my sources of learning what I have learnt.</p>

<p>In this post, I will try to put my thoughts together on what else has helped me.</p>

<p><em>disclaimer</em> - It&#8217;s probably worth mentioning here that I like to keep discovering things I do not know about and spend some time getting familiar with it. Quite a few topics do not maintain my interest level, a handful do. The sources, methods mentioned here are my way of discovering and learning, it may not be for everyone.</p>

<h3 id='at_work'><em>At work</em></h3>

<p><em>Pair Programming</em> - Pairing helps me at various levels. Sometimes, we end up talking about fundamentals of design, and some other times, we debate on what is a good name for a variable. What do I learn by this chit chat? - One more way of doing the same thing - Another example of what won&#8217;t work, before it actually doesn&#8217;t Pair programming isn&#8217;t just two people sitting together in front of a large screen. I have paired with folks who do their homework, and the next morning we share notes. It is twice the reading/research I can do in a night.</p>

<p><em>Community (debates/experiences)</em> - A community is as strong as the interaction. In my organization, the programmer community is very active, on a given day we discuss close to 10 topics actively.</p>

<h3 id='books'><em>Books</em></h3>

<p>This is quite obvious. Right from the school days, books have been a very reliable source of knowledge. However, in school/college, we are fortunate enough to be prescribed a list of books that we should read. In professional world, we are always hunting for a good book on a topic. It becomes a discovery, when we find a book or a chapter that is brilliant.</p>

<p>Stephen Hawking&#8217;s <a href='http://www.hawking.org.uk/a-brief-history-of-time.html'>A brief history of time</a> is one of the books that I answered a very basic question during my school days. Heisenberg&#8217;s Uncertainty Principle is one of popular principles in physics. I had read various textbooks defining what it is. But it was Hawking&#8217;s two and a half page description that actually made sense to me (if you haven&#8217;t already, I would recommend reading chapter 4 of this book.</p>

<p>Now, I wasn&#8217;t reading this book because I wanted to learn Heisenberg&#8217;s principle. In fact, I read this book after I had completed my exams on the topic. And the fact that it made me learn the principle which I knew by heart, gave me that feeling which made me want to jump.</p>

<h3 id='coursera'><em>Coursera</em></h3>

<p>Online courses have taken off in the recent past. And I have been following it as well. What is wonderful about sites like <a href='http://www.coursera.org'>Coursera</a> is that there are a variety of course and most of the courses are concise.</p>

<p>I enrolled into a a few courses. Naturally, one learning from each course is about the course itself. What&#8217;s been serendipitous for me is the perspectives and side learnings. An example, I took the Machine Learning course conducted by Prof. Andrew Ng.</p>

<p>Besides learning machine learning concepts/algorithms, I learnt to use Octave. But my a-ha moment was when Prof. Ng demonstrated how Vectorization can help optimize nested for loops.</p>

<h3 id='internet'><em>Internet</em></h3>

<p>Internet obviously is a wealth of knowledge. A few resources that I cherish/follow</p>

<ul>
<li>Ben Pierce&#8217;s collection of articles on <a href='http://www.cis.upenn.edu/~bcpierce/courses/670Fall04/GreatWorksInPL.shtml'>Great Works in Programming Languages</a></li>

<li><a href='http://news.ycombinator.com'>Hacker News</a></li>

<li><a href='http://lambda-the-ultimate.org/'>Lambda, the ultimate</a></li>
</ul>
					
					<a href="sources-for-serendipitous-learning.html#disqus_thread">Comments</a>
				
			</div>
		</li>
    
		<li>
			<div class="idea">
				
					<h2><a class="postlink" href="your-search-can-be-only-as-good-as-your-data.html">Your search can be only as good as your data</a></h2>
					<div class="postdate">27 November, 2012
						<ul>
						
						</ul>
					</div>
					<p>Very recently I was involved in a project that required us to implement a search feature. The setup is quite simple, and generic:</p>

<p>1) An Administration Application - Responsible for authoring/creating products and its attributes.</p>

<p>2) A catalogue application that also powers the public facing website, which has details about products.</p>

<p>In this project, the product managers have been consolidating Product details in their respective areas for a while (a few years). We were tasked with retrofitting the search engine to use the data.</p>

<p>Seems simple enough ? Here are few points that I learnt from this task.</p>

<h3 id='data_issues_can_exist_at_all_level'>Data issues can exist at all level</h3>

<h3 id='storagedesign'><em>Storage/Design</em></h3>

<p>Bad data isn&#8217;t just about <strong><em>wrong data types</em></strong>. Number stored as string, data-time woes combined with time zone complexities can be a data warehouse nightmare. Often, we had to add ad-hoc conditions to the data extraction logic.</p>

<p><strong><em>Data integrity</em></strong> is another case, where legacy databases which have grown in an ad-hoc fashion, often as a result of silo-ed application development. This causes cross database lookup, often not protected by foreign keys and such.</p>

<p>With such atomic databases being out of sync, data is duplicated and not in it&#8217;s entirety. So <strong><em>de-duplication</em></strong> of data is one of the overheads.</p>

<h3 id='human_error'><em>Human Error</em></h3>

<p>Most data collection mechanisms involve manual entry. <strong><em>Mis-spelling, mis-tagging, mis-classification etc. can have catastrophic effect</em></strong> in the long run. The effect is magnified when dependencies on such data get added. Not only the entity itself is polluted, but it ends up affecting the entire chain.</p>

<p>Humans can be easily misguided, and lack of training can only make it worse. In some cases, excessive tagging is perceived as something that would boost up the product. Product managers have been religiously doing this. What they do not realize is <strong><em>too much data is noise</em></strong>, it is pollution.</p>

<h3 id='conclusion'>Conclusion</h3>

<p>Bad data can be caused due to bad design or poor data capture mechanism. Sometimes, bad data can be curated by manual tasks. Some other times, there are patterns of bad data, and some automated tasks can help clean it up. However, there is a good chance that the data is not cleaned up at all.</p>

<p>No matter what tool/technology one looks at (be it Search tools like SOLR, or other data-warehouse tools) , the bottleneck in getting the right Search (or any analytics) is going to be data quality. These tools are mature enough to be close to perfect, given the perfect dataset. So, whenever there is a mismatch in expected v/s actual behaviour, I would first look at the data. Most problems/bugs that I have faced implementing a Search is because of bad data.</p>
					
					<a href="your-search-can-be-only-as-good-as-your-data.html#disqus_thread">Comments</a>
				
			</div>
		</li>
    
</ul>

OLDER

Web Analytics