Archive for February, 2012
Quantum Computers @ IBM
Optimizing Collatz Sequence with Dynamic Programming
Even though Kurtz and Simon proved that a natural generalization of the Collatz problem is algorithmically undecidable, it is still fairly easy to brute force the 3n+1 conjecture with large values of n and empirically see it converge. Project Euler’s problem 14 queries about which starting number, under one million, produces the longest sequence?
Since Premature optimization is often considered root of all evil, a brute force way to approach this problem would be as simple as follows.
private static void CollatzSequence (ulong max)
{
ulong startingNum = 0;
ulong longestSequence = 0;
for (ulong i = 1; i < max; i++)
{
ulong member = i;
ulong sequenceCnt = 0;
while (member != 1)
{
if (member%2 == 0)
member = member/2;
else
member = 3*member + 1;
sequenceCnt++;
}
if (longestSequence < sequenceCnt)
{
longestSequence = sequenceCnt;
startingNum = i;
}
}
Console.WriteLine("Sequence Starter:" + startingNum + " - Longest Sequence=" + longestSequence);
}
The core of the algorithm is within the convergence test stated below
while (member != 1)
{
if (member%2 == 0)
member = member/2;
else
member = 3*member + 1;
sequenceCnt++;
}
This executes in 5601.2944 ms or approx 5.6 seconds resulting in starting element to be 83779 with the longest sequence 524 elements. Not bad but this can definitely be optimized.
By looking at the pattern of sequence generated, it’s clear that there is constant repetition or cycle which occurs within the sequence before converging to 1. Therefore, if we can break down this execution into segments by storing the length of repeated cycles, these segments can then be “appended” later to count the total steps in the convergence path. This is essentially the application of dynamic programming which is a very powerful algorithmic paradigm. In DP, a problem is solved by identifying a collection of sub problems (segments) and handling them one by one, smallest first, using the answers to small problems to help figure out larger ones. Some good dynamic programming problems can be found here.
In order to record and retrieve the earlier steps as segments, I decided to use a simple hashtable. This hashtable will store an element in sequence as a key and the count of elements in the path as value. Therefore, whenever a familiar recurrence happen, we can easily find the segment and stop the processing to move on.
if (dynPrg.ContainsKey(member))
{
sequenceCnt += ulong.Parse(dynPrg[member].ToString());
member = 1;
}
By using this approach, we get the same results in about 4 seconds however there is a time-space trade-off. There are approx 91K entries in the hashtable now; Hashtable can be pruned for further optimization. The complete listing follows.
private static void CollatzSequenceDP(ulong max)
{
ulong startingNum = 0;
ulong longestSequence = 0;
var dynPrg = new Hashtable();
for (ulong i = 1; i < max; i++)
{
ulong member = i;
ulong sequenceCnt = 0;
ulong firstterm = 0;
while (member != 1)
{
if (member % 2 == 0)
member = member / 2;
else
member = 3 * member + 1;
if (firstterm == 0)
firstterm = member;
if (dynPrg.ContainsKey(member))
{
sequenceCnt += ulong.Parse(dynPrg[member].ToString());
member = 1;
}
sequenceCnt++;
}
if (longestSequence < sequenceCnt)
{
longestSequence = sequenceCnt;
startingNum = i;
}
if (!dynPrg.ContainsKey(firstterm))
dynPrg.Add(firstterm, sequenceCnt-1);
}
Console.WriteLine("Sequence Starter:" + startingNum + " - Longest Sequence=" + longestSequence + " " + dynPrg.Count);
}
Comparing the results of regular execution and one with dynamic programming approach, we see in the graph below that DM easily out performs unoptimized brute force in time as the execution path gets longer. Following graph is plotted with over 1000 data points for values of n ranging from 10-1000000 step 1000.
Further improvements include parallelization of the algorithm and introducing prime sieves. I will be revisiting it with the Parallel.For construct.
Last but not least, speaking of Dynamic Programming, let’s not confuse this with term with computer programming. The term was chosen by Richard Bellmen, the inventor of paradigm to describe schedule. The idea has its roots in mathematics and is a general technique for solving complex optimization problems which can be decomposed into smaller overlapping sub problems.
Thriving with Paradigm Shifts (or how not to become a dinosaur)
Like most production systems, life happens in real time; and there has been several major paradigm shifts happening lately. It’s easy to shrug AJAX off as revamped XMLHTTP from late 90′s but we all know this isn’t exactly the case. With the thought process, frameworks and development around emerging technologies, sands are shifting faster than ever before.
For an outside viewer, changes are huge. What follows is not absolute technology-counterpart-comparison but rather area under the drift. Traditional RDBMS to NoSQL, traditional Web Servers to Node and nginx, from typesafe languages to dynamic typing, tradional MVC (spring/asp.net) to full stack scaffoldings (RoR), HTTP/SOAP to REST/JSON, postbacks to async, scaling up to Hadoop, ACID to CAP, proprietary data centers to cloud and the list goes on. Again, this list is not necessarily an absolute comparison of technology counterparts but these analogies have definitely come a long way since Tim O’Reilly did his famous what is web 2.0 definition and table (see below) in 2005.
| Web 1.0 | Web 2.0 | |
|---|---|---|
| DoubleClick | –> | Google AdSense |
| Ofoto | –> | Flickr |
| Akamai | –> | BitTorrent |
| mp3.com | –> | Napster |
| Britannica Online | –> | Wikipedia |
| personal websites | –> | blogging |
| evite | –> | upcoming.org and EVDB |
| domain name speculation | –> | search engine optimization |
| page views | –> | cost per click |
| screen scraping | –> | web services |
| publishing | –> | participation |
| content management systems | –> | wikis |
| directories (taxonomy) | –> | tagging (“folksonomy”) |
| stickiness | –> | syndication |
For computer scientists and software engineers with sound foundation in algorithms, operating systems, database systems (including distributed databases), computer networking and enterprise architecture, the situation isn’t quite hostile. There is no need to feel threatened by falling behind the learning curve due to lack of fundamental understanding and aptitude. However one should definitely be concerned if it’s due to, in Seth Godin’s terms, lizard brain laziness in learning and prototyping about these up and coming changes.
The fundamentals of distributed databases hasn’t changed much; however the ever-networked nature of world wide web has made it more plausible then ever to use them in practice. To draw an analogy from OSI model, majority of delta is happening in the application layer. This learning curve can only be overcome through training activities learning and persistent efforts. The same way we have avoided architectural astronauts and anti-patterns by writing code and optimizing it to solve real business and technology problems, it would be bridged by coding up mapreduce or improve technical vocabulary while idling through yet another View Model implementation. At least that’s how I envision it.
“Once a new technology rolls over you, if you’re not part of the steamroller, you’re part of the road.”-Stewart Brand
Feb 15th – Learn Command Query Responsibility Segregation and Mercurial
San Gabriel Valley .NET Developers group’s next meeting is Wed Feb 15th. Please swing by to learn more about Command Query Responsibility Segregation and Mercurial source control usage in a team environment. See details below or on the sgv.net user group website
Abstract: Command Query Responsibility Segregation (CQRS) is an approach, a mindset to reduce complexities in application development. In its essence, CQRS is the creation of two objects where there was previously only one. CQRS is a very simple pattern that enables opportunities for architecture that may otherwise not exists. I will discuss what CQRS is in its simplest form and how it can be used to; capture business intent, data warehousing, event sourcing, and help reduce application complexity. After giving a brief overview of CQRS, I walk through developing a simple application using DDD and CQRS.
Abstract: Working with Mercurial in a team
Mercurial is a cross-platform, distributed revision control tool for software developers. Mercurial’s major design goals include high performance and scalability, decentralized, fully distributed collaborative development, robust handling of both plain text and binary files, and advanced branching and merging capabilities, while remaining conceptually simple. This discussion will give an introduction to Mercurial, as well as the client tools fully integrated in Visual Studio, and various work flows that can be used with Mercurial in your team.
On Unit testing Bentley’s binary search overflow condition
One of the few unit tests one would write for a simple arithmetic operation such as addition is a check for overflow i.e. if the resulting value is of the operand type and therefore does not cause overflow error. More often than not, programmers mistakenly consider this issue to be GC’s responsibility in case of managed code languages (C#, Java). The examples below are in C# adapted from java implementations in Joshua Bloch’s post.
A classic case of uncaught overflow which remain undiscovered for 20 years was in Jon Bentley’s amazing Programming Pearls book implementation of binary search. (The book by the way is a highly recommended reading for every engineer!). The erroneous code follows.
public static int BinarySearch(int[] a, int key)
{
int low = 0;
int high = a.Length - 1;
while (low key)
high = mid - 1;
else
return mid; // key found
}
return -(low + 1); // key not found.
}
The bug resides in the addition line #6 where int mid = (low + high) / 2; i.e. in case of int.Max. Modern compilers are intelligent enough to detect these errors during compile time
So for example, all the following three statements will “Overflow in constant value computation”
int a = int.MaxValue + 1;
var a = int.MaxValue + 1;
dynamic a = int.MaxValue + 1;
However, the compiler would not protect you from shooting your own foot if you insist on doing things like
int max = int.MaxValue;
int a = max + 1;
In this case, what you get in the value a is -2147483648 which is int.MinValue. If developer thinks var would protect him, think again because CLI considers int+int as int. Therefore, the following will result in int.MinValue as well.
int max = int.MaxValue;
var a = max + 1;
as well as
dynamic a = max + 1;
Now get back to the original topic of binary search overflow. A simple test to get this value would be
BinarySearch(Enumerable.Range(0, int.MaxValue - 1).ToArray(), new Random().Next());
where you would create a large array of pseudo random numbers. However, an effort to create an array of size in.Max would result in an out of memory exception. A test method stub looks like follows.
[TestMethod()]
public void BinarySearchTest()
{
int[] a = Enumerable.Range(0, int.MaxValue - 1).ToArray();
int key = 0; // TODO: Initialize to an appropriate value
int expected = 0; // TODO: Initialize to an appropriate value
int actual;
actual = Program.BinarySearch(a, key);
Assert.AreEqual(expected, actual);
Assert.Inconclusive("Verify the correctness of this test method.");
}
Trying to simulate this with shorts is still hard because counter intuitively, the sum of shorts is actually an integer. Therefore, the following statement results in a compile time error.
short c = 10, d = 10;
short e = c + d;
Therefore you cannot simulate the condition unless you cast it back
short mid = (short)((low + high)/2);
which kind of makes it a moot point anymore. And this is the reason why this bug has manifested itself without resolution for such a long time in a popular algorithm like binary search. The fixed version substitutes
short mid = (short)((low + high)/2);
with
int mid = low + ((high - low) / 2);
which are both functionally equivalent but the second statement does not go out of bounds since the subtraction operation along with division guarantees the int.Max bound.
public static int BinarySearchFixed(int[] a, int key)
{
int low = 0;
int high = a.Length - 1;
while (low key)
high = mid - 1;
else
return mid; // key found
}
return -(low + 1); // key not found.
}
It can be optimized by using the bitwise shift operator >> in C# for dividing by 2. Since we don’t have an unsigned shift, there would be some casting needed to unsigned int essentially increasing the range.
public static int BinarySearchOptimized(int[] a, int key)
{
int low = 0;
int high = a.Length - 1;
while (low <= high)
{
int mid = (int)(((uint) high + (uint) low) >> 1);
int midVal = a[mid];
if (midVal < key)
low = mid + 1;
else if (midVal > key)
high = mid - 1;
else
return mid; // key found
}
return -(low + 1); // key not found.
}
Is this perfect? May be. How about uint to int casting here, is this downsizing going to work. In most cases, it would. In order to test, the empirical testing and formal methods would help but nothing is guaranteed and hence the iterative process of static code analysis and peer code reviews.
BinarySearch VS.NET 2010 Solution Download
Developer’s Contest hosted by UCLA Anderson School of Management
WHAT:The Developer’s Contest hosted by the Entrepreneur Association @ the UCLA Anderson School of Management
HOW IT WORKS:You and a team (up to three people) develop a business idea from scratch over a single weekend to win cash prizes.
WHEN:February 24 – February 26
WHY:
- Develop a business idea from inception
- Show off your programming talent
- Meet potential employers and aspiring entrepreneurs
- Develop your resume
Format in brief:
Up to 10 UCLA Anderson students will give a 1-minute pitch of their business idea to a group of developers at the Developer Contest kickoff on Friday. After the developers have heard all 10 pitches, they will then vote for the idea that they want to develop that weekend.
After the vote, the pitching Anderson student who received the most votes will have 20-30 minutes to answer Q&A with the developers about the idea and go into further detail. The developers may form teams of up to 3 members, with the option of selecting an Anderson student as a consultant.
The developer teams are given the weekend to develop a prototype and to prepare a presentation of their final product. Sunday evening each developer team will present to a board of judges. The winning team/developer will be selected by the judges and will receive cash and/or other prizes.
UCLA Anderson School of Management, 110 Westwood Plaza, Los Angeles, CA 90095
MORE INFORMATION AND RSVP ONLINE: http://bit.ly/
How to tweet (or blog) in Urdu
Gone are the days of inpage; now writing Urdu universally in unicode is rather easy on both mac and PC (and Linux). Yesterday I was asked by a friend regarding how to tweet in Urdu. Here is a short 3 step guide without reinventing the wheel.
1. If you are not familiar with Urdu phonetic keyboard, the quickest way is to use Google Transliterate http://www.google.com/transliterate/Urdu
2. Urdu keyboard is quite easy to learn. If you want native support of Urdu in your operating system, follow the steps according to your OS.
Since I use both MAC and PC for native urdu support, I have tried and tested both the mechanisms and they work just fine.
3. If you are using transliteration, you would copy and paste the Urdu text into your favorite twitter / blogging client. Not all clients support unicode; tweetdeck didn’t use to but twitter.com does support Unicode.
Happy tweeting / blogging.
Session Notes – Practical AppFarbic @ Southern California .NET Developers Group
Last night I presented on appfabric at Southern California .NET Developers Group in Buena Park. This talk was an expanded version of my earlier talk in the code camp talk last weekend. I get a chance to talk a little more about network topology and enterprise load balancing scenarios where appfabric caching and session management really helps. I also touched upon few topics including AppFabric Caching Admin tool, Concurrency Models (Windows Server AppFabric Caching), Windows Server AppFabric Caching Concepts, Windows Server AppFabric Caching Logical Architecture, Windows Server AppFabric Caching Physical Architecture and Concepts and Architecture for app fabric design and deployment. My recently submitted tip on code project regarding Windows Server AppFabric Service Validation was also demonstrated.
Last but not least, one of the attendees brought up an excellent question of how to handle HIPAA and PCI compliant data in the cloud. To the best of my knowledge, based on my last conversations at the cloud summit in LA, the best approach is to do a hybrid cloud implementation i.e. public cloud CDN Style for the public facing sites while keep the sensetive data in-house where your internal data center is PCI/HIPAA compliant. Feel free to check with Lynn since she has been following this area closely.
Thanks to the great audience including celebrities like Jeremy Clark . Special thanks to Art Villa and Janet Chung for the speaking opportunity. For links and code sample, please see my previous talk.

