Sanstech

Ideas, Knowledge, Technology, Computer Science, Experience associated with my work and some geeky stuff I progressively encounter during my journey towards enlightenment. Read on…

  • RSS RSS Feed

    • The Pragmatic Programmer
      I finished reading The Pragmatic Programmer by Andrew Hunt and David Thomas. It’s not a new book in the market but I was curious to read this. The technology topics covered, are not any different from those found in most software engineering books, but the way they’re presented using Pragmatic Philosophy Approach, is remarkable. Code […]
    • 2013 in review
      The WordPress.com stats helper monkeys prepared a 2013 annual report for this blog. Here’s an excerpt: A San Francisco cable car holds 60 people. This blog was viewed about 1,200 times in 2013. If it were a cable car, it would take about 20 trips to carry that many people. Click here to see the […]
    • Goodbye, Ness!
      It had to happen sometime. I thought Feb 2013 was the right time. I quit Ness after a long 5 years and 4 months of stay, in Feb. I joined FICO (formerly, Fair Isaac) last Feb.  While I get an opportunity to work with many varied stakeholders like Scientists, Architect, Product Management, Peer Developers, PMO, Technical Publications and also […]
    • Meta: information retrieval from Tweets
      I pick significant problems randomly sometimes and enjoy solving them, or at least attempt designing api :-). Here’s one such problem! Problem: How’d you go about finding meta information about a person’s tweets? NOTE: a) Tweet == Twitter updates b) Meta information –> Loosely defined. You can interpret it anyway you want –> Frequency, topics, follower […]
    • Understanding Big Data
      It’s been a while, since I last posted! To keep this rolling (I’m hardly getting any time to post my own articles or stuff about my experiments these days 😦  ), I just wanted to share this ebook on Big Data titled  Understanding Big Data: Analytics for Enterprise Class and Streaming Data. Cheers!
  • Twitter Updates

Number to Text Representation Problem

Posted by sanstechbytes on April 22, 2012

Problem:

Given a number (up to a billion), display the string (text) representation of it. For eg.: 123456789 should be represented as “One Hundred and Twenty Three Million Four Hundred and Fifty Six thousand Seven Hundred and Eighty Nine. ‘and’ should only be used after hundred.

Solution:

Approach 1: In a OO approach, a number can be represented by Number class. Each Number object has a list of Triplet objects (composition). Triplet represents a group of max 3 digits and can’t exist without a Number.  With this approach, it’s easy to accommodate different text representation systems like British English International Unit (million etc), British English Indian Units (lac, crore).

I wrote the classes (see below) for both approaches including JUnit Tests. I’ve captured the outputs for sample runs in the project folder that you can download here.

Approach 2: Assume Text representation in British English. Think of converting 123456789 into “One Hundred and Twenty Three Million Four Hundred and Fifty Six thousand Seven Hundred and Eighty Nine.

So, convert(123456789) = convert(123) + “million” + convert(456) + “thousand” + convert(789). [Courtesy: Gayle Laakmann of careercup.com]. This makes use of the technique of recursion. A lot of new code has to be written to extend this behavior though. There’re too many special cases to handle.

Code: Approach 1:


package numbertextoop;

import java.util.LinkedList;
import java.util.List;

import utils.NumberTextUtils;

/**
* This class represents a given number. Can be used to display in British
* English phrase the number Eg: 1234 in British English phrase: One Thousand
* Two Hundred and Thirty Four This class can accommodate text representation
* types like British, Indian etc with the constructor provided 04/14/2012 @author
* sanjeev 04/19/2012 @author sanjeev
*
* @version 1.1
*/

public class Number {

private int num;
private List<Triplet> triplets;

/**
* Assume default text representation type as British
*/
public Number(int i) {

this.num = i;

/**
* Handle negative number
*/
if (i < 0) {
i = -1 * i;
}

int length = Integer.toString(i).length();

if (length % NumberTextUtils.THREE == 0) {
length = length / NumberTextUtils.THREE;
} else if (length % NumberTextUtils.THREE > 0) {
length = length / NumberTextUtils.THREE + 1;
}

this.triplets = new LinkedList<Triplet>();
buildTriplets(i, length);
}

/**
* Allows for specifying text representation system, British, Indian etc
* text representationType could be enum - TextRepresentationType.BRITISH,
* TextRepresentationType.INDIAN. In Case of 'Indian', constants like Lac,
* Crore, should be added appropriately in NumberTextUtils.java
*/
public Number(int i, String representationType) {

}

/**
* @param num
* @param length
*            Builds triplet objects and arranges the position (hundredth,
*            tenth, unit) of the digits forming a triplet
*/
private void buildTriplets(int num, int length) {
while (length != 0) {
int tripletValue = num % 1000;

Triplet triplet = new Triplet(tripletValue);
triplet = triplet.matchTripletPlaces(tripletValue);

System.out.println("triplet" + triplet.getHundredthPlace() + "."
+ triplet.getTenthPlace() + ":" + triplet.getUnitPlace());
((LinkedList<Triplet>) this.triplets).addFirst(triplet);

num = num / NumberTextUtils.THOUSAND;
length--;
}
}

/**
* String representation of this number
*
*/
public String toString() {

/**
* When num = 0, returns "Zero" or if accessed thro
*
*/
if (this.num == 0) {
return NumberTextUtils.ZERO_TEXT;
}

StringBuilder numTextBuilder = new StringBuilder();

/**
* Negative number
*
*/
if (this.num < 0) {
numTextBuilder.append("Negative ");
}

return constructEnglishText(this.triplets, numTextBuilder).toString();
}

/**
* Constructs the British English text from this group of triplets formed
* from
*
* @param num
* @param triplets
*            List
*/
private StringBuilder constructEnglishText(List<Triplet> triplets,
StringBuilder numTextBuilder) {
int mapIndex = triplets.size();

for (Triplet triplet : triplets) {
numTextBuilder = numTextBuilder.append(triplet.toString()).append(
" ");

/**
* When num = 1,000,000: print only one million
*/
if (triplet.getHundredthPlace() == 0
&& triplet.getTenthPlace() == 0
&& triplet.getUnitPlace() == 0
&& mapIndex < this.triplets.size()) {
numTextBuilder = numTextBuilder.append("");
} else {
numTextBuilder = numTextBuilder.append(
NumberTextUtils.tripletMap().get(mapIndex)).append(" ");
}

mapIndex--;
}

return numTextBuilder;
}

public int getNum() {
return num;
}

public String getText() {
return this.toString();
}

public List<Triplet> getTriplets() {
return triplets;
}

}

 

package numbertextoop;

import utils.NumberTextUtils;

/**
* This class represents a group of digits up to a maximum of 3 that form a
* given number. 04/14/12 @author sanjeev 04/19/12 @author sanjeev
*
* @version 1.1
*/
public class Triplet {

private int hundredthPlace;
private int tenthPlace;
private int unitPlace;
private int value;

public Triplet(int tripletGroupNum) {
this.value = tripletGroupNum;
}

/**
* Arranges the place of digits for hundredth, tenth and unit positions in
* this triplet
*
* @param tripletValue
* @return Triplet
*/
public Triplet matchTripletPlaces(int tripletValue) {
this.setHundredthPlace(tripletValue / NumberTextUtils.HUNDRED);

tripletValue = tripletValue % NumberTextUtils.HUNDRED;
this.setTenthPlace(tripletValue / NumberTextUtils.TEN);

tripletValue = tripletValue % NumberTextUtils.TEN;
this.setUnitPlace(tripletValue);

return this;
}

/**
* Returns the string representation of this triplet
*
* @see java.lang.Object#toString()
*/
public String toString() {
return constructTripletText().toString();
}

/**
* @return tripletText StringBuilder
*/
private StringBuilder constructTripletText() {

StringBuilder tripletText = new StringBuilder();

tripletText = appendHundredthPlaceText(tripletText);

/** 'Teen' candidate check: 13 -> Thirteen, 11 -> Eleven */
StringBuilder teenCandidateBuilder = new StringBuilder();
String teenCandidate = teenCandidateBuilder.append(
String.valueOf(this.tenthPlace)).append(
String.valueOf(this.unitPlace)).toString();
int teenCandidateValue = Integer.parseInt(teenCandidate);

tripletText = appendTenthPlaceText(tripletText, teenCandidateValue);

tripletText = appendUnitPlaceText(tripletText, teenCandidateValue);

return tripletText;
}

/**
* Unit place text candidates Unit place : Five, Zero, Two etc
*/
private StringBuilder appendUnitPlaceText(StringBuilder tripletText,
int teenCandidate) {
/** Print unit place text: 1 -> one, 9 -> nine etc */
if (this.unitPlace != 0) {
if (isATeenNumber(teenCandidate)) {
return tripletText;
}

tripletText = appendDigitText(tripletText, this.unitPlace);
}

return tripletText;
}

/**
* Tenth place and 'Teen' text candidates Tenth place : Five, Eleven,
* Thirteen etc.
*/
private StringBuilder appendTenthPlaceText(StringBuilder tripletText,
int teenCandidate) {
if (this.tenthPlace != 0) {
if (isATeenNumber(teenCandidate)) {
tripletText = tripletText
.append(NumberTextUtils.teens[this.unitPlace]);
} else {
tripletText = tripletText
.append(NumberTextUtils.tens[this.tenthPlace - 1]);
}
tripletText.append(" ");
}

return tripletText;
}

/**
* Hundredth place text candidates Hundredth place : Two Hundred, Two
* Hundred and etc.
*/
private StringBuilder appendHundredthPlaceText(StringBuilder tripletText) {
if (this.hundredthPlace != 0) {
if (this.tenthPlace == 0 && this.unitPlace == 0) {
tripletText = appendDigitText(tripletText, this.hundredthPlace)
.append(" Hundred ");
} else {
tripletText = appendDigitText(tripletText, this.hundredthPlace)
.append(" Hundred and ");
}
}

return tripletText;
}

/**
* @param tripletText
* @param digit
* @return
*/
public StringBuilder appendDigitText(StringBuilder tripletText, int digit) {
tripletText.append(NumberTextUtils.digits[digit]);
return tripletText;
}

/**
* @param teenCandidate
* @return true if the number belongs to [11, 12, 13, 14, 15, 16, 17, 18,
*         19]
*/
public boolean isATeenNumber(int teenCandidate) {
return (teenCandidate >= NumberTextUtils.ELEVEN && teenCandidate < NumberTextUtils.TWENTY);
}

public int getValue() {
return value;
}

public int getHundredthPlace() {
return hundredthPlace;
}

public void setHundredthPlace(int hundredthPlace) {
this.hundredthPlace = hundredthPlace;
}

public int getTenthPlace() {
return tenthPlace;
}

public void setTenthPlace(int tenthPlace) {
this.tenthPlace = tenthPlace;
}

public int getUnitPlace() {
return unitPlace;
}

public void setUnitPlace(int unitPlace) {
this.unitPlace = unitPlace;
}

}

Code: Approach 2:

package numbertextalgo;

import utils.NumberTextUtils;

/**
* This class represents a given number in British English phrase Eg: 1234 in
* British English phrase: One Thousand Two Hundred and Thirty Four
*
* @author sanjeev, 14-Apr-2012
* @version 1.0
*/

public class NumberText {

public NumberText() {

}

public static String numberToString(int number) {
if (number == 0) {
return NumberTextUtils.ZERO_TEXT;
} else if (number < 0) {
return NumberTextUtils.NEGATIVE_TEXT + numberToString(-1 * number);
}

int count = 0;
String str = "";

while (number > 0) {
if (number % NumberTextUtils.THOUSAND != 0) {
str = numberToString100(number % NumberTextUtils.THOUSAND)
+ NumberTextUtils.bigs[count] + " " + str;
number /= NumberTextUtils.THOUSAND;
count++;
}
}

return str;
}

private static String numberToString100(int number) {
String str = "";

/** Convert hundredth place */
if (number >= NumberTextUtils.HUNDRED) {
str += NumberTextUtils.digits[number / NumberTextUtils.HUNDRED - 1]
+ " Hundred and ";
number %= NumberTextUtils.HUNDRED;
}

/** Convert NumberTextUtils.tens place */
if (number >= NumberTextUtils.ELEVEN
&& number <= NumberTextUtils.NINETEEN) {
return str + NumberTextUtils.teens[number - NumberTextUtils.ELEVEN]
+ " ";
} else if (number == NumberTextUtils.TEN
|| number >= NumberTextUtils.TWENTY) {
str += NumberTextUtils.tens[number / NumberTextUtils.TEN - 1] + " ";
number %= NumberTextUtils.TEN;
}

/** Convert ones place */
if (number >= NumberTextUtils.ONE && number <= NumberTextUtils.NINE) {
str += NumberTextUtils.digits[number - 1] + " ";
}

return str;
}
}

 

package utils;

import java.util.HashMap;
import java.util.Map;

/**
* This class is used as a utility class for representing a number in text
* 04/14/12 @author sanjeev 04/19/12 @author sanjeev
*
* @version 1.1
*/
public class NumberTextUtils {

public static String[] digits = new String[] { "Zero", "One", "Two",
"Three", "Four", "Five", "Six", "Seven", "Eight", "Nine" };
public static String[] teens = new String[] { "", "Eleven", "Twelve",
"Thirteen", "Fourteen", "Fifteen", "Sixteen", "Seventeen",
"Eighteen", "Nineteen" };
public static String[] tens = new String[] { "Ten", "Twenty", "Thirty",
"Forty", "Fifty", "Sixty", "Seventy", "Eighty", "Ninety" };
public static String[] bigs = new String[] { "", "Thousand", "Million",
"Billion" };

public static final String NEGATIVE_TEXT = "Negative";
public static final String ZERO_TEXT = "Zero";

public static final int ONE = 1;
public static final int THREE = 3;
public static final int NINE = 9;
public static final int TEN = 10;
public static final int TWENTY = 20;
public static final int ELEVEN = 11;
public static final int NINETEEN = 19;
public static final int HUNDRED = 100;
public static final int THOUSAND = 1000;

public static Map<Integer, String> tripletGroupMap = new HashMap<Integer, String>();

static {
tripletGroupMap.put(1, "");
tripletGroupMap.put(2, "Thousand");
tripletGroupMap.put(3, "Million");
tripletGroupMap.put(4, "Billion");
}

public static Map<Integer, String> tripletMap() {
return tripletGroupMap;
}
} 

 

package numbertextoop;

/**
* Test Class with main() for Number.java
*
* @author sanjeev
*
*/
public class TestNumber {

public static void main(String[] args) {
int num = 1000000000;
Number number = new Number(num);

sop(num + " in British English Phrase using OOP: " + number.getText());
}

private static void sop(Object o) {
System.out.println(o);
}

}

Posted in Java | Leave a Comment »

Researching Research…

Posted by sanstechbytes on March 31, 2012

In a world of rapidly developing technologies, the citizens that shape up the technology landscape require a combination of at least a two or more skills:

a)  Hard (Technical) Skills
b)  Soft Skills (Leadership, Motivation, Emoti. Maturity, Communication etc.)
c)  Ability to acquire new skills.

I strongly believe that the third skill  above, is significantly enhanced by research and have experienced it myself. Given the complexity of nature of interaction of components, different subsystems and speed with which the technology landscape evolves or changes, the ability to acquire new skills to relate to that landscape, is a key attribute for anyone.

What is research after all? We often hear the word ‘research’ being used in R&D organizations and academia, and also not so often in Service Organizations. In a very literal sense, in R&D organizations and academia, research mostly refers to the first-class research that’s driven by the the sheer new idea, and the efforts converging to a well-defined charter and eventually, resulting in filing of a patent, in publications, in research journals etc. It’s mostly used when there’s a talk about the work based on some innovative idea. As Mihaly Csikszentmihalyi puts it, In a very literal sense, when the work done during research draws attention of or is submitted to, the body constituted by a group of people registered as experts in the that field, that have the authority to approve or disapprove that work. In Service Organizations, it’s used in the context of prototyping task with lot of unknowns to start with, in the context of troubleshooting an issue or improving performance of a system. One of the myths, is that, research is an activity with no realistic goal or objective.

As Uday Khedkar of IIT Bombay, puts it in one of his presentations on the topic: “Research could involve any of the below or none of the below activities in our world like – Building a new device, Writing a new software, Repairing a device or Debugging a software, Drawing a conclusion from a lot of data, Proving a theorem, Formulating a theorem”. With different meanings attributed to the word ‘research’,  research, in essence, is a game of innovative ideas that are significant. The significance of ideas could lie in any of – Beauty, Utility, Enhancement of Knowledge.

The ingredients of good research are:
a) Innovation
– An idea that changes one’s behavior contrary to invention that doesn’t change one’s behavior. Idea is the most generated entity in the world. But, if doesn’t change one’s behavior, although it
embarks a new way of thinking and highlights a new way of building things, it remains as an invention.

b) Aesthetics
As Henry Poincare puts it, “Scientists study science not because it is useful, but because it is
beautiful. Here I do not talk about the beauty of appearance or beauty of qualities . . . Here I talk about that profound beauty which comes from a harmonious order of parts . . .”. Take painting or any design activity.
c) Other important aspects viz., Completeness
– Rigour
– Empirical demonstration
– Effective communication

Rigour removes imprecision and adds concreteness. It makes an idea very immune to personal interpretation. Empirical demonstration is through an evidence acquired by observation and experimentation. It opens up new channels for innovation. At an abstract level, it’s about capturing the interaction of  systems with each other; and drawing conclusions based on consistency in the type and nature of interactions. Humans beings can be considered to be made up of some ‘systems’ as well!  When research is not effectively communicated, the innovative ideas, research is not termed as real research by the field.

In Service Organizations, the rigour is often lacking in the way people deliver software, when they give excuse of client deadlines, to hide their inability to learn to work with rigour for delivering a mediocre quality code or software.

When we talk about the process of research, we talk about parameters like The Spirit of Inquiry, Breadth vs Depth, Ability to Abstract and Modularize, Moving from Confusion Phase to start with to Conviction Phase in the end, Research in Academia vs. Research in Industry,

The Ten Commandments of Creativity in Research
1. Seek extension of an earlier known solution
2. If you have a solution, find a problem
3. Find out the right questions to ask
4. Seek generality by removing specificities
5. Seek symmetry by testing for duality
6. Observe patterns in ideas
7. Distill the essence, refine your ideas
8. Distinguish the relevant from the irrelevant
9. Build levels of abstractions and migrate between them
10. Mix deep thinking with routine mechanical work.

Research is often seen as a major source of ideas and by implication, of innovation. One can  internalize ‘research’ nature with practice. Research makes a researcher, a better learner. Although grooming this nature is a leadership attribute for one, to build efficient and quality teams and build robust and long-lasting systems, it’s also of paramount criticality to develop the skill or the ability to switch to different mindset in order to thrive in a dynamic nature of business, especially in service organizations.

Recommended Reading:
1. An Article to clear up some Misconceptions about the nature of Research
2. Richard Hamming on Research
3. Mihaly Csikszentmihalyi on Creativity and Flow

Posted in Research | Leave a Comment »

Performance Engineering

Posted by sanstechbytes on January 26, 2012

I’m invested in MIT OpenCourseWare. That reminds me to go back to MIT site to look for stuff that I like to learn with a first-principles based approach! ;-).

Below are some of the videos of the lectures taught under 6.172 – Performance Engineering of Software Systems and I find ’em  very useful. Check out this MIT Course Page playlist for more related lectures.

Distributed Systems:

Basic Performance Engineering:

Posted in Research | Leave a Comment »

Document Search

Posted by sanstechbytes on December 29, 2011

I recently started working on a trivial problem of search by making use of symbol table with the future goal of enhancing the API designed, for Large Scale and Memory requirements. I just wanted to share this experience in the meantime.

The Problem:
Find the list of documents that contain a particular string. The documents are stored as simple text files in a flat structure having a single directory in a Library. Keep in mind that searching for a word will not be a one-time activity. There can be thousands of calls to the search API exposed to all the clients at the same time. Assume that the public document libraries are not getting modified i.e.:
a. No new documents are being added to the library.
b. No existing documents are being modified.
c. No existing documents are being deleted.
Define a data-structure to hold the necessary information about the documents so that the above API can respond back in minimum time-complexity. Write an implementation for populating information in the data-structure you have defined above.

The Solution:
Given that there can be thousands of calls to one API to retrieve similar info from the same data, we can accept the burden of pre-processing the data. This pre-processing can be performed at the time of server start-up. For the sake of simplicity, let’s consider that the string searched for, in the documents, contains complete words,it can contain duplicate words. If the string contains one word, then our API should retrieve a set of documents that contain the word. If the string contains more than one word, then the intersection of sets of documents that contain individual words will give us the set of documents .

A data structure that maintains key and value pair could be very much optimal, when the purpose of search is to, say, for instance, find definition, find relevant pages, find the song to download, find type and value, find process transactions. For the above problem, we could maintain a pre-processed dictionary with key as Keyword and value as set of documents that contain the keyword searched for, in the documents.

I’ve considered simple search mechanism, excluding the complexities like distributing documents on different machines, assigning the task of alphabetical arragement of keywords to a particular machine, merging the list of documents with sorted keywords etc. As you’re thinking now, the complexities refer to the Large Scale and Memory needs which are the inherent while searching millions of documents. We’re also thinking about doing mergesort, using Hadoop etc. As I work on this in my free time, I want to enhance this API to develop for Large Scale and Memory needs. As I’m talking about Java implementation, we can implement Singleton Design pattern by creating a single instance of the Search API to service all the client requests, per WAR, per classloader, per JVM and ensuring that single instance can’t be cloned.  Note that I’ve used MyST as a wrapper that uses TreeMap API (implements height balancing Red-Black Tree. Worst-case O(log n) time for insert and search operations). HashMap API approach that offers O(c) for most cases and O(n) worst-case for insert and search operations can be considered. If we’re thinking about sorting the documents based on keywords, (for large scale and memory needs), I think, we could readily use TreeMap that maintains the sorted order of it’s keys.

Below is the code:

package search;
import java.io.File;
/**
 *
 * @author Sanjeev Kulkarni
 * Test Client for FileSearcher.java
 */
public class TestFileSearcher {
/**
 * @param args
 */
 public static void main(String[] args) {
 FileSearcher fileSearcher = new FileSearcher();
 String keyWord = "www.kulkarni.org";
 MySET set = fileSearcher.search(keyWord);
 sop("Files containing  " + keyWord + " - : ");
 if (set.isEmpty()) {
 sop("Keyword not found in any of the files");
 return;
 }
 for (File file : set) {
 sop( file.getName() +" at " + file.getAbsolutePath());
 }
 }
/**
 * Utility that prints output to the console
 */
 private static void sop(Object o) {
 System.out.println(o);
 }
}

package search;

import java.io.File;

interface ISearcher {
    /*
     * * @param keyWord - a single word to be searched in the available
     * documents * @returns List - a list of files in which the word is
     * present
     */

    // It can't contain duplicate elements.
    // It has to be a SET since each file (Document) has a unique name
    MySET search(String keyWord);

}

package search;
import java.io.File;

public class FileSearcher implements ISearcher {

    private MyST> myST = null;

    public MySET search(String keyWord) {
        FileSearchFactory fileSearchfactory = FileSearchFactory.getInstance();
        myST = fileSearchfactory.st;
        MySET set = new MySET();
        if (myST.contains(keyWord)) {
            set = myST.get(keyWord);
        }
        return set;
    }

}

package search;

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;

/**
 *
 * @author Sanjeev Kulkarni
 * @version 1.0 Singleton implementation for indexing files based on keyword, to
 *          provide single instance of Symbol Table(MyST class) for all
 *          concurrent client requests. When the first client queries for a
 *          particular keyword, file indexing is done and then search in
 *          performed using O(log n). Hashtable offering O(c) - constant time
 *          operation can be considered. Given that - The files(public document
 *          libraries) are not getting modified, i.e. a. No new documents are
 *          being added to the libraries. b. No existing documents are being
 *          modified and c. No existing documents are being deleted, from the
 *          second client request onwards, only search is performed using Binary
 *          Search taking O(log n).
 *
 */

public class FileSearchFactory {

    private static FileSearchFactory instance = null;
    protected MyST> st = null;
    private static final int SIZE = 3;

    private FileSearchFactory() throws FileSearchException {
        // key = word, value = set of files containing that word
        // This is done only once, per ClassLoader, per Application.
        // 'n' clients get handle to one instance of FileSearchFactory
        // loaded in one class loader, in one WAR deployment
        indexFilesForKeyWords();
    }

    private void indexFilesForKeyWords() throws FileSearchException {
        st = new MyST>();
        // Read in files from the directory
        // For demo, below hard-coded paths are used

        String[] filePaths = new String[SIZE];
        filePaths[0] = "D:/devzone/projects/search/doc1.txt";
        filePaths[1] = "D:/devzone/projects/search/doc2.txt";
        filePaths[2] = "D:/devzone/projects/search/doc3.txt";

        // create inverted index of all files
        System.out.println("Indexing files..");
        for (String filePath : filePaths) {
            System.out.println("  " + filePath);
            File file = new File(filePath);
            Scanner in = null;

            try {
                in = new Scanner(file);
            } catch (FileNotFoundException fnfe) {
                throw new FileSearchException(
                        "File wasn't found while indexing files");
            }

            while (in.hasNext()) {
                String word = in.next();
                if (!st.contains(word))
                    st.put(word, new MySET());
                MySET set = st.get(word);
                set.add(file);
            }
        }
    }

    public static synchronized FileSearchFactory getInstance() {
        // lazy initialization
        if (instance == null) {
            try {
                instance = new FileSearchFactory();
            } catch (FileSearchException e) {
                System.out.println(e.getMessage());
            }
        }
        return instance;
    }

    // preventing client's invoking Object.clone(), to ensure complete singleton
    // implementation
    public Object clone() throws CloneNotSupportedException {
        throw new CloneNotSupportedException();
    }

}

/**
 *
 * @author Sanjeev Kulkarni
 * @version 1.0
 *  Custom exception for messages in file search
 */
public class FileSearchException extends Exception {

    public FileSearchException(){
        super();
    }

    public FileSearchException(String msg){
        super(msg);
    }

    //add machines id's

}

package search;

import java.util.Iterator;
import java.util.SortedSet;
import java.util.TreeSet;

public class MySET> implements Iterable {

    private TreeSet set;

    /**
     * Create an empty set.
     */
    public MySET() {
        set = new TreeSet();
    }

    /**
     * Is this set empty?
     */
    public boolean isEmpty() {
        return set.isEmpty();
    }

    /**
     * Add the key to this set.
     */
    public void add(Key key) {
        set.add(key);
    }

    /**
     * Does this set contain the given key?
     */
    public boolean contains(Key key) {
        return set.contains(key);
    }

    /**
     * Delete the given key from this set.
     */
    public void delete(Key key) {
        set.remove(key);
    }

    /**
     * Return the number of keys in this set.
     */
    public int size() {
        return set.size();
    }

    /**
     * Return an Iterator for this set.
     */
    public Iterator iterator() {
        return set.iterator();
    }

    /**
     * Return the key in this set with the maximum value.
     */
    public Key max() {
        return set.last();
    }

    /**
     * Return the key in this set with the minimum value.
     */
    public Key min() {
        return set.first();
    }

    /**
     * Return the smallest key in this set >= k.
     */
    public Key ceil(Key k) {
        SortedSet tail = set.tailSet(k);
        if (tail.isEmpty())
            return null;
        else
            return tail.first();
    }

    /**
     * Return the largest key in this set <= k.
     */
    public Key floor(Key k) {
        if (set.contains(k))
            return k;

        // does not include key if present (!)
        SortedSet head = set.headSet(k);
        if (head.isEmpty())
            return null;
        else
            return head.last();
    }

    /**
     * Return the union of this set with that set.
     */
    public MySET union(MySET that) {
        MySET c = new MySET();
        for (Key x : this) {
            c.add(x);
        }
        for (Key x : that) {
            c.add(x);
        }
        return c;
    }

    /**
     * Return the intersection of this set with that set.
     */
    public MySET intersects(MySET that) {
        MySET c = new MySET();
        if (this.size() < that.size()) {
            for (Key x : this) {
                if (that.contains(x))
                    c.add(x);
            }
        } else {
            for (Key x : that) {
                if (this.contains(x))
                    c.add(x);
            }
        }
        return c;
    }

}

package search;

import java.util.Iterator;
import java.util.SortedMap;
import java.util.TreeMap;

/**
 *
 * @author Sanjeev Kulkarni
 * @version 1.0
 * Symbol Table with Key = keyword, Value = SET of Files that
 * contain this keyword
 * @param 
 * @param 
 */

public class MyST, Value> implements Iterable {
    private TreeMap st;

    /**
     * Create an empty symbol table.
     */
    public MyST() {
        st = new TreeMap();
    }

    /**
     * Put key-value pair into the symbol table. Remove key from table if
     * value is null.
     */
    public void put(Key key, Value val) {
        if (val == null) st.remove(key);
        else             st.put(key, val);
    }

    /**
     * Return the value paired with given key; null if key is not in table.
     */
    public Value get(Key key) {
        return st.get(key);
    }

    /**
     * Delete the key (and paired value) from table.
     * Return the value paired with given key; null if key is not in table.
     */
    public Value delete(Key key) {
        return st.remove(key);
    }

    /**
     * Is the key in the table?
     */
    public boolean contains(Key key) {
        return st.containsKey(key);
    }

    /**
     * How many keys are in the table?
     */
    public int size() {
        return st.size();
    }

    /**
     * Return an Iterator for the keys in the table.
     * To iterate over all of the keys in the symbol table st, use the
     * foreach notation: for (Key key : st).
     */
    public Iterator iterator() {
        return st.keySet().iterator();
    }

    /**
     * Return an Iterable for the keys in the table.
     * To iterate over all of the keys in the symbol table st, use the
     * foreach notation: for (Key key : st.keys()).
     */
    public Iterable keys() {
        return st.keySet();
    }

    /**
     * Return the smallest key in the table.
     */
    public Key min() {
        return st.firstKey();
    }

    /**
     * Return the largest key in the table.
     */
    public Key max() {
        return st.lastKey();
    }

    /**
     * Return the smallest key in the table >= k.
     */
    public Key ceil(Key k) {
        SortedMap tail = st.tailMap(k);
        if (tail.isEmpty()) return null;
        else return tail.firstKey();
    }

    /**
     * Return the largest key in the table <= k.
     */
    public Key floor(Key k) {
        if (st.containsKey(k)) return k;
        // does not include key if present (!)
        SortedMap head = st.headMap(k);
        if (head.isEmpty()) return null;
        else return head.lastKey();
    }
}

Posted in Java | Leave a Comment »

Applying Computational Thinking: A Consumer’s Perspective

Posted by sanstechbytes on November 11, 2011

Abstract

In this white paper, I give a brief background on the challenges we face, when we, the consumers, set out to consume something in the process of accomplishing a task and the need for computational thinking, thereby arising. I describe ways to effectively address our inability in coming up with an efficient strategy in such situations and in particular, address our inability in making an optimal decision for a sub-problem of Bad Buy. I’ve outlined some steps that can be incorporated in our decisioning framework to aid us in making an Optimal Buy. With a singular focus on the topic of Computational Thinking, I’ve illustrated application of CT techniques for situations mostly relevant to ‘Consumer’ in day-to-day life. In the end, I draw some conclusions based on my observations and my own experience applying computational thinking, as a consumer. A ‘Consumer’ is not just referred to the one that buys items; without loss of generality, Consumer is anyone that consumes data.

Background

The abilities of an individual, including the abilities for abstract thought, understanding, communication, reasoning, learning, planning, emotional intelligence and problem solving define degree of his/her intelligence. The path to excellence in any field consists of journey through many layers of abstraction of knowledge needed to use and enhance these abilities. This applies to sheer accomplishment of a task as well. in this journey of accomplishing a goal, let’s say for all practical purposes, that each and every skill (to perform any task) is acquired in a layered manner through different abstraction layers with the level of difficulty increasing at each layer. In schools, some gifted students naturally build upon the above fundamental abilities, move through each and every layer of abstraction easily and some not-so-gifted struggle a bit to do so but end up being there, through hard work and practice, while majority of the students struggle to move to the more and more difficult layers of abstraction itself. For any given generation at any given point in time, vast majority of students are at a layer of abstraction with lower level of difficulty compared to a minority (or say < 40% of students) that are at a layer of abstraction with higher level of difficulty. Industries want people that are in these layers of abstraction where the real-world action is, when they graduate.

We’re not talking about just the chosen fields to draw these observations. Regardless of one’s years of schooling, everyone thinks computationally in completing a task because this kind of thinking is already embedded inside of us either because it was passed to us by our ancestors, or we’ve observed that it works and have incorporated it or by virtue of our study and practice. A task can be: Going to work in field, Answering multiple phone calls, Designing a system, Washing Clothes, Boarding a bus, Writing a program etc. Everything is algorithmic in nature.

Keywords

Pattern, Algorithm, Computation, Time, Space, Consumer, Buy, Market, e-Commerce

 

The Problem

There’re two types of problems I’m trying to address. One is the problem of a sub-optimal buy; and the other, the poor information consumption.

The information overload has impacted all spheres of our lives. Add to this our own vulnerabilities to deal with it, we suddenly find it difficult to make an optimal decision or consume rich information. We find ourselves in loops.

In all our outward dealings related to buying an item or consuming information, this has resulted in increasing the complexity of the whole process of consumption. We often end up making a sub-optimal buy or a bad buy.

Let’s look at it, from the perspective of trends in Education Sector. The lack of ability to think computationally, among graduates has led to the poor supply of man-power to industries in various sectors. The prevalent education system is the major contributor to such a situation. Researchers and Visionaries have increasingly felt the need for introducing appropriate courses at schools or for introducing mechanism to educate masses concerned with developing new way of thinking known as Computational Thinking, to bridge the academia-industry gap as an effort in tackling this issue. They’ve also simulated and have experimented with results on Computational Thinking, with an appropriate course material, at high-school and pre-university levels that can significantly help the cause. However, a course on Computational Thinking is still not taught at most schools.

Understanding Computational Thinking

Computational Thinking is a way of solving problems, designing systems, and understanding human behavior that draws on concepts fundamental to computer science. Computational thinking is thinking in terms of abstractions, invariably multiple layers of abstraction at once. Computational thinking is about the automation of these abstractions. The automaton could be an algorithm, a Turing machine, a tangible device, a software system—or the human brain. (Carnegie Mellon, n.d.) [Bold added for emphasis.]

Computational Thinking is the amalgam of thought processes involved in formulating problems and their solutions so that the solutions are represented in a form that can be effectively carried out by an information-processing agent. In simple terms, it’s the thinking that involves techniques like problem decomposition, pattern recognition, pattern generalization to define abstractions or models, algorithm design, and data analysis and visualization. As software engineers, we use these techniques; we acquire these skills with practice and build reliable computing systems. While CT draws on concepts from Computer Science, it’s relevance is increasingly being recognized in other areas of active study including algorithmic medicine, computational archaeology, computational economics, computational finance, computation and journalism, computational law, computational social science, and digital humanities.

Specific computational thinking techniques include: problem decomposition, pattern recognition, pattern generalization to define abstractions or models, algorithm design, and data analysis and visualization. In the next section, I’ve made an attempt to describe each of these techniques with examples, to develop better understanding of CT and in a way, address the said problems.

How Computational Thinking Solves Your Problem?

Can we apply computational thinking in daily life? Is Computational Thinking already a subsystem in a big system of our thought processes? Yes and Yes. Although engineers, scientists, architects, or in general those involved in accomplishing a task requiring abstraction and automation, use CT in their daily lives,  some of the examples given below still relate to events in everyone’s life, including those mentioned above.

1. Pattern Recognition:

The ability to notice similarities or common differences that will help us make predictions or lead us to shortcuts. Pattern recognition is frequently the basis for solving problems and designing algorithm.

Example:

Many a times we in IT are under enormous pressure to meet certain deadlines. Most of us usually search on Google or seek help from colleagues or think about solutions based on existing code or existing artifacts. It happened with few of my colleagues. All of us including Yours Truly, in our team didn’t have adequate knowledge of XSL syntax (we could generally use help from existing code or google). There’s a requirement to sort a bunch of records based on date and it was inside an XSL file. My peers got no help from existing code and spent roughly one day thinking about various approaches to solve, which they weren’t willing to use.

When they discussed with me, I predicted that since XSLT is a declarative language used for reorganizing document content, and a way to generate multiple results– such as HTML, WAP, and SVG–from the same content and as any programming language that is used for the tasks mentioned in previous line, provides routines for search and sort operations, XSLT might provide a sort routine. With XSLT, it’s mostly tag based and I keyed in and phew, that works! We solved the problem in 2 minutes. Searching and Sorting are some of the key operations supported by all programming languages that are used for tasks like above. I recognized the pattern and predicted. Five minutes, and we got our solution! You don’t need work experience all the time. It’s the thinking you need.

2. Pattern Generalization and Abstraction:

It refers to the ability to filter out information that is not necessary to solve a certain type of problem and generalize the information that is necessary. Pattern generalization and abstraction allows us to represent an idea or a process in general terms (e.g., variables) so that we can use it to solve other problems that are similar in nature.

Example:

All the people are inherently aware of the fact that due to middlemen (brokers), they always buy items at a heightened price than the original price. It’s a pattern that’s been observed since the mankind began trading. Why don’t people generalize this pattern and try to find ways to solve this problem and to make an efficient buy? They don’t have to be literate to be educated. Even with the internet accessible from mobile, desktop, laptop and reachable to masses, many people do not innovatively think and save time and money using Internet!

Think about buying an item at a lesser price than is sold at many places. Be it book, apparel, music CD, gadget etc. If you think computationally, you come up with abstractions about different intermediary businesses involved between the producer (manufacturer) and the consumer (end user). You also observe the pattern that each intermediary business makes some money in the form of commission and the final price the item is sold at is obviously higher than it was actually priced by the manufacturer. Instead of the B2B (Business 2 Business) model, if you use the B2C (Business 2 Consumer) model, through online shopping for example, there’s a saving in both money and time. Think Amazon.com, Think Flipkart.com etc.

Suppose you went to buy book from a Shop on Avenue Road in Bangalore (This is the place in Bangalore where books are sold usually at discounted rate of >= 20%). This is still the B2B model. In B2C model, you usually shop online from Amazon.com or Flipkart.com; you spend about 2 minutes in making online transaction at the same price. You save money and save the time required to go and buy the book from a shop. Even if you won’t buy online, you’ll avoid spending time at the shop looking at many items, if you do your homework online. You save time anyways.

3. Decomposition:

The ability to break down a task into minute details so that we can clearly explain a process to another person or to a computer, or even to just write notes for ourselves. Decomposing a problem frequently leads to pattern recognition and generalization, and thus the ability to design an algorithm.

Example:

Suppose you’re buying a Laptop that you want for software development or you want for high performance computing. If you’re a software engineer or you’re someone who’s work needs awareness of, or is associated with internals of a workstation, you probably have fair amount of knowledge on what Laptop components are critical for your needs. If you’re not aware of critical nature of components for your needs, you would go to an expert, or ask questions in online forums (the latter is more computational. If you’re thinking computationally, you decompose your Laptop into different components that make up the laptop and you would think the one with highest clock rate, latest chipset, highest RAM, Solid State Disk would give optimum performance. You visualize the performance results in your workstation based on the past experience patterns. Then, you google for products on internet and compare them on one page. You identify the range which meets your requirements of memory and data processing speed. Then you look at the individual brands and additional. You don’t straightaway look up for different brands on google and then are caught up mixing brands and unclear requirements ending up with overlap of wants and requirements. Decomposition saves time and gives a clear picture to you about your configuration and look and feel for your laptop. From the perspective of someone not so knowledgeable about PC components, this Laptop Buying Task can be considered to be solved algorithmically and more on this below.

4. Algorithm Design:

The ability to develop a step-by-step strategy for solving a problem. Algorithm design is often based on the decomposition of a problem and the identification of patterns that help to solve the problem. In computer science as well as in mathematics, algorithms are often written abstractly, utilizing variables in place of specific numbers.

Example:

Let’s consider a scenario when a layman wants to make an optimal decision in buying a mobile. Let’s also visualize the scenario when the so called computer-savvy person wants to do so, without applying computational thinking techniques. By ‘optimal’, I mean the type of thinking that results in deciding a great mobile for his budget and yet something that could offer rich visual and audio experience without compromising on the core requirements like call quality, decent battery backup etc, having invested minimum time in the process. By ‘layman’, I mean someone who can’t appreciate the technical specifications like 5.1 Channel Output, file formats like MP3, MP4, HD etc, Screen Resolution, and Memory (in-built and expandable) size etc, himself. Ultimately, everyone really wants to make an optimal decision!

With so much of competition in mobile phone market, there are plenty of products in both low-end and high-end segments on the offer by different mobile handset manufacturers. This makes the job of making optimal decision a little challenging. There is also lot of middlemen, ‘n’ number of commission-sucking middlemen in increasing order of commission sucking ability(this particularly refers to the second-hand mobile sellers), shops, malls, mobile showrooms, online stores, where mobiles are sold. From business perspective, the sellers need to keep themselves busy, and can’t afford to spend good time explaining tech specs to laymen. Laymen waste a lot of time on looking at lot of products (enticing) that are cheaper, that may turn out to be costlier after some time in terms of the experience they expected but weren’t able to express while buying a mobile and making. In the end, they make a bad buy for their budget.

Let’s talk about computer-savvy persons now. As it’s proven that “Computers / Internet addresses the human vulnerabilities of being lonely and fearful of intimacy”, when computer-savvy person wants to make a decision he’ll google for sites with listing of mobiles, actually many sites, he’ll start randomly scanning different products that are enticing, might read reviews of course(which is good to a degree). Think about the time he actually spends before he comes up with his “practical” requirements and fixing his budget“. He would have wasted lot of time.

Let’s assume that the layman or the computer-savvy person referred to above thinks computationally. Sometimes budget and requirements can have cyclic relationship from CT perspective. Ideally, there’s only a linear uni-directional relationship. Then:

As a first step, he notes down his principles of buying that include: Simplicity/Craze (No, I mean it, although the word ‘Principle’ is not usually referred to, in the context of characteristics like craze etc.), Rough handling, Battery Longevity, Visual Experience, Something pleasing to eyes and at the same time captured for memory, Audio Experience, Digital Life needs (Read documents, PDFs, Email, Surf web, Chat, Widgets, Games etc).

You might be thinking everyone thinks about these things. True, their thinking is not systematic, rather jumbled and one principle overlaps another in their buying process without CT. Remember, the objective is to make an optimal decision. Remember, you might make a great buy spending a lot of time but it’s not just about money. It’s very much about time as well. You might get lucky and make a great buy by spending little time and well within your budget.

1) In the next step, he translates these principles into requirements in terms of technical specifications like:

– Rough handling (robust outer build),

– Battery Longevity (type of battery Li-Ne etc),

– 5.1/ 7.1 channel output (Great Audio experience),

– Busy Life Business (Dual SIM / Triple SIM)

– HD/HTML5/etc for rich Visual experience,

– Android/Bada (widgets/apps),

– Visual *memorable* Experience (Camera 3.2 MP Etc),

– Digital Life Needs (3G, WiFi, Bluetooth, Memory Card with 8 Gb/16Gb etc (Read docs, Watch videos etc)). In case of laymen, they might seek counsel from sellers or some expert to translate their principles into requirements.

2) In the next step, he fixes his budget and allows a buffer of some money like 1k etc.

3) He uses Flipkart.com or any other site that allows for comparison of mobiles on price, tech spec etc (In case of laymen, he’ll seek sellers’ help at the shop).

4) He comes up with a shortlist of products to meet his requirements.

5) He checks if anyone in his radar (friends’ circle) has purchased a mobile that’s in his shortlist.

6) He reads reviews online or watches internals of mobile on YouTube.com or visits Chroma store(store where they let you touch the items for sale and get a feel of the item that you want to buy) etc (in case of layman).

7) If there turn out to be 2 or more matches meeting his all his requirements (including his budget of course), he uses intuition, look and feel to decide which mobile he finally wants to buy. He’s made an optimal decision and is happy with the time spent in his mobile buying process.

8) If there’re no matches, if he can wait for few months when newer mobiles meeting his requirements could be released to the market, he would wait.

9) Anyone that’s crazy will aim at high-price mobile although he mayn’t use most of the features in the mobile at all, for some reasons. It’s not an optimal decision from a computational thinking perspective. The one who thinks computationally, says, “I don’t need to use those features now, I’ll buy a low-price mobile for now and after 2 years, when I would need those features, I would buy the same mobile at probably 2/3 the price that the crazy person has bought now”. Computational Thinker users power of web and technology to assess the trends and situation and make his decision.

It’s a sequence of steps and it’s algorithmic.

Conclusion

From my own experience and observation, lot of us do not think computationally, and spend lot of time in building our Decisioning Framework and also in realizing or living the life we actually want to live. It’s quite impossible or rather extremely difficult to permanently fix some of the human vulnerabilities like being lonely and being fearful of intimacy, trying to explore something enticing, trying to explore something that can grab the attention of your senses. When there’s an information overload and there’s so much to consume digitally and through other means, computational thinking should serve as a quintessential skill to solve problems or dilemmas arising out of such situations or in general, accomplish any task. In these uncertain times and fluctuating economic conditions, CT is a skill that most people should realize the importance of, for ‘Economy Living’.

In summary, Computational thinking is about using Abstraction and using Automation. We’re all increasingly feeling the want to buy time for ourselves, especially those of us leading urban lifestyles. If we think computationally, we’ll save time (always) and money (in some cases) as well. It’s synonymous to thinking in Computer Science where time (time)-complexity is more important than space (money)-complexity in most cases, these days.

References:

1.Google University: Exploring Computational Thinking

2.Computational Thinking: ACM Publication Jeannette M. Wing PDF

3.Carnegie Mellon University

4. MIT Open Courseware Lecture – What do Computer Scientists do?

Posted in Research | 2 Comments »

Exponentiation by Squaring

Posted by sanstechbytes on October 7, 2011

To compute the value of a raised to the power of b (a to power b) , Math.pow(a, b) or some appropriate library function (depending on which language we’re talking about) is often used, which, of course, is optimized. Another efficient approach is Exponentiation by Squaring using recursion. For the underlying idea behind it, read here.

Although optimization achieved with this approach doesn’t appear to be significant for smaller integer exponents, it’s worth considering this for bigger  integer exponents in your exponentiation code.

See the code (in Java) below:

The time complexity is ~ O(c log n): Takes O(log n) for squaring’s and O(log n) for multiplications.

class ExponentiationTest {

public static void main(String[] args) {
// TODO Auto-generated method stub
double a = 2, b = -10; //
double pow = powerWithRecurSquaring(a, b);
if (b < 0) {
pow = 1 / pow;
}
System.out.println(a + ” to the power of  ” + b + ” = ” + pow);
}

// a power b, runs in ~ O(2 log b) time
public static double powerWithRecurSquaring(double a, double b) {
if (b < 0) {
b = -b; //   b = Math.abs(b);
} else if (b == 0) {
return 1;
} else if (b == 1) {
return a;
} else if (b == 2) {
return a * a;
} else if (b % 2 == 0) {
return powerWithRecurSquaring(powerWithRecurSquaring(a, b / 2), 2);
}
return powerWithRecurSquaring(a, b – 1) * a;
}

}

Posted in Algorithm | 2 Comments »

Free Video/Audio Lectures from MIT, Stanford, IIT/IISc, UC Berkely

Posted by sanstechbytes on September 25, 2011

Some of the video/audio lectures that I enjoy are (check out few video lectures embedded below as well) available at:  

My favorite is MIT OCW and it’s a great place for self-learners, educators across the globe.  

The interesting note is that you can download these world-class video/audio lectures covering topics like Art, Economics, Engineering, History, Literature, Medicine, Management, Math, Philosophy, Psychology, Science  etc, including other course materials for free!  Also, most of these are available on YouTube categorised under their respective University Channel or on freevideolectures.net.

Enjoy the lectures!

MIT – Introduction to Algorithms:

Stanford – Machine Learning:

NPTEL – Data Mining:

UC Berkeley Search Engines:

Posted in Programming | Leave a Comment »

Sorting in Linear Time

Posted by sanstechbytes on August 17, 2011

Sorting in linear time is achievable using COUNTING SORT, RADIX SORT, BUCKET SORT algos under special cases. All these trade space for time.  The point in this post is, to sort in linear time, in-place, for special case of array-or-the-list-or-the-attribute-that-can-have-only-two-distinct-values problem.

I call the technique of using two pointers to rearrange(sort) elements based on comparisons or perform math operations like add, subtract, multiply, squaring, divide to yield the target output, the “Two-Pointer Technique”.  This technique is used to efficiently solve problems like:
a) Finding the pair in array that add up to the given target(it’s a sub-problem of EXACT-SUBSET-SUM problem (refer to Introduction to Algorithms by CLRS))
b) Sorting an array with 0’s and 1’s or in general, any array that contains only two distinct values.
c) Finding the Pythagorean triples in an array etc.

The thing is, most of us (I’m talking to the java developers here) just use Collections.sort(…) or Arrays.sort(…)   for all our sorting needs , even for special cases like, the array-or-the-list-can-have-only-two-distinct-values problem.  In a related note, for problem of sorting ‘Person’ objects by age, we can think of implementing BUCKET SORT (runs in O(kn) time) where k could be < 130(age) etc, instead of merge sort of Array.sort(…)  or Collections.sort(…) that runs in O(n log n) worst-case.

The below sort  algorithms run in linear time, in-place and I think, are worth considering for those cases. Below  sort implementations (in Java) are based on Two-Pointer Technique. Think of using this technique to solve many problems. Comments follow in line.

Special mention is for sort(…) method to sort objects based on one field that can contain only one of any two values(say, gender – can only be F or M), for all practical purposes. This implementation compares with Collections.sort(in Java) as follows:
a) Avoids the cost of loading classes: Collections, (say)PersonComparator, Comparable
b) Runs in O(n) compared to Merge-Sort O(n log n) used in Collections.sort(…)
c) Reduces space complexity (merge sort uses auxiliary array of length n, to copy the elements in the sorted array). This is done, in-place (constant amount of space).

Try yourself quantifying the above differences with sample input and run. You’ll notice the significant efficiency achieved.

//Code

public static void testSort( ) {
 int[] a = new int[] { 1, 0, 0, 0, 0, 0, 1 };
 int x = 0, y = 1; // generalized to
System.out.println(“before sort”);
 // printArray(a);
 sortTwoVals(a, x, y); // for int’s
String[] s = new String[] { “Male”, “female”, “male”, “Female”, “male” };
 // printArray(s);
 String s1 = “Female”;
 String s2 = “male”;
sortTwoVals(s, s1, s2); //for strings
List list = new ArrayList();
 String F = “F”;
 String M = “M”;
 list.add(new Person(“Sanjeev”, 27, M));
 list.add(new Person(“Mridul”, 30, M));
 list.add(new Person(“Roark”, 27, M));
 list.add(new Person(“Howard”, 28, M));
 list.add(new Person(“Sheldon”, 23, M));
 list.add(new Person(“Leonard”, 31, F));
 list.add(new Person(“Shanti”, 25, F));
 list.add(new Person(“Pavan”, 27, M));
System.out.println(“Before sort…”);
 // printArray(list.toArray());
 String firstAfterSort = F, secondAfterSort = M;
 list = sort(list, firstAfterSort, secondAfterSort);
 System.out.println(“After sort…”);
 System.out.println(“Persons sorted according to their gender”);
 // printArray(list.toArray());
 }


// When two pointers (‘start’ and ‘end’ converge), we’re done with sorting.
// For integers
public static int[] sortTwoVals(int[] a, int x, int y) {
 int start = 0, end = a.length – 1;
 while (start < end) {
  if (a[start] == y) {
    while (a[end] == y) {
     if (end < start) {
     break; // Sorted case
    }
  end–;
 }

   if (a[end] == x) {
    a[end] = y;
    a[start] = x;
   }
  }
  start++;
 }
 return a;
}


// For Strings
public static String[] sortTwoVals(String[] s, String s1, String s2) {
 int start = 0, end = s.length – 1;
 while (start < end) {
  if (s2.equalsIgnoreCase(s[start])) {
   while (s2.equalsIgnoreCase(s[end])) {
    if (end < start) {
      break; // Sorted case
    }
   end–;
 }

  if (s1.equalsIgnoreCase(s[end])) {
   s[end] = s2;
   s[start] = s1;
  }
 }
 start++;
}
return s;
}


// For objects
public static List sort(List personsList, String firstAfterSort,
 String secondAfterSort) {
 Object[] persons = personsList.toArray();
 int start = 0, end = persons.length – 1;
// When two pointers converge, we’re done with sorting
 while (start < end) {
 if (secondAfterSort.equalsIgnoreCase(((Person) persons[start])
 .getGender())) {
 while (secondAfterSort.equalsIgnoreCase(((Person) persons[end])
 .getGender())) {
 if (end <= start) {
 break; // Sorted case
 }
 end–;
 }
 if (firstAfterSort.equalsIgnoreCase(((Person) persons[end])
 .getGender())) {
 Person swap = (Person) persons[end];
 persons[end] = persons[start];
 persons[start] = swap;
 }
 }
 start++;
 }
 return java.util.Arrays.asList(persons);
 }


public class Person {
 private String name;
 private int age;
 private String gender;
 private Address address;
public Person(String name, int age, String gender) {
 this.name = name;
 this.age = age;
 this.gender = gender;
 }
public Address getAddress() {
 return address;
 }
public void setAddress(Address address) {
 this.address = address;
 }
public int getAge() {
 return age;
 }
public void setAge(int age) {
 this.age = age;
 }
public String getName() {
 return name;
 }
public void setName(String name) {
 this.name = name;
 }
public String getGender() {
 return gender;
 }
public void setGender(String gender) {
 this.gender = gender;
 }
}


class Address {

}

Posted in Algorithm | Leave a Comment »

Find the kth Smallest Element in Array in Linear Time

Posted by sanstechbytes on July 24, 2011

Finding the kth smallest element in array is an application related to Order Statistics problem. For details on the algo used here, watch the video lecture embedded below.

Robert Sedgewick, whose lecture this approach has been inspired from, proposes random shuffle mechanism and partitioning strategy to select the kth smallest element. The strategy requires maintaining variables lo and hi to delimit the subarray that contains the index k of the element to be selected, using partitioning to shrink the size of the subarray (Step 2). The assumption made is that the input array to the algo doesn’t contain duplicates (This can be carried out as a preprocessing step, which is performed as first step in the algo below, though). Runs in linear time, O(n).

Algo:
1) Preprocess array to remove duplicates.
2) Rearrange the array from a[lo] through a[hi] and return an integer j such that a[lo] through a[j-1] are less than a[j], and a[j+1] through a[hi] are greater than a[j].
a) If k is equal to j, done.
b) Else If k is less than j, then continue working in the right subarray (by changing the value of hi to j-1).
c) Else If k is greater than j, then continue working in the right subarray (by changing lo to j+1).
3) Repeat step (2), until the array consists just of k.

MIT Video Lecture on Order Statistics:

Java Code:


/** Sedgewick's partitioning approach */
 public class QuickSelect {

private static Random random = new Random();

public static Comparable select(Comparable[] a, int k) {
 if (k = a.length) {
 throw new RuntimeException("Selected element out of bounds");
 }

shuffle(a);

int lo = 0, hi = a.length - 1;
 while (hi > lo) {
 int i = partition(a, lo, hi);
 if (i > k)
 hi = i - 1;
 else if (i < k)
 lo = i + 1;
 else
 return a[i];
 }
 return a[lo];
 }

private static int partition(Comparable[] a, int lo, int hi) {
 int i = lo;
 int j = hi + 1;
 Comparable v = a[lo];
 while (true) {
 // find item on lo to swap
 while (less(a[++i], v))
 if (i == hi)
 break;
 // find item on hi to swap
 while (less(v, a[--j]))
 if (j == lo)
 break; // redundant since a[lo] acts as sentinel
 // check if pointers cross
 if (i >= j)
 break;
 exch(a, i, j);
 }
 // put v = a[j] into position
 exch(a, lo, j);
 // with a[lo .. j-1] <= a[j] <= a[j+1 .. hi]
 return j;
 }

public static void shuffle(Object[] a) {
 int n = a.length;
 for (int i = 0; i < n; i++) {
 int r = i + uniform(n - i); // between i and N-1
 Object temp = a[i];
 a[i] = a[r];
 a[r] = temp;
 }
 }

/**
 * Return an integer uniformly between 0 and N-1.
 */
 public static int uniform(int N) {
 return random.nextInt(N);
 }

// is v < w ?
 private static boolean less(Comparable v, Comparable w) {
 return (v.compareTo(w) < 0);
 }

// exchange a[i] and a[j]
 private static void exch(Object[] a, int i, int j) {
 Object swap = a[i];
 a[i] = a[j];
 a[j] = swap;
 }

// Read strings from standard input, sort them, and print.
 public static void main(String[] args) {
 // Read input from StdIn here
 // String[] a = new String[]{"aa", "aaa", "bb", "ab", "ca", "bc",
 // "cab", "cc", "cad" };
 Integer[] a = new Integer[] { 2, 11, 10, 6, 8, 99, 7, 100, 101, 28,
 9 };
 int k = 2;
 Object kth = Quick.select(a, k - 1); // returns a[k-1]
 System.out.println(k + "th smallest element in given array = "
 + kth);
 }
 }

Posted in Algorithm | Tagged: | Leave a Comment »

Pythagorean Triples

Posted by sanstechbytes on July 10, 2011

A quick background on properties of Pythagorean triples can be got from here and here.

When I was browsing through topics in algorithm forum on stackoverflow.com , I found , sort of, an interesting challenge in finding Pythagorean Triples in the given array, saying:  Can’t be achieved in no better than O(n*n) time complexity(a unanimous opinion). This enthused me to attempt writing a faster than O(n*n) algo, but after much analysis and code completion, I couldn’t help joining the participants. In below lines, the code snippet follows. I wonder, if any algorithm enthusiast or researcher could point me to a faster algorithm than O(n*n), if there exists any.

Follow comments in-line to help yourself understand the algorithm used. This algorithm achieves O(k*n*n) time complexity where k is nearly equal to n/3  in most cases. For brevity, it’s an O(n*n) algo.

// O(n+n*log n+k*n*n) = O(k*n*n)= O(n*n) algo where k is nearly n/3 in
// most cases
// Returns the number of triples found.
// Returns
// -1, when n < 3,
// 1 when n = 3 and/or a triple is found and
// 0 when no triple found
public static int pythtriples(int[] pyth) {
int n = pyth.length;

// No duplicate removal algo is used before sort, to avoid the cost of
// O(n),as this will be taken care of below,in for loop.
// (Trivial) Shifting mechanism should be used in
// array with new length in-place(trivial)in case,
// either the element occurring in multiple places is 0 or neg integer

// Square all the numbers – O(n) time complexity.
for (int i = 0; i < n; i++) {
pyth[i] = pyth[i] * pyth[i];
}

// Sort them – O(n * log n) time complexity
Arrays.sort(pyth);

if (n < 3) {
return -1; // error : No of elements must be >= 3
}

// Simple case
if (n == 3) {
if (pyth[2] == pyth[1] + pyth[0])
return 1;
else
return 0;
}

// Compare pair-wise: if(a[i] == (a[j] + a[k]) then, it’s a triple),
// k<i, j<i
// O(k* n * n) time complexity, where k is nearly n/3, can be
int triples = 0, foundtripleLarge = 0;
for (int i = 2; i < n; i++) {
for (int j = 0; j < i; j++) {
for (int k = i – 1; k >= 0; k–) {
// If already found, don’t compare again for 3SUM. Avoids
// unnecessary check for duplicates
if (foundtripleLarge == pyth[i]) {
break;
}
if (pyth[i] == (pyth[j] + pyth[k])) {
System.out.println(“triples found: ” + pyth[i] + ” = ”
+ pyth[j] + ” + ” + pyth[k]);
triples++; // Counter for triples
foundtripleLarge = pyth[i];
break;
}
}
}
}
return triples;
}

Math: The pair-wise comparison follows: 1^2 + 2^2 + 3^2 + …..+n^2 = n(n+1)(2n+1)/6, nearly equal to 3n^2, same as our O(kn^2) algo above!

Posted in Algorithm | Tagged: | Leave a Comment »