Archive for the ‘Software’ Category

theory.info, a new project

Tuesday, July 12th, 2011


Recently I purchased the domain and created an interactive logo/visualization for Theory Information Analysis, a screenshot of which is presented above. Theory is a new project which I would like to represent applied real word work, including quantitative consulting and applied research. (more…)

Drastic R speed-ups via vectorization (and bug fixes)

Friday, April 29th, 2011
RDS visual

Figure 1: A screenshot of the corrected and enhanced dynamic visualization of RDS. Green balls are convenience sample, pink balls are subsequently recruited individuals, pink lines are links between network nodes that have been explored by the process, and numbers in circles correspond to sample wave number.

It is common to hear that R is slow, and so when I faced the necessity to scale old R code (pertaining to material described in this post) to operate on data 100 times larger than it used to, I was initially at a loss. The problem with the old code was that it took several days and about 4000 semi-parallel jobs to complete. With the size of data increasing by a factor of 100, the task was becoming infeasible to complete. Eventually however, I was able to achieve an over 100-fold speedup of the R code, with the speedup being due to addressing two issues: (more…)

Dynamic visualization of RDS version 2

Sunday, March 27th, 2011

Early this semester, I worked on complementing my visualization of the Respondent-Driven Sampling (RDS) process presented in this post to illustrate its evolution over time. That was how the second version was created, which is displayed here.

Please refer to the earlier post for detailed description of the main functionality. The second version implements an additional view of the process, which plots the portion of the underlying network as discovered by the RDS process over time. To switch to an alternate view at any time, press the change view button. The wide pink horizontal line in the alternate view marks the true population mean. (more…)

Dynamic visualization of RDS

Saturday, December 18th, 2010

The visualization below is the last element of work with my advisor Joe Blitzstein on exploring the Respondent-Driven Sampling (RDS) process via simulation. (more…)

Tradeoffs in estimation under Respondent-Driven Sampling, and Chernoff faces

Wednesday, October 6th, 2010

Recently I have been working hard on finalizing the paper that we are writing with my advisor Joe Blitzstein about estimation under Respondent-Driven Sampling (RDS). Specifically, the paper aims to develop general intuition about how the process works on networks with different topologies, and what are the driving factors of current estimators’ performance (or lack thereof).

To do this, we simulated many networks belonging to one of three main types (homophily, rich-gets-richer and inverse homophily), simulated many RDS processes of different configurations on each, and compared performance of the well-established Volz-Heckathorn (VH) estimator, and plain vanilla mean as point estimators under each scenario. Among other findings, it has turned out that the VH estimator underperforms the plain mean on the considered class of homophily networks, and prevails in some other cases. (more…)

Working with In4mation Insights

Thursday, September 30th, 2010

Starting in the summer of 2010 I have been fortunate to work on several projects with a leading market research firm In4mation Insights, based in Needham Heights MA.

My job function has led me to work closely with the firm’s partners Steve Cohen and Mark Garratt, who have both been an example of impeccable professionalism and wit, and also with some other members of the team – Mark Irwin, Sanjib Mohanty and Ryan Hickey who are all quite sharp. (more…)

Implementing statistical procedures for Software Productivity Research LLC

Thursday, July 8th, 2010

During the spring and beginning of summer of 2010, I have been working with Software Productivity Research LLC (SPR) on implementing core statistical functionality for their software product. The company performs consulting services in the area of software development and has representatives in the US and China.

The goal of the software that the company is developing is to help their clients estimate the cost, total duration and other parameters of software development process based on historical data on past projects. (more…)

Networks with homophily, an interesting visualization

Tuesday, June 8th, 2010

The research I am currently involved in with my advisor Joe Blitzstein concerns networks with homophily. As per Wiki:

Homophily (i.e., love of the same) is the tendency of individuals to associate and bond with similar others. The presence of homophily has been discovered in a vast array of network studies. (more…)

Hello world, or the birth of my professional blog

Thursday, May 27th, 2010

Hello world!

I have finally gotten around to extend my website and include a section documenting current work developments in more detail. As the base software, I have used WordPress. It is an open-source blogging software that has turned out easy to install and customize. (more…)