Drastic R speed-ups via vectorization (and bug fixes)

Friday, April 29th, 2011

RDS visual

Figure 1: A screenshot of the corrected and enhanced dynamic visualization of RDS. Green balls are convenience sample, pink balls are subsequently recruited individuals, pink lines are links between network nodes that have been explored by the process, and numbers in circles correspond to sample wave number.

It is common to hear that R is slow, and so when I faced the necessity to scale old R code (pertaining to material described in this post) to operate on data 100 times larger than it used to, I was initially at a loss. The problem with the old code was that it took several days and about 4000 semi-parallel jobs to complete. With the size of data increasing by a factor of 100, the task was becoming infeasible to complete. Eventually however, I was able to achieve an over 100-fold speedup of the R code, with the speedup being due to addressing two issues: (more…)

Dynamic visualization of RDS

Saturday, December 18th, 2010

The visualization below is the last element of work with my advisor Joe Blitzstein on exploring the Respondent-Driven Sampling (RDS) process via simulation. (more…)

Working with In4mation Insights

Thursday, September 30th, 2010

Starting in the summer of 2010 I have been fortunate to work on several projects with a leading market research firm In4mation Insights, based in Needham Heights MA.

My job function has led me to work closely with the firm's partners Steve Cohen and Mark Garratt, who have both been an example of impeccable professionalism and wit, and also with some other members of the team - Mark Irwin, Sanjib Mohanty and Ryan Hickey who are all quite sharp. (more…)

Implementing statistical procedures for Software Productivity Research LLC

Thursday, July 8th, 2010

During the spring and beginning of summer of 2010, I have been working with Software Productivity Research LLC (SPR) on implementing core statistical functionality for their software product. The company performs consulting services in the area of software development and has representatives in the US and China.

The goal of the software that the company is developing is to help their clients estimate the cost, total duration and other parameters of software development process based on historical data on past projects. (more…)