Posts Tagged ‘RDS’

JSM2011, and a final stretch at RDS

Thursday, August 18th, 2011

The Joint Statistical Meetings conference took place in Miami Beach on July 30-August 5. It went very well, and the definite highlight was the keynote lecture by Sir David Cox. Among the other sessions, the following stand out:

  1. A Frequency Domain EM Algorithm to Detect Similar Dynamics in Time Series with Applications to Spike Sorting and Macro-Economics by Georg M. Goerg, a student at CMU Stat. The talk was very enjoyable and the conveyed ideas were crisp and exciting, the main one being that zero-mean time series can be thought of as histograms by representing them as frequency distributions which allows for an elegant non-parametric classification approach by minimizing the KL divergence of observed and simulated frequency histograms.
  2. Large Scale Data at Facebook by Eric Sun from Facebook. Though not groundbreaking, the talk was exciting as it described the work environment at Facebook and the approach taken to getting signals out of massive data. Mostly, curious facts were presented from analyzing the frequencies of word occurrences in user status updates, with the interesting part being the analysis framework developed to do that.
  3. Jointly Modeling Homophily in Networks and Recruitment Patterns in Respondent-Driven Sampling of Networks by my advisor Joe Blitzstein about our most recent research on model-based estimation for Respondent-Driven Sampling (RDS). The approach we are developing is looking to have several very attractive features in comparison to current estimation techniques and is designed for the case of homophily of varying degree. An example is illustrated on Figure 1.

    Figure 1: An example of homophily, with the network plotted over the histogram of the homophily inducing quantity (left), and resulting (normalized) vertex degrees plotted over the same histogram (right).

    We hope to finish the relevant paper soon and open the approach to extensions by the research community.

During the conference, I also had a chance to finish making a dynamic 3D visualization of a constrained optimization algorithm I developed for In4mation Insights, which is exciting. As for Miami Beach itself, it is a great place to go out and enjoy the good food, sun and beach. JSM2012 will be held in San Diego.

I created the visualization in this post using Processing.

Dynamic visualization, paper supplement 1

Saturday, May 28th, 2011


Dynamic visualization, paper supplement 2

Saturday, May 28th, 2011


Does homophily exist?

Monday, May 9th, 2011

This spring I gave two talks, one at the New England Statistics Symposium (NESS) hosted by the Department of Statistics, University of Connecticut, and a post-qualifying talk in my home department. Both talks were on my work with my advisor Joe Blitzstein, and both drew heavily on the term homophily. The first talk concerned refined simulation study results concerning design-based estimation, and the second one was about model-based estimation under Respondent-Driven Sampling. For the latter, we consider the data collected within a recent study of populations at high risk of HIV conducted in San Diego. The study took nearly 2 years to complete and was aimed at collecting information describing behavioral and health aspects of the target population. It is a privilege and responsibility to be commissioned to analyze the collected data, as the results of the analysis may be used for subsequent policy decisions. Figure 1 demonstrates the (anonymized) recruitment trees of the study.

San Diego study recruitment tree

Figure 1: San Diego study recruitment trees as functions of HIV status. On the x axis, observation means the HIV status group.


Dynamic visualization of RDS version 2

Sunday, March 27th, 2011

Early this semester, I worked on complementing my visualization of the Respondent-Driven Sampling (RDS) process presented in this post to illustrate its evolution over time. That was how the second version was created, which is displayed here.

Please refer to the earlier post for detailed description of the main functionality. The second version implements an additional view of the process, which plots the portion of the underlying network as discovered by the RDS process over time. To switch to an alternate view at any time, press the change view button. The wide pink horizontal line in the alternate view marks the true population mean. (more…)

Dynamic visualization of RDS

Saturday, December 18th, 2010

The visualization below is the last element of work with my advisor Joe Blitzstein on exploring the Respondent-Driven Sampling (RDS) process via simulation. (more…)

Tradeoffs in estimation under Respondent-Driven Sampling, and Chernoff faces

Wednesday, October 6th, 2010

Recently I have been working hard on finalizing the paper that we are writing with my advisor Joe Blitzstein about estimation under Respondent-Driven Sampling (RDS). Specifically, the paper aims to develop general intuition about how the process works on networks with different topologies, and what are the driving factors of current estimators' performance (or lack thereof).

To do this, we simulated many networks belonging to one of three main types (homophily, rich-gets-richer and inverse homophily), simulated many RDS processes of different configurations on each, and compared performance of the well-established Volz-Heckathorn (VH) estimator, and plain vanilla mean as point estimators under each scenario. Among other findings, it has turned out that the VH estimator underperforms the plain mean on the considered class of homophily networks, and prevails in some other cases. (more…)

Visualizing while on Opening Workshop on Complex Networks at SAMSI

Tuesday, August 31st, 2010

It is now almost the end of my stay here in Research Triangle Park, NC at the Opening Workshop on Complex Networks organized by SAMSI. I presented a poster here on some of my work with Joe Blitzstein on estimation under respondent-driven sampling. This was about simulation studies we have done to lay foundations for our development of the new estimation method as outlined in this post. I will prepare a post describing this earlier work once we submit a paper on it, which should be soon. I also had a pleasure to meet other researchers working in the field, in particular Matt Salganik and Erik Volz. It was really enjoyable and inspiring to discuss problems relevant to estimation in RDS.

Apart from enjoying the workshop, I have had a chance to enjoy some Processing and experimented with some ideas about visualizing high dimensional dependent data (that is, when the number of dimensions is larger than 3). (more…)

Conferences in the summer of 2010

Friday, August 20th, 2010

This summer I have attended Joint Statistical Meetings (JSM) in Vancouver, and have been fortunate to have been accepted to Complex Networks Opening Workshop held by Statistical and Applied Mathematical Science Institute (SAMSI) in North Carolina near Chapel Hill. Both events are exciting and intellectually stimulating. (more…)

Networks with homophily, an interesting visualization

Tuesday, June 8th, 2010

The research I am currently involved in with my advisor Joe Blitzstein concerns networks with homophily. As per Wiki:

Homophily (i.e., love of the same) is the tendency of individuals to associate and bond with similar others. The presence of homophily has been discovered in a vast array of network studies. (more…)