The visualization below is the last element of work with my advisor Joe Blitzstein on exploring the Respondent-Driven Sampling (RDS) process via simulation.

We aim to study the behaviour of RDS on networks with topologies dependent on the quantity surveyed. We are particularly interested in varying degrees of homophily in network topologies and respondent recruitment patterns. This is because such features introduce biases and make existing methods of estimation underperform, as I have outlined in this post. The visualization presented there summarizes our findings based on simulations, whereas the one presented in this post shows how the process actually happens under varying conditions.

On the visualization, there is a network of size 100 plotted as a function of underlying surveyed measurement. The vertices are initially jittered around the peaks of its histogram. Functionality:

- One can ask the vertices to spread out in the vertical direction by pressing the
*spread out*button. Vertex and edge opacity, as well as vertex speed are controlled with the sliders right next to the button. - Different network topologies can be controlled with controls next to the
*load network*button. The slider called*topology*selects one of 10 topology sensitivity constants (that is, by shifting this slider one gets slightly denser or sparser networks of the same type). - The final group of controls is for setting the RDS process features.
*sample size*determines when the process stops and*number of coupons*tells how many new respondents every node can recruit. Then there are controls for uniform or proportional to degree seed referral (that is, one can select whether the green nodes are selected purely at random from within the pool of available nodes, or with weights proportional to the number of edges coming out of them), and last but not least, recruitment pattern. There are three possible recruitment patterns: preferential (similar to homophily, respondents try to recruit new participants with quantity as close as possible), inverse preferential (the opposite of the previous pattern), and uniform (recruitment completely at random from within the connected vertices).*next sample*and*next wave*buttons help display new samples, or waves within a sample, respectively.

This visualization is very useful when demonstrating the intuition behind RDS to non-technical audience. It is also very helpful in generating ideas about the related problems. For example, the recruitment is modeled with 50% chance of non-recruitment (that is, every participant has 50% probability of not using available coupons). Thus, one can observe that the process frequently dies with small number of coupons. With increasing number of coupons, the process turns viral and consumes the network until the necessary sample size is achieved. This observation is important in designing actual RDS-based surveys, because one usually wants to attain an acceptable tradeoff between the breadth of RDS (number of participants in the same wave), and its depth (number of waves).

Perhaps the most interesting technical trait of the visualization is that it generates samples in real time via calls to R server-side. In its creation, I have used the following tools: Processing, CGI and R.

While creating the visualization, I have been relying on support and insightful comments of my dear advisor Joe Blitzstein. The related paper is almost ready and we plan to submit it any day now.

I plan to work more on Processing applications, as I am convinced that a compelling visualization is often key in conveying concepts and generating ideas.

Tags: Anchor Process, homophily, Joe, Processing, R, RDS, visualization

[…] on complementing my visualization of the Respondent-Driven Sampling (RDS) process presented in this post to illustrate its evolution over time. That was how the second version was created, which is […]