Does homophily exist?

by Sergiy Nesterko on May 9th, 2011

This spring I gave two talks, one at the New England Statistics Symposium (NESS) hosted by the Department of Statistics, University of Connecticut, and a post-qualifying talk in my home department. Both talks were on my work with my advisor Joe Blitzstein, and both drew heavily on the term homophily. The first talk concerned refined simulation study results concerning design-based estimation, and the second one was about model-based estimation under Respondent-Driven Sampling. For the latter, we consider the data collected within a recent study of populations at high risk of HIV conducted in San Diego. The study took nearly 2 years to complete and was aimed at collecting information describing behavioral and health aspects of the target population. It is a privilege and responsibility to be commissioned to analyze the collected data, as the results of the analysis may be used for subsequent policy decisions. Figure 1 demonstrates the (anonymized) recruitment trees of the study.

San Diego study recruitment tree

Figure 1: San Diego study recruitment trees as functions of HIV status. On the x axis, observation means the HIV status group.

Observe that there were 10 recruitment waves (including initial participants) as shown on the y axis. The x axis stands for the 6 different HIV status groups. The most represented are groups 2, 5 and 6, which are Self-reported chronic stage HIV, Self-reported non-infected with HIV, and HIV status unknown, respectively. If we assume that groups 5 and 6 are similar, and calculate the number of times participants were referred into study by previous respondents from a similar group, we observe a result displayed in Figure 2.

Number of cases a participant has been recruited into San Diego study by a respondent from a similar HIV status group

Figure 2: Number of cases a participant has been recruited into San Diego study by a respondent from a similar HIV status group.

This finding supports the claim that there is homophily on HIV status in the recruitment process. Otherwise, the bars would have been of roughly equal height. The observation we make here is coherent with a broader body of evidence. For example, similar findings are demonstrated in D. Abramovitz et al., “Using Respondent Driven Sampling in a Hidden Population at Risk of HIV Infection: Who do HIV-positive recruiters recruit?,” Sexually transmitted diseases 36, no. 12 (2009): 750.

Due to homophily, the San Diego study organizers could have anticipated the HIV status of newly recruited participants knowing the HIV status of respondents in the current wave of the study. It then follows that the “unusual” referrals (participants referred by respondents with dissimilar HIV status) may have higher importance than those exhibited by homophily. It is thus useful to view RDS process and recruitment structure as a function of homophily on the quantity surveyed. Such approach allows for different degrees of homophily as manifested by RDS data. These ideas are at the foundation of the model-based estimation technique for RDS we are developing with my advisor.

Tags: , , , , ,

One Response to “Does homophily exist?”

  1. Alex Coleman says:

    Thanks for sharing this study

Leave a Reply