Archive for May, 2011

Dynamic visualization, paper supplement 1

Saturday, May 28th, 2011


Dynamic visualization, paper supplement 2

Saturday, May 28th, 2011


Data science term in The Economist

Thursday, May 19th, 2011

Seems that there is no stopping now: the term data science appears prominently in the headline article of the current issue of The Economist.

The Economist

Compared with the rest of America, Silicon Valley feels like a boomtown. Corporate chefs are in demand again, office rents are soaring and the pay being offered to talented folk in fashionable fields like data science is reaching Hollywood levels. And no wonder, given the prices now being put on web companies.

It is indeed quite misleading that the term has the word science in it as it implies an established field, while in fact the science of data is statistics. I wrote a post on the subject earlier in an attempt to single out what is it that distinguishes data science from statistics. That set aside, however, the article is supportive of the rise in demand for our profession, which is a good news for the specialists. Hopefully, the tech bubble mentioned there won't be inflated further by people who misuse the data science term.

Does homophily exist?

Monday, May 9th, 2011

This spring I gave two talks, one at the New England Statistics Symposium (NESS) hosted by the Department of Statistics, University of Connecticut, and a post-qualifying talk in my home department. Both talks were on my work with my advisor Joe Blitzstein, and both drew heavily on the term homophily. The first talk concerned refined simulation study results concerning design-based estimation, and the second one was about model-based estimation under Respondent-Driven Sampling. For the latter, we consider the data collected within a recent study of populations at high risk of HIV conducted in San Diego. The study took nearly 2 years to complete and was aimed at collecting information describing behavioral and health aspects of the target population. It is a privilege and responsibility to be commissioned to analyze the collected data, as the results of the analysis may be used for subsequent policy decisions. Figure 1 demonstrates the (anonymized) recruitment trees of the study.

San Diego study recruitment tree

Figure 1: San Diego study recruitment trees as functions of HIV status. On the x axis, observation means the HIV status group.