coderandeffectsnetworksims.txt |
july12.7z |
I’m continuing to post implementation notes to accompany A Manual for Cultural Analysis, which I published last year together with two of my anthropologist colleagues at RAND. In my last installment, I provided links and advice for implementing CCA/PCA.
In this blog post I will address how to implement the network modeling method that is discussed in some detail in Chapter 4 of the manual. The central question is this: how do we tell which set of social connections are most important to the transmission of a cultural trait? Note: if a trait doesn’t transmit on some kind of social connection, then it can’t be socially learned, and so by definition it isn’t culture!
OK, but people are connected by ties of friendship, marriage, coworkership, twitter, etc. So how do we decide which of the various ties are most relevant to a diffusing cultural trait? We cover this question in a lot of detail in the manual. I covered it with even more detailed simulations in my paper with my student Rouslan Karimov. We never got a supplement published for that paper, so I’m posting here the code you need to run the most important analysis from the paper – the method that works. First, make sure R is installed. Download the files that are part of this blog post. Extract (unzip) the July12.7z file - I had to zip it to post it here. After it is extracted you should have a file July12.RData. Then if you double click the July12.RData file it should start up R and will already have the simulated data objects you need in the workspace. Type ls() in the R command line to see what is in the workspace. If double clicking doesn’t work, then start R and type load(“diffsimcont.RData”). Make sure you have used the change directory feature in the dropdown to move your active directory to wherever you put July12.RData.
Think of this like a cooking show where some intermediate step is already baked. To create the things in the R workspace you just loaded you would need to simulate networks, simulate trees, then simulate characters diffusing/evolving on them, etc. I’m happy to provide the simulation code to anyone interested. Just email me. The focus of this blog post, however, is not about building simulations but learning to apply dyadic regression with random effects to network/tree datasets.
OK, so then start running my code file called CodeRandEffectsNetworkSims.txt. Simply copying and pasting one line at a time into the R command line is a good way to learn how a piece of R code works. When you get to the loop you would have to paste in the whole loop for it to run; however, I recommend you set i equal to something, like i=1, and then walk through the loop one line at a time as well. That will enable you to inspect what is happening in the loop. One important part is this bit where it defines what you need to run the random effects regression that controls for the repeated identities of the individuals. The individuals are being repeated across each of their network relationships:
names.vector<-1:nrow(sn.adj1)
rows<-matrix(rep(names.vector,ncol(sn.adj1)),ncol=ncol(sn.adj1))
cols<-matrix(rep(names.vector,ncol(sn.adj1)),ncol=ncol(sn.adj1),byrow=T)
outcome.vector<-as.vector(daisy(as.data.frame(netsims[,,i]),metric="euclidean"))
temp.data<-data.frame(outcome.vector,sn.adj1[lower.tri(sn.adj1)],sn.adj2[lower.tri(sn.adj2)],rows[lower.tri(rows)],cols[lower.tri(cols)])
colnames(temp.data)<-c("outcome","sn.adj1","sn.adj2","rows","cols")
net.mod<-lmer(outcome~sn.adj1+sn.adj2+(1|rows)+(1|cols),data=temp.data)
Within the lmer function call the terms (1|rows) and (1|cols) are what is specifying the random effects – which are just the identity of each row and column for each dyadic datapoint. I like lmer in the lme4 package for random effects models (aka mixed hierarchical models) in R, but another option is gls in the nlme package. There are more options besides these as well, including in other statistical packages like SAS, which has some very good random effects modeling routines. I’m not going to discuss fully here why this is the best approach to determining which tree or network most governs the cultural diffusion process for a trait – read the manual or Karimov and Matthews 2017 if you want the answer to that.
In terms of getting to know how lmer works, be sure to run some of the example code provided in the lmer help file. From R you can get to the help for any function by typing ?function.name in the R command line. For example, typing ?lmer will get you the lmer help file.
I will say that I think the simulations in Karimov and Matthews 2017 are more comprehensive than anything anyone else has ever done on this issue. We show that the dyadic regression with random effects is a definitive solution. It works for multiple networks, or networks combined with trees. I’m sure one could create evil combinations of unmeasured confounding and measurement error where the method will fail, but in principle it works across all relevant conditions while I show the other commonly used methods like lnam (sna R package) and MRQAP (aka Mantel test) do not work across all relevant conditions. If you can fit a random effects regression model then you can fit the method I’m recommending based on the simulations I’ve done. You don’t need any particular software package, you don’t need my code, just regress the trait distances and network ties, include random effects for node IDs, and you’re done. I shouldn’t hear anymore at conferences about how we can’t distinguish treelike inheritance from network diffusion, or determine which networks are important. Measure whatever networks or trees you think might matter, put them in the dyadic regression with random effects, and you’re done.
In this blog post I will address how to implement the network modeling method that is discussed in some detail in Chapter 4 of the manual. The central question is this: how do we tell which set of social connections are most important to the transmission of a cultural trait? Note: if a trait doesn’t transmit on some kind of social connection, then it can’t be socially learned, and so by definition it isn’t culture!
OK, but people are connected by ties of friendship, marriage, coworkership, twitter, etc. So how do we decide which of the various ties are most relevant to a diffusing cultural trait? We cover this question in a lot of detail in the manual. I covered it with even more detailed simulations in my paper with my student Rouslan Karimov. We never got a supplement published for that paper, so I’m posting here the code you need to run the most important analysis from the paper – the method that works. First, make sure R is installed. Download the files that are part of this blog post. Extract (unzip) the July12.7z file - I had to zip it to post it here. After it is extracted you should have a file July12.RData. Then if you double click the July12.RData file it should start up R and will already have the simulated data objects you need in the workspace. Type ls() in the R command line to see what is in the workspace. If double clicking doesn’t work, then start R and type load(“diffsimcont.RData”). Make sure you have used the change directory feature in the dropdown to move your active directory to wherever you put July12.RData.
Think of this like a cooking show where some intermediate step is already baked. To create the things in the R workspace you just loaded you would need to simulate networks, simulate trees, then simulate characters diffusing/evolving on them, etc. I’m happy to provide the simulation code to anyone interested. Just email me. The focus of this blog post, however, is not about building simulations but learning to apply dyadic regression with random effects to network/tree datasets.
OK, so then start running my code file called CodeRandEffectsNetworkSims.txt. Simply copying and pasting one line at a time into the R command line is a good way to learn how a piece of R code works. When you get to the loop you would have to paste in the whole loop for it to run; however, I recommend you set i equal to something, like i=1, and then walk through the loop one line at a time as well. That will enable you to inspect what is happening in the loop. One important part is this bit where it defines what you need to run the random effects regression that controls for the repeated identities of the individuals. The individuals are being repeated across each of their network relationships:
names.vector<-1:nrow(sn.adj1)
rows<-matrix(rep(names.vector,ncol(sn.adj1)),ncol=ncol(sn.adj1))
cols<-matrix(rep(names.vector,ncol(sn.adj1)),ncol=ncol(sn.adj1),byrow=T)
outcome.vector<-as.vector(daisy(as.data.frame(netsims[,,i]),metric="euclidean"))
temp.data<-data.frame(outcome.vector,sn.adj1[lower.tri(sn.adj1)],sn.adj2[lower.tri(sn.adj2)],rows[lower.tri(rows)],cols[lower.tri(cols)])
colnames(temp.data)<-c("outcome","sn.adj1","sn.adj2","rows","cols")
net.mod<-lmer(outcome~sn.adj1+sn.adj2+(1|rows)+(1|cols),data=temp.data)
Within the lmer function call the terms (1|rows) and (1|cols) are what is specifying the random effects – which are just the identity of each row and column for each dyadic datapoint. I like lmer in the lme4 package for random effects models (aka mixed hierarchical models) in R, but another option is gls in the nlme package. There are more options besides these as well, including in other statistical packages like SAS, which has some very good random effects modeling routines. I’m not going to discuss fully here why this is the best approach to determining which tree or network most governs the cultural diffusion process for a trait – read the manual or Karimov and Matthews 2017 if you want the answer to that.
In terms of getting to know how lmer works, be sure to run some of the example code provided in the lmer help file. From R you can get to the help for any function by typing ?function.name in the R command line. For example, typing ?lmer will get you the lmer help file.
I will say that I think the simulations in Karimov and Matthews 2017 are more comprehensive than anything anyone else has ever done on this issue. We show that the dyadic regression with random effects is a definitive solution. It works for multiple networks, or networks combined with trees. I’m sure one could create evil combinations of unmeasured confounding and measurement error where the method will fail, but in principle it works across all relevant conditions while I show the other commonly used methods like lnam (sna R package) and MRQAP (aka Mantel test) do not work across all relevant conditions. If you can fit a random effects regression model then you can fit the method I’m recommending based on the simulations I’ve done. You don’t need any particular software package, you don’t need my code, just regress the trait distances and network ties, include random effects for node IDs, and you’re done. I shouldn’t hear anymore at conferences about how we can’t distinguish treelike inheritance from network diffusion, or determine which networks are important. Measure whatever networks or trees you think might matter, put them in the dyadic regression with random effects, and you’re done.