Blog Archives

Implementations for A Manual for Cultural Analysis: fixing Galton's problem

4/18/2019

In Chapter 5 of A Manual for Cultural Analysis we discuss how culture can result in emergent treelike patterns at the level of comparisons between groups. These nested patterns arise because many of our most enduring socially learned behaviors change very slowly (that’s a tautology of course, things that change quickly can’t be enduring). Anyway, when things change slowly, and as groups form and dissolve, this tends to result in fairly nested treelike branching patterns for a lot of culture at a very high group-y level.

This can result, however, in spurious findings for correlations between traits that in fact have no functional relationship but instead are both inherited through the same cultural pathways. In the cross-cultural literature this is known as Galton’s problem. See the manual if you want a discussion of this and why it is important.

For an even fuller discussion of Galton’s problem, you can check out my new book chapter. Mine is chapter 8 entitled “Dealing with Culture as Inherited Information.” Hopefully people’s university libraries are picking up copies of this, because as a whole it really is an excellent book. Shoot me an email if you are having trouble finding it.

The data supplement for that book is my code and is at the bottom of the books Wiley page, just scroll all the way down. You can download the supplement for free without buying the book. Go ahead and download it and you have a full R implementation for a variety of methods proposed to solve Galton’s problem. In the code I first show how to fit simple linear regression, which makes no correction at all for Galton’s problem. Then I show how to put in principal components that attempt to fix Galton’s problem, then network autoregression, mixed hierarchical models that use random effects for clumps in the network, and finally phylogenetic regression.

The book chapter shows simulation results that demonstrate multiple of these methods work when diffusion is the cultural process, but only the phylogenetic regression corrects for inheritance as a process. Note: in this case phylogenetic regression also works for diffusion because the network is highly treelike (i.e. nested). This is crucial! Only phylogenetic regression works irrespective of whether the main cultural process is diffusion or inheritance. Since we almost never know the main cultural process a priori, I recommend for treelike networks that we use phylogenetic regression and simply do not interpret a significant role for the phylogeny as necessarily indicative of inheritance. It could indicate diffusion on a treelike network.

FYI, I have a new preferred implementation for the phylogenetic model as compared to when I made the supplement for that book. My new preferred method is phylolm function in the package of the same name. It is much easier to control whether the phylogenetic parameters like lambda are bounded or unbounded in phylolm as compared to fitting the same model with gls. The gls way is what my data supplement code shows. To fit phylolm, you still use ape package to load in the phylogeny. Then give the phylogeny object itself straight to phylolm as a parameter (see the phylolm help file).

One quirk of phylolm is that is does not print BIC in the summary. I’ve advocated for BIC as a way to pick models. So, you can get BIC if you use the AIC function. Suppose my.tree is a fitted phylom model. You type AIC(my.tree,k=log(N)) where N is your sample size. This converts the AIC into the BIC. The principle difference in the two is AIC uses a penalty of 2 all the time, while BIC uses a penalty that is log(N). You can learn about his yourself with the AIC function help page.

OK, so between this blog and my prior one I have provided implementation for 1) determining which network is important for your cultural trait and 2) correcting for Galton’s problem on your network if it is highly treelike. That still leaves a hole in the analytic pipeline if your network is not treelike. What to do then? Don’t worry, I’m on it! I have a set of NIH-funded projects about physician networks, which are highly non-treelike. I have a paper in preparation right now that shows the phylogenetic method predictably fails under this condition to correct Galton’s problem on a messy non-tree network. In fact, all the previous methods fail! So, I’m inventing a couple new methods and hopefully will have that paper submitted soon.

0 Comments

Implementations for A Manual for Cultural Analysis: Network Regressions

4/10/2019

0 Comments

coderandeffectsnetworksims.txt
File Size:	3 kb
File Type:	txt

Download File

july12.7z
File Size:	9748 kb
File Type:	7z

Download File

I’m continuing to post implementation notes to accompany A Manual for Cultural Analysis, which I published last year together with two of my anthropologist colleagues at RAND. In my last installment, I provided links and advice for implementing CCA/PCA.

In this blog post I will address how to implement the network modeling method that is discussed in some detail in Chapter 4 of the manual. The central question is this: how do we tell which set of social connections are most important to the transmission of a cultural trait? Note: if a trait doesn’t transmit on some kind of social connection, then it can’t be socially learned, and so by definition it isn’t culture!

OK, but people are connected by ties of friendship, marriage, coworkership, twitter, etc. So how do we decide which of the various ties are most relevant to a diffusing cultural trait? We cover this question in a lot of detail in the manual. I covered it with even more detailed simulations in my paper with my student Rouslan Karimov. We never got a supplement published for that paper, so I’m posting here the code you need to run the most important analysis from the paper – the method that works. First, make sure R is installed. Download the files that are part of this blog post. Extract (unzip) the July12.7z file - I had to zip it to post it here. After it is extracted you should have a file July12.RData. Then if you double click the July12.RData file it should start up R and will already have the simulated data objects you need in the workspace. Type ls() in the R command line to see what is in the workspace. If double clicking doesn’t work, then start R and type load(“diffsimcont.RData”). Make sure you have used the change directory feature in the dropdown to move your active directory to wherever you put July12.RData.

Think of this like a cooking show where some intermediate step is already baked. To create the things in the R workspace you just loaded you would need to simulate networks, simulate trees, then simulate characters diffusing/evolving on them, etc. I’m happy to provide the simulation code to anyone interested. Just email me. The focus of this blog post, however, is not about building simulations but learning to apply dyadic regression with random effects to network/tree datasets.

OK, so then start running my code file called CodeRandEffectsNetworkSims.txt. Simply copying and pasting one line at a time into the R command line is a good way to learn how a piece of R code works. When you get to the loop you would have to paste in the whole loop for it to run; however, I recommend you set i equal to something, like i=1, and then walk through the loop one line at a time as well. That will enable you to inspect what is happening in the loop. One important part is this bit where it defines what you need to run the random effects regression that controls for the repeated identities of the individuals. The individuals are being repeated across each of their network relationships:

names.vector<-1:nrow(sn.adj1)

rows<-matrix(rep(names.vector,ncol(sn.adj1)),ncol=ncol(sn.adj1))
cols<-matrix(rep(names.vector,ncol(sn.adj1)),ncol=ncol(sn.adj1),byrow=T)

outcome.vector<-as.vector(daisy(as.data.frame(netsims[,,i]),metric="euclidean"))

temp.data<-data.frame(outcome.vector,sn.adj1[lower.tri(sn.adj1)],sn.adj2[lower.tri(sn.adj2)],rows[lower.tri(rows)],cols[lower.tri(cols)])

colnames(temp.data)<-c("outcome","sn.adj1","sn.adj2","rows","cols")

net.mod<-lmer(outcome~sn.adj1+sn.adj2+(1|rows)+(1|cols),data=temp.data)

Within the lmer function call the terms (1|rows) and (1|cols) are what is specifying the random effects – which are just the identity of each row and column for each dyadic datapoint. I like lmer in the lme4 package for random effects models (aka mixed hierarchical models) in R, but another option is gls in the nlme package. There are more options besides these as well, including in other statistical packages like SAS, which has some very good random effects modeling routines. I’m not going to discuss fully here why this is the best approach to determining which tree or network most governs the cultural diffusion process for a trait – read the manual or Karimov and Matthews 2017 if you want the answer to that.

In terms of getting to know how lmer works, be sure to run some of the example code provided in the lmer help file. From R you can get to the help for any function by typing ?function.name in the R command line. For example, typing ?lmer will get you the lmer help file.

I will say that I think the simulations in Karimov and Matthews 2017 are more comprehensive than anything anyone else has ever done on this issue. We show that the dyadic regression with random effects is a definitive solution. It works for multiple networks, or networks combined with trees. I’m sure one could create evil combinations of unmeasured confounding and measurement error where the method will fail, but in principle it works across all relevant conditions while I show the other commonly used methods like lnam (sna R package) and MRQAP (aka Mantel test) do not work across all relevant conditions. If you can fit a random effects regression model then you can fit the method I’m recommending based on the simulations I’ve done. You don’t need any particular software package, you don’t need my code, just regress the trait distances and network ties, include random effects for node IDs, and you’re done. I shouldn’t hear anymore at conferences about how we can’t distinguish treelike inheritance from network diffusion, or determine which networks are important. Measure whatever networks or trees you think might matter, put them in the dyadic regression with random effects, and you’re done.

0 Comments

Implementations for A Manual for Cultural Analysis: CCA/PCA

4/9/2019

0 Comments

Since my colleagues and I published A Manual for Cultural Analysis, some people have asked for R code examples of all the things we describe. That’s a fair critique of the manual as we originally published it, although I’ll note that most of the papers and books we referenced already provide implementations or point to them. Regardless, here on my blog I will write a set of posts that will point to everything you need to implement what’s in the manual.

The first part of the manual focuses on using Cultural Consensus Analysis (CCA) and Principal Component Analysis (PCA) as a first pass at understanding cultural data. Read the manual if you want to understand why this is such an appropriate first pass.

PCA has a large associated literature that I can’t overview here. For implementation, I think the best option is the prcomp function in the ‘stats’ R package that should come with any basic R install. The other option is princomp. I prefer prcomp because it uses SVD rather than eigenvalue decomposition, which is supposed to be slightly more accurate numerically. Also, this implementation allows for data structures that have more variables than datapoints, which is a common occurrence in cultural data.

CCA is a technique from cognitive anthropology, which is a subfield of cultural anthropology. Basically it works by performing PCA on the transpose of the usual individual by variable matrix, thus you are performing PCA on a variable by individual matrix. This procedure results in loadings for the individuals on the components, and scores for the variables, again the reverse of the usual PCA procedure. Exactly why you might do this theoretically and when you might use CCA vs PCA is answered in the manual.
Skipping to implementation, the simplest way to do CCA is to simply use prcomp on the transpose of your data. Like this: prcomp(t(your.data)). The t() is the transpose function in R.

There also are some packages specifically for PCA that can allow you to fit more subtle forms of it, and allow to you ensure the mathematics are being done in more precisely the same way as in prior important articles by folks like Batchelder, Romney, and Handwerker among others (check the manual for refs). One R package option is AnthroTools for R. AnthroTools will implement the classic version of CCA, and it provides some neat data manipulation tools specific to common types of cultural anthropology data, such as free-lists. Another option with more advanced features is CCTpack, which implements both the classic CCA but also more recently developed modifications, such as contexts where there is more than one underlying cultural stance.

That should cover the options, at least in R, for implementing PCA and CCA as we described in A Manual for Cultural Analysis. Note that all R functions have example code that works down at the bottom of the help pages for them. I’ve learned a lot just by running those little examples and comparing the input data they used to the outputs generated by the functions.

Stayed tuned to my personal blog for the next days and weeks because I am going to publish similar posts for the network analysis and phylogenetic analysis chapters. Feel free to leave comments here with questions or email me.

0 Comments

Author

This is my personal blog. The views expressed on this page are my own. My views should not be taken to represent the views of my mentors, employer, or any person or group other than myself.

Implementations for A Manual for Cultural Analysis: fixing Galton's problem

Implementations for A Manual for Cultural Analysis: Network Regressions

Implementations for A Manual for Cultural Analysis: CCA/PCA

Author

Archives

Categories