Since my colleagues and I published A Manual for Cultural Analysis, some people have asked for R code examples of all the things we describe. That’s a fair critique of the manual as we originally published it, although I’ll note that most of the papers and books we referenced already provide implementations or point to them. Regardless, here on my blog I will write a set of posts that will point to everything you need to implement what’s in the manual.
The first part of the manual focuses on using Cultural Consensus Analysis (CCA) and Principal Component Analysis (PCA) as a first pass at understanding cultural data. Read the manual if you want to understand why this is such an appropriate first pass.
PCA has a large associated literature that I can’t overview here. For implementation, I think the best option is the prcomp function in the ‘stats’ R package that should come with any basic R install. The other option is princomp. I prefer prcomp because it uses SVD rather than eigenvalue decomposition, which is supposed to be slightly more accurate numerically. Also, this implementation allows for data structures that have more variables than datapoints, which is a common occurrence in cultural data.
CCA is a technique from cognitive anthropology, which is a subfield of cultural anthropology. Basically it works by performing PCA on the transpose of the usual individual by variable matrix, thus you are performing PCA on a variable by individual matrix. This procedure results in loadings for the individuals on the components, and scores for the variables, again the reverse of the usual PCA procedure. Exactly why you might do this theoretically and when you might use CCA vs PCA is answered in the manual.
Skipping to implementation, the simplest way to do CCA is to simply use prcomp on the transpose of your data. Like this: prcomp(t(your.data)). The t() is the transpose function in R.
There also are some packages specifically for PCA that can allow you to fit more subtle forms of it, and allow to you ensure the mathematics are being done in more precisely the same way as in prior important articles by folks like Batchelder, Romney, and Handwerker among others (check the manual for refs). One R package option is AnthroTools for R. AnthroTools will implement the classic version of CCA, and it provides some neat data manipulation tools specific to common types of cultural anthropology data, such as free-lists. Another option with more advanced features is CCTpack, which implements both the classic CCA but also more recently developed modifications, such as contexts where there is more than one underlying cultural stance.
That should cover the options, at least in R, for implementing PCA and CCA as we described in A Manual for Cultural Analysis. Note that all R functions have example code that works down at the bottom of the help pages for them. I’ve learned a lot just by running those little examples and comparing the input data they used to the outputs generated by the functions.
Stayed tuned to my personal blog for the next days and weeks because I am going to publish similar posts for the network analysis and phylogenetic analysis chapters. Feel free to leave comments here with questions or email me.
The first part of the manual focuses on using Cultural Consensus Analysis (CCA) and Principal Component Analysis (PCA) as a first pass at understanding cultural data. Read the manual if you want to understand why this is such an appropriate first pass.
PCA has a large associated literature that I can’t overview here. For implementation, I think the best option is the prcomp function in the ‘stats’ R package that should come with any basic R install. The other option is princomp. I prefer prcomp because it uses SVD rather than eigenvalue decomposition, which is supposed to be slightly more accurate numerically. Also, this implementation allows for data structures that have more variables than datapoints, which is a common occurrence in cultural data.
CCA is a technique from cognitive anthropology, which is a subfield of cultural anthropology. Basically it works by performing PCA on the transpose of the usual individual by variable matrix, thus you are performing PCA on a variable by individual matrix. This procedure results in loadings for the individuals on the components, and scores for the variables, again the reverse of the usual PCA procedure. Exactly why you might do this theoretically and when you might use CCA vs PCA is answered in the manual.
Skipping to implementation, the simplest way to do CCA is to simply use prcomp on the transpose of your data. Like this: prcomp(t(your.data)). The t() is the transpose function in R.
There also are some packages specifically for PCA that can allow you to fit more subtle forms of it, and allow to you ensure the mathematics are being done in more precisely the same way as in prior important articles by folks like Batchelder, Romney, and Handwerker among others (check the manual for refs). One R package option is AnthroTools for R. AnthroTools will implement the classic version of CCA, and it provides some neat data manipulation tools specific to common types of cultural anthropology data, such as free-lists. Another option with more advanced features is CCTpack, which implements both the classic CCA but also more recently developed modifications, such as contexts where there is more than one underlying cultural stance.
That should cover the options, at least in R, for implementing PCA and CCA as we described in A Manual for Cultural Analysis. Note that all R functions have example code that works down at the bottom of the help pages for them. I’ve learned a lot just by running those little examples and comparing the input data they used to the outputs generated by the functions.
Stayed tuned to my personal blog for the next days and weeks because I am going to publish similar posts for the network analysis and phylogenetic analysis chapters. Feel free to leave comments here with questions or email me.