{"id":514,"date":"2014-02-27T18:06:35","date_gmt":"2014-02-27T18:06:35","guid":{"rendered":"http:\/\/kktg.net\/sgr\/?page_id=514"},"modified":"2014-02-27T18:06:35","modified_gmt":"2014-02-27T18:06:35","slug":"sgr-code-5","status":"publish","type":"page","link":"http:\/\/kktg.net\/sgr\/sgr-code-2\/sgr-code-5\/","title":{"rendered":"SGR Code from Appendices"},"content":{"rendered":"<h3>R Code for Appendix B: Statistics with R<\/h3>\n<pre lang=\"r-script\" line=\"1\">                                              \r\n#===============================================================================\r\n# Appendix B -- Statistics with R\r\n#===============================================================================\r\n\r\n# datasets package -------------------------------------------------------------\r\n\r\nstr(ToothGrowth)                       # Show structure of ToothGrowth data\r\n\r\nTG = ToothGrowth                       # Shorten dataset name\r\nTG$supp2 = as.character(TG$supp)       # Add character version of supp\r\ndim(TG)                                # Show data frame dimensions\r\n\r\n# AII.1 Descriptive statistics =========================================== AII.1\r\n\r\nsummary(TG$len)                        # Summary of a numeric variable\r\nsummary(TG$supp)                       # Summary of a factor variable\r\nsummary(TG)                            # Get summary of all TG variables\r\n\r\n# Individual descriptive statistics --------------------------------------------\r\nmean(TG$len)                           # Mean of a numeric variable\r\nmedian(TG$len)                         # Median of a numeric variable\r\nmin(TG$len)                            # Minimum of a numeric variable\r\nmax(TG$len)                            # Maximum of a numeric variable\r\nsd(TG$len)                             # Standard Deviation of numeric variable\r\nvar(TG$len)                            # Variance of numeric variable\r\n\r\n# Note that sd and var are sample statistics (denominator = n-1).\r\n#  you'll have to calculate the population statistics yourself:\r\n\r\n# Calculate population variance ------------------------------------------------\r\npopvar = sum((TG$len - mean(TG$len))^2\/length(TG$len))\r\npopsd = sqrt(popvar)                   # Population standard deviation\r\nprint.table(c(\"Sample variance: \",     # Here is printing with some labels\r\n  round(var(TG$len), 2),\r\n  \"Population variance: \", \r\n  round(popvar, 2)))\r\n  \r\nprint(c(sd(TG$len), popsd))            # Straight print without labels\r\n\r\n# With missing values (not in book) -------------------------------------------- \r\n# Now the same things for a dataset with missing values.\r\n\r\nTG$len[11] = NA                        # Put in a missing value \r\n\r\npopvar2 = sum((TG$len - mean(TG$len, na.rm = T))^2, na.rm = T)\/\r\n  (length(which(!is.na(TG$len))))\r\npopsd2 = sqrt(popvar2)\r\n\r\nprint(c(sd(TG$len), popsd2))           # Straight print without labels\r\n\r\n# Treatment of missing values for descriptive statistics -----------------------\r\n\r\nmean(TG$len)                           # If we take the mean we get NA\r\nmean(TG$len, na.rm = T)                # This will do it for us\r\nmedian(TG$len, na.rm = T)              # Likewise for these other values\r\nmax(TG$len, na.rm = T)\r\nmin(TG$len, na.rm = T)\r\n\r\nlength(TG$len[is.na(TG$len) == T])     # Two ways to count missing values\r\nsummary(TG$len)\r\n\r\n# std deviation for a data frame -----------------------------------------------\r\n\r\nsapply(TG, sd, na.rm = T)              # Apply std dev function to data frame\r\n\r\n# The mode issue ---------------------------------------------------------------\r\n\r\nmode(TG$len)                           # Object TYPE-not mode statistic!\r\n\r\n# A mode workaround\r\n# Here we start by changing our numeric variable to a factor.  Then we count \r\n# count the number of times each value appears.  Finally we sort by that \r\n# count in decreasing order to see the mode.\r\n#\r\n# as.factor() turns the numeric variable into a factor\r\n# split() breaks the var into individual vectors by the values of the factor\r\n# the length function counts how long each of the resulting vectors is\r\n# sapply() applies the length function to each of those vectors\r\n# sort() sorts the resulting lengths from longest to shortest (decreasing=TRUE)\r\n\r\nsort(sapply(split(TG$len, as.factor(TG$len)), length), decreasing = T)\r\n\r\n# Calculating standard errors --------------------------------------------------\r\n\r\n# 1. Calculate standard error (no missing values in dataset)\r\nstd.error = sd(TG$len)\/sqrt(length(TG$len))\r\nstd.error                              # Display standard error\r\n\r\n# Calculate standard error (missing values in dataset)\r\n\r\nTG$len[5] = NA                         # Switch one value to missing\r\n\r\nstd.error = sd(TG$len, na.rm = T)\/sqrt(length(!is.na(TG$len)))\r\nstd.error                              # Display standard error\r\n\r\n# Appendix II.2 - Visualizing univariate data ============================ AII.2\r\n\r\n# histograms -------------------------------------------------------------------\r\n\r\npng(filename = \"illustrations\/fig-AII-1-univariate visualization 1.png\",\r\n  units = \"in\",                        # Set measurements in inches\r\n  res = 1200,                          # Set resolution at 1200dpi\r\n  width = 6,                           # Width at 6 inches\r\n  height = 4)                          # Height at 4 inches\r\n\r\n\r\npar(mfrow=c(1, 2),                     # Set multiple plots\r\n  mai = c(.85, .85, .5, .25))          # Set margins\r\nhist(TG$len)                           # Default histogram: numeric var\r\nhist(TG$len, breaks = 4)               # Histogram w\/ about 4 breaks\r\n\r\ndev.off()                              # Output png file\r\n\r\npng(filename = \"illustrations\/fig-AII-2-univariate visualization 2.png\",\r\n  units = \"in\",                        # Set measurements in inches\r\n  res = 1200,                          # Set resolution at 1200dpi\r\n  width = 6,                           # Width at 6 inches\r\n  height = 4)                          # Height at 4 inches\r\n\r\n\r\npar(mfrow=c(1, 2),                     # Set multiple plots\r\n  mai = c(.5, .5, .25, .25))           # Set margins \r\nplot(TG$supp)                          # For factors just use plot()\r\nboxplot(TG$len)                        # It doesn't get any easier than this\r\n\r\ndev.off()                              # Output png file\r\n\r\n\r\n# AII.3 - Bivariate descriptives: Measures of Association ================ AII.3\r\n\r\n# We'll use the built-in trees data.  For demonstration purposes we'll just\r\n# use the first 26 observations and add a character variable using the \r\n# built-in LETTERS vector (A-Z)\r\n\r\nstr(trees)                             # Show structure of built-in trees data\r\nmyTrees = trees[1:26,]                 # Use the first 26 observations\r\nmyTrees$treeID = LETTERS               # Add a letter ID for each obs\r\nmyTrees$even =                         # Add a factor variable for even and\r\n  as.factor(c(\"even\", \"odd\"))          #   odd observations\r\nsummary(myTrees)                       # Summary of new trees data\r\n\r\n\r\n# Correlation ------------------------------------------------------------------\r\n\r\nsapply(myTrees, is.numeric)            # Show which variables are numeric\r\ncor(myTrees[,-c(4, 5)])                # Correlation matrix for vars 1-3\r\n                                       # Generic numeric filter\r\ncor(myTrees[sapply(myTrees, is.numeric)])\r\n\r\n# Correlation output rounded ---------------------------------------------------\r\n\r\nround(cor(myTrees[-c(4, 5)]), 2)       # Output rounded to two place\r\n\r\n# Correlation with missing values ----------------------------------------------\r\n\r\nmyTrees$Girth[14] = NA                 # Add some missing values\r\nmyTrees$Height[3] = NA\r\nmyTrees$Volume[21] = NA\r\n\r\ncor(myTrees[-c(4, 5)])                 # Cor() chokes on missing values\r\ncor(myTrees[-c(4, 5)],                 # Trying to use every obs produces\r\n  use = \"everything\")                  #   NA if either var has an NA\r\ncor(myTrees[-c(4, 5)],                 # complete.obs uses only those\r\n  use = \"complete.obs\")                #   observations that have no NA's\r\ncor(myTrees[-c(4, 5)],                 # Pairwise uses obs that are \r\n  use = \"pairwise.complete.obs\")       #   complete for each pair\r\n  \r\n# Bivariate visualizations -----------------------------------------------------\r\n\r\npng(filename = \"illustrations\/fig-AII-3-bivariate visualization.png\",\r\n  units = \"in\",                        # Set measurements in inches\r\n  res = 1200,                          # Set resolution at 1200dpi\r\n  width = 6,                           # Width at 6 inches\r\n  height = 4)                          # Height at 4 inches\r\n\r\npar(mai = c(1, 1, .25, .25))           # Set margins\r\n\r\nplot(myTrees$Girth,myTrees$Height,     # Default 2 variable scatter plot\r\n  xlim = c(5, 20),ylim = c(40, 100))   # Set axis lengths\r\nabline(                                # Add 2 variable regression line\r\n  lm(myTrees$Height ~ myTrees$Girth))    \r\n\r\ndev.off()                              # Output png file \r\n\r\n# pairs plot -------------------------------------------------------------------\r\n\r\npng(filename = \"illustrations\/fig-AII-4-pairs plot.png\",\r\n  units = \"in\",                        # Set measurements in inches\r\n  res = 1200,                          # Set resolution at 1200dpi\r\n  width = 6,                           # Width at 6 inches\r\n  height = 4)                          # Height at 4 inches\r\n\r\npar(mai = c(0, 0, 0, 0))               # Set margins\r\n\r\npairs(myTrees[-c(4, 5)])               # Pairs to produce bivariate plots\r\n\r\ndev.off()                              # Output png file\r\n\r\n# AII.4  - Hypothesis testing ============================================ AII.4\r\n\r\n# Using edata--environmental data\r\n\r\nedata = read.delim(\"http:\/\/www.kktg.net\/R\/Appendix2Data.txt\",\r\n  colClasses =                         # Set column storage modes\r\n    c(\"character\", rep(NA, 8)),        #   1st col char, let R choose rest \r\n  header = TRUE,                       # Use column headers for variable names\r\n  row.names = 1)                       # Use first column for obs names\r\nedata$Kyoto = as.logical(edata$Kyoto)  # Make signatory status logical obj\r\n\r\nsummary(edata)                         # Summary of variables\r\n\r\n# Outliers ---------------------------------------------------------------------\r\n# Show outliers\r\n\r\nprint(edata[which(abs(edata$dCO2) > 2.5), c(\"Kyoto\", \"dCO2\")], digits = 2)\r\n\r\n# Drop outliers\r\nedata2 = edata[which(abs(edata$dCO2) < 2.5),]\r\n\r\nsummary(edata2)\r\n#-------------------------------------------------------------------------------\r\nvar.test(                              # Check variances to be sure\r\n  edata2$dCO2[edata2$Kyoto],           #   t-test is appropriate\r\n  edata2$dCO2[!edata2$Kyoto])          # This generates an F statistic.\r\n\r\n\r\nkCO2 = t.test(                         # Construct t-test to compare mean\r\n  edata2$dCO2[edata2$Kyoto],           #   change in CO2 output between\r\n  edata2$dCO2[!edata2$Kyoto])          #   signatories\/non-signatories.\r\n\r\nkCO2                                   # Print results\r\nsummary(kCO2)                          # Show elements in the t-test\r\nkCO2$statistic                         # Print just the t-statistic\r\nkCO2$p.value                           # Print just the p-value\r\n\r\n# AII.5 - Crosstabulation ================================================ AII.5\r\n\r\n# Using same edata\r\n\r\nedata2$VA98 =                          # Create a dichotomous measure of\r\n  ifelse(edata2$WBVA98 > 0, T, F)      #   voice and accountability\r\ndkxtab = xtabs(                        # Crosstab 1998 VA and Kyoto signing\r\n  formula = ~ edata2$VA98 + edata2$Kyoto)   \r\ndkxtab                                 # Display cell values\r\nsummary(dkxtab)                        # Display crosstab results (chi-square)\r\n\r\n#  crosstab using chisq.test() on contingency table ----------------------------\r\n\r\nVAK = table(                           # Get table showing frequencies\r\n  edata2$VA98,                         #   for high\/low V & A and\r\n  edata2$Kyoto)                        #   Kyoto signature status\r\n\r\nVAKtest = chisq.test(VAK,              # Chi square test for matrix\r\n  correct = FALSE)                     # Turn off continuity correction\r\n\r\nVAKtest                                # Display results\r\n\r\nVAKtest$expected                       # Show expected cell values matrix\r\n\r\n# AII.6 - Analysis of Variance (ANOVA) =================================== AII.6\r\n\r\nedata2$VA = ifelse(edata2$WBVA > 0, T, F)  # Create Voice indicator\r\n\r\n# Show compliance means by dem\r\ntapply(edata2$EG,                      # Tapply mean() to Econ Growth\r\n  edata2$Kyoto,                        # Partition data by Kyoto\r\n  FUN = mean,                          # Apply mean()\r\n  na.rm = T)                           # Remove missing observations\r\n\r\n# ANOVA analysis ---------------------------------------------------------------\r\n\r\nmyAOV = aov(EG ~ Kyoto, data = edata2) # ANOVA w\/aov()\r\nsummary(myAOV)                         # Display results\r\nmyAnova = anova(                       # ANOVA w\/anova()\r\n  lm(edata2$EG ~ edata2$Kyoto))        #  analyzing a linear model\r\nmyAnova                                # Display results\r\n\r\n# Use aov() results to get group means\r\nprint(model.tables(myAOV,\"means\"), digits = 2)\r\n\r\n# Do boxplot\r\n\r\npng(filename = \"illustrations\/fig-AII-5-boxplot.png\",\r\n  units = \"in\",                        # Set measurements in inches\r\n  res = 1200,                          # Set resolution at 1200dpi\r\n  width = 6,                           # Width at 6 inches\r\n  height = 4)                          # Height at 4 inches\r\n\r\npar(mai = c(.5, .5, .25, .25))         # Set margins\r\n\r\nboxplot(EG ~ Kyoto, data = edata2)     # Produce boxplot\r\npoints(1 + jitter(as.integer(edata2$Kyoto)), edata2$EG)\r\n\r\ndev.off()                              # Output png file\r\n\r\n\r\n# AII.7 - Linear Regression ============================================== AII.7\r\n\r\n# First we'll check on correlations\r\n\r\ncor(edata2$WBVA, log(edata2$GDPc),     # Check correlations\r\n  use=\"pairwise.complete.obs\")         # Use all complete observations\r\n\r\n# ------------------------------------------------------------------------------\r\nmyModel = lm(                          # Set up a linear model with\r\n  edata2$WBRQ~                         #  Quality of regulation as a\r\n   edata2$WBVA+                        #  function of Voice\/Accountability\r\n    log(edata2$GDPc))                  #  and the log of GDP per capita\r\n\r\nsummary(myModel)                       # Show model results\r\n\r\ncor(edata2[sapply(edata2,is.numeric)], # Correlation matrix for numeric vars\r\n  use = \"pairwise.complete.obs\")         # Use all obs with complete pairs\r\n\r\nmyDiag = ls.diag(myModel)              # set up model diagnostics\r\nattributes(myDiag)\r\nround(myDiag$correlation,2)            # Show correlation matrix\r\n\r\n\r\nplot(myModel$fitted.values,            # plot residuals v. fitted\r\n  myDiag$std.res)\r\n\r\nplot(myModel)                          # Generate standard lm plots\r\n\r\n<\/pre>\n<h3>R Code for Appendix D: Some Style Guidelines<\/h3>\n<pre lang=\"r-script\" line=\"1\">\r\n#===============================================================================\r\n# Appendix D - Some style guidelines\r\n#===============================================================================\r\n\r\n# The importance of style and some white space here and there\r\n         \r\n# Example 1 - No white space and no comments\r\n\r\nmyVariable=c(1,4,6,7,9,13,15,52,62,78)\r\npopvar=sum((myVariable-mean(myVariable))^2\/length(myVariable));popsd=\r\nsqrt(popvar);print.table(c(\"Variance: sample -\",round(var(myVariable),2),\r\n\"population -\",round(popvar,2)));print(c(sd(myVariable), popsd))\r\n\r\n# Example 2 - White space & comments\r\n\r\n# Create some data\r\nmyVariable = c(1, 4, 6, 7, 9, 13, 15, 52, 62, 78)\r\n\r\n# Calculate population variance\r\npopvar = sum((myVariable - mean(myVariable))^2\/length(myVariable))\r\n\r\n# Calculate population standard deviation\r\npopsd = sqrt(popvar) \r\n\r\n# Print results with labels\r\nprint.table(c(\"Variance: sample -\",\r\n  round(var(myVariable), 2),\r\n  \"population -\", \r\n  round(popvar, 2))\r\n)\r\n\r\n# Print results without labels\r\nprint(c(sd(myVariable), popsd))  \r\n\r\n# Indentation ------------------------------------------------------------------\r\n# Double space indentation\r\nmyData = read.delim(\"myDataFile.txt\", \r\n  headers = TRUE, \r\n  colClasses = c(\"character\", \"logical, \r\n    \"logical\", \"numeric\", \"factor\", \r\n    \"character\", \"numeric\", \"Date\")\r\n  na.strings = \"N\/A\")\r\n\r\n# Parenthetical alignment indentation  \r\nmyData = read.delim(\"myDataFile.txt\", \r\n                    headers = TRUE, \r\n                    colClasses = c(\"character\", \"logical, \"logical\", \r\n                                   \"numeric\", \"factor\", \"character\", \r\n                                   \"numeric\", \"Date\")\r\n                    na.strings = \"N\/A\")\r\n  \r\n# Placement of parentheses, brackets, and braces -------------------------------\r\n\r\nmyFunction = function(x, y, z){          # Open w\/ curly bracket on same line\r\n  print(\"This function doesn't do anything\")\r\n}                                      # End w\/ curly bracket on own line\r\n\r\n\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>R Code for Appendix B: Statistics with R #=============================================================================== # Appendix B &#8212; Statistics with R #=============================================================================== # datasets package &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;- str(ToothGrowth) # Show structure of ToothGrowth data TG = ToothGrowth # Shorten dataset name TG$supp2 = as.character(TG$supp) # Add character version of supp dim(TG) # Show data frame dimensions # AII.1 Descriptive statistics =========================================== [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":98,"menu_order":0,"comment_status":"closed","ping_status":"open","template":"","meta":{"footnotes":""},"class_list":["post-514","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"http:\/\/kktg.net\/sgr\/wp-json\/wp\/v2\/pages\/514","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/kktg.net\/sgr\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"http:\/\/kktg.net\/sgr\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"http:\/\/kktg.net\/sgr\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/kktg.net\/sgr\/wp-json\/wp\/v2\/comments?post=514"}],"version-history":[{"count":1,"href":"http:\/\/kktg.net\/sgr\/wp-json\/wp\/v2\/pages\/514\/revisions"}],"predecessor-version":[{"id":515,"href":"http:\/\/kktg.net\/sgr\/wp-json\/wp\/v2\/pages\/514\/revisions\/515"}],"up":[{"embeddable":true,"href":"http:\/\/kktg.net\/sgr\/wp-json\/wp\/v2\/pages\/98"}],"wp:attachment":[{"href":"http:\/\/kktg.net\/sgr\/wp-json\/wp\/v2\/media?parent=514"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}