Social scientists use a wide range of statistical methods. To make the burden carried by this task view lighter, I have suppressed detail in some areas that are well covered by related task views (e.g., the
Spatial
task view for spatial statistics), and have pointed to those task views instead.
Most statistical data analysis in the social sciences is covered by the facilities in the base and recommended packages, which are part of the standard R distribution. In the package descriptions below, I identify base and recommended packages on first mention; packages that are not specifically identified as "Rbase" or "recommended" are contributed packages.
One area of central interest to social scientists that I do not cover here is statistical graphics, even though this is one of the great strengths of R: Basic R graphics, trellis graphics (in the recommended
lattice
package), dynamic 3D graphs (via the
rgl
package), and the many packages that include facilities for various statistical graphs are just too extensive to detail here. Fortunately, a Graphics task view is currently in preparation.
If I have omitted something of importance, or if a new package or function should be mentioned here,
please let me know.
Linear and Generalized Linear Models:
Univariate and multivariate linear models are fit by the
lm
function, generalized linear models by the
glm
function, both in the Rbase stats package. Beyond
summary
and
plot
methods for
lm
and
glm
objects, there is a wide array of functions that support these objects:

The generic
anova
function in the stats package constructs sequential analysis of variance and analysis of deviance tables, and can compute
F
and likelihoodratio tests for nested models. (It is typical for other classes of statistical models in R to have
anova
methods as well.) The generic
Anova
function in the
car
package (associated with Fox,
An R and SPLUS Companion to Applied Regression,
Sage, 2002) constructs socalled "TypeII" and "TypeIII" tests for linear and generalized linear models.

F
and Wald tests for a variety of hypotheses are available from the
coeftest
and
waldtest
functions in the
lmtest
package, and the
linear.hypothesis
function in the
car
package. All of these functions permit the use of heteroscedasticity and heteroscedasticity/autocorrelationconsistent covariance matrices, as computed, e.g., by functions in the
sandwich
and
car
packages. Also see the
glh.test
function in the
gmodels
package. Nonlinear functions of parameters can be tested via the
delta.method
function in the
alr3
package (associated with Weisberg,
Applied Linear Regression, 3rd Ed.,
Wiley, 2005). The
multcomp
package includes functions for multiple comparisons. The
vuong
function in the
pscl
package tests nonnested hypotheses for generalized linear and some other models. Also see the
rms
package for tests on linear and generalized linear models.

A basic R installation has excellent facilities for linear and generalized linear model
"diagnostics," including, for example, hatvalues and deletion statistics such as studentized
residuals and Cook's distances (
hatvalues,
rstudent, and
cooks.distance, all in the stats package). These are augmented by other packages: several functions in the
car
package, which emphasizes graphical methods, e.g.,
cr.plots
for componentplusresidual plots and
av.plots
for addedvariable plots, in addition to numerical diagnostics, such
vif
for (generalized) varianceinflation factors; the
dr
package for dimension reduction in regression, including SIR, SAVE, and pHd; and the
lmtest
package, which implements a wide variety of tests (e.g., for heteroscedasticity, nonlinearity, and autocorrelation). More diagnostic methods, e.g., for inverseresponse plots, may be found in the
alr3
package. The
forward
package implements diagnostics based on a "forward search" (Atkinson and Riani,
Robust Diagnostic Regression Analysis,
Springer, 2000). Other collinearity diagnostics are in the
perturb
package. Diagnostics may also be found in the
rms
package.

Several packages contain functions that are useful for interpreting linear and generalized linear models that have been fit to data: The
qvcalc
packages computes "quasi variances" for factors in linear and generalized linear models (and more generally). The
effects
package constructs effect displays, including, e.g., "adjusted means," for linear and generalized linear models. The
Zelig
package (see under
"Collections"
) creates displays for many kinds of statistical models.
Analysis of Categorical and Count Data:
Binomial logit and probit models, as well as Poissonregression and loglinear models for contingency
tables (including models for "overdispersed" binomial and Poisson data), can be fit with the
glm
function in the stats package. For overdispersed data, see also the
aod
package and the
glm.nb
function in the recommended
MASS
package (associated with
Venables and Ripley,
Modern Applied Statistics in S, Fourth Ed.
, Springer, 2002), which fits
negativebinomial GLMs. The multinomial logit model is fit by the
multinom
function in the
recommended
nnet
package, and ordered logit and probit models by the
polr
function in the MASS package. Also see the
MNP
package for the multinomial probit model, and
multinomRob
for the analysis of overdispersed multinomial data.
There are other noteworthy facilities for analyzing categorical and count data:

The
table
function in the Rbase base package and the
xtabs
and
ftable
functions in the stats package construct contingency tables.

The
chisq.test
and
fisher.test
functions in the stats package may be used to test for independence in twoway contingency tables.

The
loglm
and
loglin
functions in the MASS package fit hierachical
loglinear models to contingency tables, the former as a front end to
glm, the latter by iterative proportional fitting.

Also see
brglm
package for biasreduction in binomialresponse GLMs (useful, e.g., in cases of complete separation);
the
exactLoglinTest
package for exact tests of loglinear models; the
clogit
function in the
survival
package for conditional logistic regression; and the
vcd
package for graphical displays of categorical data.

The
gnm
package estimates generalized
nonlinear
models, and can be used, e.g., to fit certain specialized models to mobility tables.
Other Regression Models:
It is possible to fit a very wide variety of regression models with the facilities provided by the base and recommended packages, and a much wider variety of models with contributed packages:

Nonlinear regression:
The
nls
function in the stats package fits nonlinear models by leastsquares.

Generalized leastsquares regression and timeseries regression:
The
gls
function in the
recommended
nlme
package fits models by generalized least squares. The
lm
function can also fit weighted leastsquares regressions. Also see the
dynlm
package, which allows
lm
to handle timeseries data structures, and the
dyn
package, which extends this
capability to
glm
and other regression functions that are sufficiently similar to
lm
in their internal structure.

Mixedeffects models:
The recommended
nlme
package, associated with Pinheiro and Bates,
MixedEffects Models in S and SPLUS
(Springer, 2000), fits linear and nonlinear mixedeffects models, commonly used in the social sciences for hierarchical and longitudinal data. Generalized linear mixedeffects models may be fit by the
glmmPQL
function in the MASS package, and by the
lmer
function in the
Matrix
package (related to the
lme4
package, which largely supersedes
nlme
for
linear
mixed models). Also see the
lmeSplines
and
lmm
packages.

Generalized estimating equations:
The
gee
and
geepack
packages fit marginal models by generalized estimating equations.

Nonparametric regression analysis:
This is one of the conspicuous strengths of R. A standard
R installation includes several functions for smoothing scatterplots, including
loess.smooth
and
smooth.spline, both in the stats package. The
loess
function in the stats package fits simple and multipleregression models by local polynomial regression. Generalized additive models are covered by several packages, including the recommended
mgcv
package, and the
gam
package, the latter associated with Hastie and Tibshirani,
Generalized Additive Models
(Chapman and Hall, 1990). Some other noteworthy contributed packages in this area are
gss, which fits spline regressions,
locfit, for localpolynomial regression (and also density estimation) (Loader,
Local Regression and Likelihood,
Springer, 1999),
sm, for a variety of smoothing techniques, including for regression (Bowman and Azzalini,
Applied Smoothing Techniques for Data Analysis,
Oxford, 1997), and
acepack
for ACE (alternating conditional expecations) and AVAS (additivity and variance stabilization) nonparametric transformation of the response and explanatory variables in regression.

Robust regression:
The
rlm
function fits linear models by Mestimation and
lqs
computes boundedinfluence estimators; both are in the MASS package. (The
cov.rob
function in the same package computes a robust covariancematrix estimator.)
Also see the
quantreg
package, which computes linear, nonlinear, and nonparametric
quantile regressions;
lmrob
in
robustbase
and
lmRob
in
robust
for MM estimation.

Structuralequation models:
The
sem
package fits general (i.e., latentvariable) SEMs by FIML, and structural equations in observedvariable models by 2SLS. Categorical variables in SEMs can be accommodated via the
polycor
package. The
systemfit
package implements a wider variety of estimators for observedvariables models, including nonlinear simultaneousequations models. See also the
pls
package, for partial leastsquares estimation, and the
gR
task view for graphical models.

Selection bias and censored regression:
Censored regression models, such as the tobit model, can be fit by the
survreg
function in the recommended
survival
package. The
rq
function in the
quantreg
package can estimate censored quantileregression models. The
hurdle
and
zeroinfl
functions in the
pscl
package fit hurdle and zeroinflated Poisson and negativebinomial models to count data. The
heckit
function in the
micEcon
package implements twostep Heckman estimators to correct for sampleselection bias. Also see under
Survival Analysis
below.
Other Statistical Methods:
Here is a brief survey of implementations in R of other statistical methods commonly used by social scientists:

Survival (EventHistory) Analysis:
There is an extensive implementation of methods of survival analysis in the recommended
survival
package, which is associated with Therneau and Grambsch,
Modeling Survival Data
(Springer, 2000). Also see the
eha,
survrec,
frailtypack, and
rms
packages.

"Dimensional" Analysis:
Exploratory maximumlikelihood factor analysis is implemented in the
factanal
function in the stats package, which also provides for varimax and promax factor rotation. (Confirmatory factoranalysis models can be fit with the
sem
package.) Additional rotations are available through functions in the
GPArotation
package. The
prcomp
and
princomp
functions in the stats package perform principalcomponents analysis. The
cmdscale
function in the stats package performs
metric
multidimensional scaling, while the
isoMDS
and
sammon
functions in the MASS package perform
nonmetric
multidimensional scaling. For methods of cluster analysis and mixtures see the
Cluster
task view. The
BradleyTerry2
package fits the BradleyTerry model for paired comparisons. The
ltm
package fits Rasch and other itemresponse models to binary items. The
irr
package contains functions for assessing interrater reliability; also see the
psy
package.

Other Multivariate Statistics:
See the
Multivariate
task view, which includes information on graphs for visualizing multivariate data.

Missing Data:
A variety of packages implement methods for handling missing data by multiple imputation, including the
mix, and
pan
packages associated with Shafer,
Analysis of Incomplete Multivariate Data
(Chapman and Hall, 1997), and the
mice
and
mitools
packages (the latter for drawing inferences from multiply imputed data sets). There are also some facilities for missingdata imputation in the general
Hmisc
package, which is described below, under
"Collections"
.

Bootstrapping and Other Resampling Methods:
The recommended package
boot, associated with Davison and Hinkley,
Bootstrap Methods and Their Application
(Cambridge, 1997), has excellent facilities for bootstrapping and some related methods. Also notable is the
bootstrap
package, associated with Efron and Tibshirani,
An Introduction to the Bootstrap
(Chapman and Hall, 1993), which has functions for bootstrapping and jackknifing.

Model Selection:
The
step
function in the stats package and the more broadly applicable
stepAIC
function in the MASS package perform forward, backward, and forwardbackward stepwise selection for a variety of statistical models. The
regsubsets
function in the
leaps
package performs allsubsets regression. The
BMA
package performs Bayesian model averaging. Beyond these, see the
MachineLearning
task view.

Social Network Analysis:
There are several packages useful for social network analysis, including
sna
for sociometric analysis of networks (e.g., blockmodeling),
network
for manipulating and displaying network objects, and
latentnet
for latent position and cluster models for networks.

Bayesian Statistical Methods:
Because of its easy programmability, R is a natural environment within which to implement and use Bayesian methods, and there are many packages that provide such methods, including interfaces to external Bayesian software, such as BUGS. For details, see the
Bayesian
task view.

Spatial Statistics:
In addition to the recommended
spatial
package, see the
Spatial
task view for an extensive list of functions and packages for spatial data analysis.

TimeSeries Analysis:
Beyond timeseries regression (see
generalized leastsquares regression,
above), R has very extensive facilities for timeseries analysis, both in the standard R distribution and in contributed packages; for details, see the
Econometrics
and
Finance
task views.

Surveys:
The
sampling
package includes functions for selecting survey samples; the
survey
package includes functions for the analysis of data from complex sample surveys, among them functions for fitting linear and generalized linear models.

Meta Analysis:
See the
meta
and
rmeta
packages.

Propensity Scores and Matching:
See the
Matching
and
MatchIt
packages.
Collections of Functions:
There are some packages that are so heterogeneous that they are difficult to classify, yet contain functions (typically in multiple domains) that are potentially of interest to social scientists:

I have already made several references to the recommended
MASS
package, which is
associated with Venables and Ripley's
Modern Applied Statistics With S
. Other recommended
packages associated with this book are
nnet, for fitting neural networks (but also, as
mentioned, multinomial logisticregression models);
spatial
for spatial statistics; and
class, which contains functions for classification.

The
Hmisc
and
rms
packages (both mentioned above), associated with Harrell,
Regression Modeling Strategies
(Springer, 2001), provide functions for data manipulation, linear models, logisticregression models, and survival analysis, many of them "front ends" to or modifications of other facilities in R.

The
Zelig
package integrates a wide array of statistical models of interest to social scientists (see the
Zelig web site
for details).