Research Matters I, August 2016

Statistical Resources Update to Version 5

By Jon Starkweather, Ph.D., consultant, Research IT Services

This month’s article originally appeared first in November of 2011, but periodically, it is necessary to update it with more current resources. The original article was motivated by a workshop for graduate students and contains much of the same content presented in the workshop: Statistical Resources. The following materials are, for the most part, freely available through the World Wide Web. The resources mentioned below fall, generally, into three categories; the resources we at R&SS maintain, the resources available to UNT community members, and resources available to the general public with access to the web.

NOTE: This article is particularly important this month, because UIT is migrating all webpages to Drupal 7 – all of our URLs are changing.

Research Computing Services Resources

The main Research Computing Services website offers several resources, both specific resources aimed at particular software and more general resources, e.g., Exam and Research Data Services. One of the key resources available to members of the UNT community is the opportunity to set up a consulting appointment with a Data Science and Analytics consultant. The link to contact DSA staff for consultation is prominently displayed on each of the pages associated with DSA. The link guides clients to a web interface, known as the Front Range system, which forwards the service request to DSA staff, who then contact the requestor directly – generally through email. Please, read the frequently asked questions prior to submitting a Front Range request. It is also important to note that DSA staff maintains a rather extensive collection of digital and paper copies of articles, book chapters and whole books. DSA staff members often lend copies of these – in whole or part – to clients so clients can research various analytic or methodological concepts to their own satisfaction (and often the satisfaction of their colleagues, advisors, or committees, etc.).

A second frequently used resource DSA offers consists of the instructional services for DSA supported software. These were initially short courses offered in a classroom twice per semester; however, they have been migrated to the online format so that they may reach a wider audience and allow self-paced learning. These pages were designed to show how a particular software package can be used (e.g., R, SPSS, SAS), they are not designed to teach statistics or how to interpret statistics (although some interpretation is offered among the many pages). In fact, some of the software supported by DSA is not directly related to statistics (e.g., survey technology such as Zope and QSurvey). On each of the R, SPSS, SAS short course pages you will also find links to resources specific to those software packages; from user manuals provided by the software producer (e.g., SPSS Manuals, CRAN Task Views) to other users’ user guides or websites (e.g. Quick-R, STAT-L). There is even an R specific search engine available called, RSeek.

Another resource DSA offers is displayed right here; the contributions by DSA staff to this publication, Benchmarks Online, in the Research Matters column and its archives. Each article in the Research Matters column is linked to the previous article and an index of Research Matters articles is maintained on the DSA website. The index is quite handy for finding particular topics (e.g., canonical correlation), rather than clicking back through the years of articles available through the column links.

R&SS offers a service for instructors at UNT with which we can provide a randomly sampled data set from a fictional population named Examplonia. Examplonia is a fictional country which provides a meaningful context for statistical analysis examples. The population data for Examplonia was generated to provide a statistical population from which random samples could be drawn for the completion of example statistical analysis problems. The current version of the Examplonia population contains a variety of univariate, bivariate, and multivariate effects; including random effects based on hierarchical structure. If you are an instructor for a statistics course, you may be interested in obtaining some simulated data for your class (i.e. data for in-class demonstrations, homework assignments, etc.). Learn more about the population by visiting the Examplonia webpage.

R&SS also implemented some new services last year, all of which are focused on making software available to researchers through a web browser and relieving them of the need to download and install software. Meaning that Sage Mathematics and RStudio along with the other services can be accessed through a web browser. Sage Mathematics is mathematical computing software which can integrate the use of R. A brief introduction can be found at the Sage link above. RStudio is an integrated development environment for running the R statistical package. A brief introduction can be found here. Another new service is called Tiki Wiki, an open source, freely available, content management system, CMS. More information can be found here. These servers/services are available to faculty and advanced graduate students; however those interested need to submit a request for an access account for each service. Once a user has setup an account, they can simply visit the servers using their preferred web browser and conduct analyses using the software without having to install the software on their local machines.

Online Statistical Textbooks

The Rice Virtual Lab in Statistics is a valuable site for anyone interested in learning or teaching some of the basics of traditional, i.e. frequentist, statistics. The site offers several animations for understanding concepts which are often difficult for newcomers to statistics, e.g., sampling distribution characteristics and the Central Limit Theorem. The Rice University Virtual Lab in Statistics also offers an online – free; no registration required – introductory statistics textbook. The textbook is called HyperStat and contains chapters which cover the usual contents such as describing univariate and bivariate data, elementary probability, the normal distribution, point estimation, interval estimation, Null Hypothesis testing, statistical power, t-tests, Analysis of Variance (ANOVA), prediction, chi-square, non-parametric tests, and effect size estimates.

Another online repository of statistical resources is the site maintained by Michael Friendly at York University. The site offers a variety of links to resources for a variety of software, tutorials for specific analyses, and sections of links for statistical societies, associations, and academic departments; as well as links to support more general computing resources (e.g., using Unix). A similar site listing various statistical resources on the web is maintained by Clay Helberg.

Statsoft, the company behind the statistical software Statistica, also offers web surfers a textbook covering a variety of statistical topics. The Statsoft site covers topics ranging from elementary concepts, basic statistics, ANOVA/MANOVA to multivariate topics such as principle components and factor analysis, multidimensional scaling, and structural equation modeling. Unlike Statnotes, mentioned above, the Statsoft site does not offer software output or interpretation (although graphs and tables are often used). However, one handy feature of the Statsoft site is the interactive glossary; each hyperlinked word sends the users to the definition/entry for that word in the glossary. The Statsoft textbook is also available in printed form for $80 plus shipping.

Miscellaneous Other Resources

Another resource option for members of the UNT community, which is often overlooked, is the UNT library system. The library’s general catalog contains a monumental collection of resources, from textbooks being used in current courses to books which focus on the statistical analyses used in particular fields and authoritative books devoted to specific types of analysis (e.g., searching “logistic regression” yielded 66 returns). Furthermore, the electronic resources offer access to thousands of periodicals (i.e. journals) from a variety of databases (e.g. EBSCOHost, Medline, ERIC, LexisNexis, & JSTOR). One of the most frequently used databases by R&SS staff is the JSTOR database, which contains many of the most prominent methodological and statistical journals – with almost all articles available (through the UNT portal) in full text (i.e. Adobe.pdf format). Another commonly used resource is the Journal of Statistical Software, which contains articles on a variety of statistical computing applications/software, as well as articles covering statistical methods. One more often consulted resource is the little green books which are actually a series published by Sage. The Quantitative Applications in the Social Sciences series are a collection of thin, soft covered, books; each dealing with a specific research or statistical topic. The UNT library carries approximately 145 of the series’ editions and the R&SS staff has collected most of the series as well. There are approximately 170 books in the series and a typical researcher would be hard pressed not to find something of value among them. Of course, there are more general resources, such as Google, Scholarpedia, Wikipedia, and even Youtube; all of which can be useful.

Until next time, remember, GIYF – Google is your friend.