What is Data SGP?
Data SGP is an analysis tool that works with the statistical software environment R. It is designed for educational assessment data but it can be used with any kind of statistical data; to use the tool users must have access to a computer running Windows, OSX or Linux as well as to an installation of the free version of the R software package; further details on the system requirements are available at the Data SGP web page. This analysis tool uses advanced statistical functions; it is intended to be user-friendly but it does require some familiarity with the basic concepts of statistics and the basics of programming in the R language. There are numerous online resources to assist newcomers in getting started with the software.
Despite the fact that it uses large and complex statistical techniques data sgp is not really “big data”. In fact, by comparison to the massive data sets analyzed in scientific applications or the vast amounts of data being processed by businesses such as Facebook, it is quite small. The main reason it has to be so small is that its primary purpose is to assemble a huge amount of meta-data for academic research and then provide easy-to-use tools for exploring the data and extracting meaningful information.
One of the main features that distinguishes data sgp from competing tools is that it allows students to be compared across multiple assessments. This is an extremely valuable feature when assessing the effectiveness of education programs in meeting official state achievement targets/goals as it ensures that growth across the range of assessment dates is accounted for.
It also makes it possible to identify students who are making good progress toward reaching their targets despite the fact that they may have been performing at the top of their grade level in some years but not in others. While this should not replace traditional benchmarking, it does help schools communicate to stakeholders that proficiency must be reached within a specified time frame and can serve as a powerful motivational tool for students.
The main data set used by data sgp is sgpData. It contains the student records for all students who took a statewide test between 2010 and 2014. The first column, ID, is the unique student identifier. The next 5 columns, GRADE_2013, GRADE_2014, GRADE_2015, and GRADE_2016, contain the grade level scores for the students on each of the 5 statewide tests that were administered during this period. The last column, SS_INSTRUCTOR_NUMBER, provides an anonymized list of teachers for each of the content areas tested.
Most of the analyses that are supported by data sgp use a single function, studentGrowthPercentiles, which takes the cohort sample subset and returns the percentile values for each student. This function can be passed a character, ‘ALL_DATA’, to request that the original data be returned for inspection; this option is recommended. The lower level functions in data sgp that do the actual calculations (and higher level wrapper functions) require WIDE formatted data. More detailed documentation on using WIDE data formats can be found in the sgpData Analysis Vignette.