This repository comprises the data and computer code (in Python, in a Jupyter notebook) for an assessment project with the objective of identifying measures that can evaluate high school graduates’ preparedness for collegiate studies.
Researchers have a great interest in enhancing college readiness for all youth and increasing the rate of postsecondary enrollment among graduates. The community is equally passionate about improving its students’ postsecondary success. For many of these students, college is not perceived as a viable option. Researchers posit that the district can improve college readiness among all district students by first identifying high schools to serve as “models of excellence” and then learning from these exemplars about best practices for producing “college-ready” students who enroll and persist in postsecondary education. Therefore, the available data was examined to recommend a model school.
Utilizing principal component analysis (aka dimension reduction), viable criteria were selected, upon which model schools were chosen. It appears that the most distinguishing factors to utilize for scoring are:
- The standardized grades (mathematics, SAT assessments)
- Enrolling for college. The two metrics that seem most relevant are: Enrolling for a 4-year degree (i.e., a strong education), and enrolling shortly after high school (an index of passion for education).
However, regarding data quality, there were both missing values and duplicated entries in the dataset. The strategy to manage the data was deliberated and implemented. Further research avenues (disaggregating the data within sub-regions) were suggested.
This dashboard comprises near real-time interactive information on fatal and serious injury collisions on Tennessee roadways for the current and previous years.
The dashboard allows for a nuanced analysis of fatal and serious crashes through interactive filters and graphs powered by a SQL database and Tableau. Users can analyze trends and patterns by location, road conditions, time of day, victim demographics, and other parameters. The dashboard provides actionable insights to inform traffic safety policies, enforcement initiatives, infrastructure improvements, public education campaigns, and other countermeasures aimed at reducing crash-related deaths and injuries on Tennessee roads.
This dashboard was presented at the 2019 LifeSavers Conference, an annual injury prevention and traffic safety conference organized by the National Safety Council. (Link)
This repository comprises R programming scripts and data for an evaluation project with the objective of identifying schools with exceptional performance. Historically, the Tennessee Department of Education recognized the top 10 percent of schools in the state as Reward schools. Reward Performance schools constitute the top 5 percent of schools in terms of achievement as quantified by a one-year success rate. Reward Progress schools comprise the top 5 percent of schools regarding growth as gauged by the Tennessee Value-Added Assessment System.
Utilizing the provided data, Reward Performance and Reward Progress schools among K-8 institutions were identified via statistical analysis. This allows recognition of high-performing schools based on rigorous quantitative metrics of student outcomes and growth. The R-based analysis enables reproducible identification of Reward schools. Further examination of pedagogical and administrative practices at these exceptional schools could illuminate drivers of student success.
These two dashboards, present comparative year-to-date and historical statistics on road fatalities in the state, divided into individual vehicle-related and driver/passenger-related components.
The dashboards utilize a SQL database and Tableau analytics platform to enable interactive analysis of fatal crash trends over time. The visualizations incorporate current fatality data from the present year along with historical yearly totals going back over a decade. Segmenting fatalities into vehicle-related vs. driver/passenger-related factors allows for a nuanced epidemiological understanding of crash mortality patterns.
The comparative dashboards employ data visualization principles to intuitively showcase fatality trends and disseminate actionable road safety insights to diverse stakeholders. Further analysis could relate the dashboard metrics to various interventions such as seatbelt campaigns, drunk driving crackdowns, improved road design, vehicle safety regulations, and other evidence-based policies aimed at reducing traffic-related mortality.
The objective of this project was to elucidate patterns and hypothetical rationales for land use changes in Tennessee. It comprised two sub-projects:
-
Phase I (powered by R & Shiny App): Visualizing land use alterations utilizing an interactive dashboard.
-
Phase II (powered by Python & SciKit): Identifying determinants contributing to land use changes via dimension reduction techniques.
A slideshow reviewing the methodology and results is included with the phase II set of files.
The visualizations depicted various correlations between land availability and valuation across counties. Analyses suggested the predominant criteria influencing land value are proximity to major metropolitan areas and agricultural profitability.
In summary, this project leveraged statistical programming languages and multivariate analysis to glean scientific insights into drivers of land use trends. The interactive dashboards and dimension reduction models provide data-driven understanding of how exurban dynamics shape land use patterns over time. Further research could relate the identified factors to policies on zoning, land conservation, transportation infrastructure and urban development.
This Jupyter notebook was developed to exhibit the concept of logistic regression, and how to implement this technique in Python. The code generates a logistic regression model, prints the model summary, exports and prints the coefficients, calculates predicted probabilities, and visualizes the logistic regression model along with the original data. The logistic regression model is applied to a binary response variable based on an explanatory variable. The visualization helps to understand how well the logistic regression model fits the data.