Author

Farzin Maniei

ORCID Identifier(s)

0000-0002-2071-2043

Graduation Semester and Year

2023

Language

English

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Civil Engineering

Department

Civil Engineering

First Advisor

Stephen P Mattingly

Abstract

The continuous expansion of highway and freeway networks further exacerbates the risk of traffic crashes and highlights the critical importance of freeway safety management. To improve overall road safety, many organizations acknowledge the necessity of pinpointing locations experiencing higher-than-expected crashes (known as hotspot identification, HSID), identifying the factors contributing to traffic crashes, and determining the most effective preventive measures. Previous studies highlighted two major drawbacks associated with this approach: (1) HSID based on the total number of crashes can lead to incorrectly identifying hazardous areas; (2) the arbitrary selection of a fragment size (due to the lack of explicit recommendations) used for dividing the highway and freeways into small segments to aggregate data may affect the factors that correlate with crash rates in predictive models. This study addresses the urgent need for investigating the merits of expanding traffic crash analysis from total crashes to traffic crash subsets and providing a standard approach for recommending the fragment size when aggregating crash groups and roadway data based on three crash characteristics (i.e. crash units, manner of collision, and crash severity). The study performs feature selection with a unique approach that harnesses the Laplacian score joined with a distance-based entropy measure, called LSDBEM. followed by an unsupervised clustering method, K-means clustering, to provide a recommended fragment size (RFS) for data aggregation. The LSDBEM is utilized to satisfy prior to clustering. After the feature selection, the method applies an unsupervised clustering method, K-means clustering, to capture the pattern of traffic crashes on freeways within Dallas County. The investigation considers the LSDBEM/K-means method for fragment sizes ranging from 0.10 mile to 0.25 mile with an increment of 0.01 mile. To evaluate the use of crash features or the total crash rate (TCR) to establish the clustering pattern and the recommended fragment size (RFS), the study compares the LSDBEM/K-means method results for TCR and FCRs. The dissertation assesses the impacts of using higher dimensions of traffic crash characteristics (across four different scenarios) for crash prediction models on model performance, the statistical significance of crash contributing factors, and the identification of crash hot spots. The investigation of higher dimensions of traffic crash characteristics estimates many count data regression models including Poisson, negative binomial (NB), negative binomial type P (NBP), zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB), zero-inflated negative binomial type P (ZINBP), generalized Poisson type 1 (GP-1), generalized Poisson type 2 (GP-2), and Hurdle regression models. The dissertation evaluates the performance and suitability of the RFS across the four scenarios of traffic crash characteristic dimensions. This analysis estimates crash prediction models with fragment sizes ranging from 0.10 mile to 0.25 mile with 0.01-mile increments. The evaluation focuses on comparing crash prediction model performance using the root mean square error (RMSE) for the testing dataset. The investigation determines the circumstances that support the adoption of the RFS as the standardized approach for data aggregation. The study results show that LSDBEM/K-means clustering method provides a standardized approach to determine a recommended fragment size for data aggregation. The clustering results demonstrate that FCR-based clustering creates more cohesive clusters than TCR-based clustering, which promotes the use of three traffic crash dimensions for safety analysis and modeling. The additional crash dimensions indicate substantially different top ten hotspots for each crash group, especially when compared to hotspots identified using the total number of crashes (scenario 1). The crash prediction models for scenario 4, formed by all three dimensions, provides a better understanding of the crash mechanisms, but scenario 4 may not always work for all crash groups due to insufficient observations. The investigation of RFS reveals that it minimizes the multicollinearity among the explanatory variables. The evaluation of the testing RMSE shows that the minimum RMSE (RMSE_min) occurs at the RFS for some SV-related and MV-related crash groups. Moreover, the crash groups with sufficient non-zero observations generate a RMSE for the RFS (RMSE_RFS) remains within the proximity (20%) of RMSE_min, which makes the RFS an acceptable approach for standardized data aggregation when sufficient non-zero observations exist. The future studies need to confirm the benefit of the RFS for data aggregation, its appropriate use cases, and its impact on crash prediction models by examining other highways and freeways.

Keywords

Traffic crashes, Crash units, Manner of collision, Crash severity, Fragment size, Data aggregation, Unsupervised machine learning, K-means clustering, Laplacian score, Crash prediction models, Crash contributing factors, Hotspot identification

Disciplines

Civil and Environmental Engineering | Civil Engineering | Engineering

Comments

Degree granted by The University of Texas at Arlington

Available for download on Sunday, February 01, 2026

Share

COinS