Md Ahmed Al Muzaddid, University of Texas at Arlington
William J. Beksi, University of Texas at Arlington


The TexCot22 dataset is a set of cotton crop video sequences for training and testing multi-object tracking methods. Each tracking sequence is 10 to 20 seconds in length. The dataset contains of a total of 30 sequences of which 17 are for training and the remaining 13 are for testing. Among the training sequences, 2 of them consist of roughly 5,000 annotated images, which can be used to train a cotton boll detection model. The video sequences were captured at 4K resolution and at distinct frame rates (e.g., 10, 15, 30). There are typically 2 to 10 cotton bolls per cluster. The average width and height of an annotated bounding box is approximately 230 x 210 pixels. To make the dataset robust to environmental conditions, we recorded the field videos at separate times of day to account for varying lighting conditions. In total, there are roughly 30 x 300 frames with 150,000 labeled instances. On average there are 70 unique cotton bolls in each sequence.