Explore the submissions to 2024 Datathon Challenges below! Refer to the instructions for each type of challenge.

Timed Challenges

1. Claim Check-worthiness Annotation
You will be shown sentences from the United States presidential debates. Given each sentence, your task is to determine whether the general public will be interested in knowing whether (part of) the sentence is true or false. More specifically, you are asked to discern whether the sentence contains an important factual claim, an unimportant factual claim or no factual claim.

2. Claim Matching Annotation
You will be shown pairs of factual claims. In each pair, the claim on the left is fact-checked by PolitiFact between September 01, 2007 and October 21, 2020, and the claim on the right is from a candidate during 2012, 2016, and 2020 U.S. Primary and General Elections debates. Besides the claims, we also provide various contextual information, such as the claimant, the ruling and analysis of the claim from PolitiFact, the dialogs before and after the claim in a debate, and so on. Given such a pair of claims, your task is to decide to what extent can the PolitiFact fact-check help vet the claim from the debate.

3. Truthfulness Stance Annotation
You will be shown a tweet-claim pair. In each pair, the claim on the left is fact-checked by PolitiFact, and the tweet on the right is displayed as it would appear on Twitter. Your task is to decide the truthfulness stance of the tweet towards the factual claim, i.e., whether the tweet believes the factual claim is true or false.

4. Demographic Analysis of VR Usage
Investigate how different demographics (age, gender) interact with VR technology. Using the provided dataset, perform cluster analysis to identify distinct groups of users based on their reactions (motion sickness, immersion levels) and usage patterns (duration, choice of VR headset). Use association rule mining to discover interesting relationships between different variables in the dataset. For instance, is there a strong association between certain VR headsets and high levels of immersion or motion sickness?

5. Urinalysis Test Results
Identify which features (test parameters) are the most predictive of the diagnosis outcome. Address the challenge of predictive modeling with potentially imbalanced classes in the diagnosis outcome. Conduct a thorough EDA on the urinalysis dataset to understand the distribution of various parameters like color, transparency, glucose levels, and their relationship with the diagnosis outcome. Investigate how different demographics (age, gender) correlate with specific urinalysis outcomes. For instance, which demographic groups are more likely to show negative or positive diagnoses? Are certain age groups or genders more associated with specific protein levels or glucose presence in their urinalysis results?

6. Viz-a-Verse: A Tyler Sheridan Story
*Inspired from the Tableau’s Data + Movie Challenges’ Into the Spider-Verse Data Story created by Xinran Peng for the Iron Viz 2024. Create a Dynamic Data Story with multiple combinations of Data Visualization charts and infographics that explain the life of Tyler Sheridan and his roles in various movies focusing especially on Ready Player One. You may use any visualization software and you should collect your own dataset. Define the dataset source, why you chose that dataset, and what are your insight needs that your research questions will answer. Write a brief explanation of your visualization.

7. Innovative Data Intelligence Research Lab Data Visualization Challenges
Use the Bills and Voting Database (voting_records_visualization_challenge.zip) provided with SQLite3 or Python. Visualizations can include infographics, charts, maps, diagrams, and other visual aids designed to present information clearly and in engaging and insightful ways. You can visualize the given dataset. You can also process, transform, analyze the dataset, and integrate the dataset with other publicly available datasets. Your visualization can present the results of such processing, transformation, analyzation, and integration. You choose how to present your visualization.

Model Challenges

1. Innovative Data Intelligence Research Lab Machine Learning Challenges
Create a machine learning model that, given an input sentence, determines whether it is a factual statement that is worth fact-checking, i.e., whether the general public would be interested in knowing whether it is true or not. You can test your model by submitting to Kaggle. Use its leaderboard to understand the performance of your model so that you can improve it. The final submission must be made through Devpost.

2. Mayday Maestros
In a 2024 report by the National Oceanic and Atmospheric Administration (NOAA), 2023 was identified as a historic year in terms of how many billion dollar natural catastrophes occurred within the United States alone. 28 separate climate-related natural catastrophes (frigid blizzards, blazing wild fires, scorching heat waves, calamitous floods, disastrous winds/tornadoes, fatal tropical storms/hurricanes, etc.) occurred causing damage to communities estimated to be well over 1 billion dollars each. 2023 was also an extremely fatal year costing 492 lives, the 8th most deadly year due to natural disasters since 1980. With such an influx of cost due to natural disasters being fueled by climate change, effective disaster responses need to be formulated and implemented to reduce potential losses of life and property that will arise due to the global climate change crisis unfolding. Your goal as a disaster data professional is to formulate a data-driven optimal disaster response protocol that includes resource allocation such as medical supplies and clean water, map out evacuation routes for at-risk communities, and create an integrated communication plan for first responders and threatened communities. If you are to accept this vital mission, you will investigate provided datasets to generate predictive models (machine learning models) and mitigate the impact of future disasters, ultimately enhancing community resilience and response capabilities.

3. Vision Quest Challenge
The objective of this challenge is to develop an image classifier for a refined set of categories. Participants will work with RGB images of objects across 256 diverse categories, from everyday objects to more unusual items. The dataset features a mix of conventional classification challenges alongside few-shot learning scenarios, where a limited number of images are provided for certain categories. This adjustment aims to push the boundaries of current image classification techniques and encourage innovative solutions.

4. No Drama Llama Traffic Jam-a
Participants are tasked with developing a model that recommends the ideal driving speed to prevent 'ghost' traffic on a given road segment. 'Ghost' traffic refers to traffic congestion that occurs without any apparent cause, leading to inefficiencies in traffic flow. Participants will use historical traffic data, road conditions, weather information, and other relevant factors to predict the optimal speed for a given road segment at a specific time.

Follow

Submissions from 2024

PDF

Urinalysis Test Data Analysis and Prediction, Nikhil Mhatre