Intro
We naturally enjoy comparing, ranking, and even labeling things as "The Greatest." This tendency spans virtually every area, including products, industries, and especially sports. Athletes are frequently ranked, but only a select few earn the prestigious distinction of being part of the "Greatest of All Time" (GOAT) conversation. For instance, NHL legend Wayne Gretzky famously carries the nickname "The Great One."
In the NFL, Tom Brady is widely regarded as the GOAT quarterback due to his unrivaled achievements across numerous statistical categories. However, many believe Patrick Mahomes' career trajectory is positioning him to potentially challenge Brady's status. Mahomes has already secured three Super Bowl victories and recently aimed to achieve something unprecedented: to win three consecutive Super Bowls. About a month ago, during Super Bowl LIX, Mahomes and the Kansas City Chiefs were handily defeated by the Philadelphia Eagles. Consequently, following Super Bowl LIX, the debate about the NFL's quarterback GOAT was once again a hot topic across sports media pundits.
Video examples from recent sports talk shows comparing Mahomes and Brady:
- ‘Mahomes will never be Tom Brady’, Did Super Bowl LIX end the GOAT debate?: https://www.youtube.com/watch?v=bt4kMkDq-7o
- Can Patrick Mahomes still catch Tom Brady in the GOAT-bate?: https://www.youtube.com/watch?v=6zAgYtXFOCA
- Tom Brady Is Still The GOAT, Not Patrick Mahomes: https://www.youtube.com/watch?v=Hv3PbMk9FDM
- Mahomes never had ‘better team’ in Super Bowl, unlike Tom Brady | Pro Football Talk: https://www.youtube.com/watch?v=1E8cP52ymz8
🗒️ Note: The videos selected above are illustrative of the recent NFL quarterback GOAT debate. These debate videos were recorded after the most recent Super Bowl LIX, which does give us additional information (Mahomes Super Bowl loss performance). However, these videos have recency bias that needs to be taken into consideration. In a few months, even if there aren't new NFL statistics to consider, the recency bias will have somewhat worn off.
We have seen what the sports media has to say about the Brady vs Mahomes debate. I thought it would be interesting to have Generative AI provide an analysis, forecast, and opinion of its own! In the next few sections, we will see how this can easily be approached with AI, potential optimizations and similar advanced implementations that can be done.
Background on Generative AI Approach
Artificial Intelligence will be used to demonstrate the research process and final probabilistic forecast. All the tools that will be used to generate this comparison can be found in OpenAI's ChatGPT platform (ChatGPT.com). Specifically, the following features were used:
- OpenAI GPT 4.5 Research Preview Model
- Basic Prompt Engineering (kept to a minimum to give AI the most autonomy)
- Deep Research (ChatGPT Pro Subscription) used 26 different web sources to gather the required intelligence
The key highlight among the tools used is OpenAI's Deep Research feature. OpenAI Deep Research is essentially an opinionated AI automation workflow. It operates by clearly defining the goal, investing significant time in robust internet-based research, and delivering a comprehensive, detailed report to the user. One caveat of using this automation is the duration it takes to complete; research tasks can sometimes require several minutes, occasionally up to 25 minutes. The additional processing time is essentially the trade-off for obtaining high-quality, accurate results.
The OpenAI Deep Research process, equipped with multiple automation steps and several tool integrations, significantly enhances outcome accuracy. This improvement is evident in the benchmark results shown below, using the Humanity's Last Exam dataset. This benchmark comprises 2,700 highly curated expert-level questions from various domains and is intended to be particularly challenging. The results clearly indicate that the Deep Research process delivers superior performance compared to individual AI models.
More information on Humanity's Last Exam can be found here with example questions and updated results: https://agi.safe.ai/
The prompts used for this research can be found in this chat by following the ChatGPT link below. Note the iterative response and locating the web sources used: https://chatgpt.com/share/67c25ceb-a834-8006-8ce6-c5147d40da92
Summary Analysis Table: Project Mahomes Career and Compare to Tom Brady
🤖 AI Generated
The table below was AI Generated and copied from the Deep Research response: https://chatgpt.com/share/67c25ceb-a834-8006-8ce6-c5147d40da92
I cleaned the table up, statistics were clarified for consistency and some minor statistics were removed.
Career Metric | Tom Brady (Final Career) | Patrick Mahomes (Projected Career) |
---|---|---|
Super Bowl Wins | 7 ✅ (Holds NFL Record) |
5 (projected) Assumes two more championships to finish with 5 total |
Super Bowl MVP Awards | 5 (Holds NFL Record) |
5 (projected) Likely earns MVP in most future SB wins |
NFL MVP Awards | 3 | 4 ✅ (projected) On pace to add 2 more to current 2 (2018, 2022) |
Pro Bowl Selections | 15 ✅ (Holds NFL Record) |
14 (projected) Consistent Pro Bowler (6 selections by 2023) |
First-Team All-Pro | 3 | 5 ✅ (projected) Could lead NFL in QB play a few more times (2 so far) |
Career Passing Yards | 89,214 ✅ (Holds NFL Record) |
≈80,000+ (projected) Maintains ~4,000+ yards/year into late 30s |
Career Passing TDs | 649 ✅ (Holds NFL Record) |
≈600+ (projected) Continues 30+ TDs/year; potential to approach Brady’s total |
Playoff Record | 35–13 ✅ (Holds NFL Record) |
30+ postseason wins (projected) 17–4 (as of 2025) |
Playoff TD Passes | 88 | 100± ✅ (projected) On track to break Brady’s record (46 TD in 21 games already) |
Playoff Passer Rating | 89.8 | ~105 ✅ (projected) Likely remains around 100+ (currently 105.4, highest ever) |
Probability Assessment: Will Mahomes Become the GOAT?
🤖 AI Generated
This section was AI Generated and copied directly from the prompt Deep Research prompt: https://chatgpt.com/share/67c25ceb-a834-8006-8ce6-c5147d40da92
Taking all the above into account – championships, stats, awards, and the projected trajectory – how likely is it that Patrick Mahomes surpasses Tom Brady as the consensus GOAT quarterback? It is certainly possible, but far from guaranteed. Brady’s combination of 7 titles and longevity records is an enormous benchmark for Mahomes or anyone to eclipse. Mahomes would likely need to win at least 5+ Super Bowls total (i.e. 2 or more additional championships in the coming years) and maintain elite play for another decade to build the statistical resume to rival Brady. Given Mahomes’ talent and early success, he has a legitimate shot to approach those milestones. However, the NFL’s unpredictability (injuries, team changes, strong competitors) means we must temper expectations.
Probability Estimate: We assess the probability of Mahomes eventually overtaking Brady as the GOAT to be around ✶30%. This represents a moderate chance – reflecting that Mahomes is on a Hall-of-Fame trajectory and is arguably ahead of where Brady was at the same point in his career, but also acknowledging that beating a legend’s legacy is exceedingly difficult. In other words, there’s roughly a one in three chance (in our estimation) that when Mahomes’ career is over, the majority of fans and analysts will rank him above Brady.
Explanation: A ~30% chance accounts for both the upside and the hurdles. On one hand, Mahomes could continue to pile up MVPs and possibly reach 5 or 6 Super Bowl titles, which would put him in GOAT contention. For example, if Mahomes adds even two more rings (to reach 5) while also finishing top-3 all-time in yards and TDs, many would argue that his era dominance (and superior efficiency stats) make him the GOAT even if he falls short of 7 rings. On the other hand, there’s a strong possibility that Mahomes ends up with an extraordinary career that still falls just short of Brady’s legacy – perhaps ending with on the order of 3–4 titles and top-5 passing totals. In that case, he’d likely be remembered as #2 all-time behind Brady. Indeed, after Mahomes’ loss in Super Bowl LIX, commentators noted that Mahomes “remains distantly behind Brady” in the GOAT race – he’s only halfway up the mountain in terms of rings (3 vs. 7).
Overview of AI Deep Research Analysis
📜 "All models are wrong, but some are useful."
-- George E. P. Box (British statistician and thought leader recognized for his foundational contributions to statistical modeling)
I consider the AI-generated "model" from OpenAI's Deep Research to be very powerful. Why? With simple prompt engineering, Deep Research was able to successfully:
- Identify the key criteria to effectively compare two NFL quarterback careers.
- Gather Patrick Mahomes' career statistics and project realistic final career statistics.
- Cite the various internet knowledge stores used.
- Summarize the key metrics, rank them, communicate and present them in an easy to undertand table.
- Take all of the information into account and provided a final quantitative probabilistic estimate (30%). The estimate (while not statistically verified) is quite reasonable.
Now, let's compare this to the analysis we typically see from major sports networks. When these sports shows are produced, multiple analysts and producers are involved in creating the segment. Yet, as demonstrated by the sample videos provided, many sports experts don't deliver even half the depth or thoroughness offered by this AI-generated analysis. Human-generated sports analyses often rely heavily on subjective projections and frequently overlook crucial comparison metrics.
This doesn't imply the AI model is flawless. For instance, it did not account for head-to-head matchups between Tom Brady and Patrick Mahomes, nor did it consider how Mahomes' reliance on mobility might lead to a steeper performance decline over time. These are two key areas that human analysts did cover. Still, this AI-generated model closely matches or even surpasses the quality of analysis commonly presented on major sports networks.
We must also consider cost efficiency in such analyses. Producing a sports segment on a major network that analyzes the Brady vs. Mahomes GOAT debate typically costs tens of thousands of dollars. This AI-driven analysis, however, achieved comparable or better results at a mere fraction of that cost and time, requiring only a handful of well-crafted prompts!
Possible Model Optimizations paired with Artificial Intelligence
While the Deep Research performed by OpenAI is impressive, it would be unreasonable to suggest there aren't significant opportunities for optimization. Optimizations can enhance both the analytical approach and the accuracy of the final probabilistic forecast (such as the likelihood of Mahomes surpassing Tom Brady). By structuring optimizations into a clear workflow or process, AI can be positioned to excel particularly in analytical tasks.
- Currently, statistical metrics (e.g., passing yards, Super Bowl wins) are treated as independent events. While this method is acceptable because certain errors can average out, optimal accuracy demands treating these as dependent, correlated events. For instance, Mahomes winning a Super Bowl inherently requires winning regular-season and playoff games (with rare exceptions). An improved model would explicitly account for these dependencies.
- It is possible to break the problem down into smaller components. Perform the Gathering of Intelligence (statistics), Simulating/Projecting of Patrick Mahomes' stats, metric comparison and final analysis communication could be done in several separate steps. This would ensure that each step's quality is validated before passing on any intelligence and moving onto the next step.
- Monte Carlo Simulations can be used to simulate Patrick Mahomes' remaining career statistics and achievements. This could create a range of probabilistic outcomes that includes a variety of constraints and rare events. For example, you could simulate Mahomes' career anchored on a typical quarterback aging curve. You could introduce "black swan" (rare) events like major injuries (Robert Griffin), suspensions (Michael Vick) or early retirement (Andrew Luck).
- Most of these comparison metrics are single-point estimations. For example, the AI generated model projects Patrick Mahomes to pass for 600 touchdowns (TDs). An estimate that provides a confidence interval range is a better estimate of uncertainty. As an example, a sophisticated simulation could show a 95% Confidence Interval of Patrick Mahomes ending with a range of 525-650 touchdowns to end his career.
- Tom Brady's career data is static and should be sourced directly from historical records rather than relying on repeated web scraping. Similarly, Brady and Mahomes’s five head-to-head matchups (unless Brady returns from retirement) are historical events. Brady won three of these matchups, including the critical AFC Championship game and the Super Bowl. These outcomes are established facts and should be easily accessible within a dedicated knowledge base for accurate comparative analysis.
Other Approaches
The comparison between Tom Brady and Patrick Mahomes can also be approached using decision-making. This would optimize and frame the problem as a decision. This can be implemented using various components of the Decision Intelligence Framework.
More information on the Decision Intelligence Framework: https://github.com/bartczernicki/DecisionIntelligence.GenAI.Workshop
You may also be asking if OpenAI Deep Research can be replicated on other platforms. The answer is: Yes! For example, below are two AI automation pipelines that are similar to OpenAI Deep Research implemented on Azure OpenAI.
- Risk Analysis with Generative AI Reasoning (SEC 10-K)
https://github.com/bartczernicki/RiskAnalysisWithGenerativeAIReasoning
- Document Intelligence With Advanced AI Reasoning
https://github.com/bartczernicki/DocIntelligenceWithAdvancedAIReasoning
Conclusion
📜 "The difference between smart and stupid is half an hour."
-- Old Proverb, Anonymous
This proverb may sound harsh, but it emphasizes how significantly our understanding can improve with just a small investment of time. Surprisingly, even thirty extra minutes spent gaining knowledge can be a differentiator in enhancing your approach, recommendations, or decisions. Therefore, at a minimum I highly recommend trying OpenAI Deep Research to gain that critical knowledge advantage. For solutions requiring more customized approaches, Microsoft AI services can help craft almost any AI automations. It truly is a game-changer, enabling you to tackle complex knowledge problems by having AI "spend the 30 minutes" on your behalf!
While there's room for further refinement, the current AI research automation capabilities are already transformative, significantly expanding access to insights traditionally reserved for expert analysis or costly management consulting engagements. Though not yet flawless, this technology undoubtedly represents a substantial advancement in research quality, efficiency, and cost-effectiveness.
Ultimately, products like OpenAI Deep Research (and other AI knowledge automations) not only enhance debates like the Mahomes-Brady discussion but also signal a broader shift toward democratizing high-quality analysis and insights. As these AI-driven methodologies continue to develop, we can expect widespread disruption across various sectors. Use this to your advantage.