Monitoring LLM behavior: Drift, retries, and refusal patterns
The stochastic challenge Traditional software is predictable: Input A plus function B always equals output C. This determinism allows engineers to develop robust tests.
Explore articles tagged with Large Language Models
The stochastic challenge Traditional software is predictable: Input A plus function B always equals output C. This determinism allows engineers to develop robust tests.
The whale has resurfaced. DeepSeek, the Chinese AI startup offshoot of High-Flyer Capital Management quantitative analysis firm, became a near-overnight sensation globally in January 2025 with the release of its open source R1 model that matched proprietary U.
This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. We’re in a new era of AI-driven scams When ChatGPT was released in late 2022, it showed how easily generative AI could create human-like text.
After months of rumors and reports that OpenAI was developing a new, more powerful AI large language model for use in ChatGPT and through its application programming interface (API), allegedly codenamed "Spud" internally, the company has today unveiled its latest offering under the more formal name GPT-5. And to likely no one's surprise, it's hardly a "potato" in the disparaging sense of the word: GPT-5.
OpenAI introduced a new paradigm and product today that is likely to have huge implications for enterprises seeking to adopt and control fleets of AI agent workers. Called "Workspace Agents," OpenAI's new offering essentially allows users on its ChatGPT Business ($20 per user per month) and variably priced Enterprise, Edu and Teachers subscription plans to design or select from pre-existing agent templates that can take on work tasks across third-party apps and data sources including Slack, Goog.
A rogue AI agent at Meta passed every identity check and still exposed sensitive data to unauthorized employees in March. Two weeks later, Mercor, a $10 billion AI startup, confirmed a supply-chain breach through LiteLLM.
The standard guidelines for building large language models (LLMs) optimize only for training costs and ignore inference costs. This poses a challenge for real-world applications that use inference-time scaling techniques to increase the accuracy of model responses, such as drawing multiple reasoning samples from a model at deployment.
Anthropic is publicly releasing its most powerful large language model yet, Claude Opus 4.7, today — as it continues to keep an even more powerful successor, Mythos, restricted to a small number of external enterprise partners for cybersecurity testing and patching vulnerabilities in the software said enterprises use (which Mythos exposed rapidly).
OpenAI is making moves to try and court more developers and vibe coders (those who build software using AI models and natural language) away from rivals like Anthropic. Today, the firm arguably most synonymous with the generative AI boom announced it will begin offering a new, more mid-range subscription tier — a $100 ChatGPT Pro plan — which joins its free, Go ($8 monthly), Plus ($20 monthly) and existing Pro ($200 monthly) plans for individuals using ChatGPT and related OpenAI products.
Meta has been one of the most interesting companies of the generative AI era — initially gaining a loyal and huge following of users for the release of its mostly open source Llama family of large language models (LLMs) beginning in early 2023 but coming to screeching halt last year after Llama 4 debuted to mixed reviews and ultimately, admissions of gaming benchmarks. That bumpy rollout of Llama 4 apparently spurred Meta founder and CEO Mark Zuckerberg to totally overhaul Meta's AI operations i.
The age of agentic AI is upon us — whether we like it or not. What started with an innocent question-answer banter with ChatGPT back in 2022 has become an existential debate on job security and the rise of the machines.
AI vibe coders have yet another reason to thank Andrej Karpathy, the coiner of the term. The former Director of AI at Tesla and co-founder of OpenAI, now running his own independent AI project, recently posted on X describing a "LLM Knowledge Bases" approach he's using to manage various topics of research interest.
The baton of open source AI models has been passed on between several companies over the years since ChatGPT debuted in late 2022, from Meta with its Llama family to Chinese labs like Qwen and z. But lately, Chinese companies have started pivoting back towards proprietary models even as some U.
In the early days of large language models (LLMs), we grew accustomed to massive 10x jumps in reasoning and coding capability with every new model iteration. Today, those jumps have flattened into incremental gains.
Earlier this month, Microsoft launched Copilot Health, a new space within its Copilot app where users will be able to connect their medical records and ask specific questions about their health. A couple of days earlier, Amazon had announced that Health AI, an LLM-based tool previously restricted to members of its One Medical service, would….
Banking house JPMorgan Chase is asking its roughly 65,000 engineers and technologists to use AI tools as part of their regular workflow. Business Insider reported that managers are tracking how often staff use these tools.
Intercom is taking an unusual gamble for a legacy software company: building its own AI model. The 15-year-old, Dublin, Ireland-based massive customer service platform announced Fin Apex 1.
For decades, software companies designed their products for a single type of customer: a human being staring at a screen. Every button, menu, and dashboard existed to translate a person’s intention into a machine’s action.
OpenAI on Monday launched a set of interactive visual tools inside ChatGPT that let users manipulate mathematical and scientific formulas in real time — a genuinely impressive education feature that also serves as the company's most direct attempt yet to change the subject during the worst ten days of its corporate life. The new experience covers more than 70 core math and science concepts, from the Pythagorean theorem to Ohm's law to compound interest.
OpenAI launched Codex Security on March 6, entering the application security market that Anthropic had disrupted 14 days earlier with Claude Code Security. Both scanners use LLM reasoning instead of pattern matching.