Skip to main content
Get Template — $89

Search AI Workflow Pro

Search tools, categories, stacks, and pages

Fresh daily

AI News

Latest AI tool releases, research breakthroughs, and industry news.

AllReleasesResearchFundingTutorialsOpinion

Older

Meta-learning for wrestling

We show that for the task of simulated robot wrestling, a meta-learning agent can learn to quickly defeat a stronger non-meta-learning agent, and also show that the meta-learning agent can adapt to physical malfunction.

OpenAI Blog·Oct 11research

Nonlinear computation in deep linear networks

OpenAI Blog·Sep 29research

Learning to model other minds

We’re releasing an algorithm which accounts for the fact that other agents are learning too, and discovers self-interested yet collaborative strategies like tit-for-tat in the iterated prisoner’s dilemma. This algorithm, Learning with Opponent-Learning Awareness (LOLA), is a small step towards agents that model other minds.

OpenAI Blog·Sep 14research

Learning with opponent-learning awareness

OpenAI Blog·Sep 13research

More on Dota 2

Our Dota 2 result shows that self-play can catapult the performance of machine learning systems from far below human level to superhuman, given sufficient compute. In the span of a month, our system went from barely matching a high-ranked player to beating the top pros and has continued to improve since then. Supervised deep learning systems can only be as good as their training datasets, but in self-play systems, the available data improves automatically as the agent gets better.

OpenAI Blog·Aug 16research

Dota 2

We’ve created a bot which beats the world’s top professionals at 1v1 matches of Dota 2 under standard tournament rules. The bot learned the game from scratch by self-play, and does not use imitation learning or tree search. This is a step towards building AI systems which accomplish well-defined goals in messy, complicated situations involving real humans.

OpenAI Blog·Aug 11research

Better exploration with parameter noise

We’ve found that adding adaptive noise to the parameters of reinforcement learning algorithms frequently boosts performance. This exploration method is simple to implement and very rarely decreases performance, so it’s worth trying on any problem.

OpenAI Blog·Jul 27research

Proximal Policy Optimization

We’re releasing a new class of reinforcement learning algorithms, Proximal Policy Optimization (PPO), which perform comparably or better than state-of-the-art approaches while being much simpler to implement and tune. PPO has become the default reinforcement learning algorithm at OpenAI because of its ease of use and good performance.

OpenAI Blog·Jul 20research

Robust adversarial inputs

We’ve created images that reliably fool neural network classifiers when viewed from varied scales and perspectives. This challenges a claim from last week that self-driving cars would be hard to trick maliciously since they capture images from multiple scales, angles, perspectives, and the like.

OpenAI Blog·Jul 17research

Hindsight Experience Replay

OpenAI Blog·Jul 5research

Teacher–student curriculum learning

OpenAI Blog·Jul 1research

Learning from human preferences

One step towards building safe AI systems is to remove the need for humans to write goal functions, since using a simple proxy for a complex goal, or getting the complex goal a bit wrong, can lead to undesirable and even dangerous behavior. In collaboration with DeepMind’s safety team, we’ve developed an algorithm which can infer what humans want by being told which of two proposed behaviors is better.

OpenAI Blog·Jun 13research

Learning to cooperate, compete, and communicate

Multiagent environments where agents compete for resources are stepping stones on the path to AGI. Multiagent environments have two useful properties: first, there is a natural curriculum—the difficulty of the environment is determined by the skill of your competitors (and if you’re competing against clones of yourself, the environment exactly matches your skill level). Second, a multiagent environment has no stable equilibrium: no matter how smart an agent is, there’s always pressure to get smarter. These environments have a very different feel from traditional environments, and it’ll take a lot more research before we become good at them.

OpenAI Blog·Jun 8research

UCB exploration via Q-ensembles

OpenAI Blog·Jun 5research

Robots that learn

We’ve created a robotics system, trained entirely in simulation and deployed on a physical robot, which can learn a new task after seeing it done once.

OpenAI Blog·May 16research

Equivalence between policy gradients and soft Q-learning

OpenAI Blog·Apr 21research

Stochastic Neural Networks for hierarchical reinforcement learning

OpenAI Blog·Apr 10research

Unsupervised sentiment neuron

We’ve developed an unsupervised system which learns an excellent representation of sentiment, despite being trained only to predict the next character in the text of Amazon reviews.

OpenAI Blog·Apr 6research

Spam detection in the physical world

We’ve created the world’s first Spam-detecting AI trained entirely in simulation and deployed on a physical robot.

OpenAI Blog·Apr 1research

Evolution strategies as a scalable alternative to reinforcement learning

We’ve discovered that evolution strategies (ES), an optimization technique that’s been known for decades, rivals the performance of standard reinforcement learning (RL) techniques on modern RL benchmarks (e.g. Atari/MuJoCo), while overcoming many of RL’s inconveniences.

OpenAI Blog·Mar 24research