Improving reinforcement learning alone is not inventive

Neural Network

The European Patent Office’s Board of Appeal highlighted in case T1952/21 that claiming an improvement in reinforced learning alone is not sufficient to justify the grant of a patent.

The patent application was directed to a machine learning system using reinforcement learning. The description explains that, in reinforcement learning, an agent explores the environment according to a set policy, determining which action (such as moving right) the agent takes at every juncture as a function of its current state (e.g. its position in the environment). The agent receives rewards, positive or negative. In this way, the agent can “learn” the value of the various actions and states. The goal of training is to maximize a value function which reflects the expected sum of rewards given for a certain action.

The application builds upon an existing, prior art, training system by injecting a degree of randomness into the training system by using stochastic weights (e.g. random noise, or from models) in the policy and value networks. The aim of the injection is to enable the system to explore the parameter space more fully, and thus improve the training system.

The Applicant argued that the system design was motivated by technical consideration of the internal function of the computer. The Board rejected this argument and stated that there was no adaptation of the functioning of the computer. The concept of reinforcement learning does not generally imply any technical context.

The Board also noted that there was no technical use implied by the claim, and thus the claimed system would not solve by itself a technical problem. As a result, the Board rejected the claim in the main request.

The Applicant filed a further “auxiliary request” which recited that the claim specified a “control policy output” which was to be configured for controlling one of several technical systems. However, the Applicant gave no details about the specific technical system or model. The Board considered that the advantage of the claimed invention was not established over the whole breadth of the claim. The theoretical arguments advanced by the Applicant suggested that, in some cases, this advantage could be achieved, but the evidence was insufficient to lead the Board to include that the technical effect was present over the full breadth of the claim, as required by the EPO Decision G1/91.