Command Palette
Search for a command to run...
Tofara Moyo

Abstract
We present a novel method for learning hierarchical abstractions that prioritize competing objectives, leading to improved global expected rewards. Our approach employs a secondary rewarding agent with multiple scalar outputs, each associated with a distinct level of abstraction. The traditional agent then learns to maximize these outputs in a hierarchical manner, conditioning each level on the maximization of the preceding level. We derive an equation that orders these scalar values and the global reward by priority, inducing a hierarchy of needs that informs goal formation. Experimental results on the Pendulum v1 environment demonstrate superior performance compared to a baseline implementation.We achieved state of the art results.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| openai-gym-on-pendulum-v1 | TLA with Hierarchical Reward Functions | Action Repetition: .8073 Average Decisions: 38.6 Mean Reward: -125.02 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.