CodeReviewEnv — AI Agent Training Environment

Performance Benchmarks

Baseline Performance

Scores achieved by Gemini 2.0 Flash on each task — used as the reference baseline.

EASY

Basic Bug Detection

📄 utils/statistics.py

0.00

4 hidden bugs • Max 3 steps

MEDIUM

Security Vulnerability Review

📄 auth/user_manager.py

0.00

8 vulnerabilities • Max 5 steps

HARD

Concurrency Bug Hunt

📄 core/rate_limiter.py

0.00

8 subtle bugs • Max 8 steps

Interactive Playground

Live Environment Demo

Interact with the environment in real time. Select a task, start a review, and submit your findings.

CONTROL PANEL

Step 1 — Select Task

Task Difficulty

Step 2 — Write Your Review

Line Number Issue Type Severity Description Suggested Fix Verdict

CODE DIFF VIEWER

Select a task and click Start Review to load the code diff

REWARD RESULT

AI Review Score

0.00

Architecture

How It Works

Three simple steps to train an AI code reviewer.

Reset

POST /reset

Start a new episode. Pass a task ID and receive the full code diff, file name, PR title, and max steps allowed.

Step

POST /step

Submit your review comments and verdict. Receive a dense reward signal, done flag, and info about remaining steps.

Grade

POST /grader

When done, grade the full episode. Receive a deterministic score from 0.0 to 1.0 based on detected issues and verdict accuracy.

Beta Feature

BETA

Free Code Review

Paste any code — get instant AI feedback

AI-powered review. No ground truth scoring. For exploration and demos only.

Paste code and click Review to get AI feedback

AI is reviewing your code...

This usually takes 5-10 seconds depending on code length.

COMMENT 0 issues found

🔴 0 critical

🟡 0 major

⚪ 0 minor

What looks good

Task Library

Available Tasks

From simple edge-case bugs to subtle concurrency nightmares.

EASY

Basic Bug Detection

utils/statistics.py

Review a Python utility module. Find edge-case bugs and performance issues hidden in seemingly functional code — empty-list crashes and O(n²) lookups.

ZeroDivisionError IndexError O(n²) complexity Empty list guard

Max 3 steps • 4 bugs hidden • Score: 0.95

MEDIUM

Security Vulnerability Review

auth/user_manager.py

Analyze an authentication module riddled with security holes — SQL injection, hardcoded secrets, broken crypto, and dangerous deserialization.

SQL Injection Hardcoded Secrets MD5 Password Hash pickle.loads RCE Broken Auth

Max 5 steps • 8 vulnerabilities • Score: 0.90

HARD

Concurrency & Architecture Bug Hunt

core/rate_limiter.py

Hunt subtle race conditions, silent exception swallowing, mutating-while-iterating crashes, and architectural flaws in a distributed rate limiter.

Race Conditions Thread Safety Silent Exceptions Dict Mutation deque vs list

Max 8 steps • 8 subtle bugs • Score: 0.29

EASY-MEDIUM

JavaScript Async Flow Review

api/fetcher.js

Review a JavaScript module handling data fetching and state updates. Find async/await pitfalls and race conditions.

Missing Await Promise Errors Race Condition Caching Logic

Max 4 steps • 3 issues hidden • Score: 0.92

MEDIUM

Advanced SQL Injection Hunt

db/reports.js

Review a Node.js database service. Identify multiple sophisticated SQL injection patterns in dynamic queries.

Order By Injection Limit Injection Template Literals Parameterization

Max 5 steps • 4 vulnerabilities • Score: 0.88

MEDIUM

React Component Security

components/UserProfile.jsx

Review a React component for XSS risks (dangerouslySetInnerHTML) and sensitive data leaks in console/URLs.

XSS Data Leaks Sanitization React Hooks

Max 4 steps • 3 issues hidden • Score: 0.90

HARD

Django Auth Logic Review

auth/middleware.py

Review Django middleware and auth backends for bypasses, timing attacks, and improper exception handling.

Auth Bypass Timing Attack Middleware Logic Security Logic

Max 7 steps • 3 critical issues • Score: 0.85

HARD

Node.js Concurrency Issues

services/inventory.js

Identify race conditions and stale state updates in a Node.js singleton service responsible for inventory tracking.

Race Condition Stale State Atomicity Singleton Pattern

Max 6 steps • 2 critical loops • Score: 0.82

Developer Reference

API Reference

All endpoints available at http://localhost:7860

Method	Endpoint	Description	Copy cURL

MEDIUM

JavaScript Async Bug Review

api/async_handler.js

Review an async JS handler with unhandled promise rejections, missing await, callback hell, memory leak from uncleaned event listeners, and swallowed errors.

Missing Await Memory Leak Promise.all Callback Hell

Max 5 steps • 5 bugs hidden

HARD

API Security Review

api/endpoints.py

Review a FastAPI endpoints file with JWT algorithm bypass, wildcard CORS, missing rate limiting, exposed stack traces, IDOR, and debug mode enabled in production.

JWT Bypass CORS Wildcard IDOR Stack Trace Leak

Max 6 steps • 6 vulnerabilities

MEDIUM

Database ORM Bug Review

models/database.py

Review SQLAlchemy ORM models for N+1 queries, missing foreign key indexes, uncommitted transactions, mutable default arguments, and missing cascade rules.

N+1 Query Missing Index Mutable Default No Commit

Max 5 steps • 5 bugs hidden

HARD

JWT Auth System Review

auth/jwt_handler.py

Review a JWT auth system for weak secrets, missing token expiry, disabled signature verification, session fixation, missing token revocation, and timing attacks.

Weak Secret No Expiry Session Fixation Timing Attack

Max 7 steps • 7 vulnerabilities

MEDIUM

Data Pipeline Bug Review

pipeline/processor.py

Review a batch data processing pipeline for memory leaks, missing None checks, silent int8 overflow, unclosed file handles, bare except clauses, and off-by-one slicing.

Memory Leak int8 Overflow File Handle Leak Bare Except

Max 6 steps • 6 bugs hidden

Train AI Agents toReview Code

Baseline Performance

Basic Bug Detection

Security Vulnerability Review

Concurrency Bug Hunt

Live Environment Demo

How It Works

Reset

Step

Grade

Free Code Review

What looks good

Available Tasks

Basic Bug Detection

Security Vulnerability Review

Concurrency & Architecture Bug Hunt

JavaScript Async Flow Review

Advanced SQL Injection Hunt

React Component Security

Django Auth Logic Review

Node.js Concurrency Issues

API Reference

JavaScript Async Bug Review

API Security Review

Database ORM Bug Review

JWT Auth System Review

Data Pipeline Bug Review

Train AI Agents to
Review Code