Meta × Hugging Face OpenEnv Hackathon 2025

Train AI Agents to
Review Code

The first OpenEnv environment for real-world code review. 13 tasks, 6 languages, dense rewards, deterministic grading.

13 Tasks
7 API Endpoints
0.71 Avg Score
100% Deterministic

Baseline Performance

Scores achieved by Gemini 2.0 Flash on each task — used as the reference baseline.

EASY

Basic Bug Detection

📄 utils/statistics.py

0.00

4 hidden bugs • Max 3 steps

MEDIUM

Security Vulnerability Review

📄 auth/user_manager.py

0.00

8 vulnerabilities • Max 5 steps

HARD

Concurrency Bug Hunt

📄 core/rate_limiter.py

0.00

8 subtle bugs • Max 8 steps


Live Environment Demo

Interact with the environment in real time. Select a task, start a review, and submit your findings.

CONTROL PANEL
Step 1 — Select Task

Step 2 — Write Your Review
CODE DIFF VIEWER
Select a task and click Start Review to load the code diff
REWARD RESULT
AI Review Score
0.00

How It Works

Three simple steps to train an AI code reviewer.

01

Reset

POST /reset

Start a new episode. Pass a task ID and receive the full code diff, file name, PR title, and max steps allowed.

02

Step

POST /step

Submit your review comments and verdict. Receive a dense reward signal, done flag, and info about remaining steps.

03

Grade

POST /grader

When done, grade the full episode. Receive a deterministic score from 0.0 to 1.0 based on detected issues and verdict accuracy.

BETA

Free Code Review

Paste any code — get instant AI feedback

AI-powered review. No ground truth scoring. For exploration and demos only.
Paste code and click Review to get AI feedback
AI is reviewing your code...
This usually takes 5-10 seconds depending on code length.
COMMENT 0 issues found
🔴 0 critical
🟡 0 major
0 minor

What looks good


    Available Tasks

    From simple edge-case bugs to subtle concurrency nightmares.

    EASY

    Basic Bug Detection

    utils/statistics.py

    Review a Python utility module. Find edge-case bugs and performance issues hidden in seemingly functional code — empty-list crashes and O(n²) lookups.

    ZeroDivisionError IndexError O(n²) complexity Empty list guard
    Max 3 steps 4 bugs hidden Score: 0.95
    MEDIUM

    Security Vulnerability Review

    auth/user_manager.py

    Analyze an authentication module riddled with security holes — SQL injection, hardcoded secrets, broken crypto, and dangerous deserialization.

    SQL Injection Hardcoded Secrets MD5 Password Hash pickle.loads RCE Broken Auth
    Max 5 steps 8 vulnerabilities Score: 0.90
    HARD

    Concurrency & Architecture Bug Hunt

    core/rate_limiter.py

    Hunt subtle race conditions, silent exception swallowing, mutating-while-iterating crashes, and architectural flaws in a distributed rate limiter.

    Race Conditions Thread Safety Silent Exceptions Dict Mutation deque vs list
    Max 8 steps 8 subtle bugs Score: 0.29
    EASY-MEDIUM

    JavaScript Async Flow Review

    api/fetcher.js

    Review a JavaScript module handling data fetching and state updates. Find async/await pitfalls and race conditions.

    Missing Await Promise Errors Race Condition Caching Logic
    Max 4 steps 3 issues hidden Score: 0.92
    MEDIUM

    Advanced SQL Injection Hunt

    db/reports.js

    Review a Node.js database service. Identify multiple sophisticated SQL injection patterns in dynamic queries.

    Order By Injection Limit Injection Template Literals Parameterization
    Max 5 steps 4 vulnerabilities Score: 0.88
    MEDIUM

    React Component Security

    components/UserProfile.jsx

    Review a React component for XSS risks (dangerouslySetInnerHTML) and sensitive data leaks in console/URLs.

    XSS Data Leaks Sanitization React Hooks
    Max 4 steps 3 issues hidden Score: 0.90
    HARD

    Django Auth Logic Review

    auth/middleware.py

    Review Django middleware and auth backends for bypasses, timing attacks, and improper exception handling.

    Auth Bypass Timing Attack Middleware Logic Security Logic
    Max 7 steps 3 critical issues Score: 0.85
    HARD

    Node.js Concurrency Issues

    services/inventory.js

    Identify race conditions and stale state updates in a Node.js singleton service responsible for inventory tracking.

    Race Condition Stale State Atomicity Singleton Pattern
    Max 6 steps 2 critical loops Score: 0.82

    API Reference

    All endpoints available at http://localhost:7860

    Method Endpoint Description Copy cURL
    MEDIUM

    JavaScript Async Bug Review

    api/async_handler.js

    Review an async JS handler with unhandled promise rejections, missing await, callback hell, memory leak from uncleaned event listeners, and swallowed errors.

    Missing Await Memory Leak Promise.all Callback Hell
    Max 5 steps 5 bugs hidden
    HARD

    API Security Review

    api/endpoints.py

    Review a FastAPI endpoints file with JWT algorithm bypass, wildcard CORS, missing rate limiting, exposed stack traces, IDOR, and debug mode enabled in production.

    JWT Bypass CORS Wildcard IDOR Stack Trace Leak
    Max 6 steps 6 vulnerabilities
    MEDIUM

    Database ORM Bug Review

    models/database.py

    Review SQLAlchemy ORM models for N+1 queries, missing foreign key indexes, uncommitted transactions, mutable default arguments, and missing cascade rules.

    N+1 Query Missing Index Mutable Default No Commit
    Max 5 steps 5 bugs hidden
    HARD

    JWT Auth System Review

    auth/jwt_handler.py

    Review a JWT auth system for weak secrets, missing token expiry, disabled signature verification, session fixation, missing token revocation, and timing attacks.

    Weak Secret No Expiry Session Fixation Timing Attack
    Max 7 steps 7 vulnerabilities
    MEDIUM

    Data Pipeline Bug Review

    pipeline/processor.py

    Review a batch data processing pipeline for memory leaks, missing None checks, silent int8 overflow, unclosed file handles, bare except clauses, and off-by-one slicing.

    Memory Leak int8 Overflow File Handle Leak Bare Except
    Max 6 steps 6 bugs hidden