The FICO Score for AI-Generated Code
Calibrated reliability scoring for every AI code commit
The Problem
AI coding assistants now write 30-50% of code in many organizations. But there's no standardized way to measure whether AI-generated code actually works in production.
49pt
Score spread by language
15-35%
Demo-to-production gap
0
Industry standards exist
Research confirms: capability gains don't improve reliability (Princeton, 2026). Better benchmarks don't mean better code (Sonar, 2025). AI code reliability is a separate axis requiring separate measurement.
What SHIP Does
SHIP provides a calibrated 0-100 reliability score for any AI-generated code commit, based on real CI/CD outcomes — not synthetic benchmarks.
Multi-Signal Scoring
Language base rates, task type multipliers, and repo-specific outcome data, calibrated against real-world CI pass/fail results.
Statistical Calibration
ECE, MCE, and Brier scores measure how well our predictions match reality. Wilson score confidence intervals quantify uncertainty at small sample sizes.
Continuous Learning
Every scored commit feeds back into the system. More data = better scores = more trust. The data flywheel creates a compounding advantage.
Integration Points
- REST API — Score any commit via
POST /v2/score
- MCP Server — IDE integration for Claude Code, Cursor (
npm i @vibeatlas/ship-mcp-server)
- Git Pre-Commit Hook — Score every commit automatically (
npx ship-pre-commit install)
- GitHub Action — CI/CD pipeline integration
- CLI Tool — Command-line scoring (
npx ship-score)
- Badge — README embedding (
GET /v2/badge?repo=owner/repo)
EU AI Act Compliance
The EU AI Act (Regulation 2024/1689) requires transparency for AI-generated content (Article 50, effective August 2, 2026) and quality management for high-risk AI systems (Article 17).
SHIP provides the infrastructure for compliance:
- AI commit detection and labeling (Article 50)
- Quality scoring and outcome tracking (Article 17)
- Compliance reporting via
GET /v2/compliance/{owner}/{repo}
Scientific Foundation
SHIP's methodology is grounded in 6 peer-reviewed papers:
- Agent Reliability Science — Princeton, 2026 (capability != reliability)
- Code Calibration & Correctness — UC Davis, 2024 (graduated decision-making)
- Multicalibration for Code LLMs — RheinMain, 2025 (subgroup calibration)
- OPENIA Correctness Assessment — VNU Hanoi, 2025 (internal representations)
- CLEAR Enterprise Framework — AAAI 2026 (multidimensional scoring)
- AI Code Quality & Security — Sonar, 2025 (Pass@1 != quality)
Live API Endpoints
POST /v2/score — Full reliability scoring
GET /v2/score/quick — Lightweight score lookup
GET /v2/patterns — Language reliability patterns
GET /v2/calibration — ECE/MCE/Brier metrics
GET /v2/report/{owner}/{repo} — Repo quality report
GET /v2/compliance/{owner}/{repo} — EU AI Act compliance
GET /v2/org/{org} — Organization-wide scoring
GET /v2/detection/{owner}/{repo} — AI tool detection stats
GET /v2/trends — Reliability time series
GET /v2/badge — README badge (SVG)
About Us
SHIP Protocol is built by VibeAtlas.
Selected as Hello Tomorrow Deep Tech Pioneer 2026.
Jun 10-12
Hello Tomorrow Global Summit — Amsterdam
Jun 17-20
VivaTech — Paris
Aug 2
EU AI Act Article 50 takes effect