The FICO Score for AI-Generated Code

Calibrated reliability scoring for every AI code commit

The Problem

AI coding assistants now write 30-50% of code in many organizations. But there's no standardized way to measure whether AI-generated code actually works in production.

49pt

Score spread by language

15-35%

Demo-to-production gap

Industry standards exist

Research confirms: capability gains don't improve reliability (Princeton, 2026). Better benchmarks don't mean better code (Sonar, 2025). AI code reliability is a separate axis requiring separate measurement.

What SHIP Does

SHIP provides a calibrated 0-100 reliability score for any AI-generated code commit, based on real CI/CD outcomes — not synthetic benchmarks.

Multi-Signal Scoring

Language base rates, task type multipliers, and repo-specific outcome data, calibrated against real-world CI pass/fail results.

Statistical Calibration

ECE, MCE, and Brier scores measure how well our predictions match reality. Wilson score confidence intervals quantify uncertainty at small sample sizes.

Continuous Learning

Every scored commit feeds back into the system. More data = better scores = more trust. The data flywheel creates a compounding advantage.

Integration Points

REST API — Score any commit via POST /v2/score
MCP Server — IDE integration for Claude Code, Cursor (npm i @vibeatlas/ship-mcp-server)
Git Pre-Commit Hook — Score every commit automatically (npx ship-pre-commit install)
GitHub Action — CI/CD pipeline integration
CLI Tool — Command-line scoring (npx ship-score)
Badge — README embedding (GET /v2/badge?repo=owner/repo)

EU AI Act Compliance

The EU AI Act (Regulation 2024/1689) requires transparency for AI-generated content (Article 50, effective August 2, 2026) and quality management for high-risk AI systems (Article 17).

SHIP provides the infrastructure for compliance:

AI commit detection and labeling (Article 50)
Quality scoring and outcome tracking (Article 17)
Compliance reporting via GET /v2/compliance/{owner}/{repo}

Scientific Foundation

SHIP's methodology is grounded in 6 peer-reviewed papers:

Agent Reliability Science — Princeton, 2026 (capability != reliability)
Code Calibration & Correctness — UC Davis, 2024 (graduated decision-making)
Multicalibration for Code LLMs — RheinMain, 2025 (subgroup calibration)
OPENIA Correctness Assessment — VNU Hanoi, 2025 (internal representations)
CLEAR Enterprise Framework — AAAI 2026 (multidimensional scoring)
AI Code Quality & Security — Sonar, 2025 (Pass@1 != quality)

Live API Endpoints

POST /v2/score — Full reliability scoring
GET /v2/score/quick — Lightweight score lookup
GET /v2/patterns — Language reliability patterns
GET /v2/calibration — ECE/MCE/Brier metrics
GET /v2/report/{owner}/{repo} — Repo quality report
GET /v2/compliance/{owner}/{repo} — EU AI Act compliance
GET /v2/org/{org} — Organization-wide scoring
GET /v2/detection/{owner}/{repo} — AI tool detection stats
GET /v2/trends — Reliability time series
GET /v2/badge — README badge (SVG)

About Us

SHIP Protocol is built by VibeAtlas.

Selected as Hello Tomorrow Deep Tech Pioneer 2026.

Jun 10-12

Hello Tomorrow Global Summit — Amsterdam

Jun 17-20

VivaTech — Paris

Aug 2

EU AI Act Article 50 takes effect

Try the Live Demo