Raw success rates are pulled toward a 50% prior using sample-size-weighted shrinkage. This prevents tools with very few observations from showing extreme scores. Minimum 3 unique repositories required for a confident rating.
Each repository contributes equally regardless of commit volume, preventing a single prolific repo from dominating a tool's score. This corrects for sampling bias in the underlying dataset.
Scores are based on actual build pass/fail results, not synthetic benchmarks. Only outcomes with confidence ≥ 70% are included. Data is sourced from open-source repositories with public CI pipelines.