About
Baseball Beluga is an independently built and actively maintained MLB statistical repository, developed and operated by Noah Wilson, a software engineer, data scientist, and lifelong baseball fan. The platform is designed as both a research tool and an analytical framework, built on the belief that disciplined, process-driven evaluation does not require proprietary systems or institutional access. It requires clean data and a consistent methodology.
The dataset is self-curated and spans four MLB seasons, with records collected at the individual game level. Every offensive and pitching performance is logged as a discrete entry tied to a specific game, date, team matchup, and season. This game-level granularity is foundational to the platform's analytical flexibility, enabling full-season aggregation, multi-season longitudinal comparison, rolling window trend analysis, and separate treatment of regular season versus postseason performance.
On the offensive side, the dataset captures hits, at-bats, runs, RBIs, walks, strikeouts, left on base, batting average, and OPS for each player appearance. Pitching records include innings pitched, hits allowed, runs, earned runs, walks, strikeouts, home runs, pitch count, and ERA per outing. Player position usage is tracked per game, and roster assignments are recorded with start and end dates to accurately reflect transactions and team changes across seasons.
Player rankings are generated through internally defined composite scoring models, one for batters and one for pitchers, that apply weighted, min-max normalized metrics to produce a single comparable score across the active player pool. Offensive weighting prioritizes OPS, RBI production, and run scoring, while pitching weighting leads with ERA suppression, strikeout rate, and hits and home runs allowed. Both models enforce a minimum game threshold to ensure scores are grounded in meaningful sample sizes rather than small-window noise. All metrics are traditional and verifiable, with no black-box inputs or third-party scoring systems.
The platform is Noah's vehicle for applying an empirical, evidence-based evaluation philosophy to publicly available MLB data. Every element of the dataset, from collection and cleaning to modeling and interpretation, is original work built and maintained independently.