FinSignal Forecast
A full stack financial NLP and forecasting platform that tests whether financial text signals improve return forecasts.
Tech stack
- Rust
- Python
- FinBERT
- XGBoost
- FastAPI
- React
- TypeScript
- PyTorch
- PostgreSQL
- MLflow
- Docker
- GitHub Actions
- Prometheus
- Grafana
- ONNX
- NVIDIA Triton
- Alpaca Paper Trading
- Rocket
- Tokio
- Parquet
- Pandas
Overview
A financial signal extraction and forecasting platform built to ingest market data, SEC filings, and financial news, turn financial language into model ready sentiment features, compare price based models against price plus text models, and display forecasts, risk metrics, paper trading signals, and system health in an interactive dashboard.
Problem & solution
- Built timestamp safe data pipelines for OHLCV prices, SEC metadata, financial text records, sentiment outputs, and model ready feature rows
- Used FinBERT to extract sentiment, uncertainty, and financial tone signals from filings, earnings related text, and market news
- Trained and compared naive baselines, price based XGBoost models, and price plus text forecasting models with walk forward validation
- Added leakage controls so text records only joined into forecasts after their publication timestamp
- Integrated Alpaca paper trading to test model generated signals with position limits, order logs, risk controls, and simulated portfolio tracking
- Built a React research console showing signal lift, equity curve, drawdown, feature importance, sentiment trends, latest forecasts, confidence scores, paper orders, and API health
- Added MLflow tracking, Docker workflows, GitHub Actions CI, Prometheus metrics, Grafana dashboards, Rust gateway routes, and an ONNX plus Triton serving path
What I learned
- How to design a financial ML system around evidence instead of assuming the AI model wins
- How to prevent look ahead bias with publication timestamps, prediction cutoffs, and feature lineage checks
- How to compare price only models against text enhanced models using Sharpe, Sortino, drawdown, hit rate, turnover, and transaction cost adjusted returns
- How to connect data engineering, NLP, model training, paper trading, API serving, monitoring, and dashboard design into one production style platform
- How to make advanced AI features interview defensible by separating real results, research backtests, paper trading evidence, and scaffolded infrastructure