add-evaluation-tool

Tech Stack

Frontend

Backend

API & Contract

Database & Storage

Containerization & Orchestration

Testing & Quality

Testing

We standardized on pytest with the pytest-cov plugin for backend unit testing. Pytest already underpins our existing tests, offers first-class fixtures/monkeypatching for FastAPI services, and integrates seamlessly with Poetry. The pytest-cov plugin layers in coverage reporting and lets us gate per-module thresholds without bolting on another runner, so we get a single, fast command for both correctness and coverage validation.

Static analysis

Documentation & Diagramming

Security & Networking

Analytics

Instrumentation Tools

Why These Tools

We chose OpenTelemetry SDK for analytics because it allows us to track business-specific events (like evaluation completions) that aren’t captured by generic HTTP metrics. The SDK’s counter and histogram instruments let us measure user workflows end-to-end, which is essential for understanding if users are finding value in the tool. Grafana Cloud provides the storage and querying infrastructure we need without managing our own Prometheus instance, and Grafana’s dashboards make it easy for stakeholders to understand business metrics at a glance.

Observability

Instrumentation Tools

Why These Tools

We chose Grafana Beyla as our primary instrumentation tool because it provides automatic, zero-code observability for all HTTP traffic. This means we get comprehensive metrics and traces without modifying application code, reducing the risk of missing instrumentation and ensuring consistent coverage. The eBPF-based approach has minimal performance overhead and works regardless of the application framework.

OpenTelemetry SDK complements Beyla by allowing us to instrument business logic that doesn’t involve HTTP requests (like PlantUML parsing) and to add custom attributes that provide business context. The SDK’s histogram instruments are essential for tracking parsing performance against our SLO (p95 < 3 seconds).

Grafana Cloud was chosen over self-hosted solutions because it eliminates operational complexity while providing enterprise-grade features like long retention, high availability, and automatic scaling. The unified platform means we don’t need to manage separate systems for metrics, traces, and logs.

Grafana provides the visualization layer that makes our observability data actionable. Its PromQL support allows us to create complex queries for SLO monitoring, and its alerting system integrates with our notification channels (PagerDuty, Slack, Email) to ensure we respond quickly to issues.

CI/CD (Planned)