Jumpstart Incident Response with Grafana Assistant: A Pre-Built Infrastructure Knowledge Base Guide

By ✦ min read

Overview

When an unexpected alert fires, every second counts. The traditional approach—asking an AI assistant for help, then manually sharing context about your data sources, services, and metrics—wastes precious time. Grafana Assistant, the agentic observability assistant in Grafana Cloud, eliminates this friction by learning your infrastructure before you ask a single question. It automatically builds and maintains a persistent knowledge base of your environment, so when you need answers, it already knows what’s running, how services connect, and where to look. This guide walks you through enabling, configuring, and leveraging Grafana Assistant to slash incident response times and reduce context-sharing overhead.

Jumpstart Incident Response with Grafana Assistant: A Pre-Built Infrastructure Knowledge Base Guide

Prerequisites

Before you begin, ensure you have the following:

A Grafana Cloud account (Stack, Pro, or Advanced plan that includes Grafana Assistant).
At least one configured data source in your Grafana Cloud stack: Prometheus, Loki, or Tempo (Assistant works best with all three).
Admin or Editor permissions in your Grafana organization to enable and manage Assistant.
Basic familiarity with Grafana dashboards and data source configuration is helpful but not required.

Step-by-Step Instructions

1. Enable Grafana Assistant

Navigate to your Grafana Cloud stack’s main interface. In the left-hand navigation menu, click Assist (or Assistant if that’s the label in your version). If this is your first time, you’ll see a welcome screen with a “Enable Assistant” button. Click it. No further configuration is needed—Assistant begins working in the background immediately.

2. Connect Your Data Sources

Assistant automatically discovers and scans all connected Prometheus, Loki, and Tempo data sources in your Grafana Cloud stack. To maximize its knowledge base:

Go to Configuration > Data Sources and ensure all relevant Prometheus, Loki, and Tempo instances are added and “Default” is not interfering.
If you use multiple Prometheus sources (e.g., one for metrics, one for alerts), keep them connected—Assistant scans them all in parallel.
For Loki data, enable structured log parsing if your logs are in JSON format (e.g., by using a json stage in your pipeline).

3. Let the Background Scan Run

Once data sources are connected, Assistant’s swarm of AI agents initiates a scanning process:

Data source discovery: Identifies all configured Prometheus, Loki, and Tempo sources.
Metrics scans: Queries each Prometheus source to find services, deployments, and infrastructure components. This may take a few minutes depending on your data volume.
Enrichments via logs and traces: Correlates Loki log streams and Tempo trace structures with corresponding metrics, mapping log formats, trace IDs, and service dependencies.
Structured knowledge generation: For each discovered service or service group, Assistant produces a mini documentation covering: what the service does, its key metrics and labels, how it’s deployed (e.g., Kubernetes, EC2), its upstream/downstream dependencies, and its log/trace sources.

You don’t need to trigger anything—the scan runs automatically in the background. You can check progress by revisiting the Assistant page; a status indicator shows “Building knowledge base…” while scanning is active.

4. Ask Your First Question

Once the scan completes (typically within 5–15 minutes for a medium-sized stack), you can start asking questions. For example:

“Why is my checkout service slow?”
“Show me latency metrics for the payment service.”
“What are the downstream dependencies of the checkout service?”

Assistant uses its pre-loaded knowledge base to instantly understand:

That “checkout service” lives in a specific Prometheus data source.
Its latency metric is checkout_request_duration_seconds.
Its logs are stored as structured JSON in a Loki stream.
It depends on three downstream services (payment, inventory, shipping).

You do not need to share any context—Assistant already has the map. If you need to probe deeper, you can ask follow-up questions like “Show the trace of the last slow checkout request,” and Assistant will correlate logs and traces automatically.

5. Verify and Expand the Knowledge Base

To check what Assistant has learned, ask “What do you know about my infrastructure?” or “List all services you discovered.” Assistant will output a summary. If you add new data sources or services later, they will be scanned during the next periodic refresh (by default every 4 hours). You can also trigger an immediate rescan from the Assistant settings page.

Common Mistakes

Mistake 1: Not Connecting All Relevant Data Sources

Assistant can only learn from data sources it can see. If you have a separate Prometheus for a critical microservice but it’s not added to your Grafana instance, that service will be invisible. Fix: Double-check your data sources list. At minimum, connect one Prometheus, one Loki, and one Tempo instance to get the full benefit of enrichment.

Mistake 2: Expecting Instant Results

Scans take time—especially for large environments. If you ask a question immediately after enabling Assistant, it may respond with “I haven’t finished learning your infrastructure yet.” Fix: Wait for the initial scan to complete (check the status indicator). Patience pays off: after the first scan, subsequent refreshes are faster.

Mistake 3: Overlooking Structured Log Formats

Assistant enriches its knowledge base using log formats. If your logs are plaintext or unstructured, it can still work but will have less detail about log fields. Fix: If possible, add JSON formatting to your log pipeline (e.g., using logstash or fluentd—see Loki documentation). For existing logs, you can add a json stage in your Loki pipeline configuration.

Mistake 4: Not Utilizing the Pre-Built Knowledge for Cross-Team Collaboration

Some teams assume Assistant is only for SREs. But it’s especially valuable for developers who don’t know the full infrastructure. Fix: Encourage all team members to ask questions about dependencies and metrics, even if they’re unfamiliar with the system. Assistant serves as a living documentation that everyone can query.

Summary

Grafana Assistant transforms incident response by proactively learning your infrastructure, eliminating the need for repetitive context sharing. With zero configuration, it automatically discovers data sources, scans metrics, correlates logs and traces, and builds a structured knowledge base. By following the steps above—enable Assistant, connect data sources, let the scan run, and then ask questions—you can shave valuable minutes off every troubleshooting session. This feature is especially powerful for distributed teams where not everyone has the full picture. Start using Grafana Assistant today to turn “every conversation from scratch” into “straight into troubleshooting.”

Tags: