DevOps & CloudFeatured

RAG Assistant — Microservices & Cloud Deployment

Intelligent RAG assistant split across 3 independent microservices, orchestrated with Kubernetes, secured with RBAC and HashiCorp Vault, and monitored via the Prometheus/Loki/Grafana stack.

2026
Completed (January 2026)
1 member

Technologies Used

DockerKubernetesGitLab CIPrometheusLokiGrafanaHashiCorp VaultRBACPythonRAG

Development of an intelligent Retrieval-Augmented Generation (RAG) assistant split into 3 independent microservices for independent scalability. Deployed on a cloud environment with strict security policies and full observability.

🎯 Project Overview

This academic project explores modern cloud-native architecture patterns: microservices decomposition, container orchestration, infrastructure-as-code, and production-grade security. The RAG assistant answers user queries by retrieving relevant context from a knowledge base before generating responses.

🏗️ Microservices Architecture

The application is divided into 3 independent services, each deployable and scalable separately:

  1. Retrieval Service — Vector similarity search over the knowledge base
  2. Generation Service — Language model interface for response generation with retrieved context
  3. API Gateway — Single entry point, routing, authentication, and rate limiting

☁️ Containerization & Orchestration

  • Docker — Each microservice containerized with optimized multi-stage builds
  • Kubernetes (K8s) — Cluster orchestration with independent HPA per service
  • Network Policies — Strict inter-service communication rules (least privilege)
  • RBAC — Role-Based Access Control for Kubernetes resources

🔐 Security

  • HashiCorp Vault — Dynamic secrets management; no hardcoded credentials
  • RBAC — Fine-grained access control on Kubernetes namespaces and resources
  • Network Policies — Pod-to-pod communication restricted to declared flows

🚀 CI/CD Pipeline (GitLab CI)

  • Docker image build and push on commit
  • Kubernetes manifest linting
  • Automated deployment to staging/production namespaces
  • Rollback triggers on health check failure

📊 Observability Stack (PLG)

| Tool | Role | |------|------| | Prometheus | Metrics scraping (latency, throughput, error rate) | | Loki | Centralized log aggregation from all pods | | Grafana | Unified dashboards for metrics and logs |

Custom dashboards track AI generation latency P50/P95/P99 and retrieval hit rates.

Challenges

  • Decomposing a monolithic RAG system into independently deployable microservices
  • Managing secrets securely in a Kubernetes cluster without hardcoded credentials
  • Ensuring strict network isolation between services with Kubernetes Network Policies
  • Building observability for AI-specific metrics (generation latency, retrieval accuracy)

Solutions

  • Defined clear service boundaries as separate K8s Deployments with independent HPA
  • Integrated HashiCorp Vault for dynamic secret injection via Vault Agent sidecar
  • Implemented NetworkPolicy manifests allowing only declared pod-to-pod communication paths
  • Created custom Prometheus metrics in the generation service for P50/P95/P99 latency tracking

Outcomes

  • Fully operational RAG assistant deployed on Kubernetes with independent service scaling
  • Zero hardcoded secrets — all credentials dynamically injected via HashiCorp Vault
  • Complete PLG observability stack with AI latency dashboards
  • Automated CI/CD pipeline with GitLab CI covering build, test, deploy, and rollback
  • Production-grade Kubernetes security: RBAC, NetworkPolicies, namespaced isolation